Re: [PR] [SPARK-56903][SQL] Spread NULL outer join keys across shuffle partitions [spark]

via GitHub Mon, 18 May 2026 08:11:04 -0700


sunchao commented on code in PR #55927:
URL: https://github.com/apache/spark/pull/55927#discussion_r3259965820



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledJoin.scala:
##########
@@ -28,6 +28,21 @@ import 
org.apache.spark.sql.catalyst.plans.physical.{ClusteredDistribution, Dist
 trait ShuffledJoin extends JoinCodegenSupport {
   def isSkewJoin: Boolean
 
+  private def containsNullSafeJoinMarker(keys: Seq[Expression]): Boolean = {
+    keys.exists(_.exists(_.isInstanceOf[IsNull]))
+  }
+
+  private lazy val canSpreadNullJoinKeys: Boolean = {

Review Comment:
   For most types the `coalesce` key is non-null, but 
`Literal.default(NullType)` is itself null, so it seems the extracted shuffle 
key can still contain nulls even though those rows remain matchable under `<=>`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56903][SQL] Spread NULL outer join keys across shuffle partitions [spark]

Reply via email to