cloud-fan commented on a change in pull request #32476:
URL: https://github.com/apache/spark/pull/32476#discussion_r629891534



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##########
@@ -431,6 +433,41 @@ case class SortMergeJoinExec(
     // Copy the streamed keys as class members so they could be used in next 
function call.
     val matchedKeyVars = copyKeys(ctx, streamedKeyVars)
 
+    // Handle the case when streamed rows has any NULL keys.
+    val handleStreamedAnyNull = joinType match {
+      case _: InnerLike =>
+        // Skip streamed row.
+        s"""
+           |$streamedRow = null;
+           |continue;
+         """.stripMargin
+      case LeftOuter | RightOuter =>
+        // Eagerly return streamed row. Only call `matches.clear()` when 
`matches.isEmpty()` is
+        // false, to reduce unnecessary computation.
+        s"""
+           |if (!$matches.isEmpty()) {
+           |  $matches.clear();
+           |}
+           |return false;
+         """.stripMargin
+      case x =>
+        throw new IllegalArgumentException(
+          s"SortMergeJoin.genScanner should not take $x as the JoinType")
+    }
+
+    // Handle the case when streamed keys less than buffered keys.
+    val handleStreamedLessThanBuffered = joinType match {
+      case _: InnerLike =>
+        // Skip streamed row.
+        s"$streamedRow = null;"
+      case LeftOuter | RightOuter =>
+        // Eagerly return with streamed row.
+        "return false;"
+      case x =>
+        throw new IllegalArgumentException(
+          s"SortMergeJoin.genScanner should not take $x as the JoinType")
+    }
+
     ctx.addNewFunction("findNextJoinRows",

Review comment:
       Can we add some comments to explain `findNextJoinRows`? Its "return 
value" is not just the boolean value, it can also change some states, like 
`bufferedRow = null`. We should explain how the states can be changed and what 
does it mean.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to