Github user sujithjay commented on a diff in the pull request: https://github.com/apache/spark/pull/22168#discussion_r211695230 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -1058,31 +1064,37 @@ private class SortMergeFullOuterJoinScanner( * @return true if a valid match is found, false otherwise. */ private def scanNextInBuffered(): Boolean = { - while (leftIndex < leftMatches.size) { - while (rightIndex < rightMatches.size) { - joinedRow(leftMatches(leftIndex), rightMatches(rightIndex)) - if (boundCondition(joinedRow)) { - leftMatched.set(leftIndex) - rightMatched.set(rightIndex) + val leftMatchesIterator = leftMatches.generateIterator(leftIndex) + + while (leftMatchesIterator.hasNext) { + val leftCurRow = leftMatchesIterator.next() + val rightMatchesIterator = rightMatches.generateIterator(rightIndex) --- End diff -- Hi @viirya, Thank you for reviewing the code. I agree the code will be a bit complicated if we had to keep scanning the iterators. In this particular case, I was following a pattern observed throughout the rest of the class. For eg., https://github.com/apache/spark/blob/35f7f5ce83984d8afe0b7955942baa04f2bef74f/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L307-L329 Having said that, I do feel the penalty for following a similar approach in case of full outer joins could be higher. I will try & see what I can do.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org