Github user gczsjdy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19862#discussion_r154563897
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
    @@ -674,8 +674,9 @@ private[joins] class SortMergeJoinScanner(
       private[this] val bufferedMatches =
         new ExternalAppendOnlyUnsafeRowArray(inMemoryThreshold, spillThreshold)
     
    -  // Initialization (note: do _not_ want to advance streamed here).
    -  advancedBufferedToRowWithNullFreeJoinKey()
    +  // Initialization (note: do _not_ want to advance streamed here). This 
is made lazy to prevent
    +  // unnecessary trigger of calculation.
    +  private lazy val advancedBufferedIterRes = 
advancedBufferedToRowWithNullFreeJoinKey()
    --- End diff --
    
    This function should be called (to try to set `BufferedRow`) before 
`BufferedRow` is checked, and it should be only once. This is the original 
requirement due to the logic. While to add this optimization, I think this is 
the best way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to