[ https://issues.apache.org/jira/browse/SPARK-37341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-37341: ------------------------------------ Assignee: Apache Spark > Avoid unnecessary buffer and copy in full outer sort merge join > --------------------------------------------------------------- > > Key: SPARK-37341 > URL: https://issues.apache.org/jira/browse/SPARK-37341 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Cheng Su > Assignee: Apache Spark > Priority: Minor > > FULL OUTER sort merge join (non-code-gen path) copies join keys and buffers > input rows, even when rows from both sides do have matched keys > ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L1637-L1641] > ). This is unnecessary, as we can just output the row with smaller join > keys, and only buffer when both sides have matched keys. This would save us > from unnecessary copy and buffer, when both join sides have a lot of rows not > matched with each other. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org