[ https://issues.apache.org/jira/browse/SPARK-32104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146839#comment-17146839 ]
Zuo Dao commented on SPARK-32104: --------------------------------- Yes, same problem. [~viirya] > Avoid full outer join OOM on skewed dataset > ------------------------------------------- > > Key: SPARK-32104 > URL: https://issues.apache.org/jira/browse/SPARK-32104 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.1, 2.4.6, 3.0.0 > Reporter: Zuo Dao > Priority: Minor > > SPARK-24985 changed {{SortMergeJoinExec}} to use > {{ExternalAppendOnlyUnsafeRowArray}} on {{SortMergeFullOuterJoinScanner}}. > But its performance is very poor, because when matching the key, you need to > constantly get the element at the specified position according to the index. > This PR add a {{FileBasedAppendOnlyUnsafeRowArray}}. > It can quickly locate the corresponding data according to the given index > because of the offsets array in memory. > And in memory buffer rows can be limited by > {{spark.sql.sortMergeJoinExec.buffer.in.memory.threshold}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org