Ian created SPARK-13872: --------------------------- Summary: Memory leak SortMergeOuterJoin Key: SPARK-13872 URL: https://issues.apache.org/jira/browse/SPARK-13872 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1 Reporter: Ian
SortMergeJoin composes its partition/iterator from org.apache.spark.sql.execution.Sort, which in turns designates the sorting to UnsafeExternalRowSorter. UnsafeExternalRowSorter's implementation cleans up the resources when: 1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully iterated. 2. task is done execution. In case of outer join case of SortMergeJoin, when the left or right iterator is not fully iterated, the only only occasion for the recources to be cleaned up is at the end of the spark task. This probably ok most of the time, however when a SortMergeOuterJoin is nested within a CartesianProduct, the "deferred" resources cleanup becomes an memory leak amplified by the loop driven by the CartesianRdd's outter loop iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org