[ 
https://issues.apache.org/jira/browse/SPARK-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian updated SPARK-13872:
------------------------
    Description: 
SortMergeJoin composes its partition/iterator from 
org.apache.spark.sql.execution.Sort, which in turns designates the sorting to 
UnsafeExternalRowSorter.
UnsafeExternalRowSorter's implementation cleans up the resources when:
1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully 
iterated.
2. task is done execution.

In outer join case of SortMergeJoin, when the left or right iterator is not 
fully iterated, the only only occasion for the recources to be cleaned up is at 
the end of the spark task. This probably ok most of the time, however when a 
SortMergeOuterJoin is nested within a CartesianProduct, the "deferred" 
resources cleanup becomes an memory leak amplified by the loop driven by the 
CartesianRdd's outter loop iteration.   



  was:
SortMergeJoin composes its partition/iterator from 
org.apache.spark.sql.execution.Sort, which in turns designates the sorting to 
UnsafeExternalRowSorter.
UnsafeExternalRowSorter's implementation cleans up the resources when:
1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully 
iterated.
2. task is done execution.

In case of outer join case of SortMergeJoin, when the left or right iterator is 
not fully iterated, the only only occasion for the recources to be cleaned up 
is at the end of the spark task. This probably ok most of the time, however 
when a SortMergeOuterJoin is nested within a CartesianProduct, the "deferred" 
resources cleanup becomes an memory leak amplified by the loop driven by the 
CartesianRdd's outter loop iteration.   




> Memory leak in SortMergeOuterJoin
> ---------------------------------
>
>                 Key: SPARK-13872
>                 URL: https://issues.apache.org/jira/browse/SPARK-13872
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Ian
>         Attachments: Screen Shot 2016-03-11 at 5.42.32 PM.png
>
>
> SortMergeJoin composes its partition/iterator from 
> org.apache.spark.sql.execution.Sort, which in turns designates the sorting to 
> UnsafeExternalRowSorter.
> UnsafeExternalRowSorter's implementation cleans up the resources when:
> 1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully 
> iterated.
> 2. task is done execution.
> In outer join case of SortMergeJoin, when the left or right iterator is not 
> fully iterated, the only only occasion for the recources to be cleaned up is 
> at the end of the spark task. This probably ok most of the time, however when 
> a SortMergeOuterJoin is nested within a CartesianProduct, the "deferred" 
> resources cleanup becomes an memory leak amplified by the loop driven by the 
> CartesianRdd's outter loop iteration.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to