[ https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951165#comment-16951165 ]
Min Shen edited comment on SPARK-21492 at 10/14/19 5:13 PM: ------------------------------------------------------------ Want to further clarify the scope of the fix in PR [https://github.com/apache/spark/pull/25888]. Based on previous work by [~taoluo], this PR further fixes the issue for SMJ codegen. [~hvanhovell] raised 2 concerns in [~taoluo]'s PR in [https://github.com/apache/spark/pull/23762]: # This only works for a SMJ with Sorts as its direct input. # Not sure if it safe to assume that you can close an underlying child like this. The fix in PR [https://github.com/apache/spark/pull/25888] should have addressed concern #2, i.e. it guarantees safeness on closing the iterator for a Sort operator early. This fix does not yet propagate the requests to close iterators of both child operators of a SMJ throughout the plan tree to reach the Sort operators. However, with our experiences in operating all Spark workloads at LinkedIn, it is mostly common for SMJ not having Sort as its direct input when there are multiple SMJs stacked together. In this case, even if we are not yet propagating the requests, each SMJ can still properly handle its local child operators which would still help to release the resources early. was (Author: mshen): Want to further clarify the scope of the fix in PR [https://github.com/apache/spark/pull/25888]. Based on previous work by [~taoluo], this PR further fixes the issue for SMJ codegen. [~hvanhovell] raised 2 concerns in [~taoluo]'s PR in [https://github.com/apache/spark/pull/23762]: # This only works for a SMJ with Sorts as its direct input. # Not sure if it safe to assume that you can close an underlying child like this. The fix in PR [https://github.com/apache/spark/pull/25888] should have addressed concern #2, i.e. it guarantees safeness on closing the iterator for a Sort operator early. This fix does not yet propagate the requests to close iterators of both child operators of a SMJ throughout the plan tree to reach the Sort operators. However, with our experiences in operating all Spark workloads at LI, it is mostly common for SMJ not having Sort as its direct input when there are multiple SMJs stacked together. In this case, even if we are not yet propagating the requests, each SMJ can still properly handle its local child operators which would still help to release the resources early. > Memory leak in SortMergeJoin > ---------------------------- > > Key: SPARK-21492 > URL: https://issues.apache.org/jira/browse/SPARK-21492 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0 > Reporter: Zhan Zhang > Priority: Major > > In SortMergeJoin, if the iterator is not exhausted, there will be memory leak > caused by the Sort. The memory is not released until the task end, and cannot > be used by other operators causing performance drop or OOM. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org