[ 
https://issues.apache.org/jira/browse/SPARK-22713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-22713.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

Issue resolved by pull request 21369
[https://github.com/apache/spark/pull/21369]

> OOM caused by the memory contention and memory leak in TaskMemoryManager
> ------------------------------------------------------------------------
>
>                 Key: SPARK-22713
>                 URL: https://issues.apache.org/jira/browse/SPARK-22713
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 2.1.1, 2.1.2
>            Reporter: Lijie Xu
>            Assignee: Eyal Farago
>            Priority: Critical
>             Fix For: 2.4.0
>
>
> The pdf version of this issue with high-quality figures is available at 
> https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/report/OOM-TaskMemoryManager.pdf.
> *[Abstract]* 
> I recently encountered an OOM error in a PageRank application 
> (_org.apache.spark.examples.SparkPageRank_). After profiling the application, 
> I found the OOM error is related to the memory contention in shuffle spill 
> phase. Here, the memory contention means that a task tries to release some 
> old memory consumers from memory for keeping the new memory consumers. After 
> analyzing the OOM heap dump, I found the root cause is a memory leak in 
> _TaskMemoryManager_. Since memory contention is common in shuffle phase, this 
> is a critical bug/defect. In the following sections, I will use the 
> application dataflow, execution log, heap dump, and source code to identify 
> the root cause.
> *[Application]* 
> This is a PageRank application from Spark’s example library. The following 
> figure shows the application dataflow. The source code is available at \[1\].
> !https://raw.githubusercontent.com/JerryLead/Misc/master/OOM-TasksMemoryManager/figures/PageRankDataflow.png|width=100%!
> *[Failure symptoms]*
> This application has a map stage and many iterative reduce stages. An OOM 
> error occurs in a reduce task (Task-28) as follows.
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/Stage.png?raw=true|width=100%!
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/task.png?raw=true|width=100%!
>  
> *[OOM root cause identification]*
> Each executor has 1 CPU core and 6.5GB memory, so it only runs one task at a 
> time. After analyzing the application dataflow, error log, heap dump, and 
> source code, I found the following steps lead to the OOM error. 
> => The MemoryManager found that there is not enough memory to cache the 
> _links:ShuffledRDD_ (rdd-5-28, red circles in the dataflow figure).
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/ShuffledRDD.png?raw=true|width=100%!
> => The task needs to shuffle twice (1st shuffle and 2nd shuffle in the 
> dataflow figure).
> => The task needs to generate two _ExternalAppendOnlyMap_ (E1 for 1st shuffle 
> and E2 for 2nd shuffle) in sequence.
> => The 1st shuffle begins and ends. E1 aggregates all the shuffled data of 
> 1st shuffle and achieves 3.3 GB.
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/FirstShuffle.png?raw=true|width=100%!
> => The 2nd shuffle begins. E2 is aggregating the shuffled data of 2nd 
> shuffle, and finding that there is not enough memory left. This triggers the 
> memory contention.
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/SecondShuffle.png?raw=true|width=100%!
> => To handle the memory contention, the _TaskMemoryManager_ releases E1 
> (spills it onto disk) and assumes that the 3.3GB space is free now.
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/MemoryContention.png?raw=true|width=100%!
> => E2 continues to aggregates the shuffled records of 2nd shuffle. However, 
> E2 encounters an OOM error while shuffling.
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/OOMbefore.png?raw=true|width=100%!
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/OOMError.png?raw=true|width=100%!
> *[Guess]* 
> The task memory usage below reveals that there is not memory drop down. So, 
> the cause may be that the 3.3GB _ExternalAppendOnlyMap_ (E1) is not actually 
> released by the _TaskMemoryManger_. 
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/GCFigure.png?raw=true|width=100%!
> *[Root cause]* 
> After analyzing the heap dump, I found the guess is right (the 3.3GB 
> _ExternalAppendOnlyMap_ is actually not released). The 1.6GB object is 
> _ExternalAppendOnlyMap (E2)_.
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/heapdump.png?raw=true|width=100%!
> *[Question]* 
> Why the released _ExternalAppendOnlyMap_ is still in memory?
> The source code of _ExternalAppendOnlyMap_ shows that the _currentMap_ 
> (_AppendOnlyMap_) has been set to _null_ when the spill action is finished.
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/SourceCode.png?raw=true|width=100%!
> *[Root cause in the source code]* I further analyze the reference chain of 
> unreleased _ExternalAppendOnlyMap_. The reference chain shows that the 3.3GB 
> _ExternalAppendOnlyMap_ is still referenced by the _upstream/readingIterator_ 
> and further referenced by _TaskMemoryManager_ as follows. So, the root cause 
> in the source code is that the _ExternalAppendOnlyMap_ is still referenced by 
> other iterators (setting the _currentMap_ to _null_ is not enough).
> !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/References.png?raw=true|width=100%!
> *[Potential solution]*
> Setting the _upstream/readingIterator_ to _null_ after the _forceSpill_() 
> action. I will try this solution in these days.
> [References]
> [1] PageRank source code. 
> https://github.com/JerryLead/SparkGC/blob/master/src/main/scala/applications/graph/PageRank.scala
> [2] Task execution log. 
> https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/log/TaskExecutionLog.txt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to