[ https://issues.apache.org/jira/browse/SPARK-22713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481468#comment-16481468 ]
Eyal Farago commented on SPARK-22713: ------------------------------------- [~jerrylead], excellent investigation and description of the issue, I'll open a PR shortly. > OOM caused by the memory contention and memory leak in TaskMemoryManager > ------------------------------------------------------------------------ > > Key: SPARK-22713 > URL: https://issues.apache.org/jira/browse/SPARK-22713 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core > Affects Versions: 2.1.1, 2.1.2 > Reporter: Lijie Xu > Priority: Critical > > The pdf version of this issue with high-quality figures is available at > https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/report/OOM-TaskMemoryManager.pdf. > *[Abstract]* > I recently encountered an OOM error in a PageRank application > (_org.apache.spark.examples.SparkPageRank_). After profiling the application, > I found the OOM error is related to the memory contention in shuffle spill > phase. Here, the memory contention means that a task tries to release some > old memory consumers from memory for keeping the new memory consumers. After > analyzing the OOM heap dump, I found the root cause is a memory leak in > _TaskMemoryManager_. Since memory contention is common in shuffle phase, this > is a critical bug/defect. In the following sections, I will use the > application dataflow, execution log, heap dump, and source code to identify > the root cause. > *[Application]* > This is a PageRank application from Spark’s example library. The following > figure shows the application dataflow. The source code is available at \[1\]. > !https://raw.githubusercontent.com/JerryLead/Misc/master/OOM-TasksMemoryManager/figures/PageRankDataflow.png|width=100%! > *[Failure symptoms]* > This application has a map stage and many iterative reduce stages. An OOM > error occurs in a reduce task (Task-28) as follows. > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/Stage.png?raw=true|width=100%! > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/task.png?raw=true|width=100%! > > *[OOM root cause identification]* > Each executor has 1 CPU core and 6.5GB memory, so it only runs one task at a > time. After analyzing the application dataflow, error log, heap dump, and > source code, I found the following steps lead to the OOM error. > => The MemoryManager found that there is not enough memory to cache the > _links:ShuffledRDD_ (rdd-5-28, red circles in the dataflow figure). > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/ShuffledRDD.png?raw=true|width=100%! > => The task needs to shuffle twice (1st shuffle and 2nd shuffle in the > dataflow figure). > => The task needs to generate two _ExternalAppendOnlyMap_ (E1 for 1st shuffle > and E2 for 2nd shuffle) in sequence. > => The 1st shuffle begins and ends. E1 aggregates all the shuffled data of > 1st shuffle and achieves 3.3 GB. > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/FirstShuffle.png?raw=true|width=100%! > => The 2nd shuffle begins. E2 is aggregating the shuffled data of 2nd > shuffle, and finding that there is not enough memory left. This triggers the > memory contention. > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/SecondShuffle.png?raw=true|width=100%! > => To handle the memory contention, the _TaskMemoryManager_ releases E1 > (spills it onto disk) and assumes that the 3.3GB space is free now. > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/MemoryContention.png?raw=true|width=100%! > => E2 continues to aggregates the shuffled records of 2nd shuffle. However, > E2 encounters an OOM error while shuffling. > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/OOMbefore.png?raw=true|width=100%! > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/OOMError.png?raw=true|width=100%! > *[Guess]* > The task memory usage below reveals that there is not memory drop down. So, > the cause may be that the 3.3GB _ExternalAppendOnlyMap_ (E1) is not actually > released by the _TaskMemoryManger_. > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/GCFigure.png?raw=true|width=100%! > *[Root cause]* > After analyzing the heap dump, I found the guess is right (the 3.3GB > _ExternalAppendOnlyMap_ is actually not released). The 1.6GB object is > _ExternalAppendOnlyMap (E2)_. > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/heapdump.png?raw=true|width=100%! > *[Question]* > Why the released _ExternalAppendOnlyMap_ is still in memory? > The source code of _ExternalAppendOnlyMap_ shows that the _currentMap_ > (_AppendOnlyMap_) has been set to _null_ when the spill action is finished. > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/SourceCode.png?raw=true|width=100%! > *[Root cause in the source code]* I further analyze the reference chain of > unreleased _ExternalAppendOnlyMap_. The reference chain shows that the 3.3GB > _ExternalAppendOnlyMap_ is still referenced by the _upstream/readingIterator_ > and further referenced by _TaskMemoryManager_ as follows. So, the root cause > in the source code is that the _ExternalAppendOnlyMap_ is still referenced by > other iterators (setting the _currentMap_ to _null_ is not enough). > !https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/figures/References.png?raw=true|width=100%! > *[Potential solution]* > Setting the _upstream/readingIterator_ to _null_ after the _forceSpill_() > action. I will try this solution in these days. > [References] > [1] PageRank source code. > https://github.com/JerryLead/SparkGC/blob/master/src/main/scala/applications/graph/PageRank.scala > [2] Task execution log. > https://github.com/JerryLead/Misc/blob/master/OOM-TasksMemoryManager/log/TaskExecutionLog.txt > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org