[ 
https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ye Zhou updated SPARK-21961:
----------------------------
    Description: 
As described in SPARK-20923, TaskMetrics._updatedBlockStatuses uses a lot of 
memory in Driver. Recently we also noticed the same issue in Spark History 
Server. Even though in SPARK-20084, those event logs are getting removed from 
history log. But multiple versions of Spark including 1.6.x and 2.1.0 versions 
are deployed in our production cluster, none of them have these two patches 
included.
In this case, those event logs will still be in shown up in logs and Spark 
History Server will replay them. Spark History Server continuously get severe 
Full GCs even though we tried to limit cache size as well as enlarge the 
heapsize to 40GB. We also tried with different GC tuning parameters, like using 
CMS or G1GC. None of them works.
We made a heap dump, and found that the top memory consumer objects is 
BlockStatus. There was even one thread that took 24GB heap which was replaying 
one log file.
Since the former two tickets has resolved related issues in both driver and 
writing to history logs, we should also consider add this filter to Spark 
History Server in order to decrease the memory consumption for replaying one 
history log. For use cases like us, where we have multiple older versions of 
Spark deployed, this filter should be pretty useful.
!https://issues.apache.org/jira/secure/attachment/12886191/Objects_Count_in_Heap.png!


  was:
As described in SPARK-20923, TaskMetrics._updatedBlockStatuses uses a lot of 
memory in Driver. Recently we also noticed the same issue in Spark History 
Server. Even though in SPARK-20084, those event logs are getting removed from 
history log. But multiple versions of Spark including 1.6.x and 2.1.0 versions 
are deployed in our production cluster, none of them have these two patches 
included.
In this case, those event logs will still be in shown up in logs and Spark 
History Server will replay them. Spark History Server continuously get severe 
Full GCs even though we tried to limit cache size as well as enlarge the 
heapsize to 40GB. We also tried with different GC tuning parameters, like using 
CMS or G1GC. None of them works.
We made a heap dump, and found that the top memory consumer objects is 
BlockStatus. There was even one thread that took 24GB heap which was replaying 
one log file.
Since the former two tickets has resolved related issues in both driver and 
writing to history logs, we should also consider add this filter to Spark 
History Server in order to decrease the memory consumption for replaying one 
history log. For use cases like us, where we have multiple older versions of 
Spark deployed, this filter should be pretty useful.



> Filter out BlockStatuses Accumulators during replaying history logs in Spark 
> History Server
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21961
>                 URL: https://issues.apache.org/jira/browse/SPARK-21961
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Ye Zhou
>         Attachments: Objects_Count_in_Heap.png, One_Thread_Took_24GB.png
>
>
> As described in SPARK-20923, TaskMetrics._updatedBlockStatuses uses a lot of 
> memory in Driver. Recently we also noticed the same issue in Spark History 
> Server. Even though in SPARK-20084, those event logs are getting removed from 
> history log. But multiple versions of Spark including 1.6.x and 2.1.0 
> versions are deployed in our production cluster, none of them have these two 
> patches included.
> In this case, those event logs will still be in shown up in logs and Spark 
> History Server will replay them. Spark History Server continuously get severe 
> Full GCs even though we tried to limit cache size as well as enlarge the 
> heapsize to 40GB. We also tried with different GC tuning parameters, like 
> using CMS or G1GC. None of them works.
> We made a heap dump, and found that the top memory consumer objects is 
> BlockStatus. There was even one thread that took 24GB heap which was 
> replaying one log file.
> Since the former two tickets has resolved related issues in both driver and 
> writing to history logs, we should also consider add this filter to Spark 
> History Server in order to decrease the memory consumption for replaying one 
> history log. For use cases like us, where we have multiple older versions of 
> Spark deployed, this filter should be pretty useful.
> !https://issues.apache.org/jira/secure/attachment/12886191/Objects_Count_in_Heap.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to