[ 
https://issues.apache.org/jira/browse/SPARK-33710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246329#comment-17246329
 ] 

liangtianlun commented on SPARK-33710:
--------------------------------------

Thank you. I'll change it into English

> Shuffle Index use Guava cache OOM, Yarn NodeManage GC Alarm
> -----------------------------------------------------------
>
>                 Key: SPARK-33710
>                 URL: https://issues.apache.org/jira/browse/SPARK-33710
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, YARN
>    Affects Versions: 3.2.0
>            Reporter: liangtianlun
>            Priority: Major
>
> h2. CDH6.3 Yarn nodemanger frequently GC, and then the dump file is generated 
> due to memory overflow
> !https://upload-images.jianshu.io/upload_images/18249296-24acecfcc46dc744.png?imageMogr2/auto-orient/strip|imageView2/2/w/607/format/webp!
>   
> h2. Use the Memory Analyzer Tool to locate the shuffle index module
>  
> Using guava to cache the memory limit, there is no restriction on the cache 
> key, resulting in a lot of path information in the memory. If the size of 
> shuffleindexinformation in the cache is very small, the number of keys will 
> be very large, and eventually lead to memory overflow. I think there is a 
> defect here, and the capacity of key should be added to the statistics of 
> 100MB
>  
> !https://upload-images.jianshu.io/upload_images/18249296-ed0cfee76b6f6bf2.png?imageMogr2/auto-orient/strip|imageView2/2/w/630/format/webp!
> According to the MAT, the ExternalShuffleBlockHandler uses guava's local 
> cache and takes up 82.88% of the heap memory
>  
>  
> !https://upload-images.jianshu.io/upload_images/18249296-43ec91771f3c68b7.png?imageMogr2/auto-orient/strip|imageView2/2/w/760/format/webp!
> !https://upload-images.jianshu.io/upload_images/18249296-f85e27a501605260.png?imageMogr2/auto-orient/strip|imageView2/2/w/1147/format/webp!
> Through the analysis, it is found that there are a lot of shuffle index path 
> information in the memory, which takes up more than 400 MB of memory, and the 
> number is very large. This path is the key of shuffleindex cache in the 
> external shufflebock resolver. After looking at the source code, we know that 
> there may be some defects in the cache management, because the limited 100MB 
> does not include the key statistics
>   
> !https://upload-images.jianshu.io/upload_images/18249296-87118ce13744c2ca.png?imageMogr2/auto-orient/strip|imageView2/2/w/1200/format/webp!
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to