[ https://issues.apache.org/jira/browse/SPARK-33710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246329#comment-17246329 ]
liangtianlun commented on SPARK-33710: -------------------------------------- Thank you. I'll change it into English > Shuffle Index use Guava cache OOM, Yarn NodeManage GC Alarm > ----------------------------------------------------------- > > Key: SPARK-33710 > URL: https://issues.apache.org/jira/browse/SPARK-33710 > Project: Spark > Issue Type: Bug > Components: Shuffle, YARN > Affects Versions: 3.2.0 > Reporter: liangtianlun > Priority: Major > > h2. CDH6.3 Yarn nodemanger frequently GC, and then the dump file is generated > due to memory overflow > !https://upload-images.jianshu.io/upload_images/18249296-24acecfcc46dc744.png?imageMogr2/auto-orient/strip|imageView2/2/w/607/format/webp! > > h2. Use the Memory Analyzer Tool to locate the shuffle index module > > Using guava to cache the memory limit, there is no restriction on the cache > key, resulting in a lot of path information in the memory. If the size of > shuffleindexinformation in the cache is very small, the number of keys will > be very large, and eventually lead to memory overflow. I think there is a > defect here, and the capacity of key should be added to the statistics of > 100MB > > !https://upload-images.jianshu.io/upload_images/18249296-ed0cfee76b6f6bf2.png?imageMogr2/auto-orient/strip|imageView2/2/w/630/format/webp! > According to the MAT, the ExternalShuffleBlockHandler uses guava's local > cache and takes up 82.88% of the heap memory > > > !https://upload-images.jianshu.io/upload_images/18249296-43ec91771f3c68b7.png?imageMogr2/auto-orient/strip|imageView2/2/w/760/format/webp! > !https://upload-images.jianshu.io/upload_images/18249296-f85e27a501605260.png?imageMogr2/auto-orient/strip|imageView2/2/w/1147/format/webp! > Through the analysis, it is found that there are a lot of shuffle index path > information in the memory, which takes up more than 400 MB of memory, and the > number is very large. This path is the key of shuffleindex cache in the > external shufflebock resolver. After looking at the source code, we know that > there may be some defects in the cache management, because the limited 100MB > does not include the key statistics > > !https://upload-images.jianshu.io/upload_images/18249296-87118ce13744c2ca.png?imageMogr2/auto-orient/strip|imageView2/2/w/1200/format/webp! > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org