chen zhejia created SPARK-39569: ----------------------------------- Summary: Spark Shuffle Index Cache ignore the weight of index Path Key: SPARK-39569 URL: https://issues.apache.org/jira/browse/SPARK-39569 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.1.2, 2.4.0 Reporter: chen zhejia
We had the same OOMs problem with [SPARK-33206|https://issues.apache.org/jira/browse/SPARK-33206]. This PR fixed the incorrect weight calculation problem when ExternalShuffle caches ShuffleIndexInformation, but we noticed that the key was ignored, of which type is filePath shuffleIndexCache = CacheBuilder.newBuilder() .maximumWeight(JavaUtils.byteStringAsBytes(indexCacheSize)) .weigher((Weigher<String, ShuffleIndexInformation>) (filePath, indexInfo) -> indexInfo.getRetainedMemorySize()) .build(indexCacheLoader); in our case the length of the index path could be greater than 100, e.g. /data/data2/yarn/nm/usercache/hive/appcache/application_1654741161919_1249246/blockmgr-6b0f7db0-7d55-4270-ad3d-42fe70b5694e/37/shuffle_0_1794_0.index . This is causing a lot of memory usage in jmap dump. Should we consider the key size when calculating the weight in order to get a more accurate result? -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org