chen zhejia created SPARK-39569:
-----------------------------------

             Summary: Spark Shuffle Index Cache ignore the weight of index Path
                 Key: SPARK-39569
                 URL: https://issues.apache.org/jira/browse/SPARK-39569
             Project: Spark
          Issue Type: Bug
          Components: Shuffle
    Affects Versions: 3.1.2, 2.4.0
            Reporter: chen zhejia


We had the same OOMs problem with 
[SPARK-33206|https://issues.apache.org/jira/browse/SPARK-33206]. This PR fixed 
the incorrect weight calculation problem when ExternalShuffle caches 
ShuffleIndexInformation, but we noticed that the key was ignored, of which type 
is filePath
 
shuffleIndexCache = CacheBuilder.newBuilder()
      .maximumWeight(JavaUtils.byteStringAsBytes(indexCacheSize))
      .weigher((Weigher<String, ShuffleIndexInformation>)
        (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
      .build(indexCacheLoader);
 
in our case the length of the index path could be greater than 100, e.g. 
/data/data2/yarn/nm/usercache/hive/appcache/application_1654741161919_1249246/blockmgr-6b0f7db0-7d55-4270-ad3d-42fe70b5694e/37/shuffle_0_1794_0.index
. This is causing a lot of memory usage in jmap dump. Should we consider the 
key size when calculating the weight in order to get a more accurate result?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to