[ https://issues.apache.org/jira/browse/SPARK-33206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-33206: ------------------------------------ Assignee: Apache Spark > Spark Shuffle Index Cache calculates memory usage wrong > ------------------------------------------------------- > > Key: SPARK-33206 > URL: https://issues.apache.org/jira/browse/SPARK-33206 > Project: Spark > Issue Type: Bug > Components: Shuffle > Affects Versions: 2.4.0, 3.0.1 > Reporter: Lars Francke > Assignee: Apache Spark > Priority: Major > Attachments: image001(1).png > > > SPARK-21501 changed the spark shuffle index service to be based on memory > instead of the number of files. > Unfortunately, there's a problem with the calculation which is based on size > information provided by `ShuffleIndexInformation`. > It is based purely on the file size of the cached file on disk. > We're running in OOMs with very small index files (byte size ~16 bytes) but > the overhead of the ShuffleIndexInformation around this is much larger (e.g. > 184 bytes, see screenshot). We need to take this into account and should > probably add a fixed overhead of somewhere between 152 and 180 bytes > according to my tests. I'm not 100% sure what the correct number is and it'll > also depend on the architecture etc. so we can't be exact anyway. > If we do that we can maybe get rid of the size field in > ShuffleIndexInformation to save a few more bytes per entry. > In effect this means that for small files we use up about 70-100 times as > much memory as we intend to. Our NodeManagers OOM with 4GB and more of > indexShuffleCache. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org