[ 
https://issues.apache.org/jira/browse/SPARK-33206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated SPARK-33206:
---------------------------------
    Description: 
dSPARK-21501 changed the spark shuffle index service to be based on memory 
instead of the number of files.

Unfortunately, there's a problem with the calculation which is based on size 
information provided by `ShuffleIndexInformation`.

It is based purely on the file size of the cached file on disk.

We're running in OOMs with very small index files (byte size ~16 bytes) but the 
overhead of the ShuffleIndexInformation around this is much larger (e.g. 184 
bytes, see screenshot). We need to take this into account and should probably 
add a fixed overhead of somewhere between 152 and 180 bytes according to my 
tests. I'm not 100% sure what the correct number is and it'll also depend on 
the architecture etc. so we can't be exact anyway.

If we do that we can maybe get rid of the size field in ShuffleIndexInformation 
to save a few more bytes per entry.

In effect this means that for small files we use up about 70-100 times as much 
memory as we intend to. Our NodeManagers OOM with 4GB and more of 
indexShuffleCache.

 

 

  was:
SPARK-21501 changed the spark shuffle index service to be based on memory 
instead of the number of files.

Unfortunately, there's a problem with the calculation which is based on size 
information provided by `ShuffleIndexInformation`.

It is based purely on the file size of the cached file on disk.

We're running in OOMs with very small index files (byte size ~16 bytes) but the 
overhead of the ShuffleIndexInformation around this is much larger (e.g. 184 
bytes, see screenshot). We need to take this into account and should probably 
add a fixed overhead of somewhere between 152 and 180 bytes according to my 
tests. I'm not 100% sure what the correct number is and it'll also depend on 
the architecture etc. so we can't be exact anyway.

If we do that we can maybe get rid of the size field in ShuffleIndexInformation 
to save a few more bytes per entry.

In effect this means that for small files we use up about 70-100 times as much 
memory as we intend to. Our NodeManagers OOM with 4GB and more of 
indexShuffleCache.

 

 


> Spark Shuffle Index Cache calculates memory usage wrong
> -------------------------------------------------------
>
>                 Key: SPARK-33206
>                 URL: https://issues.apache.org/jira/browse/SPARK-33206
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 2.4.0, 3.0.1
>            Reporter: Lars Francke
>            Priority: Major
>         Attachments: image001(1).png
>
>
> dSPARK-21501 changed the spark shuffle index service to be based on memory 
> instead of the number of files.
> Unfortunately, there's a problem with the calculation which is based on size 
> information provided by `ShuffleIndexInformation`.
> It is based purely on the file size of the cached file on disk.
> We're running in OOMs with very small index files (byte size ~16 bytes) but 
> the overhead of the ShuffleIndexInformation around this is much larger (e.g. 
> 184 bytes, see screenshot). We need to take this into account and should 
> probably add a fixed overhead of somewhere between 152 and 180 bytes 
> according to my tests. I'm not 100% sure what the correct number is and it'll 
> also depend on the architecture etc. so we can't be exact anyway.
> If we do that we can maybe get rid of the size field in 
> ShuffleIndexInformation to save a few more bytes per entry.
> In effect this means that for small files we use up about 70-100 times as 
> much memory as we intend to. Our NodeManagers OOM with 4GB and more of 
> indexShuffleCache.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to