[ https://issues.apache.org/jira/browse/SPARK-33331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chandni Singh updated SPARK-33331: ---------------------------------- Summary: Limit the number of pending blocks in memory and store blocks that collide (was: Limit the number of pending blocks in memory when RemoteBlockPushResolver defers a block) > Limit the number of pending blocks in memory and store blocks that collide > -------------------------------------------------------------------------- > > Key: SPARK-33331 > URL: https://issues.apache.org/jira/browse/SPARK-33331 > Project: Spark > Issue Type: Sub-task > Components: Shuffle > Affects Versions: 3.1.0 > Reporter: Chandni Singh > Priority: Major > > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. There has > been a discussion around this here: > https://github.com/apache/spark/pull/30062#discussion_r514026014 > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org