[ https://issues.apache.org/jira/browse/SPARK-33331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226531#comment-17226531 ]
wuyi edited comment on SPARK-33331 at 11/5/20, 7:12 AM: -------------------------------------------------------- I like the idea to cache the blocks of the worst case instead of throwing it away as long as we have the memory threshold(either memory size or block number). And we can always fallback to the original way whenever we set the threshold to 0. Another problem may be, when should the client retry the block after we have the memory cache? Shall we retry it immediately or wait for a few seconds regarding the number of deferred blocks? was (Author: ngone51): I like the idea to cache the blocks of the worst case instead of throwing it away as long as we have the memory threshold(either memory size or block number). And we can always fallback to the original way whenever they set the threshold to 0. Another problem may be, when should the client retry the block after we have the memory cache? Shall we retry it immediately or wait for a few seconds regarding the number of deferred blocks? > Limit the number of pending blocks in memory and store blocks that collide > -------------------------------------------------------------------------- > > Key: SPARK-33331 > URL: https://issues.apache.org/jira/browse/SPARK-33331 > Project: Spark > Issue Type: Sub-task > Components: Shuffle > Affects Versions: 3.1.0 > Reporter: Chandni Singh > Priority: Major > > This jira addresses the below two points: > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. We can limit > the number of pending blocks in memory. There has been a discussion around > this here: > [https://github.com/apache/spark/pull/30062#discussion_r514026014] > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. > More information can be found in this discussion: > [https://github.com/apache/spark/pull/30062#discussion_r517524546] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org