[ 
https://issues.apache.org/jira/browse/SPARK-33331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-33331:
----------------------------------
    Description: 
This jira addresses the below two points:
 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are 
stored in memory. The stream callback maintains a list of {{deferredBufs}}. 
When a block cannot be merged it is added to this list. Currently, there isn't 
a limit on the number of pending blocks. We can limit the number of pending 
blocks in memory. There has been a discussion around this here:
[https://github.com/apache/spark/pull/30062#discussion_r514026014]

2. When a stream doesn't get an opportunity to merge, then 
{{RemoteBlockPushResolver}} ignores the data from that stream. Another approach 
is to store the data of the stream in {{AppShufflePartitionInfo}} when it 
reaches the worst-case scenario. This may increase the memory usage of the 
shuffle service though. However, given a limit introduced with 1 we can try 
this out.
 More information can be found in this discussion:
 [https://github.com/apache/spark/pull/30062#discussion_r517524546]

  was:
This jira addresses the below two points:
 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are 
stored in memory. The stream callback maintains a list of {{deferredBufs}}. 
When a block cannot be merged it is added to this list. Currently, there isn't 
a limit on the number of pending blocks. We can limit the number of pending 
blocks in memory. There has been a discussion around this here:
 [https://github.com/apache/spark/pull/30062#discussion_r514026014
]

2. When a stream doesn't get an opportunity to merge, then 
{{RemoteBlockPushResolver}} ignores the data from that stream. Another approach 
is to store the data of the stream in {{AppShufflePartitionInfo}} when it 
reaches the worst-case scenario. This may increase the memory usage of the 
shuffle service though. However, given a limit introduced with 1 we can try 
this out.
 More information can be found in this discussion:
 [https://github.com/apache/spark/pull/30062#discussion_r517524546]


> Limit the number of pending blocks in memory and store blocks that collide
> --------------------------------------------------------------------------
>
>                 Key: SPARK-33331
>                 URL: https://issues.apache.org/jira/browse/SPARK-33331
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Shuffle
>    Affects Versions: 3.1.0
>            Reporter: Chandni Singh
>            Priority: Major
>
> This jira addresses the below two points:
>  1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately 
> are stored in memory. The stream callback maintains a list of 
> {{deferredBufs}}. When a block cannot be merged it is added to this list. 
> Currently, there isn't a limit on the number of pending blocks. We can limit 
> the number of pending blocks in memory. There has been a discussion around 
> this here:
> [https://github.com/apache/spark/pull/30062#discussion_r514026014]
> 2. When a stream doesn't get an opportunity to merge, then 
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another 
> approach is to store the data of the stream in {{AppShufflePartitionInfo}} 
> when it reaches the worst-case scenario. This may increase the memory usage 
> of the shuffle service though. However, given a limit introduced with 1 we 
> can try this out.
>  More information can be found in this discussion:
>  [https://github.com/apache/spark/pull/30062#discussion_r517524546]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to