Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18388 Ideally we do both. I think https://github.com/apache/spark/pull/18487 already will help you on the reducer side. It allows you to limit the # of blocks its fetching in one call. So you should see max 1k * spark.reducer.maxBlocksInFlightPerAddress at any given time. say you set it to 20, 20*1k is much better then 3.5M, even at the 5k you mention, thats only 100k. Note Mapreduce has a very similar concept and that is set to 20 and has been working well for many years. I think this should be easier to tune from the other configs as you are still fetching from the same # of hosts you were before, its just chunking it. We've done some testing with that and haven't seen any performance differences between Int.max and say 20. We are still doing more perf testing as well. We an also add flow control on the shuffle server side where we only create a configurable number of outbound buffers before waiting for them to be sent. once one is sent you create another. This might be a bit more work on the shuffle server but haven't looked in detail. I assume in your case the 1k reducers were spread across multiple different jobs? Is it always the same big jobs that cause issues? If it was the same job I would use spark.reducer.maxReqsInFlight temporarily. We use that for some of our larger jobs that were causing issues and didn't see any performance impact set to 100 buts its going to be job specific.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org