Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18388 ok sorry I forgot you had the screenshot there. so as you mention in that post if we are just creating to many outboundbuffers before they can actual be sent over the network then we should try to add some flow control. did you check to see what the buffers were for? How many connections did you have any how many blocks was each fetching? a million is a lot either way. but I'm assuming its something like you had 500 connections each fetching 2000 blocks. If that is the case it seems like it would be good to add flow control here rather then just disconnecting based on memory. really having both would be good, this as a fall back, but the flow control part should allow everyone to start fetching without rejecting a bunch, especially if the network can't push it out that fast anyway. For instance only create a handful of those outgoing buffers and wait to get successfully sent messages back for the those before creating more. This might be a bit more complex
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org