Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18388
  
    Ideally we do both. 
    
    I think https://github.com/apache/spark/pull/18487 already will help you on 
the reducer side. It allows you to limit the # of blocks its fetching in one 
call.  So you should see max 1k * spark.reducer.maxBlocksInFlightPerAddress at 
any given time.  say you set it to 20,  20*1k is much better then 3.5M, even at 
the 5k you mention, thats only 100k. Note Mapreduce has a very similar concept 
and that is set to 20 and has been working well for many years.  I think this 
should be easier to tune from the other configs as you are still fetching from 
the same # of hosts you were before, its just chunking it.  We've done some 
testing with that and haven't seen any performance differences between Int.max 
and say 20. We are still doing more perf testing as well.
    
    We an also add flow control on the shuffle server side where we only create 
a configurable number of outbound buffers before waiting for them to be sent.  
once one is sent you create another.  This might be a bit more work on the 
shuffle server but haven't looked in detail.
    
    I assume in your case the 1k reducers were spread across multiple different 
jobs?  Is it always the same big jobs that cause issues?  If it was the same 
job I would use spark.reducer.maxReqsInFlight temporarily.  We use that for 
some of our larger jobs that were causing issues and didn't see any performance 
impact set to 100 buts its going to be job specific.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to