thinkharderdev commented on issue #832:
URL: https://github.com/apache/arrow-ballista/issues/832#issuecomment-1622093766

   I agree with @yahoNanJing here. Having a default setting where task slots = 
cpu cores is a reasonable default, but different workloads have different 
constraints. Sometimes it is CPU, sometimes it is memory (eg if you do a lot of 
joins and high-cardinality aggregations) and could even be something like 
network bandwidth or disk size. Trying to "derive" the task slots from some 
static config values might just get confusing and complicated so seems like a 
more maintainable approach is to just have a sensible, easily explainable 
default and then let users configure different task concurrency based on their 
use case and whatever parameters they want to include. As mentioned this should 
be easily accomplished by having a script which picks the right task 
concurrency and then just runs the executor binary passing the correct value to 
the existing config.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to