thinkharderdev commented on issue #832: URL: https://github.com/apache/arrow-ballista/issues/832#issuecomment-1622093766
I agree with @yahoNanJing here. Having a default setting where task slots = cpu cores is a reasonable default, but different workloads have different constraints. Sometimes it is CPU, sometimes it is memory (eg if you do a lot of joins and high-cardinality aggregations) and could even be something like network bandwidth or disk size. Trying to "derive" the task slots from some static config values might just get confusing and complicated so seems like a more maintainable approach is to just have a sensible, easily explainable default and then let users configure different task concurrency based on their use case and whatever parameters they want to include. As mentioned this should be easily accomplished by having a script which picks the right task concurrency and then just runs the executor binary passing the correct value to the existing config. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
