wesm edited a comment on pull request #10420:
URL: https://github.com/apache/arrow/pull/10420#issuecomment-857717570


   I'm not able to wade deeply into the implementation details, but in general 
I'm excited about deploying work stealing more broadly in the codebase as a 
means of managing nested parallelism. This is particularly relevant not only 
for query execution but also controlling resource fairness between 
concurrently-executing file readers (CSV, Parquet, etc.) or IPC message 
decompression (which can also be parallelized), and anywhere else where we have 
introduced parallelism. 
   
   That said, there isn't necessarily a *requirement* that everything be made 
work-stealing. Rather, any workload (like reading a file) should have access to 
a non-global task queue to put its child tasks into. Whether the task queue 
happens to be global or not (versus a thread-local queue where idle threads 
will steal work) is up to the application developer. I imagine to reach this 
goal will require some refactoring from where we are now, but it seems to me — 
from first principles — like the right way to go. Let me know if this is 
consistent with your thinking (since I'm thinking out loud a little bit myself 
and you've spent more time thinking about this and working on the 
parallel/asynchronous computing machinery around the codebase in recent times).
   
   Might be good to discuss this on the mailing list, too, for increased 
visibility. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to