Re: [I] Create a pipeline/morsel scheduler [datafusion]

via GitHub Mon, 20 Apr 2026 07:29:54 -0700


Dandandan commented on issue #21719:
URL: https://github.com/apache/datafusion/issues/21719#issuecomment-4281659435


   It probably needs some metric to see how often a thread is being moved 
beyond cores with the current approach.
   
   I think with some metric/example to showcase this in different scenario's 
would help first, perhaps simple scan+filter+aggregate/joins queries (like 
Clickbench) have very good thread locality "by accident" as the tasks probably 
will stick to a single thread anyway, but perhaps in more nested scenario's 
(e.g. partitions > cores) I feel like there might be suboptimal cases where we 
just spawn too much tasks at once.
   
   But perhaps it doesn't need "morsel scheduler" per se for that, just better 
pipelining.
   
   > I think the biggest improvement would be to avoid having to move IO across 
cores (aka do IO and then read it , rather than stalling having to wait for IO 
from some other thread/core on a blocking thread)
   
   Yes, for scan performance, agreed!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Create a pipeline/morsel scheduler [datafusion]

Reply via email to