Dandandan commented on issue #21719: URL: https://github.com/apache/datafusion/issues/21719#issuecomment-4281659435
It probably needs some metric to see how often a thread is being moved beyond cores with the current approach. I think with some metric/example to showcase this in different scenario's would help first, perhaps simple scan+filter+aggregate/joins queries (like Clickbench) have very good thread locality "by accident" as the tasks probably will stick to a single thread anyway, but perhaps in more nested scenario's (e.g. partitions > cores) I feel like there might be suboptimal cases where we just spawn too much tasks at once. But perhaps it doesn't need "morsel scheduler" per se for that, just better pipelining. > I think the biggest improvement would be to avoid having to move IO across cores (aka do IO and then read it , rather than stalling having to wait for IO from some other thread/core on a blocking thread) Yes, for scan performance, agreed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
