darmie commented on issue #20324:
URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3941410790

   > One other direction I am exploring is to see if morsel-driven execution 
can help here.
   > 
   > One hypothesis is that filter pushdown pushes more CPU work (especially in 
the case of dynamic queries) and serial IO (i.e. each individual RowFilter) + 
some additional overhead so slow / skewed partitions will become even more slow.
   > 
   > With morsel-driven execution we might be able to mitigate this effect, as 
we can distribute the work better by planning the work using a queue (and so 
any overhead or file IO latencies will be spread out more).
   > 
   > PoC is here [#20477](https://github.com/apache/datafusion/pull/20477) - it 
seems it gives quite a bit of speedups on Clickbench(!) (without filter 
pushdown) though I see some large slowdowns on TPCH SF10 as well, probably as 
it doesn't benefit much (as far as I remember data / filters are perfectly 
distributed and files seem to contain many row groups) and probably hurts 
locality as implemented.
   
   Is the TPC-H regression purely cache locality, or is there queue contention 
overhead too? Curious whether the ClickBench speedups hold when combined with 
pushdown enabled,  the interaction between morsel scheduling and filter-induced 
work variance could compound.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to