Re: [I] Make ClickBench Q23 Go Faster [datafusion]

via GitHub Thu, 20 Mar 2025 08:29:35 -0700


alamb commented on issue #15177:
URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740855773


   > I did not fully get this part. DF has semi join support and some rewrites 
to utilize it in similar cases?
   
   > The query transformation in SQL as given by @xudong963 is optimized to a 
SEMI join + TopK, so I think it could be implemented as logical optimization 
rule (i.e. adding a filter with subquery on the ids).
   
   @Dandandan  -- 
   
   
   I think it would be interesting to try and rewrite q23 manually to that 
pattern and see how it goes fast
   
   I suspect (but have not measured), if we implemented this rewrite we would 
find it runs much more slowly than the existing code because  what would happen 
is that the entire input file (all columns) would be decoded and all but 10 
rows are thrown away
   
   To avoid this we need to push the join filters into the scan (and get 
predicate pushdown on by default)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

Reply via email to