alamb commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740855773
> I did not fully get this part. DF has semi join support and some rewrites to utilize it in similar cases? > The query transformation in SQL as given by @xudong963 is optimized to a SEMI join + TopK, so I think it could be implemented as logical optimization rule (i.e. adding a filter with subquery on the ids). @Dandandan -- I think it would be interesting to try and rewrite q23 manually to that pattern and see how it goes fast I suspect (but have not measured), if we implemented this rewrite we would find it runs much more slowly than the existing code because what would happen is that the entire input file (all columns) would be decoded and all but 10 rows are thrown away To avoid this we need to push the join filters into the scan (and get predicate pushdown on by default) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org