alamb opened a new pull request, #16731: URL: https://github.com/apache/datafusion/pull/16731
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Related to of https://github.com/apache/datafusion/issues/3463 - Closes https://github.com/apache/datafusion/issues/16729 ## Rationale for this change In order to enable `filter_pushdown` by default, we need to ensure it doesn't regress existing performance However, it has been very hard to make forward progress on improving filter pushdown because all our benchmarks compare filter pushdown to not filter pushdown, so the bar for change is quite high. Here is the most recent example: - https://github.com/apache/datafusion/pull/16711 It seems obvious but the the right metric for improvements to the filter pushdown are comparing when filter pushdown is already on. However, we don't have any such benchmark (see https://github.com/apache/datafusion/issues/16729 and https://github.com/apache/datafusion/pull/16730 for why the existing benchmarks are not good enough) ## What changes are included in this PR? Add a benchmark (clickbench_pushdown) that turns on filter_pushdown and reorder_filters on You can run it like this: ```shell `./benchmarks/bench.sh run clickbench_pushdown ``` Which then invokes ```shell + cargo run --release --bin dfbench -- clickbench --pushdown --iterations 5 --path /Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned --queries-path /Users/andrewlamb/Software/datafusion/benchmarks/queries/clickbench/queries -o /Users/andrewlamb/Software/datafusion/benchmarks/results/alamb_new_filter_pushdown/clickbench_partitioned.json ``` ## Are these changes tested? I tested it manually and also did some profiling on Q30 to verify that filter pushdown is indeed being invoked (TODO picture) ## Are there any user-facing changes? No this is a development process change only -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org