Omega359 opened a new issue, #20002: URL: https://github.com/apache/datafusion/issues/20002
### Describe the bug While investigating #17261 it became apparent that one of the largest consumers of cpu time during planning of the sql_planner_extended benchmark is the PushDownFilter OptimizerRule. I've instrumented datafusion with some logging and ran the benchmark with the following cmd: `` The full output can be seen in [this gist](https://gist.github.com/Omega359/978e208b401f6af03fdf00fd8af63938) but below is the pertinent bit: ``` [2026-01-25T15:40:20Z INFO datafusion_optimizer::optimizer] Optimization (round 0) for rule push_down_limit took > 50ms: 90ms [2026-01-25T15:43:14Z INFO datafusion_optimizer::optimizer] Optimization (round 0) for rule push_down_filter took > 50ms: 174174ms [2026-01-25T15:43:15Z INFO datafusion_optimizer::optimizer] Optimization (round 0) for rule single_distinct_aggregation_to_group_by took > 50ms: 164ms [2026-01-25T15:43:15Z INFO datafusion_optimizer::optimizer] Optimization (round 0) for rule eliminate_group_by_constant took > 50ms: 159ms [2026-01-25T15:43:16Z INFO datafusion_optimizer::optimizer] Optimization (round 0) for rule common_sub_expression_eliminate took > 50ms: 1313ms [2026-01-25T15:43:20Z INFO datafusion_optimizer::optimizer] Optimization (round 0) for rule optimize_projections took > 50ms: 3389ms ``` As you can see quite a few optimizer rules are using too much cpu for planning however the push_down_filter is the most egregious taking 174 seconds to complete. You can see from a screenshot of the output of samply where it seems most of that time is going. <img width="2476" height="1522" alt="Image" src="https://github.com/user-attachments/assets/f20ffad0-1466-487c-a8b7-44c5cafed3ec" /> <img width="3812" height="1995" alt="Image" src="https://github.com/user-attachments/assets/3f785989-98d8-4868-a01b-532ac356a2a9" /> ### To Reproduce `RUST_LOG=info cargo samply --profile=release-nonlto --bench sql_planner_extended -- --nocapture --sample-size 10` Branch with profiling log @ https://github.com/Omega359/arrow-datafusion/tree/profile_optimize ### Expected behavior Plan optimization should not be exponentially slow for some logical plans. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
