Omega359 opened a new issue, #20002:
URL: https://github.com/apache/datafusion/issues/20002

   ### Describe the bug
   
   While investigating #17261 it became apparent that one of the largest 
consumers of cpu time during planning of the sql_planner_extended benchmark is 
the PushDownFilter OptimizerRule. I've instrumented datafusion with some 
logging and ran the benchmark with the following cmd:
   ``
   The full output can be seen in [this 
gist](https://gist.github.com/Omega359/978e208b401f6af03fdf00fd8af63938) but 
below is the pertinent bit:
   ```
   [2026-01-25T15:40:20Z INFO  datafusion_optimizer::optimizer] Optimization 
(round 0) for rule push_down_limit took > 50ms: 90ms
   [2026-01-25T15:43:14Z INFO  datafusion_optimizer::optimizer] Optimization 
(round 0) for rule push_down_filter took > 50ms: 174174ms
   [2026-01-25T15:43:15Z INFO  datafusion_optimizer::optimizer] Optimization 
(round 0) for rule single_distinct_aggregation_to_group_by took > 50ms: 164ms
   [2026-01-25T15:43:15Z INFO  datafusion_optimizer::optimizer] Optimization 
(round 0) for rule eliminate_group_by_constant took > 50ms: 159ms
   [2026-01-25T15:43:16Z INFO  datafusion_optimizer::optimizer] Optimization 
(round 0) for rule common_sub_expression_eliminate took > 50ms: 1313ms
   [2026-01-25T15:43:20Z INFO  datafusion_optimizer::optimizer] Optimization 
(round 0) for rule optimize_projections took > 50ms: 3389ms
   
   ```
   As you can see quite a few optimizer rules are using too much cpu for 
planning however the push_down_filter is the most egregious taking 174 seconds 
to complete. You can see from a screenshot of the output of samply where it 
seems most of that time is going.
   
   <img width="2476" height="1522" alt="Image" 
src="https://github.com/user-attachments/assets/f20ffad0-1466-487c-a8b7-44c5cafed3ec";
 />
   
   <img width="3812" height="1995" alt="Image" 
src="https://github.com/user-attachments/assets/3f785989-98d8-4868-a01b-532ac356a2a9";
 />
   
   ### To Reproduce
   
   `RUST_LOG=info cargo samply --profile=release-nonlto --bench 
sql_planner_extended -- --nocapture --sample-size 10`
   
   Branch with profiling log @ 
https://github.com/Omega359/arrow-datafusion/tree/profile_optimize
   
   ### Expected behavior
   
   Plan optimization should not be exponentially slow for some logical plans.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to