AdamGS opened a new pull request, #20234:
URL: https://github.com/apache/datafusion/pull/20234

   ## Which issue does this PR close?
   
   - Closes #20134 .
   
   ## Rationale for this change
   
   The check for simplifying const expressions was recursive and expensive, 
repeatedly checking the expression's children in a recursive way. I've tried 
other approached like pre-computing the result for all expressions outside of 
the loop and using that cache during the traversal, but I've found that it only 
yielded between 5-8% improvement while adding complexity, while this approach 
simplifies the code and has shown to be more performant in my benchmarks 
(change is compared to current main branch):
   ```
   tpc-ds/q76/cs/16        time:   [27.112 µs 27.159 µs 27.214 µs]
                           change: [−13.533% −13.167% −12.801%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     1 (1.00%) low mild
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   tpc-ds/q76/ws/16        time:   [26.175 µs 26.280 µs 26.394 µs]
                           change: [−14.312% −13.833% −13.346%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low mild
   
   tpc-ds/q76/cs/128       time:   [195.79 µs 196.17 µs 196.56 µs]
                           change: [−14.362% −14.080% −13.816%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) low severe
     1 (1.00%) low mild
     3 (3.00%) high mild
   
   tpc-ds/q76/ws/128       time:   [197.08 µs 197.61 µs 198.23 µs]
                           change: [−13.531% −13.142% −12.737%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
   ```
   
   ## What changes are included in this PR?
   
   1. `simplify_const_expr` now only checks itself and whether all of its 
children are literals, because it assumes the order of simplification is 
bottoms-up.
   2. Removes some code from the public API, see the last section for the full 
details.
   
   ## Are these changes tested?
   
   Existing test suite
   
   ## Are there any user-facing changes?
   
   I suggest removing some of the physical expression simplification code from 
the public API, which I believe reduces the maintenance burden here. These 
changes also helps removing code like the distinct `simplify_const_expr` and 
`simplify_const_expr_with_dummy`. 
   
   1. Makes all `datafusion-physical-expr::simplifier` sub-modules (`not` and 
`const_evaluator`) private, including their key functions. They are not used 
externally, and being able to change their behavior seems more valuable long 
term. The simplifier is also not currently an extension point as far as I can 
tell, so there's no value in providing atomic building blocks like them for now.
   2. Removes `has_column_references` completely, its trivial to re-implement 
and isn't used anywhere in the codebase.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to