drin opened a new pull request, #18648:
URL: https://github.com/apache/datafusion/pull/18648

   This decomposes a custom optimizer rule into the datafusion expression 
simplifier (work-in-progress).
   
   ## Which issue does this PR close?
   Closes #18319.
   
   ## Rationale for this change
   To transform binary expressions that compare `date_trunc` with a constant 
value into a form that can be better utilized (improved performance).
   
   For Bauplan, we can see the following (approximate average over a handful of 
runs):
   
   Q1:
   ```sql
   SELECT PULocationID, trip_miles, tips
     FROM taxi_fhvhv
    WHERE date_trunc('month', pickup_datetime) <= '2025-01-08'::DATE
   ```
   
   Q2:
   ```sql
   SELECT PULocationID, trip_miles, tips
     FROM taxi_fhvhv
    WHERE pickup_datetime < date_trunc('month', '2025-02-08'::DATE)
   ```
   
   |  Query  | Time (s) | Options |
   | ------- | -------- | -------- |
   |    Q1     |    ~3    |  no cache, optimization enabled |
   |    Q1     |   ~35   |  no cache, optimization disabled |
   |    Q2     |    ~3    |  no cache, optimization enabled |
   |    Q2     |    ~3    |  no cache, optimization disabled |
   
   ## What changes are included in this PR?
   A few additional support functions and additional match arms in the 
simplifier match expression.
   
   ## Are these changes tested?
   
   Our custom rule has tests of the expression transformations and for correct 
evaluation results. These will be added to the PR after the implementation is 
in approximately good shape.
   
   ## Are there any user-facing changes?
   Better performance and occasionally confusing explain plan. In short, a 
`date_trunc('month', col) = '2025-12-03'::DATE` will always be false (because 
the truncation result can never be a non-truncated value), which may produce an 
unexpected expression (`false`).
   
   Explain plan details below (may be overkill but it was fun to figure out):
   
   Initial query:
   ```sql
   SELECT  PULocationID
              ,pickup_datetime
         FROM taxi_view_2025
        WHERE date_trunc('month', pickup_datetime) = '2025-12-03'
   ```
   
   After simplify_expressions:
   ```sql
   logical_plan after simplify_expressions                    | Projection: 
taxi_view_2025.PULocationID, taxi_view_2025.pickup_datetime                     
                                                                       |
   |                                                            |   Filter: 
date_trunc(Utf8("month"), CAST(taxi_view_2025.pickup_datetime AS 
Timestamp(Nanosecond, None))) = TimestampNanosecond(1764720000000000000, None)  
        |
   |                                                            |     
TableScan: taxi_view_2025
   ```
   
   Before and after `date_trunc_optimizer` (our custom rule):
   ```sql
   logical_plan after optimize_projections                    | Filter: 
date_trunc(Utf8("month"), CAST(taxi_view_2025.pickup_datetime AS 
Timestamp(Nanosecond, None))) = TimestampNanosecond(1764720000000000000, None)  
          |
   |                                                            |   TableScan: 
taxi_view_2025 projection=[PULocationID, pickup_datetime]                       
                                                                      |
   | logical_plan after date_trunc_optimizer                    | Filter: 
Boolean(false)                                                                  
                                                                           |
   |                                                            |   TableScan: 
taxi_view_2025 projection=[PULocationID, pickup_datetime]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to