2010YOUY01 commented on code in PR #18789:
URL: https://github.com/apache/datafusion/pull/18789#discussion_r2544175337


##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1956,6 +1962,36 @@ impl<S: SimplifyInfo> TreeNodeRewriter for 
Simplifier<'_, S> {
                 }))
             }
 
+            // =======================================
+            // unwrap_date_part_in_comparison
+            // =======================================
+            //
+            // For case:
+            // date_part(expr as data_type) op literal
+            Expr::BinaryExpr(BinaryExpr { left, op, right })

Review Comment:
   ```suggestion
               //
               // Background:
               // Datasources such as Parquet can prune partitions using simple 
predicates,
               // but they cannot do so for complex expressions.
               // For a complex predicate like `date_part('YEAR', c1) < 2000`, 
pruning is not possible.
               // After rewriting it to `c1 < 2000-01-01`, pruning becomes 
feasible.
               Expr::BinaryExpr(BinaryExpr { left, op, right })
   ```



##########
datafusion/optimizer/src/simplify_expressions/unwrap_date_part.rs:
##########
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::datatypes::{DataType, TimeUnit};
+use chrono::NaiveDate;
+use datafusion_common::{
+    internal_err, tree_node::Transformed, DataFusionError, Result, ScalarValue,
+};
+use datafusion_expr::{
+    and, expr::ScalarFunction, lit, or, simplify::SimplifyInfo, BinaryExpr, 
Expr,
+    Operator,
+};
+
+pub(super) fn unwrap_date_part_in_comparison_for_binary<S: SimplifyInfo>(
+    info: &S,
+    cast_expr: Expr,
+    literal: Expr,
+    op: Operator,
+) -> Result<Transformed<Expr>> {
+    let (args, lit_value) = match (cast_expr, literal) {
+        (
+            Expr::ScalarFunction(ScalarFunction { func, args }),
+            Expr::Literal(lit_value, _),
+        ) if func.name() == "date_part" => (args, lit_value),

Review Comment:
   I wish we can have a safer way to check it in the future. Related: 
https://github.com/apache/datafusion/issues/18643



##########
datafusion/optimizer/src/simplify_expressions/unwrap_date_part.rs:
##########
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::datatypes::{DataType, TimeUnit};
+use chrono::NaiveDate;
+use datafusion_common::{
+    internal_err, tree_node::Transformed, DataFusionError, Result, ScalarValue,
+};
+use datafusion_expr::{
+    and, expr::ScalarFunction, lit, or, simplify::SimplifyInfo, BinaryExpr, 
Expr,
+    Operator,
+};
+
+pub(super) fn unwrap_date_part_in_comparison_for_binary<S: SimplifyInfo>(
+    info: &S,
+    cast_expr: Expr,
+    literal: Expr,
+    op: Operator,
+) -> Result<Transformed<Expr>> {
+    let (args, lit_value) = match (cast_expr, literal) {
+        (
+            Expr::ScalarFunction(ScalarFunction { func, args }),
+            Expr::Literal(lit_value, _),
+        ) if func.name() == "date_part" => (args, lit_value),
+        _ => return internal_err!("Expect date_part expr and literal"),
+    };
+    let expr = Box::new(args[1].clone());
+
+    let Ok(expr_type) = info.get_data_type(&expr) else {
+        return internal_err!("Can't get the data type of the expr {:?}", 
&expr);
+    };
+
+    // Helper to cast literal
+    let cast_year = |updated_year: &ScalarValue| -> Result<ScalarValue, 
DataFusionError> {
+        year_literal_to_type(updated_year, &expr_type).ok_or_else(|| {
+            DataFusionError::Internal(format!(

Review Comment:
   nit: I think macro `internal_datafusion_err!` can be used here.



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1956,6 +1962,36 @@ impl<S: SimplifyInfo> TreeNodeRewriter for 
Simplifier<'_, S> {
                 }))
             }
 
+            // =======================================
+            // unwrap_date_part_in_comparison
+            // =======================================
+            //
+            // For case:
+            // date_part(expr as data_type) op literal

Review Comment:
   It would be great to specify all the supported patterns here, like only 
extracting years, and what's the supported `op` and `literal` types.



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1956,6 +1962,36 @@ impl<S: SimplifyInfo> TreeNodeRewriter for 
Simplifier<'_, S> {
                 }))
             }
 
+            // =======================================
+            // unwrap_date_part_in_comparison
+            // =======================================
+            //
+            // For case:
+            // date_part(expr as data_type) op literal
+            Expr::BinaryExpr(BinaryExpr { left, op, right })
+                if 
is_date_part_expr_and_support_unwrap_date_part_in_comparison_for_binary(
+                    info, &left, op, &right,
+                ) && op.supports_propagation() =>

Review Comment:
   I don't get the purpose of `supports_propagation()` here, after reading its 
source comments. Perhaps we can improve the documentation later.



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1956,6 +1962,36 @@ impl<S: SimplifyInfo> TreeNodeRewriter for 
Simplifier<'_, S> {
                 }))
             }
 
+            // =======================================

Review Comment:
   First to make sure I get the issue correctly:
   - The current `ScalarUDFImpl::simplify()` is not working, because it's 
recognizing patterns that the scalar function is the expression root, and swap 
this scalar function root to any other equivalent expression
   - Because we're rewriting expressions like `EXTRACT (YEAR FROM k) = 2024`, 
its  expr tree root node is `=`. If a similar `simplify()` API is available on 
operators like `=`, then we can implement such rewrite easier. 
   
   Probably we can refactor all the operators like `+, -, <` to use `ScalarUDF` 
interface? Fundamentally they're the all a node in the expression tree.
   The refactoring process can get a bit painful, but it's appealing to see a 
unified interface for all the expressions in the long term.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to