adriangb commented on code in PR #19129:
URL: https://github.com/apache/datafusion/pull/19129#discussion_r2600271925


##########
datafusion/pruning/src/pruning_predicate.rs:
##########
@@ -456,10 +456,28 @@ impl PruningPredicate {
     ///
     /// See the struct level documentation on [`PruningPredicate`] for more
     /// details.
-    pub fn try_new(expr: Arc<dyn PhysicalExpr>, schema: SchemaRef) -> 
Result<Self> {
-        // Get a (simpler) snapshot of the physical expr here to use with 
`PruningPredicate`
-        // which does not handle dynamic exprs in general
-        let expr = snapshot_physical_expr(expr)?;
+    ///
+    /// Note that `PruningPredicate` does not attempt to normalize or simplify
+    /// the input expression.
+    /// It is recommended that you pass the expressions through 
[`PhysicalExprSimplifier`]
+    /// before calling this method to get the best results.
+    pub fn try_new(mut expr: Arc<dyn PhysicalExpr>, schema: SchemaRef) -> 
Result<Self> {
+        // Get a (simpler) snapshot of the physical expr here to use with 
`PruningPredicate`.
+        // In particular this unravels any `DynamicFilterPhysicalExpr`s by 
snapshotting them
+        // so that PruningPredicate can work with a static expression.
+        let tf = snapshot_physical_expr_opt(expr)?;
+        if tf.transformed {
+            // If we had an expression such as Dynamic(part_col < 5 and col < 
10)
+            // (this could come from something like `select * from t order by 
part_col, col, limit 10`)
+            // after snapshotting and because `DynamicFilterPhysicalExpr` 
applies child replacements to its
+            // children after snapshotting and previously 
`replace_columns_with_literals` may have been called with partition values
+            // the expression we have now is `8 < 5 and col < 10`.
+            // Thus we need as simplifier pass to get `false and col < 10` => 
`false` here.
+            let simplifier = PhysicalExprSimplifier::new(&schema);

Review Comment:
   Hmm interesting. I think maybe best to keep things as is. E.g. if you're 
going to evaluate the expression against data (as opposed to doing the kind of 
weird stuff PruningPredicate does) then maybe you don't want to pay the 
simplify cost?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to