alamb commented on code in PR #19129:
URL: https://github.com/apache/datafusion/pull/19129#discussion_r2600201502


##########
datafusion/pruning/src/pruning_predicate.rs:
##########
@@ -456,10 +456,28 @@ impl PruningPredicate {
     ///
     /// See the struct level documentation on [`PruningPredicate`] for more
     /// details.
-    pub fn try_new(expr: Arc<dyn PhysicalExpr>, schema: SchemaRef) -> 
Result<Self> {
-        // Get a (simpler) snapshot of the physical expr here to use with 
`PruningPredicate`
-        // which does not handle dynamic exprs in general
-        let expr = snapshot_physical_expr(expr)?;
+    ///
+    /// Note that `PruningPredicate` does not attempt to normalize or simplify

Review Comment:
   it seems to me like PruningPredicate does now actually call simplify 🤔 (if 
it is a snapshot )



##########
datafusion/pruning/src/pruning_predicate.rs:
##########
@@ -1545,6 +1588,50 @@ fn build_predicate_expression(
         .unwrap_or_else(|_| unhandled_hook.handle(expr))
 }
 
+/// Count of distinct column references in an expression.
+/// This is the same as [`collect_columns`] but optimized to stop counting
+/// once more than one distinct column is found.
+///
+/// For example, in expression `col1 + col2`, the count is `Many`.
+/// In expression `col1 + 5`, the count is `One`.
+/// In expression `5 + 10`, the count is `Zero`.
+///
+/// [`collect_columns`]: datafusion_physical_expr::utils::collect_columns
+#[derive(Debug, PartialEq, Eq)]
+enum ColumnReferenceCount {
+    /// no column references
+    Zero,
+    /// Only one column reference
+    One(phys_expr::Column),
+    /// More than one column reference
+    Many,
+}
+
+impl ColumnReferenceCount {
+    /// Count the number of distinct column references in an expression
+    fn from_expression(expr: &Arc<dyn PhysicalExpr>) -> Self {
+        let mut seen = HashSet::<phys_expr::Column>::new();
+        expr.apply(|expr| {
+            if let Some(column) = 
expr.as_any().downcast_ref::<phys_expr::Column>() {
+                seen.insert(column.clone());
+                if seen.len() > 1 {

Review Comment:
   I am surprised clippy didn't complain about this not using `is_empty` 🤔 



##########
datafusion/pruning/src/pruning_predicate.rs:
##########
@@ -456,10 +456,28 @@ impl PruningPredicate {
     ///
     /// See the struct level documentation on [`PruningPredicate`] for more
     /// details.
-    pub fn try_new(expr: Arc<dyn PhysicalExpr>, schema: SchemaRef) -> 
Result<Self> {
-        // Get a (simpler) snapshot of the physical expr here to use with 
`PruningPredicate`
-        // which does not handle dynamic exprs in general
-        let expr = snapshot_physical_expr(expr)?;
+    ///
+    /// Note that `PruningPredicate` does not attempt to normalize or simplify
+    /// the input expression.
+    /// It is recommended that you pass the expressions through 
[`PhysicalExprSimplifier`]
+    /// before calling this method to get the best results.
+    pub fn try_new(mut expr: Arc<dyn PhysicalExpr>, schema: SchemaRef) -> 
Result<Self> {
+        // Get a (simpler) snapshot of the physical expr here to use with 
`PruningPredicate`.
+        // In particular this unravels any `DynamicFilterPhysicalExpr`s by 
snapshotting them
+        // so that PruningPredicate can work with a static expression.
+        let tf = snapshot_physical_expr_opt(expr)?;
+        if tf.transformed {
+            // If we had an expression such as Dynamic(part_col < 5 and col < 
10)
+            // (this could come from something like `select * from t order by 
part_col, col, limit 10`)
+            // after snapshotting and because `DynamicFilterPhysicalExpr` 
applies child replacements to its
+            // children after snapshotting and previously 
`replace_columns_with_literals` may have been called with partition values
+            // the expression we have now is `8 < 5 and col < 10`.
+            // Thus we need as simplifier pass to get `false and col < 10` => 
`false` here.
+            let simplifier = PhysicalExprSimplifier::new(&schema);

Review Comment:
   Since this code is specific to dynamic expressions, maybe the call to 
simplify would make more sense in the `snapshot_physical_expr_opt` method 
itself?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to