adriangb commented on code in PR #18644:
URL: https://github.com/apache/datafusion/pull/18644#discussion_r2544491321


##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -418,6 +504,13 @@ pub struct AggregateExec {
     /// Describes how the input is ordered relative to the group by columns
     input_order_mode: InputOrderMode,
     cache: PlanProperties,
+    /// During initialization, if the plan supports dynamic filtering (see 
[`AggrDynFilter`]),
+    /// it is set to `Some(..)` regardless of whether it can be pushed down to 
a child node.
+    ///
+    /// During filter pushdown optimization, if a child node can accept this 
filter,
+    /// it remains `Some(..)` to enable dynamic filtering during aggregate 
execution;
+    /// otherwise, it is cleared to `None`.

Review Comment:
   > One thing I feel also ambiguous is when a child replies, (e.g. for 
parquet) do we differentiate if they're using the dynamic filter for stat 
pruning, or evaluate filter row by row
   
   That's the crux of the issue: 
https://github.com/apache/datafusion/blob/068f96f04906dac223246d6f3608196f194b864f/datafusion/datasource-parquet/src/source.rs#L767-L774
   
   So currently Parquet says `No` even when it stores a copy of the filters for 
stats pruning. So for this PR we should change it to always update the filters 
even if the child responded `No`.
   
   If Parquet responded `Yes` instead it would possibly lead to incorrect 
results if it can't actually evaluate the filter row by row.
   
   For operators like `HashJoinExec` that may want to push down an expression 
that cannot be used for stats pruning the "wasted effort" would be if they 
still do compute to update the filters but then ParquetSource can't use them 
for stats pruning and doesn't push them down row-by-row.
   
   So in summary:
   1. I think this PR should always push down the filters even if the answer it 
got back is `No` because ParquetSource may still use them for stats pruning.
   2. We should make a ticket to improve the system to have 3 states 
`Exact/Inexact/Unsupported`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to