berkaysynnada commented on PR #15566:
URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2792203825
> The TLDR is that because each filter may be allowed or not, and maybe
transformed, to be pushed down into each child we end up with a matrix of
filters x children that we need to ask the node for. For example, imagine
joins, each of its children needs to be treated differently and only some
filters can be pushed into some children. That meant that as you suggest it had
to be at least 2 methods on ExecutionPlan (2 with complex APIs or 3 simpler
ones).
You can properly (and IMO more easily) apply the pushdown logic in the rule
recursion as well. There are many rules doing that.
> The recursion is also a bit wonky because you need to:
>
> 1. Recurse down and on your way up transmit not only the new nodes but
also the filter support result.
That's exactly the intended usage of PlanContext
> 2. There's jumps, eg when you hit a node that doesn't support pushdown but
still want to recurse into sub-trees.
That's not a problem as well in transform_down() when you say
Recursion::Continue and Transformed::No
> 3. You're carrying around non-trivial context as you go down.
That's now a problem as well in PlanContext.
> Having implemented it both ways I've come to the conclusion that trying to
generalize the recursion into the optimizer rules makes things harder for both
the simple cases (FilterExec, RepartitionExec) and the complex cases
(HashJoinExec, ProjectionExec). Doing it this way makes the simple
implementations trivial and the complex ones will be complex but will be self
contained and not have to fit into some rigid APIs.
You already keep the operator specific parts in the ExecutionPlan API. For
example if I give an example on the simplest case:
```rust
pub fn try_pushdown_filters_to_input(
plan: &Arc<dyn ExecutionPlan>,
input: &Arc<dyn ExecutionPlan>,
parent_filters: &[PhysicalExprRef],
config: &ConfigOptions,
) -> Result<FilterPushdownResult<Arc<dyn ExecutionPlan>>> {
match input.try_pushdown_filters(input, parent_filters, config)? {
FilterPushdownResult::NotPushed => {
// No pushdown possible, keep this child as is
Ok(FilterPushdownResult::NotPushed)
}
FilterPushdownResult::Pushed {
updated: inner,
support,
} => {
// We have a child that has pushed down some filters
let new_inner =
with_new_children_if_necessary(Arc::clone(plan),
vec![inner])?;
Ok(FilterPushdownResult::Pushed {
updated: new_inner,
support,
})
}
}
}
```
Here, you will not need to do the recursive call, and the last part of
with_new_children...() logic. As you've mentioned, maybe a single API wouldn't
be enough, but perhaps having 2 or 3 is what we really need to comply
separation of concerns.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]