realno commented on pull request #1729:
URL:
https://github.com/apache/arrow-datafusion/pull/1729#issuecomment-1031867012
Sounds good - will proceed with the optimizer route.
@alamb
> In terms of the planning overhead, I agree it isn't ideal, though I think
we can improve things over time by consolidating several of the optimizer
passes together
There are a few things I noticed that are not ideal, I will update the PR
later we can discuss in more details. I think it is a good idea to have a plan
to improve over time - I will create the issues after discussion. Here are the
things I noticed for now:
1. For at least aggregate functions some traits/structs from `use
crate::physical_plan::aggregates` are leaked into the logical planning phase,
e.g. `fun` in this code block
```
match expr {
Expr::AggregateFunction {
fun,
args,
distinct,
} => {
let mut new_args = args.clone();
let mut new_func = fun.clone();
if fun == &aggregates::AggregateFunction::ApproxMedian {
new_args.push(Expr::Literal(ScalarValue::Float64(Some(0.5_f64))));
new_func =
aggregates::AggregateFunction::ApproxPercentileCont;
}
```
2. Changing functions also includes rewriting Projections and Aliases, this
is very tedious especially some util functions are not public and it replicates
part of the work for building the plan. Also it may potentially have conflicts
with other optimize rules if not super careful. Ideally these kind of of
rewrites could happen before building the logical plan, maybe we can introduce
a pre-build phase? I am working the part handling Projection and Alias, it'll
be more clear looking at the code - I am using string replacement.
> If we need more drastic performance improvements I have wanted to make a
LogicalPlanRewriter (like ExprRewriter) for a while now that would avoid so
many copies -- it is a fair amount of work but all pretty mechanical.
I like this idea, it may also help with the issues I mentioned above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]