realno commented on pull request #1729:
URL: 
https://github.com/apache/arrow-datafusion/pull/1729#issuecomment-1031867012


   Sounds good - will proceed with the optimizer route. 
   
   @alamb 
   
   >  In terms of the planning overhead, I agree it isn't ideal, though I think 
we can improve things over time by consolidating several of the optimizer 
passes together
   
   There are a few things I noticed that are not ideal, I will update the PR 
later we can discuss in more details. I think it is a good idea to have a plan 
to improve over time - I will create the issues after discussion. Here are the 
things I noticed for now:
   1. For at least aggregate functions some traits/structs from `use 
crate::physical_plan::aggregates` are leaked into the logical planning phase, 
e.g. `fun` in this code block
   ```
    match expr {
           Expr::AggregateFunction {
               fun,
               args,
               distinct,
           } => {
               let mut new_args = args.clone();
               let mut new_func = fun.clone();
               if fun == &aggregates::AggregateFunction::ApproxMedian {
                   
new_args.push(Expr::Literal(ScalarValue::Float64(Some(0.5_f64))));
                   new_func = 
aggregates::AggregateFunction::ApproxPercentileCont;
               }
   ```
   
   2. Changing functions also includes rewriting Projections and Aliases, this 
is very tedious especially some util functions are not public and it replicates 
part of the work for building the plan. Also it may potentially have conflicts 
with other optimize rules if not super careful. Ideally these kind of of 
rewrites could happen before building the logical plan, maybe we can introduce 
a pre-build phase?  I am working the part handling Projection and Alias, it'll 
be more clear looking at the code - I am using string replacement.
   
   > If we need more drastic performance improvements I have wanted to make a 
LogicalPlanRewriter (like ExprRewriter) for a while now that would avoid so 
many copies -- it is a fair amount of work but all pretty mechanical.
   
   I like this idea, it may also help with the issues I mentioned above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to