alamb opened a new issue, #4968: URL: https://github.com/apache/arrow-datafusion/issues/4968
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** `ProjectionExec` can either have computations like (`col1` + `col2`) or it can be used to reorder / rename the columns The first use case benefits from repartitioning (as then the calculation can be done in multiple cores) The second use case (ordering) does not benefit from partitioning as it is simply a bookkeeping arrangement Basically we have a plan like ```text ProjectionExec: expr=[f@0 as f] DeduplicateExec: [tag@1 ASC,time@2 ASC] SortPreservingMergeExec: [tag@1 ASC,time@2 ASC] UnionExec ``` That is then optimized by https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_optimizer/repartition.rs to repartition before the projection ```text ProjectionExec: expr=[f@0 as f] RepartitionExec: partitioning=RoundRobinBatch(4) <-- This repartition node is likely worthless DeduplicateExec: [tag@1 ASC,time@2 ASC] SortPreservingMergeExec: [tag@1 ASC,time@2 ASC] UnionExec ``` **Describe the solution you'd like** This I think ProjectionExec should only "benefit from partitioning" when its partition expressions actually have calculations (aka are not just columns / aliases) This would like defining `benefits_from_input_partitioning` https://github.com/apache/arrow-datafusion/blob/906896b7c59ff14d71b3056ec4349274cf6662af/datafusion/core/src/physical_plan/mod.rs#L176-L183 For `impl ExecutionPlan for ProjectionExec`: https://github.com/apache/arrow-datafusion/blob/906896b7c59ff14d71b3056ec4349274cf6662af/datafusion/core/src/physical_plan/projection.rs#L151 So that it returned true only if there were expressions that had non column references / aliases **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** I think this is a good first issue as the code and desire is fairly straightforward and this would largely be an exercise in updating tests I suspect -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org