Neal Richardson created ARROW-17463: ---------------------------------------
Summary: [R] Avoid unnecessary projections Key: ARROW-17463 URL: https://issues.apache.org/jira/browse/ARROW-17463 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 10.0.0 In ExecPlan$Build(), we call Project in a few places, and there is code to make sure that there is at least one ProjectNode in the query in order to remove augmented fields from a Dataset scan (unless the user has added them). As a result, it is possible to get multiple ProjectNodes in a row that are essentially no-op. One example is with grouped aggregation: there is a projection to get the order of the columns back to what R expects, and then a no-op projection after that: {code} > mtcars |> arrow_table() |> count(cyl) |> explain() ExecPlan with 6 nodes: 5:SinkNode{} 4:ProjectNode{projection=[cyl, n]} 3:ProjectNode{projection=[cyl, n]} 2:GroupByNode{keys=["cyl"], aggregates=[ hash_sum(n, {skip_nulls=true, min_count=1}), ]} 1:ProjectNode{projection=["n": 1, cyl]} 0:TableSourceNode{} {code} IDK how significant of a performance impact this would have, but it certainly looks wasteful and should be avoidable. -- This message was sent by Atlassian Jira (v8.20.10#820010)