[ https://issues.apache.org/jira/browse/SPARK-32361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184977#comment-17184977 ]
L. C. Hsieh commented on SPARK-32361: ------------------------------------- Isn't it already in physical plan phase? Removing such Project in physical plan, seems not giving us too much performance gain. > Remove project if output is subset of child > ------------------------------------------- > > Key: SPARK-32361 > URL: https://issues.apache.org/jira/browse/SPARK-32361 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: ulysses you > Priority: Minor > > We can remove some redundant project after we completed pruning column. > e.g., > {code:java} > create table t1(c1 int, c2 int) using parquet; > explain extended > select sum(c1) from ( > select * from t1 > ); > {code} > Currently we get this plan. > {code:java} > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], > output=[sum(c1)#68L]) > +- Exchange SinglePartition, true, [id=#86] > +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as > bigint))], output=[sum#70L]) > +- *(1) Project [c1#19] > +- *(1) ColumnarToRow > +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: > [], Format: Parquet, Location: > InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<c1:int> > {code} > We can remove the `Project`, like this > {code:java} > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[sum(cast(c1#19 as bigint))], > output=[sum(c1)#68L]) > +- Exchange SinglePartition, true, [id=#86] > +- *(1) HashAggregate(keys=[], functions=[partial_sum(cast(c1#19 as > bigint))], output=[sum#70L]) > +- *(1) ColumnarToRow > +- FileScan parquet default.t1[c1#19] Batched: true, DataFilters: > [], Format: Parquet, Location: > InMemoryFileIndex[hdfs:///user/hive/warehouse/t1], PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<c1:int> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org