Dongjoon Hyun created SPARK-27123: ------------------------------------- Summary: Improve CollapseProject to handle projects cross limit/repartition/sample Key: SPARK-27123 URL: https://issues.apache.org/jira/browse/SPARK-27123 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Dongjoon Hyun
`CollapseProject` optimizer simplifies the plan by merging the adjacent projects and performing alias substitution. {code:java} scala> sql("SELECT b c FROM (SELECT a b FROM t)").explain == Physical Plan == *(1) Project [a#5 AS c#1] +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5] {code} We can do that more complex cases like the following. *BEFORE* {code:java} scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM t)").explain == Physical Plan == *(2) Project [b#0 AS c#1] +- Exchange RoundRobinPartitioning(1) +- *(1) Project [a#5 AS b#0] +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5] {code} *AFTER* {code:java} scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM t)").explain == Physical Plan == Exchange RoundRobinPartitioning(1) +- *(1) Project [a#11 AS c#7] +- Scan hive default.t [a#11], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#11] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org