Repository: spark Updated Branches: refs/heads/master 256704c77 -> 23369c3bd
[SPARK-13763][SQL] Remove Project when its Child's Output is Nil #### What changes were proposed in this pull request? As shown in another PR: https://github.com/apache/spark/pull/11596, we are using `SELECT 1` as a dummy table, when the table is used for SQL statements in which a table reference is required, but the contents of the table are not important. For example, ```SQL SELECT value FROM (select 1) dummyTable Lateral View explode(array(1,2,3)) adTable as value ``` Before the PR, the optimized plan contains a useless `Project` after Optimizer executing the `ColumnPruning` rule, as shown below: ``` == Analyzed Logical Plan == value: int Project [value#22] +- Generate explode(array(1, 2, 3)), true, false, Some(adtable), [value#22] +- SubqueryAlias dummyTable +- Project [1 AS 1#21] +- OneRowRelation$ == Optimized Logical Plan == Generate explode([1,2,3]), false, false, Some(adtable), [value#22] +- Project +- OneRowRelation$ ``` After the fix, the optimized plan removed the useless `Project`, as shown below: ``` == Optimized Logical Plan == Generate explode([1,2,3]), false, false, Some(adtable), [value#22] +- OneRowRelation$ ``` This PR is to remove `Project` when its Child's output is Nil #### How was this patch tested? Added a new unit test case into the suite `ColumnPruningSuite.scala` Author: gatorsmile <gatorsm...@gmail.com> Closes #11599 from gatorsmile/projectOneRowRelation. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/23369c3b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/23369c3b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/23369c3b Branch: refs/heads/master Commit: 23369c3bd2c6a6d7a2b9d1396d6962022676cee7 Parents: 256704c Author: gatorsmile <gatorsm...@gmail.com> Authored: Wed Mar 9 10:29:27 2016 -0800 Committer: Michael Armbrust <mich...@databricks.com> Committed: Wed Mar 9 10:29:27 2016 -0800 ---------------------------------------------------------------------- .../spark/sql/catalyst/optimizer/Optimizer.scala | 6 +++--- .../sql/catalyst/optimizer/ColumnPruningSuite.scala | 16 ++++++++++++++++ 2 files changed, 19 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/23369c3b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 7455e68..586bf3d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -381,12 +381,12 @@ object ColumnPruning extends Rule[LogicalPlan] { p } - // Can't prune the columns on LeafNode - case p @ Project(_, l: LeafNode) => p - // Eliminate no-op Projects case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child + // Can't prune the columns on LeafNode + case p @ Project(_, l: LeafNode) => p + // for all other logical plans that inherits the output from it's children case p @ Project(_, child) => val required = child.references ++ p.references http://git-wip-us.apache.org/repos/asf/spark/blob/23369c3b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala index d09601e..409e922 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala @@ -157,6 +157,22 @@ class ColumnPruningSuite extends PlanTest { comparePlans(Optimize.execute(query), expected) } + test("Eliminate the Project with an empty projectList") { + val input = OneRowRelation + val expected = Project(Literal(1).as("1") :: Nil, input).analyze + + val query1 = + Project(Literal(1).as("1") :: Nil, Project(Literal(1).as("1") :: Nil, input)).analyze + comparePlans(Optimize.execute(query1), expected) + + val query2 = + Project(Literal(1).as("1") :: Nil, Project(Nil, input)).analyze + comparePlans(Optimize.execute(query2), expected) + + // to make sure the top Project will not be removed. + comparePlans(Optimize.execute(expected), expected) + } + test("column pruning for group") { val testRelation = LocalRelation('a.int, 'b.int, 'c.int) val originalQuery = --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org