spark git commit: [SPARK-13763][SQL] Remove Project when its Child's Output is Nil

marmbrus Wed, 09 Mar 2016 10:30:01 -0800

Repository: spark
Updated Branches:
  refs/heads/master 256704c77 -> 23369c3bd



[SPARK-13763][SQL] Remove Project when its Child's Output is Nil

#### What changes were proposed in this pull request?

As shown in another PR: https://github.com/apache/spark/pull/11596, we are 
using `SELECT 1` as a dummy table, when the table is used for SQL statements in 
which a table reference is required, but the contents of the table are not 
important. For example,

```SQL
SELECT value FROM (select 1) dummyTable Lateral View explode(array(1,2,3)) 
adTable as value
```
Before the PR, the optimized plan contains a useless `Project` after Optimizer 
executing the `ColumnPruning` rule, as shown below:

```
== Analyzed Logical Plan ==
value: int
Project [value#22]
+- Generate explode(array(1, 2, 3)), true, false, Some(adtable), [value#22]
   +- SubqueryAlias dummyTable
      +- Project [1 AS 1#21]
         +- OneRowRelation$

== Optimized Logical Plan ==
Generate explode([1,2,3]), false, false, Some(adtable), [value#22]
+- Project
   +- OneRowRelation$
```

After the fix, the optimized plan removed the useless `Project`, as shown below:
```
== Optimized Logical Plan ==
Generate explode([1,2,3]), false, false, Some(adtable), [value#22]
+- OneRowRelation$
```

This PR is to remove `Project` when its Child's output is Nil

#### How was this patch tested?

Added a new unit test case into the suite `ColumnPruningSuite.scala`

Author: gatorsmile <gatorsm...@gmail.com>

Closes #11599 from gatorsmile/projectOneRowRelation.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/23369c3b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/23369c3b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/23369c3b

Branch: refs/heads/master
Commit: 23369c3bd2c6a6d7a2b9d1396d6962022676cee7
Parents: 256704c
Author: gatorsmile <gatorsm...@gmail.com>
Authored: Wed Mar 9 10:29:27 2016 -0800
Committer: Michael Armbrust <mich...@databricks.com>
Committed: Wed Mar 9 10:29:27 2016 -0800

----------------------------------------------------------------------
 .../spark/sql/catalyst/optimizer/Optimizer.scala    |  6 +++---
 .../sql/catalyst/optimizer/ColumnPruningSuite.scala | 16 ++++++++++++++++
 2 files changed, 19 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/23369c3b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 7455e68..586bf3d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -381,12 +381,12 @@ object ColumnPruning extends Rule[LogicalPlan] {
         p
       }
 
-    // Can't prune the columns on LeafNode
-    case p @ Project(_, l: LeafNode) => p
-
     // Eliminate no-op Projects
     case p @ Project(projectList, child) if sameOutput(child.output, p.output) 
=> child
 
+    // Can't prune the columns on LeafNode
+    case p @ Project(_, l: LeafNode) => p
+
     // for all other logical plans that inherits the output from it's children
     case p @ Project(_, child) =>
       val required = child.references ++ p.references

http://git-wip-us.apache.org/repos/asf/spark/blob/23369c3b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
index d09601e..409e922 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ColumnPruningSuite.scala
@@ -157,6 +157,22 @@ class ColumnPruningSuite extends PlanTest {
     comparePlans(Optimize.execute(query), expected)
   }
 
+  test("Eliminate the Project with an empty projectList") {
+    val input = OneRowRelation
+    val expected = Project(Literal(1).as("1") :: Nil, input).analyze
+
+    val query1 =
+      Project(Literal(1).as("1") :: Nil, Project(Literal(1).as("1") :: Nil, 
input)).analyze
+    comparePlans(Optimize.execute(query1), expected)
+
+    val query2 =
+      Project(Literal(1).as("1") :: Nil, Project(Nil, input)).analyze
+    comparePlans(Optimize.execute(query2), expected)
+
+    // to make sure the top Project will not be removed.
+    comparePlans(Optimize.execute(expected), expected)
+  }
+
   test("column pruning for group") {
     val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
     val originalQuery =


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-13763][SQL] Remove Project when its Child's Output is Nil

Reply via email to