spark git commit: [SPARK-22141][BACKPORT][SQL] Propagate empty relation before checking Cartesian products

hvanhovell Wed, 27 Sep 2017 08:41:01 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 b0f30b56a -> a406473a5



[SPARK-22141][BACKPORT][SQL] Propagate empty relation before checking Cartesian 
products

Back port https://github.com/apache/spark/pull/19362 to branch-2.2

## What changes were proposed in this pull request?

When inferring constraints from children, Join's condition can be simplified as 
None.
For example,
```
val testRelation = LocalRelation('a.int)
val x = testRelation.as("x")
val y = testRelation.where($"a" === 2 && !($"a" === 2)).as("y")
x.join.where($"x.a" === $"y.a")
```
The plan will become
```
Join Inner
:- LocalRelation <empty>, [a#23]
+- LocalRelation <empty>, [a#224]
```
And the Cartesian products check will throw exception for above plan.

Propagate empty relation before checking Cartesian products, and the issue is 
resolved.

## How was this patch tested?

Unit test

Author: Wang Gengliang <ltn...@gmail.com>

Closes #19366 from gengliangwang/branch-2.2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a406473a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a406473a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a406473a

Branch: refs/heads/branch-2.2
Commit: a406473a525285888dbc29503443173df1d1c490
Parents: b0f30b5
Author: Wang Gengliang <ltn...@gmail.com>
Authored: Wed Sep 27 17:40:31 2017 +0200
Committer: Herman van Hovell <hvanhov...@databricks.com>
Committed: Wed Sep 27 17:40:31 2017 +0200

----------------------------------------------------------------------
 .../org/apache/spark/sql/catalyst/optimizer/Optimizer.scala  | 4 ++--
 sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala | 8 ++++++++
 2 files changed, 10 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/a406473a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index f67daa5..71e03ee 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -113,8 +113,6 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, 
conf: SQLConf)
       SimplifyCreateArrayOps,
       SimplifyCreateMapOps) ++
       extendedOperatorOptimizationRules: _*) ::
-    Batch("Check Cartesian Products", Once,
-      CheckCartesianProducts(conf)) ::
     Batch("Join Reorder", Once,
       CostBasedJoinReorder(conf)) ::
     Batch("Decimal Optimizations", fixedPoint,
@@ -125,6 +123,8 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, 
conf: SQLConf)
     Batch("LocalRelation", fixedPoint,
       ConvertToLocalRelation,
       PropagateEmptyRelation) ::
+    Batch("Check Cartesian Products", Once,
+      CheckCartesianProducts(conf)) ::
     Batch("OptimizeCodegen", Once,
       OptimizeCodegen(conf)) ::
     Batch("RewriteSubquery", Once,

http://git-wip-us.apache.org/repos/asf/spark/blob/a406473a/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
index 95dc147..cdfd33d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
@@ -200,6 +200,14 @@ class JoinSuite extends QueryTest with SharedSQLContext {
       Nil)
   }
 
+  test("SPARK-22141: Propagate empty relation before checking Cartesian 
products") {
+    Seq("inner", "left", "right", "left_outer", "right_outer", 
"full_outer").foreach { joinType =>
+      val x = testData2.where($"a" === 2 && !($"a" === 2)).as("x")
+      val y = testData2.where($"a" === 1 && !($"a" === 1)).as("y")
+      checkAnswer(x.join(y, Seq.empty, joinType), Nil)
+    }
+  }
+
   test("big inner join, 4 matches per row") {
     val bigData = testData.union(testData).union(testData).union(testData)
     val bigDataX = bigData.as("x")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22141][BACKPORT][SQL] Propagate empty relation before checking Cartesian products

Reply via email to