spark git commit: [SPARK-24870][SQL] Cache can't work normally if there are case letters in SQL

lixiao Mon, 23 Jul 2018 23:05:16 -0700

Repository: spark
Updated Branches:
  refs/heads/master d2436a852 -> 13a67b070



[SPARK-24870][SQL] Cache can't work normally if there are case letters in SQL

## What changes were proposed in this pull request?
Modified the canonicalized to not case-insensitive.
Before the PR, cache can't work normally if there are case letters in SQL,
for example:
     sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")

    sql("select key, sum(case when Key > 0 then 1 else 0 end) as positiveNum " +
      "from src group by key").cache().createOrReplaceTempView("src_cache")
    sql(
      s"""select a.key
           from
           (select key from src_cache where positiveNum = 1)a
           left join
           (select key from src_cache )b
           on a.key=b.key
        """).explain

The physical plan of the sql is:
![image](https://user-images.githubusercontent.com/26834091/42979518-3decf0fa-8c05-11e8-9837-d5e4c334cb1f.png)

The subquery "select key from src_cache where positiveNum = 1" on the left of 
join can use the cache data, but the subquery "select key from src_cache" on 
the right of join cannot use the cache data.

## How was this patch tested?

new added test

Author: 10129659 <chen.yans...@zte.com.cn>

Closes #21823 from eatoncys/canonicalized.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/13a67b07
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/13a67b07
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/13a67b07

Branch: refs/heads/master
Commit: 13a67b070d335bb257d13dacadea3450885c3d81
Parents: d2436a8
Author: 10129659 <chen.yans...@zte.com.cn>
Authored: Mon Jul 23 23:05:08 2018 -0700
Committer: Xiao Li <gatorsm...@gmail.com>
Committed: Mon Jul 23 23:05:08 2018 -0700

----------------------------------------------------------------------
 .../apache/spark/sql/catalyst/plans/QueryPlan.scala  |  2 +-
 .../apache/spark/sql/execution/SameResultSuite.scala | 15 +++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/13a67b07/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
index 4b4722b..b1ffdca 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
@@ -284,7 +284,7 @@ object QueryPlan extends PredicateHelper {
         if (ordinal == -1) {
           ar
         } else {
-          ar.withExprId(ExprId(ordinal))
+          ar.withExprId(ExprId(ordinal)).canonicalized
         }
     }.canonicalized.asInstanceOf[T]
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/13a67b07/sql/core/src/test/scala/org/apache/spark/sql/execution/SameResultSuite.scala
----------------------------------------------------------------------
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SameResultSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SameResultSuite.scala
index aaf51b5..d088e24 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SameResultSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SameResultSuite.scala
@@ -18,8 +18,11 @@
 package org.apache.spark.sql.execution
 
 import org.apache.spark.sql.{DataFrame, QueryTest}
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, Project}
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types.IntegerType
 
 /**
  * Tests for the sameResult function for [[SparkPlan]]s.
@@ -58,4 +61,16 @@ class SameResultSuite extends QueryTest with 
SharedSQLContext {
     val df4 = spark.range(10).agg(sumDistinct($"id"))
     
assert(df3.queryExecution.executedPlan.sameResult(df4.queryExecution.executedPlan))
   }
+
+  test("Canonicalized result is case-insensitive") {
+    val a = AttributeReference("A", IntegerType)()
+    val b = AttributeReference("B", IntegerType)()
+    val planUppercase = Project(Seq(a), LocalRelation(a, b))
+
+    val c = AttributeReference("a", IntegerType)()
+    val d = AttributeReference("b", IntegerType)()
+    val planLowercase = Project(Seq(c), LocalRelation(c, d))
+
+    assert(planUppercase.sameResult(planLowercase))
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24870][SQL] Cache can't work normally if there are case letters in SQL

Reply via email to