[jira] [Commented] (SPARK-26402) Accessing nested fields with different cases in case insensitive mode

2019-01-05 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735046#comment-16735046
 ] 

Dongjoon Hyun commented on SPARK-26402:
---

Hi, [~smilegator]. This is not a correctness issue because it fails with 
AnalysisException previously.

> Accessing nested fields with different cases in case insensitive mode
> -
>
> Key: SPARK-26402
> URL: https://issues.apache.org/jira/browse/SPARK-26402
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> {{GetStructField}} with different optional names should be semantically 
> equal. We will use this as building block to compare the nested fields used 
> in the plans to be optimized by catalyst optimizer.
> This PR also fixes a bug below that accessing nested fields with different 
> cases in case insensitive mode will result result {{AnalysisException}}.
> {code:java}
> sql("create table t (s struct) using json")
> sql("select s.I from t group by s.i")
> {code}
> which is currently failing
> {code:java}
> org.apache.spark.sql.AnalysisException: expression 'default.t.`s`' is neither 
> present in the group by, nor is it an aggregate function
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26402) Accessing nested fields with different cases in case insensitive mode

2018-12-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16727529#comment-16727529
 ] 

ASF GitHub Bot commented on SPARK-26402:


asfgit closed pull request #23353: [SPARK-26402][SQL] Accessing nested fields 
with different cases in case insensitive mode
URL: https://github.com/apache/spark/pull/23353
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
index fe6db8b344d3d..4d218b936b3a2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
@@ -26,6 +26,7 @@ package org.apache.spark.sql.catalyst.expressions
  *
  * The following rules are applied:
  *  - Names and nullability hints for [[org.apache.spark.sql.types.DataType]]s 
are stripped.
+ *  - Names for [[GetStructField]] are stripped.
  *  - Commutative and associative operations ([[Add]] and [[Multiply]]) have 
their children ordered
  *by `hashCode`.
  *  - [[EqualTo]] and [[EqualNullSafe]] are reordered by `hashCode`.
@@ -37,10 +38,11 @@ object Canonicalize {
 expressionReorder(ignoreNamesTypes(e))
   }
 
-  /** Remove names and nullability from types. */
+  /** Remove names and nullability from types, and names from 
`GetStructField`. */
   private[expressions] def ignoreNamesTypes(e: Expression): Expression = e 
match {
 case a: AttributeReference =>
   AttributeReference("none", a.dataType.asNullable)(exprId = a.exprId)
+case GetStructField(child, ordinal, Some(_)) => GetStructField(child, 
ordinal, None)
 case _ => e
   }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
index 28e6940f3cca3..9802a6e5891b8 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.plans.logical.Range
+import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
 
 class CanonicalizeSuite extends SparkFunSuite {
 
@@ -50,4 +51,32 @@ class CanonicalizeSuite extends SparkFunSuite {
 assert(range.where(arrays1).sameResult(range.where(arrays2)))
 assert(!range.where(arrays1).sameResult(range.where(arrays3)))
   }
+
+  test("SPARK-26402: accessing nested fields with different cases in case 
insensitive mode") {
+val expId = NamedExpression.newExprId
+val qualifier = Seq.empty[String]
+val structType = StructType(
+  StructField("a", StructType(StructField("b", IntegerType, false) :: 
Nil), false) :: Nil)
+
+// GetStructField with different names are semantically equal
+val fieldA1 = GetStructField(
+  AttributeReference("data1", structType, false)(expId, qualifier),
+  0, Some("a1"))
+val fieldA2 = GetStructField(
+  AttributeReference("data2", structType, false)(expId, qualifier),
+  0, Some("a2"))
+assert(fieldA1.semanticEquals(fieldA2))
+
+val fieldB1 = GetStructField(
+  GetStructField(
+AttributeReference("data1", structType, false)(expId, qualifier),
+0, Some("a1")),
+  0, Some("b1"))
+val fieldB2 = GetStructField(
+  GetStructField(
+AttributeReference("data2", structType, false)(expId, qualifier),
+0, Some("a2")),
+  0, Some("b2"))
+assert(fieldB1.semanticEquals(fieldB2))
+  }
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala
index a313681eeb8f0..5794691a365a9 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala
@@ -25,6 +25,7 @@ import 
org.apache.spark.sql.catalyst.expressions.Literal.{FalseLiteral, TrueLite
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules._
+imp