[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2018-01-08 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r160115072
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -462,27 +507,139 @@ class TypeCoercionSuite extends AnalysisTest {
   ArrayType(DoubleType, containsNull = false),
   Some(ArrayType(DoubleType, containsNull = true)))
 widenTestWithStringPromotion(
-  ArrayType(TimestampType, containsNull = false),
-  ArrayType(StringType, containsNull = true),
+  ArrayType(ArrayType(IntegerType), containsNull = true),
+  ArrayType(ArrayType(LongType), containsNull = false),
+  Some(ArrayType(ArrayType(LongType), containsNull = true)))
--- End diff --

I think this is more coherent with the behavior in other parts and this is 
the behavior I would expect. But I think that we should follow @gatorsmile's 
suggestion and check Hive's behavior first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2018-01-04 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r159676537
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -462,27 +507,139 @@ class TypeCoercionSuite extends AnalysisTest {
   ArrayType(DoubleType, containsNull = false),
   Some(ArrayType(DoubleType, containsNull = true)))
 widenTestWithStringPromotion(
-  ArrayType(TimestampType, containsNull = false),
-  ArrayType(StringType, containsNull = true),
+  ArrayType(ArrayType(IntegerType), containsNull = true),
+  ArrayType(ArrayType(LongType), containsNull = false),
+  Some(ArrayType(ArrayType(LongType), containsNull = true)))
--- End diff --

@mgaido91, thoughts on this?

It's definitely possible for us to revert back to behavior where we don't 
do IntegerType-to-LongType, xType-to-StringType, etc. promotion inside complex 
types, which was how a previous form of this PR handled it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2018-01-04 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r159675261
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,11 +100,22 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
--- End diff --

Sure, see #09e49fb8162b97dec76a6324fa4fac4553b22013


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2018-01-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r159575582
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -462,27 +507,139 @@ class TypeCoercionSuite extends AnalysisTest {
   ArrayType(DoubleType, containsNull = false),
   Some(ArrayType(DoubleType, containsNull = true)))
 widenTestWithStringPromotion(
-  ArrayType(TimestampType, containsNull = false),
-  ArrayType(StringType, containsNull = true),
+  ArrayType(ArrayType(IntegerType), containsNull = true),
+  ArrayType(ArrayType(LongType), containsNull = false),
+  Some(ArrayType(ArrayType(LongType), containsNull = true)))
--- End diff --

Do we really wanna do this? Is there any other database have this behavior?

I think for complex type, we should ignore the nullable difference, but I'm 
not sure if we should do type coercion inside complex type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2018-01-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r159575262
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,11 +100,22 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
--- End diff --

`pointType` looks like a weird name, how about `elementType` or `et`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2018-01-02 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r159244795
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -148,6 +160,61 @@ object TypeCoercion {
 case (l, r) => None
   }
 
+  /**
+   * Case 2 type widening over complex types. `widerTypeFunc` is a 
function that finds the wider
+   * type over point types. The `widerTypeFunc` specifies behavior over 
whether types should be
+   * promoted to StringType.
+   */
+  private def findWiderTypeForTwoComplex(
+  t1: DataType,
+  t2: DataType,
+  widerTypeFunc: (DataType, DataType) => Option[DataType]): 
Option[DataType] = {
+(t1, t2) match {
+  case (_, _) if t1 == t2 => Some(t1)
+  case (NullType, _) => Some(t1)
+  case (_, NullType) => Some(t1)
+
+  case (ArrayType(pointType1, nullable1), ArrayType(pointType2, 
nullable2)) =>
+val dataType = widerTypeFunc.apply(pointType1, pointType2)
+
+dataType.map(ArrayType(_, nullable1 || nullable2))
+
+  case (MapType(keyType1, valueType1, nullable1), MapType(keyType2, 
valueType2, nullable2)) =>
+val keyType = widerTypeFunc.apply(keyType1, keyType2)
+val valueType = widerTypeFunc.apply(valueType1, valueType2)
+
+if (keyType.nonEmpty && valueType.nonEmpty) {
+  Some(MapType(keyType.get, valueType.get, nullable1 || nullable2))
+} else {
+  None
+}
+
+  case (StructType(fields1), StructType(fields2)) =>
+val fieldTypes = fields1.zip(fields2).map { case (f1, f2) =>
--- End diff --

@mgaido91 That's seems to be the assumption already made in 
[`findTightestCommonType`](https://github.com/bdrillard/spark/blob/d42dfa55105c8944b63781fc61b59c98a99338d1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L115).
 The 
[`sameType`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L83)
 function on DataType also requires structfields are ordered the same. 

The difference here is that we don't require the structfields strictly have 
the same type, so we can support widening to LongType, StringType, etc. But we 
do require the fields 1. have the same order, and 2. have the same name (either 
with strict case, or ignoring case).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-30 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r159119696
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -148,6 +160,61 @@ object TypeCoercion {
 case (l, r) => None
   }
 
+  /**
+   * Case 2 type widening over complex types. `widerTypeFunc` is a 
function that finds the wider
+   * type over point types. The `widerTypeFunc` specifies behavior over 
whether types should be
+   * promoted to StringType.
+   */
+  private def findWiderTypeForTwoComplex(
+  t1: DataType,
+  t2: DataType,
+  widerTypeFunc: (DataType, DataType) => Option[DataType]): 
Option[DataType] = {
+(t1, t2) match {
+  case (_, _) if t1 == t2 => Some(t1)
+  case (NullType, _) => Some(t1)
+  case (_, NullType) => Some(t1)
+
+  case (ArrayType(pointType1, nullable1), ArrayType(pointType2, 
nullable2)) =>
+val dataType = widerTypeFunc.apply(pointType1, pointType2)
+
+dataType.map(ArrayType(_, nullable1 || nullable2))
+
+  case (MapType(keyType1, valueType1, nullable1), MapType(keyType2, 
valueType2, nullable2)) =>
+val keyType = widerTypeFunc.apply(keyType1, keyType2)
+val valueType = widerTypeFunc.apply(valueType1, valueType2)
+
+if (keyType.nonEmpty && valueType.nonEmpty) {
+  Some(MapType(keyType.get, valueType.get, nullable1 || nullable2))
+} else {
+  None
+}
+
+  case (StructType(fields1), StructType(fields2)) =>
+val fieldTypes = fields1.zip(fields2).map { case (f1, f2) =>
--- End diff --

this requires that fields with the same name must also be in the same 
position. Is this assumption correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158852830
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -469,12 +488,21 @@ class TypeCoercionSuite extends AnalysisTest {
   ArrayType(ArrayType(IntegerType), containsNull = false),
   ArrayType(ArrayType(LongType), containsNull = false),
   Some(ArrayType(ArrayType(LongType), containsNull = false)))
+widenTestWithStringPromotion(
+  StructType(StructField("a", ArrayType(LongType)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType)) :: Nil),
+  Some(StructType(StructField("a", ArrayType(StringType)) :: Nil)))
+
 
 // Without string promotion
 widenTestWithoutStringPromotion(IntegerType, StringType, None)
 widenTestWithoutStringPromotion(StringType, TimestampType, None)
 widenTestWithoutStringPromotion(ArrayType(LongType), 
ArrayType(StringType), None)
 widenTestWithoutStringPromotion(ArrayType(StringType), 
ArrayType(TimestampType), None)
+widenTestWithoutStringPromotion(
+  StructType(StructField("a", ArrayType(LongType)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType)) :: Nil),
+  None)
--- End diff --

I've grouped non-string promotion tests, and added tests for arrays, maps, 
and then structures nested in those types or having fields of those types, 
https://github.com/apache/spark/pull/20010/files#diff-01ecdd038c5c2f53f38118912210fef8R563


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158852200
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -469,12 +488,21 @@ class TypeCoercionSuite extends AnalysisTest {
   ArrayType(ArrayType(IntegerType), containsNull = false),
   ArrayType(ArrayType(LongType), containsNull = false),
   Some(ArrayType(ArrayType(LongType), containsNull = false)))
+widenTestWithStringPromotion(
+  StructType(StructField("a", ArrayType(LongType)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType)) :: Nil),
+  Some(StructType(StructField("a", ArrayType(StringType)) :: Nil)))
--- End diff --

I've grouped string promotion tests for arrays and maps, with tests where 
those types are nested in structs or are fields in structs, 
https://github.com/apache/spark/pull/20010/files#diff-01ecdd038c5c2f53f38118912210fef8R504


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158852095
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -389,6 +389,25 @@ class TypeCoercionSuite extends AnalysisTest {
 widenTest(StringType, MapType(IntegerType, StringType, true), None)
 widenTest(ArrayType(IntegerType), StructType(Seq()), None)
 
+widenTest(
+  ArrayType(StringType, containsNull=true),
+  ArrayType(StringType, containsNull=false),
+  Some(ArrayType(StringType, containsNull=true)))
+widenTest(
+  MapType(StringType, StringType, valueContainsNull=true),
+  MapType(StringType, StringType, valueContainsNull=false),
+  Some(MapType(StringType, StringType, valueContainsNull=true)))
--- End diff --

Here are tests for nested complex types, 
https://github.com/apache/spark/pull/20010/files#diff-01ecdd038c5c2f53f38118912210fef8R405


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158851910
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -389,6 +389,25 @@ class TypeCoercionSuite extends AnalysisTest {
 widenTest(StringType, MapType(IntegerType, StringType, true), None)
 widenTest(ArrayType(IntegerType), StructType(Seq()), None)
 
+widenTest(
+  ArrayType(StringType, containsNull=true),
--- End diff --

Fixed, see dfb4a3b47d88853be348aafd9802f799639693f6


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158851887
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +213,8 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
+  .orElse(findWiderTypeForTwoComplex(t1, t2, findWiderTypeForTwo))
+
--- End diff --

Fixed, see dfb4a3b47d88853be348aafd9802f799639693f6


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158851832
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -469,12 +488,21 @@ class TypeCoercionSuite extends AnalysisTest {
   ArrayType(ArrayType(IntegerType), containsNull = false),
   ArrayType(ArrayType(LongType), containsNull = false),
   Some(ArrayType(ArrayType(LongType), containsNull = false)))
+widenTestWithStringPromotion(
+  StructType(StructField("a", ArrayType(LongType)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType)) :: Nil),
+  Some(StructType(StructField("a", ArrayType(StringType)) :: Nil)))
+
--- End diff --

Fixed, see dfb4a3b47d88853be348aafd9802f799639693f6


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158844539
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -469,12 +488,21 @@ class TypeCoercionSuite extends AnalysisTest {
   ArrayType(ArrayType(IntegerType), containsNull = false),
   ArrayType(ArrayType(LongType), containsNull = false),
   Some(ArrayType(ArrayType(LongType), containsNull = false)))
+widenTestWithStringPromotion(
+  StructType(StructField("a", ArrayType(LongType)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType)) :: Nil),
+  Some(StructType(StructField("a", ArrayType(StringType)) :: Nil)))
+
--- End diff --

this empty line is not needed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158844501
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -469,12 +488,21 @@ class TypeCoercionSuite extends AnalysisTest {
   ArrayType(ArrayType(IntegerType), containsNull = false),
   ArrayType(ArrayType(LongType), containsNull = false),
   Some(ArrayType(ArrayType(LongType), containsNull = false)))
+widenTestWithStringPromotion(
+  StructType(StructField("a", ArrayType(LongType)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType)) :: Nil),
+  Some(StructType(StructField("a", ArrayType(StringType)) :: Nil)))
+
 
 // Without string promotion
 widenTestWithoutStringPromotion(IntegerType, StringType, None)
 widenTestWithoutStringPromotion(StringType, TimestampType, None)
 widenTestWithoutStringPromotion(ArrayType(LongType), 
ArrayType(StringType), None)
 widenTestWithoutStringPromotion(ArrayType(StringType), 
ArrayType(TimestampType), None)
+widenTestWithoutStringPromotion(
+  StructType(StructField("a", ArrayType(LongType)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType)) :: Nil),
+  None)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158844477
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -469,12 +488,21 @@ class TypeCoercionSuite extends AnalysisTest {
   ArrayType(ArrayType(IntegerType), containsNull = false),
   ArrayType(ArrayType(LongType), containsNull = false),
   Some(ArrayType(ArrayType(LongType), containsNull = false)))
+widenTestWithStringPromotion(
+  StructType(StructField("a", ArrayType(LongType)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType)) :: Nil),
+  Some(StructType(StructField("a", ArrayType(StringType)) :: Nil)))
--- End diff --

please add an analogous test also for map and array


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158844437
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -389,6 +389,25 @@ class TypeCoercionSuite extends AnalysisTest {
 widenTest(StringType, MapType(IntegerType, StringType, true), None)
 widenTest(ArrayType(IntegerType), StructType(Seq()), None)
 
+widenTest(
+  ArrayType(StringType, containsNull=true),
+  ArrayType(StringType, containsNull=false),
+  Some(ArrayType(StringType, containsNull=true)))
+widenTest(
+  MapType(StringType, StringType, valueContainsNull=true),
+  MapType(StringType, StringType, valueContainsNull=false),
+  Some(MapType(StringType, StringType, valueContainsNull=true)))
+
+widenTest(
+  StructType(StructField("a", ArrayType(StringType, 
containsNull=true)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType, 
containsNull=false)) :: Nil),
+  Some(StructType(StructField("a", ArrayType(StringType, 
containsNull=true)) :: Nil)))
+widenTest(
+  StructType(StructField("a", MapType(StringType, StringType, 
valueContainsNull=true)) :: Nil),
+  StructType(StructField("a", MapType(StringType, StringType, 
valueContainsNull=false)) :: Nil),
+  Some(StructType(
+StructField("a", MapType(StringType, StringType, 
valueContainsNull=true)) :: Nil)))
--- End diff --

thanks, may you please add also two tests having Map and Array as outer 
types?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158844322
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -389,6 +389,25 @@ class TypeCoercionSuite extends AnalysisTest {
 widenTest(StringType, MapType(IntegerType, StringType, true), None)
 widenTest(ArrayType(IntegerType), StructType(Seq()), None)
 
+widenTest(
+  ArrayType(StringType, containsNull=true),
+  ArrayType(StringType, containsNull=false),
+  Some(ArrayType(StringType, containsNull=true)))
+widenTest(
+  MapType(StringType, StringType, valueContainsNull=true),
+  MapType(StringType, StringType, valueContainsNull=false),
+  Some(MapType(StringType, StringType, valueContainsNull=true)))
--- End diff --

please add a test also for struct


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158844162
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -389,6 +389,25 @@ class TypeCoercionSuite extends AnalysisTest {
 widenTest(StringType, MapType(IntegerType, StringType, true), None)
 widenTest(ArrayType(IntegerType), StructType(Seq()), None)
 
+widenTest(
+  ArrayType(StringType, containsNull=true),
--- End diff --

please add a whitespace before and after `=` and the same in the following 
lines


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-27 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158843869
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +213,8 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
+  .orElse(findWiderTypeForTwoComplex(t1, t2, findWiderTypeForTwo))
+
--- End diff --

this line should be removed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-26 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158724861
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +213,8 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
+  .orElse(findWiderTypeForTwoComplex(t1, t2, findWiderTypeForTwo))
--- End diff --

It should not. `findWiderTypeForTwoComplex` will only be called as we 
operate over "complex" types (i.e arrays, maps, structs), and will only recurse 
(calling `findWiderTypeForTwo`) over the child point-types of a complex type, 
so we ensure the recursive computation gets narrower as it recurses, until 
eventually terminating at the leaf level of the schema.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-26 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158728293
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +169,6 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
--- End diff --

Yes, this PR should be ready for a Jenkins build.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-21 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158440114
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +169,6 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
--- End diff --

I like your implementation, except for the concerns I raised in another 
thread. Have you tested locally?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-21 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158440005
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +213,8 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
+  .orElse(findWiderTypeForTwoComplex(t1, t2, findWiderTypeForTwo))
--- End diff --

Will this cause infinite loop? Calling `findWiderTypeForTwo` in 
`findWiderTypeForTwo`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-20 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r158081192
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +169,6 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
--- End diff --

@gczsjdy I've taken a shot at implementing your suggestion with 
`findWiderTypeForTwoComplex`, which takes as an argument a `widerTypeFunc`, 
describing which widening behavior to apply to point types (should they permit 
promotion to string or not). 

Because `ArrayType` instances that would require widening the type could be 
nested in `StructType` and `MapType`, I think it's necessary to have more case 
matching than would be in `findWiderTypeForArray`, hence 
`findWiderTypeForTwoComplex`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-20 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157967667
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +169,6 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
--- End diff --

I like your idea @gczsjdy!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157928910
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +102,33 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (ArrayType(pointType1, nullable1), ArrayType(pointType2, 
nullable2)) =>
+  val dataType = if (withStringPromotion) {
+findWiderTypeForTwo(pointType1, pointType2)
--- End diff --

I think we break the `findTightest` semantic by `withStringPromotion` 
judgement and calling `findWiderTypeForTwo`, these are what we should add in 
`Case 2 type widening` -> `findWiderTypeForTwo`. I suggest we put these logic 
in another function`findWiderTypeForDecimal`. Will mention this in the other 
thread.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157929494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +169,6 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
--- End diff --

My suggestion: we define a new function, like `findWiderTypeForArray`. This 
new function can provide 'findWider' functionality compared to the 
'findTightestArray' part(which is basically the first commit in your PR). We 
won't break the original `findTightest` semantic in this way and the code is 
clean.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157926754
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
--- End diff --

Thanks for your illustration.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157926722
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +169,6 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
--- End diff --

Thanks for your illustration.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157901873
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +169,6 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
--- End diff --

Sure, I think that's possible. In order to handle instances with and 
without string promotion, I think it may be necessary to add a boolean 
parameter, and then to handle the instances where the pointType/keyType and 
valueType may result in `None`, see 
https://github.com/apache/spark/pull/20010/files#diff-383a8cdd0a9c58cae68e0a79295520a3R105

To support the minor change in function signature for 
`findTightestCommonType`, I have to do some refactoring in the tests. Let me 
know if you think there's a cleaner implementation, but this seems to help 
localize like concerns into `findTightestCommonType`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157794266
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +169,6 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
--- End diff --

it makes sense, but I'd love that in your implementation in 
`findTightestCommonType`, you would replicate this logic, ie. removing the 
`sameType` guard and using `findWiderTypeForTwo`, in order to allow casting an 
array of int to an array of long. What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157792439
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -182,12 +188,6 @@ object TypeCoercion {
   t2: DataType): Option[DataType] = {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeWithoutStringPromotionForTwo(et1, et2)
-.map(ArrayType(_, containsNull1 || containsNull2))
-case _ => None
-  })
--- End diff --

Same comment here, as above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157792361
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -158,11 +169,6 @@ object TypeCoercion {
 findTightestCommonType(t1, t2)
   .orElse(findWiderTypeForDecimal(t1, t2))
   .orElse(stringPromotion(t1, t2))
-  .orElse((t1, t2) match {
-case (ArrayType(et1, containsNull1), ArrayType(et2, 
containsNull2)) =>
-  findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || 
containsNull2))
-case _ => None
-  })
--- End diff --

I think if we now check for ArrayTypes (including MapTypes) in 
`findTightestCommonType`, the match here becomes redundant. @mgaido91, 
@gczsjdy, does this thinking make sense to you both? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157792706
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -389,6 +389,25 @@ class TypeCoercionSuite extends AnalysisTest {
 widenTest(StringType, MapType(IntegerType, StringType, true), None)
 widenTest(ArrayType(IntegerType), StructType(Seq()), None)
 
+widenTest(
+  ArrayType(StringType, containsNull=true),
+  ArrayType(StringType, containsNull=false),
+  Some(ArrayType(StringType, containsNull=true)))
+widenTest(
+  MapType(StringType, StringType, valueContainsNull=true),
+  MapType(StringType, StringType, valueContainsNull=false),
+  Some(MapType(StringType, StringType, valueContainsNull=true)))
+
+widenTest(
+  StructType(StructField("a", ArrayType(StringType, 
containsNull=true)) :: Nil),
+  StructType(StructField("a", ArrayType(StringType, 
containsNull=false)) :: Nil),
+  Some(StructType(StructField("a", ArrayType(StringType, 
containsNull=true)) :: Nil)))
+widenTest(
+  StructType(StructField("a", MapType(StringType, StringType, 
valueContainsNull=true)) :: Nil),
+  StructType(StructField("a", MapType(StringType, StringType, 
valueContainsNull=false)) :: Nil),
+  Some(StructType(
+StructField("a", MapType(StringType, StringType, 
valueContainsNull=true)) :: Nil)))
--- End diff --

Here's a test for nested structures where an explicit match case against 
ArrayType/MapType is necessary due to the difference in nullability between the 
two structures.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157787536
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
--- End diff --

@gczsjdy `sameType` will ignore the nullability of the two types, only 
checking if the `DataType` is the same. We would expect then that the first 
case of `findTightestCommonType` would pass the types on through, but during 
its equality check, it also checks the type nullability, which for ArrayType 
and MapType, may differ between t1 and t2:

```
case (t1, t2) if t1 == t1 => Some(t1)
```

That means we can establish that two StructFields of ArrayType/MapType are 
the same DataType, but if they have different nullability, then the above case 
match won't match them, nor will any other case in the match set. So in order 
to find the tightest common type where nullabilities of the point-types may 
differ, we'll need to recurse. See a case like:

```
widenTest(
StructType(StructField("a", ArrayType(StringType, containsNull=true)) 
:: Nil),
StructType(StructField("a", ArrayType(StringType, containsNull=false)) 
:: Nil),
Some(StructType(StructField("a", ArrayType(StringType, 
containsNull=true)) :: Nil)))
```

At the moment, since we have no case to match ArrayType/MapType where the 
nullabilities may differ, this case would fail. I can add this test explicitly 
to the suite.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157787266
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
--- End diff --

@gczsjdy  I think it is needed, because IIUC in `sameType` the nullability 
is not checked to be the same. So it might vary and calling 
`findTightestCommonType` does the trick.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157780706
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
--- End diff --

@mgaido91 Thanks, I got it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread bdrillard
Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157778903
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
--- End diff --

@mgaido91, @gczsjdy, sure, I'll add a pair of test cases for nested complex 
structures for `ArrayType` and `MapType`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157775323
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
--- End diff --

Let's use this example, if we call `findTightestCommonType` on 2 array of 
array of Integer types, then this `dataType` = 
`findTightest(findTightest(IntegralType, IntegralType))` = `IntegralType`, so 
the final answer is `Some(ArrayType(IntegralType))`? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157749627
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
--- End diff --

then @bdrillard can we add also some test cases for nested structures?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157749463
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
--- End diff --

I think that the reason is for nested structures, for instance an array of 
array or array of map


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157697044
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
+  Some(ArrayType(dataType, nullable1 || nullable2))
+
+case (t1 @ MapType(keyType1, valueType1, nullable1),
+t2 @ MapType(keyType2, valueType2, nullable2)) if t1.sameType(t2) 
=>
+  val keyType = findTightestCommonType(keyType1, keyType2).get
+  val valueType = findTightestCommonType(valueType1, valueType2).get
--- End diff --

Also here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157696626
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
+  val dataType = findTightestCommonType(pointType1, pointType2).get
--- End diff --

If `t1` and `t2` are the sameType, why do we need recursively 
`findTightestCommonType`? Can we just make `val dataType = pointType1`? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157695599
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
--- End diff --

This will make sure the later `get` won't be applied to `None` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-19 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20010#discussion_r157691450
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -99,6 +99,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ ArrayType(pointType1, nullable1), t2 @ 
ArrayType(pointType2, nullable2))
+if t1.sameType(t2) =>
--- End diff --

why is this `if t1.sameType(t2)` needed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20010: [SPARK-22826][SQL] findWiderTypeForTwo Fails over...

2017-12-18 Thread bdrillard
GitHub user bdrillard opened a pull request:

https://github.com/apache/spark/pull/20010

[SPARK-22826][SQL] findWiderTypeForTwo Fails over StructField of Array


## What changes were proposed in this pull request?

[SPARK-22826](https://issues.apache.org/jira/browse/SPARK-22826)

Attempting to find the tightest common type over a struct holding fields of 
`ArrayType` or `MapType` will fail if the types of those struct fields are not 
exactly equal. See the JIRA for an example.

## How was this patch tested?

This patch adds a test case for `ArrayType` and `MapType` for finding 
tightest common types.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bdrillard/spark SPARK-22826

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20010.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20010


commit 1ac4b0be7926d24b5cbee54e0a8b56381adda462
Author: ALeksander Eskilson 
Date:   2017-12-18T21:09:18Z

[SPARK-22826] adding support for tighest type coercion over array/map types 
in a structfield




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org