[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16777 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100898554 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -117,49 +115,67 @@ object TypeCoercion { * loss of precision when widening decimal and double, and promotion to string. */ private[analysis] def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { -(t1, t2) match { - case (t1: DecimalType, t2: DecimalType) => -Some(DecimalPrecision.widerDecimalType(t1, t2)) - case (t: IntegralType, d: DecimalType) => -Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) - case (d: DecimalType, t: IntegralType) => -Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) - case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => -Some(DoubleType) - case _ => -findTightestCommonTypeToString(t1, t2) -} +findTightestCommonType(t1, t2) + .orElse(findWiderTypeForDecimal(t1, t2)) --- End diff -- Yes. Integer will be promoted to a wider Decimal anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100861589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -117,49 +115,67 @@ object TypeCoercion { * loss of precision when widening decimal and double, and promotion to string. */ private[analysis] def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { -(t1, t2) match { - case (t1: DecimalType, t2: DecimalType) => -Some(DecimalPrecision.widerDecimalType(t1, t2)) - case (t: IntegralType, d: DecimalType) => -Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) - case (d: DecimalType, t: IntegralType) => -Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) - case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => -Some(DoubleType) - case _ => -findTightestCommonTypeToString(t1, t2) -} +findTightestCommonType(t1, t2) + .orElse(findWiderTypeForDecimal(t1, t2)) --- End diff -- yea we changed the order, but looks like it won't change the result. `findWiderTypeForDecimal` will always return a result for decimal type and numeric type, and if `findTightestCommonType` can return a result, `findWiderTypeForDecimal` will return the same result. So it doesn't matter if we run `findTightestCommonType` before or after it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100730262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- I see. Thank you for catching it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100727682 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- We should make them consistent. That is why I think it is right to make the change, even if it causes the behavior changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100723132 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- Aha, thank you for correcting me. I overlooked but the result should be still the same, shouldn't it? - `DecimalType.isWiderThan` ``` (p1 - s1) >= (p2 - s2) && s1 >= s2 ``` - DecimalPrecision.widerDecimalType ``` max(s1, s2) + max(p1-s1, p2-s2), max(s1, s2) ``` If both are different, we were already applying different type coercion rules between `findWiderTypeWithoutStringPromotion` and `findWiderTypeForTwo`, I guess we should match them with the same given https://github.com/apache/spark/pull/14439 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100719964 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- The original `findTightestCommonTypeToString` does not handle `DecimalType `. However, this PR is calling the `findTightestCommonType ` at first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100719832 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- The result is the same? See the cases in `findTightestCommonType`: https://github.com/HyukjinKwon/spark/blob/510a0eee43030abbf37ef922684e6165d6f1e1c8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L87-L90 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100716751 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- @cloud-fan refactored this logic recently and I believe he didn't missed this part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100716252 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- Yes, it is true that the type dispatch order was changed but `findTightestCommonType` does not take care of `DecimalType` therefore the results would be the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100715180 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -116,48 +114,66 @@ object TypeCoercion { * i.e. the main difference with [[findTightestCommonType]] is that here we allow some * loss of precision when widening decimal and double, and promotion to string. */ - private def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = (t1, t2) match { -case (t1: DecimalType, t2: DecimalType) => - Some(DecimalPrecision.widerDecimalType(t1, t2)) -case (t: IntegralType, d: DecimalType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (d: DecimalType, t: IntegralType) => - Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d)) -case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: FractionalType) => - Some(DoubleType) -case _ => - findTightestCommonTypeToString(t1, t2) + def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = { +findTightestCommonType(t1, t2) --- End diff -- Previously, `findWiderTypeForDecimal ` is before `findTightestCommonTypeToString `. Thus, the results could be different. cc @cloud-fan You changed the order. I am not sure whether this should be documented in the release note. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100690775 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -379,6 +386,67 @@ class TypeCoercionSuite extends PlanTest { widenTest(ArrayType(IntegerType), StructType(Seq()), None) } + test("wider common type for decimal and array") { +def widenTestWithStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeForTwo, t1, t2, expected) +} + +def widenTestWithoutStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeWithoutStringPromotionForTwo, t1, t2, expected) +} + +// Decimal +widenTestWithStringPromotion( + DecimalType(2, 1), DecimalType(3, 2), Some(DecimalType(3, 2))) +widenTestWithStringPromotion( + DecimalType(2, 1), DoubleType, Some(DoubleType)) +widenTestWithStringPromotion( + DecimalType(2, 1), IntegerType, Some(DecimalType(11, 1))) +widenTestWithStringPromotion( + DoubleType, DecimalType(2, 1), Some(DoubleType)) +widenTestWithStringPromotion( + LongType, DecimalType(2, 1), Some(DecimalType(21, 1))) + +// ArrayType +widenTestWithStringPromotion( + ArrayType(ShortType, containsNull = true), + ArrayType(DoubleType, containsNull = false), + Some(ArrayType(DoubleType, containsNull = true))) +widenTestWithStringPromotion( + ArrayType(TimestampType, containsNull = false), + ArrayType(StringType, containsNull = true), + Some(ArrayType(StringType, containsNull = true))) +widenTestWithStringPromotion( + ArrayType(ArrayType(IntegerType), containsNull = false), + ArrayType(ArrayType(LongType), containsNull = false), + Some(ArrayType(ArrayType(LongType), containsNull = false))) + +// Without string promotion +widenTestWithoutStringPromotion(IntegerType, StringType, None) --- End diff -- `LongType` test was removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100687325 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -379,6 +386,67 @@ class TypeCoercionSuite extends PlanTest { widenTest(ArrayType(IntegerType), StructType(Seq()), None) } + test("wider common type for decimal and array") { +def widenTestWithStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeForTwo, t1, t2, expected) +} + +def widenTestWithoutStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeWithoutStringPromotionForTwo, t1, t2, expected) +} + +// Decimal +widenTestWithStringPromotion( + DecimalType(2, 1), DecimalType(3, 2), Some(DecimalType(3, 2))) +widenTestWithStringPromotion( + DecimalType(2, 1), DoubleType, Some(DoubleType)) +widenTestWithStringPromotion( + DecimalType(2, 1), IntegerType, Some(DecimalType(11, 1))) +widenTestWithStringPromotion( + DoubleType, DecimalType(2, 1), Some(DoubleType)) +widenTestWithStringPromotion( + LongType, DecimalType(2, 1), Some(DecimalType(21, 1))) + +// ArrayType +widenTestWithStringPromotion( + ArrayType(ShortType, containsNull = true), + ArrayType(DoubleType, containsNull = false), + Some(ArrayType(DoubleType, containsNull = true))) +widenTestWithStringPromotion( + ArrayType(TimestampType, containsNull = false), + ArrayType(StringType, containsNull = true), + Some(ArrayType(StringType, containsNull = true))) +widenTestWithStringPromotion( + ArrayType(ArrayType(IntegerType), containsNull = false), + ArrayType(ArrayType(LongType), containsNull = false), + Some(ArrayType(ArrayType(LongType), containsNull = false))) + +// Without string promotion +widenTestWithoutStringPromotion(IntegerType, StringType, None) --- End diff -- I think we can just test int or long, not both --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100687309 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -379,6 +386,67 @@ class TypeCoercionSuite extends PlanTest { widenTest(ArrayType(IntegerType), StructType(Seq()), None) } + test("wider common type for decimal and array") { +def widenTestWithStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeForTwo, t1, t2, expected) +} + +def widenTestWithoutStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeWithoutStringPromotionForTwo, t1, t2, expected) +} + +// Decimal +widenTestWithStringPromotion( + DecimalType(2, 1), DecimalType(3, 2), Some(DecimalType(3, 2))) +widenTestWithStringPromotion( + DecimalType(2, 1), DoubleType, Some(DoubleType)) +widenTestWithStringPromotion( + DecimalType(2, 1), IntegerType, Some(DecimalType(11, 1))) +widenTestWithStringPromotion( + DoubleType, DecimalType(2, 1), Some(DoubleType)) --- End diff -- can we add a test for string? (not array(string)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100687275 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -379,6 +386,67 @@ class TypeCoercionSuite extends PlanTest { widenTest(ArrayType(IntegerType), StructType(Seq()), None) } + test("wider common type for decimal and array") { +def widenTestWithStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeForTwo, t1, t2, expected) +} + +def widenTestWithoutStringPromotion( +t1: DataType, +t2: DataType, +expected: Option[DataType]): Unit = { + checkWidenType(TypeCoercion.findWiderTypeWithoutStringPromotionForTwo, t1, t2, expected) +} + +// Decimal +widenTestWithStringPromotion( + DecimalType(2, 1), DecimalType(3, 2), Some(DecimalType(3, 2))) +widenTestWithStringPromotion( + DecimalType(2, 1), DoubleType, Some(DoubleType)) +widenTestWithStringPromotion( + DecimalType(2, 1), IntegerType, Some(DecimalType(11, 1))) +widenTestWithStringPromotion( + DoubleType, DecimalType(2, 1), Some(DoubleType)) --- End diff -- nit: how about we always put the decimal type in left? then we won't mistakenly add symmetric test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100678810 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -379,6 +380,75 @@ class TypeCoercionSuite extends PlanTest { widenTest(ArrayType(IntegerType), StructType(Seq()), None) } + test("wider common type for decimal and array") { +def widenTest(t1: DataType, t2: DataType, tightestCommon: Option[DataType]) { + var found = TypeCoercion.findWiderTypeForTwo(t1, t2) + assert(found == tightestCommon, +s"Expected $tightestCommon as wider common type for $t1 and $t2, found $found") + // Test both directions to make sure the widening is symmetric. + found = TypeCoercion.findWiderTypeForTwo(t2, t1) + assert(found == tightestCommon, +s"Expected $tightestCommon as wider common type for $t2 and $t1, found $found") +} + +def widenTestWithoutStringPromotion( +t1: DataType, +t2: DataType, +tightestCommon: Option[DataType]) { + var found = TypeCoercion.findWiderTypeWithoutStringPromotionForTwo(t1, t2) + assert(found == tightestCommon, +s"Expected $tightestCommon as wider common type for $t1 and $t2, found $found") + // Test both directions to make sure the widening is symmetric. + found = TypeCoercion.findWiderTypeWithoutStringPromotionForTwo(t2, t1) + assert(found == tightestCommon, +s"Expected $tightestCommon as wider common type for $t2 and $t1, found $found") +} + +// Decimal +widenTest(DecimalType(2, 1), DecimalType(3, 2), Some(DecimalType(3, 2))) +widenTest(DecimalType(2, 1), DoubleType, Some(DoubleType)) +widenTest(DecimalType(2, 1), IntegerType, Some(DecimalType(11, 1))) +widenTest(DoubleType, DecimalType(2, 1), Some(DoubleType)) +widenTest(IntegerType, DecimalType(2, 1), Some(DecimalType(11, 1))) +widenTest(LongType, DecimalType(2, 1), Some(DecimalType(21, 1))) + +// ArrayType +widenTest(ArrayType(NullType), ArrayType(ByteType), Some(ArrayType(ByteType))) --- End diff -- These tests are too similar. we just need one test case for numeric, one for string, one for nested array. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100678782 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -379,6 +380,75 @@ class TypeCoercionSuite extends PlanTest { widenTest(ArrayType(IntegerType), StructType(Seq()), None) } + test("wider common type for decimal and array") { +def widenTest(t1: DataType, t2: DataType, tightestCommon: Option[DataType]) { + var found = TypeCoercion.findWiderTypeForTwo(t1, t2) + assert(found == tightestCommon, +s"Expected $tightestCommon as wider common type for $t1 and $t2, found $found") + // Test both directions to make sure the widening is symmetric. + found = TypeCoercion.findWiderTypeForTwo(t2, t1) + assert(found == tightestCommon, +s"Expected $tightestCommon as wider common type for $t2 and $t1, found $found") +} + +def widenTestWithoutStringPromotion( +t1: DataType, +t2: DataType, +tightestCommon: Option[DataType]) { + var found = TypeCoercion.findWiderTypeWithoutStringPromotionForTwo(t1, t2) + assert(found == tightestCommon, +s"Expected $tightestCommon as wider common type for $t1 and $t2, found $found") + // Test both directions to make sure the widening is symmetric. + found = TypeCoercion.findWiderTypeWithoutStringPromotionForTwo(t2, t1) + assert(found == tightestCommon, +s"Expected $tightestCommon as wider common type for $t2 and $t1, found $found") +} + +// Decimal +widenTest(DecimalType(2, 1), DecimalType(3, 2), Some(DecimalType(3, 2))) +widenTest(DecimalType(2, 1), DoubleType, Some(DoubleType)) +widenTest(DecimalType(2, 1), IntegerType, Some(DecimalType(11, 1))) +widenTest(DoubleType, DecimalType(2, 1), Some(DoubleType)) +widenTest(IntegerType, DecimalType(2, 1), Some(DecimalType(11, 1))) --- End diff -- `widenTest` already tests symmetric --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100678739 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -379,6 +380,75 @@ class TypeCoercionSuite extends PlanTest { widenTest(ArrayType(IntegerType), StructType(Seq()), None) } + test("wider common type for decimal and array") { +def widenTest(t1: DataType, t2: DataType, tightestCommon: Option[DataType]) { --- End diff -- `tightestCommon` -> `expected`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100668061 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -101,13 +101,13 @@ object TypeCoercion { case _ => None } - /** Similar to [[findTightestCommonType]], but can promote all the way to StringType. */ - def findTightestCommonTypeToString(left: DataType, right: DataType): Option[DataType] = { -findTightestCommonType(left, right).orElse((left, right) match { - case (StringType, t2: AtomicType) if t2 != BinaryType && t2 != BooleanType => Some(StringType) - case (t1: AtomicType, StringType) if t1 != BinaryType && t1 != BooleanType => Some(StringType) - case _ => None -}) + /** + * Promotes all the way to StringType. + */ + private def stringPromotion: (DataType, DataType) => Option[DataType] = { --- End diff -- Oh, I will fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r100598284 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -101,13 +101,13 @@ object TypeCoercion { case _ => None } - /** Similar to [[findTightestCommonType]], but can promote all the way to StringType. */ - def findTightestCommonTypeToString(left: DataType, right: DataType): Option[DataType] = { -findTightestCommonType(left, right).orElse((left, right) match { - case (StringType, t2: AtomicType) if t2 != BinaryType && t2 != BooleanType => Some(StringType) - case (t1: AtomicType, StringType) if t1 != BinaryType && t1 != BooleanType => Some(StringType) - case _ => None -}) + /** + * Promotes all the way to StringType. + */ + private def stringPromotion: (DataType, DataType) => Option[DataType] = { --- End diff -- this becomes a method that returns a function, is it what we need? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r99329662 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -101,24 +101,13 @@ object TypeCoercion { case _ => None } - /** Similar to [[findTightestCommonType]], but can promote all the way to StringType. */ - def findTightestCommonTypeToString(left: DataType, right: DataType): Option[DataType] = { -findTightestCommonTypeOfTwo(left, right).orElse((left, right) match { - case (StringType, t2: AtomicType) if t2 != BinaryType && t2 != BooleanType => Some(StringType) - case (t1: AtomicType, StringType) if t1 != BinaryType && t1 != BooleanType => Some(StringType) - case _ => None -}) - } - /** - * Find the tightest common type of a set of types by continuously applying - * `findTightestCommonTypeOfTwo` on these types. + * Promotes all the way to StringType. */ - private def findTightestCommonType(types: Seq[DataType]): Option[DataType] = { --- End diff -- Sure, I submitted for this in https://github.com/apache/spark/pull/16786. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r99283834 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -101,24 +101,13 @@ object TypeCoercion { case _ => None } - /** Similar to [[findTightestCommonType]], but can promote all the way to StringType. */ - def findTightestCommonTypeToString(left: DataType, right: DataType): Option[DataType] = { -findTightestCommonTypeOfTwo(left, right).orElse((left, right) match { - case (StringType, t2: AtomicType) if t2 != BinaryType && t2 != BooleanType => Some(StringType) - case (t1: AtomicType, StringType) if t1 != BinaryType && t1 != BooleanType => Some(StringType) - case _ => None -}) - } - /** - * Find the tightest common type of a set of types by continuously applying - * `findTightestCommonTypeOfTwo` on these types. + * Promotes all the way to StringType. */ - private def findTightestCommonType(types: Seq[DataType]): Option[DataType] = { --- End diff -- It becomes harder for reviewers to read this PR. Could you submit a separate PR for code cleaning? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r99093765 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -53,7 +53,8 @@ class TypeCoercionSuite extends PlanTest { // | NullType | ByteType | ShortType | IntegerType | LongType | DoubleType | FloatType | Dec(10, 2) | BinaryType | BooleanType | StringType | DateType | TimestampType | ArrayType | MapType | StructType | NullType | CalendarIntervalType | DecimalType(38, 18) | DoubleType | IntegerType | // | CalendarIntervalType | X| X | X | X | X | X | X | X | X | X | X| X | X | X| X | X| CalendarIntervalType | X | X | X| // +--+--+---+-+--++---+++-++--+---++--+-+--+--+-+-+--+ - // Note: ArrayType*, MapType*, StructType* are castable only when the internal child types also match; otherwise, not castable + // Note: MapType*, StructType* are castable only when the internal child types also match; otherwise, not castable. + // Note: ArrayType* is castable when the element type is castable according to the table. --- End diff -- It seems we now support implicit cast of `ArrayType` via https://issues.apache.org/jira/browse/SPARK-18624. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16777#discussion_r99093557 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -101,24 +101,13 @@ object TypeCoercion { case _ => None } - /** Similar to [[findTightestCommonType]], but can promote all the way to StringType. */ - def findTightestCommonTypeToString(left: DataType, right: DataType): Option[DataType] = { -findTightestCommonTypeOfTwo(left, right).orElse((left, right) match { - case (StringType, t2: AtomicType) if t2 != BinaryType && t2 != BooleanType => Some(StringType) - case (t1: AtomicType, StringType) if t1 != BinaryType && t1 != BooleanType => Some(StringType) - case _ => None -}) - } - /** - * Find the tightest common type of a set of types by continuously applying - * `findTightestCommonTypeOfTwo` on these types. + * Promotes all the way to StringType. */ - private def findTightestCommonType(types: Seq[DataType]): Option[DataType] = { --- End diff -- `findTightestCommonType` was removed as it seems used nowhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16777 [SPARK-19435][SQL] Type coercion between ArrayTypes ## What changes were proposed in this pull request? This PR proposes to support type coercion between `ArrayType`s where the element types are compatible. **Before** ``` Seq(Array(1)).toDF("a").selectExpr("greatest(a, array(1D))") org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(`a`, array(1.0D))' due to data type mismatch: The expressions should all have the same type, got GREATEST(array, array).; line 1 pos 0; Seq(Array(1)).toDF("a").selectExpr("least(a, array(1D))") org.apache.spark.sql.AnalysisException: cannot resolve 'least(`a`, array(1.0D))' due to data type mismatch: The expressions should all have the same type, got LEAST(array, array).; line 1 pos 0; sql("SELECT * FROM values (array(0)), (array(1D)) as data(a)") org.apache.spark.sql.AnalysisException: incompatible types found in column a for inline table; line 1 pos 14 Seq(Array(1)).toDF("a").union(Seq(Array(1D)).toDF("b")) org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. ArrayType(DoubleType,false) <> ArrayType(IntegerType,false) at the first column of the second table;; sql("SELECT IF(1=1, array(1), array(1D))") org.apache.spark.sql.AnalysisException: cannot resolve '(IF((1 = 1), array(1), array(1.0D)))' due to data type mismatch: differing types in '(IF((1 = 1), array(1), array(1.0D)))' (array and array).; line 1 pos 7; ``` **After** ```scala Seq(Array(1)).toDF("a").selectExpr("greatest(a, array(1D))") res5: org.apache.spark.sql.DataFrame = [greatest(a, array(1.0)): array] Seq(Array(1)).toDF("a").selectExpr("least(a, array(1D))") res6: org.apache.spark.sql.DataFrame = [least(a, array(1.0)): array] sql("SELECT * FROM values (array(0)), (array(1D)) as data(a)") res8: org.apache.spark.sql.DataFrame = [a: array] Seq(Array(1)).toDF("a").union(Seq(Array(1D)).toDF("b")) res10: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [a: array] sql("SELECT IF(1=1, array(1), array(1D))") res15: org.apache.spark.sql.DataFrame = [(IF((1 = 1), array(1), array(1.0))): array] ``` ## How was this patch tested? Unit tests in `TypeCoercion` and Jenkins tests and building with scala 2.10 ```scala ./dev/change-scala-version.sh 2.10 ./build/mvn -Pyarn -Phadoop-2.4 -Dscala-2.10 -DskipTests clean package ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-19435 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16777.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16777 commit bd0d9f790821d8cb9caf77508e6c658fb56222ab Author: hyukjinkwon Date: 2017-02-02T10:21:07Z Type coercion between ArrayTypes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org