[GitHub] spark pull request #16063: [SPARK-18622][SQL] Remove TypeCoercion rules for ...

hvanhovell Tue, 29 Nov 2016 15:40:23 -0800

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16063#discussion_r90140461
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
    @@ -482,21 +482,6 @@ object TypeCoercion {
     
             CreateMap(newKeys.zip(newValues).flatMap { case (k, v) => Seq(k, 
v) })
     
    -      // Promote SUM, SUM DISTINCT and AVERAGE to largest types to prevent 
overflows.
    -      case s @ Sum(e @ DecimalType()) => s // Decimal is already the 
biggest.
    -      case Sum(e @ IntegralType()) if e.dataType != LongType => 
Sum(Cast(e, LongType))
    -      case Sum(e @ FractionalType()) if e.dataType != DoubleType => 
Sum(Cast(e, DoubleType))
    -
    -      case s @ Average(e @ DecimalType()) => s // Decimal is already the 
biggest.
    -      case Average(e @ IntegralType()) if e.dataType != LongType =>
    -        Average(Cast(e, LongType))
    -      case Average(e @ FractionalType()) if e.dataType != DoubleType =>
    -        Average(Cast(e, DoubleType))
    -
    -      // Hive lets you do aggregation of timestamps... for some reason
    -      case Sum(e @ TimestampType()) => Sum(Cast(e, DoubleType))
    -      case Average(e @ TimestampType()) => Average(Cast(e, DoubleType))
    --- End diff --
    
    I was surprised to find out that we actually support this. For example:
    ```scala
    import org.apache.spark.sql.types._
    val df = spark.range(1000).select((current_timestamp() + 
concat(lit("interval "), $"id", lit(" 
days")).cast(CalendarIntervalType)).as("time"))
    df.groupBy().agg(sum($"time")).show()
    
    +--------------------+
    |           sum(time)|
    +--------------------+
    |1.523614355600014...|
    +--------------------+
    ```
    Back to the drawing board :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16063: [SPARK-18622][SQL] Remove TypeCoercion rules for ...

Reply via email to