Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14777#discussion_r76313519
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
    @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) 
extends LeafExpression with
         case (v: Short, ShortType) => v + "S"
         case (v: Long, LongType) => v + "L"
         // Float type doesn't have a suffix
    -    case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})"
    -    case (v: Double, DoubleType) => v + "D"
    +    case (v: Float, FloatType) =>
    +      val castedValue = v match {
    +        case _ if v.isNaN => "'NaN'"
    +        case Float.PositiveInfinity => "'Infinity'"
    +        case Float.NegativeInfinity => "'-Infinity'"
    +        case _ => v
    +      }
    +      s"CAST($castedValue AS ${FloatType.sql})"
    +    case (v: Double, DoubleType) =>
    +      v match {
    +        case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})"
    +        case Double.PositiveInfinity => s"CAST('Infinity' AS 
${DoubleType.sql})"
    +        case Double.NegativeInfinity => s"CAST('-Infinity' AS 
${DoubleType.sql})"
    +        case _ => v + "D"
    +      }
         case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})"
    --- End diff --
    
    According to 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-FloatingPointTypes:
    
    > Floating point literals are assumed to be DOUBLE. Scientific notation is 
not yet supported.
    
    However, the professed lack of support for scientific notation seems to be 
contradicted by https://issues.apache.org/jira/browse/HIVE-2536 and manual 
tests.
    
    Here's a test query which demonstrates the precision issues in decimal 
literals:
    
    ```
    SELECT
        CAST(-0.000000000000000006688467811848818630 as DECIMAL(38, 36)), 
        CAST(-6.688467811848818630E-18 AS DECIMAL(38, 36))
    ```
    
    In Hive, these both behave equivalently: both forms of the number are 
interpreted as double so we lose precision and both cases wind up as 
`0.000000000000000006688467811848818` (with the final three digits lost).
    
    In Spark 2.0, the first expanded form is parsed as a decimal literal, while 
the scientific notation form is parsed as a double, so the expanded form 
correctly preserves the decimal while the scientific notation causes precision 
loss (as in Hive).
    
    I think there's two possible fixes here: we could either emit the 
fully-expanded form or could update Spark's parser to treat scientific notation 
floating point literals as decimals.
    
    From a consistency point, I'm in favor of the latter approach because I 
don't think it makes sense for `1.1` and `1.1e0` to be treated differently. 
    
    Given all of this, I think that it would certainly be _safe_ to emit 
fully-expanded forms of the decimal but I'm not sure if this is the optimal fix 
because it doesn't resolve inconsistencies between Spark and Hive and results 
in really ugly, hard-to-read expressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to