[ 
https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Roberts updated SPARK-13552:
---------------------------------
    Description: 
The Long.minValue test fails on IBM Java 8, we get the following incorrect 
answer with the slightly simplified test case:
{code:SQL}
val tester = sql(s"SELECT ${Long.MinValue} FROM testData")
{code}
result is

_-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's 
only one bit difference if we convert to binary representation).


Here's the full test output:
{code}
Results do not match for query:
== Parsed Logical Plan ==
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Sort ['key ASC], true
      +- 'Project [unresolvedalias(-9223372036854775808, None)]
         +- 'UnresolvedRelation `testData`, None

== Analyzed Logical Plan ==
(-9223372036854775808): decimal(19,0)
GlobalLimit 1
+- LocalLimit 1
   +- Project [(-9223372036854775808)#4391]
      +- Sort [key#101 ASC], true
         +- Project [-9223372036854775808 AS 
(-9223372036854775808)#4391,key#101]
            +- SubqueryAlias testData
               +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at 
beforeAll at BeforeAndAfterAll.scala:187

== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- Project [(-9223372036854775808)#4391]
      +- Sort [key#101 ASC], true
         +- Project [-9223372036854775808 AS 
(-9223372036854775808)#4391,key#101]
            +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at beforeAll 
at BeforeAndAfterAll.scala:187

== Physical Plan ==
TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], 
output=[(-9223372036854775808)#4391])
+- WholeStageCodegen
   :  +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101]
   :     +- INPUT
   +- Scan ExistingRDD[key#101,value#102]
== Results ==

== Results ==
!== Correct Answer - 1 ==   == Spark Answer - 1 ==
![-9223372036854775808]     [-9223372041149743104]
{code}

Debugging in Intellij shows the query seems to be parsed OK and we eventually 
have a schema with the correct data in the struct field but the BigDecimal's 
BigInteger is incorrect when we have a GenericRowWithSchema.

I've identified that the problem started when SPARK-12575 was implemented and 
suspect the following paragraph is important:

"Hive and the SQL Parser treat decimal literals differently. Hive will turn any 
decimal into a Double whereas the SQL Parser would convert a non-scientific 
decimal into a BigDecimal, and would turn a scientific decimal into a Double. 
We follow Hive's behavior here. The new parser supports a big decimal literal, 
for instance: 81923801.42BD, which can be used when a big decimal is needed."

  was:
The Long.minValue test fails on IBM Java 8, we get the following incorrect 
answer with the slightly simplified test case:
{code:SQL}
val tester = sql(s"SELECT ${Long.MinValue} FROM testData")
{code}
result is

_-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's 
only one bit difference if we convert to binary representation).

The query looks to be parsed OK and we have a schema with the correct data in 
the struct field but the BigDecimal's BigInteger is incorrect when we have a 
GenericRowWithSchema.

I've identified that the problem started when SPARK-12575 was implemented and 
suspect the following paragraph is important:

"Hive and the SQL Parser treat decimal literals differently. Hive will turn any 
decimal into a Double whereas the SQL Parser would convert a non-scientific 
decimal into a BigDecimal, and would turn a scientific decimal into a Double. 
We follow Hive's behavior here. The new parser supports a big decimal literal, 
for instance: 81923801.42BD, which can be used when a big decimal is needed."


> Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
> -------------------------------------------------------------
>
>                 Key: SPARK-13552
>                 URL: https://issues.apache.org/jira/browse/SPARK-13552
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: IBM Java only, all platforms
>            Reporter: Adam Roberts
>            Priority: Minor
>
> The Long.minValue test fails on IBM Java 8, we get the following incorrect 
> answer with the slightly simplified test case:
> {code:SQL}
> val tester = sql(s"SELECT ${Long.MinValue} FROM testData")
> {code}
> result is
> _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's 
> only one bit difference if we convert to binary representation).
> Here's the full test output:
> {code}
> Results do not match for query:
> == Parsed Logical Plan ==
> 'GlobalLimit 1
> +- 'LocalLimit 1
>    +- 'Sort ['key ASC], true
>       +- 'Project [unresolvedalias(-9223372036854775808, None)]
>          +- 'UnresolvedRelation `testData`, None
> == Analyzed Logical Plan ==
> (-9223372036854775808): decimal(19,0)
> GlobalLimit 1
> +- LocalLimit 1
>    +- Project [(-9223372036854775808)#4391]
>       +- Sort [key#101 ASC], true
>          +- Project [-9223372036854775808 AS 
> (-9223372036854775808)#4391,key#101]
>             +- SubqueryAlias testData
>                +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at 
> beforeAll at BeforeAndAfterAll.scala:187
> == Optimized Logical Plan ==
> GlobalLimit 1
> +- LocalLimit 1
>    +- Project [(-9223372036854775808)#4391]
>       +- Sort [key#101 ASC], true
>          +- Project [-9223372036854775808 AS 
> (-9223372036854775808)#4391,key#101]
>             +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at 
> beforeAll at BeforeAndAfterAll.scala:187
> == Physical Plan ==
> TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], 
> output=[(-9223372036854775808)#4391])
> +- WholeStageCodegen
>    :  +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101]
>    :     +- INPUT
>    +- Scan ExistingRDD[key#101,value#102]
> == Results ==
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 1 ==
> ![-9223372036854775808]     [-9223372041149743104]
> {code}
> Debugging in Intellij shows the query seems to be parsed OK and we eventually 
> have a schema with the correct data in the struct field but the BigDecimal's 
> BigInteger is incorrect when we have a GenericRowWithSchema.
> I've identified that the problem started when SPARK-12575 was implemented and 
> suspect the following paragraph is important:
> "Hive and the SQL Parser treat decimal literals differently. Hive will turn 
> any decimal into a Double whereas the SQL Parser would convert a 
> non-scientific decimal into a BigDecimal, and would turn a scientific decimal 
> into a Double. We follow Hive's behavior here. The new parser supports a big 
> decimal literal, for instance: 81923801.42BD, which can be used when a big 
> decimal is needed."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to