[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java

Adam Roberts (JIRA) Wed, 02 Mar 2016 02:11:34 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175383#comment-15175383
 ]


Adam Roberts commented on SPARK-13552:
--------------------------------------

Done, both "value" and "row" return the correct result for both Java 
implementations: -9223372036854775808

FWIW, I know the first time we can see the incorrect row values is in the 
{code}withCallback[T]{code} method in DataFrame.scala, the specific line of 
code is

{code}
val result = action(df)
{code}

Unfortunately when I'm stepping into this method, it's not clear how the 
resulting rows are being produced (could be that I'm debugging with the wrong 
thread in Intellij?). 

The first time I see a value for "result" is when it's too late: when we're 
seeing the incorrect values.

> Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
> -------------------------------------------------------------
>
>                 Key: SPARK-13552
>                 URL: https://issues.apache.org/jira/browse/SPARK-13552
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: IBM Java only, all platforms
>            Reporter: Adam Roberts
>            Priority: Minor
>         Attachments: DefectBadMinValueLongResized.jpg
>
>
> The Long.minValue test fails on IBM Java 8, we get the following incorrect 
> answer with the slightly simplified test case:
> {code:SQL}
> val tester = sql(s"SELECT ${Long.MinValue} FROM testData")
> {code}
> result is
> _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's 
> only one bit difference if we convert to binary representation).
> Here's the full test output:
> {code}
> Results do not match for query:
> == Parsed Logical Plan ==
> 'GlobalLimit 1
> +- 'LocalLimit 1
>    +- 'Sort ['key ASC], true
>       +- 'Project [unresolvedalias(-9223372036854775808, None)]
>          +- 'UnresolvedRelation `testData`, None
> == Analyzed Logical Plan ==
> (-9223372036854775808): decimal(19,0)
> GlobalLimit 1
> +- LocalLimit 1
>    +- Project [(-9223372036854775808)#4391]
>       +- Sort [key#101 ASC], true
>          +- Project [-9223372036854775808 AS 
> (-9223372036854775808)#4391,key#101]
>             +- SubqueryAlias testData
>                +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at 
> beforeAll at BeforeAndAfterAll.scala:187
> == Optimized Logical Plan ==
> GlobalLimit 1
> +- LocalLimit 1
>    +- Project [(-9223372036854775808)#4391]
>       +- Sort [key#101 ASC], true
>          +- Project [-9223372036854775808 AS 
> (-9223372036854775808)#4391,key#101]
>             +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at 
> beforeAll at BeforeAndAfterAll.scala:187
> == Physical Plan ==
> TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], 
> output=[(-9223372036854775808)#4391])
> +- WholeStageCodegen
>    :  +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101]
>    :     +- INPUT
>    +- Scan ExistingRDD[key#101,value#102]
> == Results ==
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 1 ==
> ![-9223372036854775808]     [-9223372041149743104]
> {code}
> Debugging in Intellij shows the query seems to be parsed OK and we eventually 
> have a schema with the correct data in the struct field but the BigDecimal's 
> BigInteger is incorrect when we have a GenericRowWithSchema.
> I've identified that the problem started when SPARK-12575 was implemented and 
> suspect the following paragraph is important:
> "Hive and the SQL Parser treat decimal literals differently. Hive will turn 
> any decimal into a Double whereas the SQL Parser would convert a 
> non-scientific decimal into a BigDecimal, and would turn a scientific decimal 
> into a Double. We follow Hive's behavior here. The new parser supports a big 
> decimal literal, for instance: 81923801.42BD, which can be used when a big 
> decimal is needed."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java

Reply via email to