[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268526#comment-15268526 ] Herman van Hovell commented on SPARK-13552: --- [~aroberts]/[~robbinspg] it would be nice for completeness sake to have a link to a ticket/documentation describing this. Could you provide one? > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > Attachments: DefectBadMinValueLongResized.jpg > > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." > Done, both "value" and "row" return the correct result for both Java > implementations: -9223372036854775808 > FWIW, I know the first time we can see the incorrect row values is in the > {code}withCallback[T]{code} method in DataFrame.scala, the specific line of > code is > {code} > val result = action(df) > {code} > Stepping into this doesn't clearly indicate how the resulting rows are being > produced though (could be that I'm debugging with the wrong thread in > Intellij - the first time I see a value for "result" is when it's too late - > when we're seeing the incorrect values). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268232#comment-15268232 ] Pete Robbins commented on SPARK-13552: -- [~aroberts] This Jira can be closed as this is not a Spark issue > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > Attachments: DefectBadMinValueLongResized.jpg > > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." > Done, both "value" and "row" return the correct result for both Java > implementations: -9223372036854775808 > FWIW, I know the first time we can see the incorrect row values is in the > {code}withCallback[T]{code} method in DataFrame.scala, the specific line of > code is > {code} > val result = action(df) > {code} > Stepping into this doesn't clearly indicate how the resulting rows are being > produced though (could be that I'm debugging with the wrong thread in > Intellij - the first time I see a value for "result" is when it's too late - > when we're seeing the incorrect values). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262482#comment-15262482 ] Pete Robbins commented on SPARK-13552: -- This is looking like an issue with the IBM implementation of java.math.BigInteger. I'm still investigating and we can close this jira if my theory is correct. > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > Attachments: DefectBadMinValueLongResized.jpg > > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." > Done, both "value" and "row" return the correct result for both Java > implementations: -9223372036854775808 > FWIW, I know the first time we can see the incorrect row values is in the > {code}withCallback[T]{code} method in DataFrame.scala, the specific line of > code is > {code} > val result = action(df) > {code} > Stepping into this doesn't clearly indicate how the resulting rows are being > produced though (could be that I'm debugging with the wrong thread in > Intellij - the first time I see a value for "result" is when it's too late - > when we're seeing the incorrect values). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175383#comment-15175383 ] Adam Roberts commented on SPARK-13552: -- Done, both "value" and "row" return the correct result for both Java implementations: -9223372036854775808 FWIW, I know the first time we can see the incorrect row values is in the {code}withCallback[T]{code} method in DataFrame.scala, the specific line of code is {code} val result = action(df) {code} Unfortunately when I'm stepping into this method, it's not clear how the resulting rows are being produced (could be that I'm debugging with the wrong thread in Intellij?). The first time I see a value for "result" is when it's too late: when we're seeing the incorrect values. > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > Attachments: DefectBadMinValueLongResized.jpg > > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173958#comment-15173958 ] Herman van Hovell commented on SPARK-13552: --- Could you execute the following code in spark-shell ({{sbt/sparkShell}} is a quick way of doing this): {noformat} import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.types._ val struct = (new StructType).add("a", DecimalType(19, 0)) val value = UnaryMinus(Literal(BigDecimal("9223372036854775808").underlying())).eval() val row = new GenericRowWithSchema(Array(value), struct) println(row) {noformat} What is {{println(row)}} returning? It should be {{\[-9223372036854775808\]}}. > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > Attachments: DefectBadMinValueLongResized.jpg > > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173931#comment-15173931 ] Adam Roberts commented on SPARK-13552: -- Thanks for the suggestion, both JDK vendors use the Literal.v(underlying) case and the value of the "text" variable is correct. Again for both vendors, v.underlying() is correct and so is Literal(v.underlying()). > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > Attachments: DefectBadMinValueLongResized.jpg > > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173868#comment-15173868 ] Herman van Hovell commented on SPARK-13552: --- [~aroberts] Since you are already in the debugging mode. Could you check what happens in the following lines of CatalystQl: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystQl.scala#L829-L843 This is where we convert the string into a Literal. If this checks out (it appears so because the field name is correct), then we should start looking further down the line. > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > Attachments: DefectBadMinValueLongResized.jpg > > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173856#comment-15173856 ] Herman van Hovell commented on SPARK-13552: --- Yeah, -2^32 points for me. Should of though longer about that before posting :S... > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > Attachments: DefectBadMinValueLongResized.jpg > > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171838#comment-15171838 ] Adam Roberts commented on SPARK-13552: -- Thanks for the quick reply Herman, yep Long.MinValue is the same for IBM Java and even when we just create a Row object with Long.MinValue passed as an argument to the constructor, printing the result doesn't indicate a problem. It's a Spark test in the SQLQuerySuite. Here's what I encountered while debugging (a lot of stepping into required to get here) !DefectBadMinValueLongResized.jpg! > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > Attachments: DefectBadMinValueLongResized.jpg > > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171823#comment-15171823 ] Sean Owen commented on SPARK-13552: --- It certainly must be -9223372036854775808 = -2^63; he's saying the query returns something else which doesn't make sense. The difference is exactly -2^32 = -4294967296, which means the result is actually less than -2^63. Maybe that adds a clue; looks like you are pretty much on the trail already. > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171812#comment-15171812 ] Herman van Hovell commented on SPARK-13552: --- Hi [~aroberts], We have reverted the double behavior in SPARK-12848. When we encounter a decimal value we will create a BigDecimal literal for this. Scientific decimals are still converted into doubles. However, the example you are giving me deals with integral values. These are treated differently: we will convert the integral value into a literal with the most suitable data type (i.e. BigInteger, Long or Int). What makes this interesting is that the parser parses this as a positive number with unary minus applied to it. This is relevant because the value {{9223372036854775808}} is outside of the range of possible Long value; the literal will be a BigInt instead of a Long (this can cause testing problems). Another thing is that on my machine Long.MinValue is -9223372036854775808 instead of -9223372041149743104: {noformat} scala> Long.MinValue res14: Long = -9223372036854775808 {noformat} Is this the same for IBM jvms (I can hardly imagine it isn't)? Is the test that is failing a Spark test? Or an internal one? > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > Here's the full test output: > {code} > Results do not match for query: > == Parsed Logical Plan == > 'GlobalLimit 1 > +- 'LocalLimit 1 >+- 'Sort ['key ASC], true > +- 'Project [unresolvedalias(-9223372036854775808, None)] > +- 'UnresolvedRelation `testData`, None > == Analyzed Logical Plan == > (-9223372036854775808): decimal(19,0) > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- SubqueryAlias testData >+- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Optimized Logical Plan == > GlobalLimit 1 > +- LocalLimit 1 >+- Project [(-9223372036854775808)#4391] > +- Sort [key#101 ASC], true > +- Project [-9223372036854775808 AS > (-9223372036854775808)#4391,key#101] > +- LogicalRDD [key#101,value#102], MapPartitionsRDD[3] at > beforeAll at BeforeAndAfterAll.scala:187 > == Physical Plan == > TakeOrderedAndProject(limit=1, orderBy=[key#101 ASC], > output=[(-9223372036854775808)#4391]) > +- WholeStageCodegen >: +- Project [-9223372036854775808 AS (-9223372036854775808)#4391,key#101] >: +- INPUT >+- Scan ExistingRDD[key#101,value#102] > == Results == > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![-9223372036854775808] [-9223372041149743104] > {code} > Debugging in Intellij shows the query seems to be parsed OK and we eventually > have a schema with the correct data in the struct field but the BigDecimal's > BigInteger is incorrect when we have a GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13552) Incorrect data for Long.minValue in SQLQuerySuite on IBM Java
[ https://issues.apache.org/jira/browse/SPARK-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171678#comment-15171678 ] Sean Owen commented on SPARK-13552: --- CC [~hvanhovell] > Incorrect data for Long.minValue in SQLQuerySuite on IBM Java > - > > Key: SPARK-13552 > URL: https://issues.apache.org/jira/browse/SPARK-13552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: IBM Java only, all platforms >Reporter: Adam Roberts >Priority: Minor > > The Long.minValue test fails on IBM Java 8, we get the following incorrect > answer with the slightly simplified test case: > {code:SQL} > val tester = sql(s"SELECT ${Long.MinValue} FROM testData") > {code} > result is > _-9,223,372,041,149,743,104_ instead of _-9,223,372,036,854,775,808_ (there's > only one bit difference if we convert to binary representation). > The query looks to be parsed OK and we have a schema with the correct data in > the struct field but the BigDecimal's BigInteger is incorrect when we have a > GenericRowWithSchema. > I've identified that the problem started when SPARK-12575 was implemented and > suspect the following paragraph is important: > "Hive and the SQL Parser treat decimal literals differently. Hive will turn > any decimal into a Double whereas the SQL Parser would convert a > non-scientific decimal into a BigDecimal, and would turn a scientific decimal > into a Double. We follow Hive's behavior here. The new parser supports a big > decimal literal, for instance: 81923801.42BD, which can be used when a big > decimal is needed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org