[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23210 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/23210#discussion_r238471660 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1647,6 +1647,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkDataset(ds, data: _*) checkAnswer(ds.select("x"), Seq(Row(1), Row(2))) } + + test("SPARK-26233: serializer should enforce decimal precision and scale") { --- End diff -- Well, everything is possible, but it is not easy actually. Because the issue here happens in the codegen, not when we retrieve the output. So if we just encode and decode everything is fine. The problem happens if there is any transformation in the codegen meanwhile, because there the underlying decimal is used (assuming that it has the same precision and scale of the data type - which without the current change is not always true). I tried checking the precision and scale of the serialized object, but it is not really feasible as they are converted when it is read (please see `UnsafeRow`)... So I'd avoid this actually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23210#discussion_r238424196 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1647,6 +1647,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkDataset(ds, data: _*) checkAnswer(ds.select("x"), Seq(Row(1), Row(2))) } + + test("SPARK-26233: serializer should enforce decimal precision and scale") { --- End diff -- Can we have a test case in `RowEncoderSuite`, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/23210 [SPARK-26233][SQL] CheckOverflow when encoding a decimal value ## What changes were proposed in this pull request? When we encode a Decimal from external source we don't check for overflow. That method is useful not only in order to enforce that we can represent the correct value in the specified range, but it also changes the underlying data to the right precision/scale. Since in our code generation we assume that a decimal has exactly the same precision and scale of its data type, missing to enforce it can lead to corrupted output/results when there are subsequent transformations. ## How was this patch tested? added UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-26233 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23210.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23210 commit 91d3e1b49667d3d5023663c8507570a118c54254 Author: Marco Gaido Date: 2018-12-03T16:16:08Z [SPARK-26233][SQL] CheckOverflow when encoding a decimal value --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org