[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...

2018-12-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23210


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...

2018-12-03 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23210#discussion_r238471660
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -1647,6 +1647,15 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
 checkDataset(ds, data: _*)
 checkAnswer(ds.select("x"), Seq(Row(1), Row(2)))
   }
+
+  test("SPARK-26233: serializer should enforce decimal precision and 
scale") {
--- End diff --

Well, everything is possible, but it is not easy actually. Because the 
issue here happens in the codegen, not when we retrieve the output. So if we 
just encode and decode everything is fine. The problem happens if there is any 
transformation in the codegen meanwhile, because there the underlying decimal 
is used (assuming that it has the same precision and scale of the data type - 
which without the current change is not always true). I tried checking the 
precision and scale of the serialized object, but it is not really feasible as 
they are converted when it is read (please see `UnsafeRow`)... So I'd avoid 
this actually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...

2018-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23210#discussion_r238424196
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -1647,6 +1647,15 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
 checkDataset(ds, data: _*)
 checkAnswer(ds.select("x"), Seq(Row(1), Row(2)))
   }
+
+  test("SPARK-26233: serializer should enforce decimal precision and 
scale") {
--- End diff --

Can we have a test case in `RowEncoderSuite`, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...

2018-12-03 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/23210

[SPARK-26233][SQL] CheckOverflow when encoding a decimal value

## What changes were proposed in this pull request?

When we encode a Decimal from external source we don't check for overflow. 
That method is useful not only in order to enforce that we can represent the 
correct value in the specified range, but it also changes the underlying data 
to the right precision/scale. Since in our code generation we assume that a 
decimal has exactly the same precision and scale of its data type, missing to 
enforce it can lead to corrupted output/results when there are subsequent 
transformations.

## How was this patch tested?

added UT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-26233

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23210.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23210


commit 91d3e1b49667d3d5023663c8507570a118c54254
Author: Marco Gaido 
Date:   2018-12-03T16:16:08Z

[SPARK-26233][SQL] CheckOverflow when encoding a decimal value




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org