[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-04-19 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r182820983
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala
 ---
@@ -80,30 +80,44 @@ case class PromotePrecision(child: Expression) extends 
UnaryExpression {
 
 /**
  * Rounds the decimal to given scale and check whether the decimal can fit 
in provided precision
- * or not, returns null if not.
+ * or not. If not, if `nullOnOverflow` is `true`, it returns `null`; 
otherwise an
+ * `ArithmeticException` is thrown.
  */
-case class CheckOverflow(child: Expression, dataType: DecimalType) extends 
UnaryExpression {
+case class CheckOverflow(
+child: Expression,
+dataType: DecimalType,
+nullOnOverflow: Boolean) extends UnaryExpression {
 
   override def nullable: Boolean = true
 
   override def nullSafeEval(input: Any): Any =
-input.asInstanceOf[Decimal].toPrecision(dataType.precision, 
dataType.scale)
+input.asInstanceOf[Decimal].toPrecision(
+  dataType.precision,
+  dataType.scale,
+  Decimal.ROUND_HALF_UP,
+  nullOnOverflow)
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, eval => {
   val tmp = ctx.freshName("tmp")
+  val onOverflow = if (nullOnOverflow) {
--- End diff --

Why are you not just calling `Decimal.toPrecision`  here? There seems to be 
very little value in code generating this (no specialization).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-23 Thread bersprockets
Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163269198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1074,6 +1074,16 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val DECIMAL_OPERATIONS_NULL_ON_OVERFLOW =
+buildConf("spark.sql.decimalOperations.nullOnOverflow")
+  .internal()
+  .doc("When true (default), if an overflow on a decimal occurs, then 
NULL is returned. " +
+"Spark's older versions and Hive behave in this way. If turned to 
false, SQL ANSI 2011 " +
+"specification, will be followed instead: an arithmetic exception 
is thrown. This is " +
+"what most of the SQL databases do.")
--- End diff --

Tiny nit:

If turned to false, SQL ANSI 2011 specification, will be followed instead

This should be

If turned to false, SQL ANSI 2011 specification will be followed instead



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-23 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163184979
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   /**
* Create new `Decimal` with given precision and scale.
*
-   * @return a non-null `Decimal` value if successful or `null` if 
overflow would occur.
+   * @return a non-null `Decimal` value if successful. Otherwise, if 
`nullOnOverflow` is true, null
+   * is returned; if `nullOnOverflow` is false, an 
`ArithmeticException` is thrown.
*/
   private[sql] def toPrecision(
   precision: Int,
   scale: Int,
-  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = 
{
+  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
+  nullOnOverflow: Boolean = true): Decimal = {
 val copy = clone()
-if (copy.changePrecision(precision, scale, roundMode)) copy else null
+if (copy.changePrecision(precision, scale, roundMode)) {
+  copy
+} else {
+  def message = s"$toDebugString cannot be represented as 
Decimal($precision, $scale)."
+  if (nullOnOverflow) {
+if (log.isDebugEnabled) logDebug(s"$message NULL is returned.")
+null
--- End diff --

since also @hvanhovell was suggesting that this is not necessary, even 
though I think it would be good to have it, I am removing it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-23 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163184915
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/decimalArithmeticOperations.sql ---
@@ -49,7 +49,6 @@ select 1e35 / 0.1;
 
 -- arithmetic operations causing a precision loss are truncated
 select 123456789123456789.1234567890 * 1.123456789123456789;
-select 0.001 / 9876543210987654321098765432109876543.2
--- End diff --

yes, unfortunately I missed it somehow previously...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163134873
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   /**
* Create new `Decimal` with given precision and scale.
*
-   * @return a non-null `Decimal` value if successful or `null` if 
overflow would occur.
+   * @return a non-null `Decimal` value if successful. Otherwise, if 
`nullOnOverflow` is true, null
+   * is returned; if `nullOnOverflow` is false, an 
`ArithmeticException` is thrown.
*/
   private[sql] def toPrecision(
   precision: Int,
   scale: Int,
-  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = 
{
+  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
+  nullOnOverflow: Boolean = true): Decimal = {
 val copy = clone()
-if (copy.changePrecision(precision, scale, roundMode)) copy else null
+if (copy.changePrecision(precision, scale, roundMode)) {
+  copy
+} else {
+  def message = s"$toDebugString cannot be represented as 
Decimal($precision, $scale)."
+  if (nullOnOverflow) {
+if (log.isDebugEnabled) logDebug(s"$message NULL is returned.")
+null
--- End diff --

Do we really need to log for this? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163134537
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/decimalArithmeticOperations.sql ---
@@ -49,7 +49,6 @@ select 1e35 / 0.1;
 
 -- arithmetic operations causing a precision loss are truncated
 select 123456789123456789.1234567890 * 1.123456789123456789;
-select 0.001 / 9876543210987654321098765432109876543.2
--- End diff --

I think it is missing a `;` before...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163013386
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   /**
* Create new `Decimal` with given precision and scale.
*
-   * @return a non-null `Decimal` value if successful or `null` if 
overflow would occur.
+   * @return a non-null `Decimal` value if successful. Otherwise, if 
`nullOnOverflow` is true, null
+   * is returned; if `nullOnOverflow` is false, an 
`ArithmeticException` is thrown.
*/
   private[sql] def toPrecision(
   precision: Int,
   scale: Int,
-  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = 
{
+  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
+  nullOnOverflow: Boolean = true): Decimal = {
 val copy = clone()
-if (copy.changePrecision(precision, scale, roundMode)) copy else null
+if (copy.changePrecision(precision, scale, roundMode)) {
+  copy
+} else {
+  val message = s"$toDebugString cannot be represented as 
Decimal($precision, $scale)."
+  if (nullOnOverflow) {
+logWarning(s"$message NULL is returned.")
--- End diff --

I am ok with using debug/trace level logging. Can you make sure we do not 
construct the message unless we are logging or throwing the exception (changing 
`val` into `def` should be enough)? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163008946
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   /**
* Create new `Decimal` with given precision and scale.
*
-   * @return a non-null `Decimal` value if successful or `null` if 
overflow would occur.
+   * @return a non-null `Decimal` value if successful. Otherwise, if 
`nullOnOverflow` is true, null
+   * is returned; if `nullOnOverflow` is false, an 
`ArithmeticException` is thrown.
*/
   private[sql] def toPrecision(
   precision: Int,
   scale: Int,
-  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = 
{
+  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
+  nullOnOverflow: Boolean = true): Decimal = {
 val copy = clone()
-if (copy.changePrecision(precision, scale, roundMode)) copy else null
+if (copy.changePrecision(precision, scale, roundMode)) {
+  copy
+} else {
+  val message = s"$toDebugString cannot be represented as 
Decimal($precision, $scale)."
+  if (nullOnOverflow) {
+logWarning(s"$message NULL is returned.")
--- End diff --

I see your point. And I agree with you. But I wanted to put some traces of 
what was happening What about using DEBUG as log level? In this case most of 
the time we are not logging anything, but if we want to check is an overflow is 
happening we can. What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163007885
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   /**
* Create new `Decimal` with given precision and scale.
*
-   * @return a non-null `Decimal` value if successful or `null` if 
overflow would occur.
+   * @return a non-null `Decimal` value if successful. Otherwise, if 
`nullOnOverflow` is true, null
+   * is returned; if `nullOnOverflow` is false, an 
`ArithmeticException` is thrown.
*/
   private[sql] def toPrecision(
   precision: Int,
   scale: Int,
-  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = 
{
+  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
+  nullOnOverflow: Boolean = true): Decimal = {
 val copy = clone()
-if (copy.changePrecision(precision, scale, roundMode)) copy else null
+if (copy.changePrecision(precision, scale, roundMode)) {
+  copy
+} else {
+  val message = s"$toDebugString cannot be represented as 
Decimal($precision, $scale)."
+  if (nullOnOverflow) {
+logWarning(s"$message NULL is returned.")
--- End diff --

I agree that a result becomes less useful if we return nulls often. My 
problem is more that if we process a million non convertible decimals we log 
the same message a million times, which is going to cause a significant 
regression. Moreover this is logged on the executor, an end-user typically does 
not look at those logs (there is also no reason to do so since the job does not 
throw an error).

My suggestion would be to not log at all, or just log once. I prefer not to 
log at all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r162959471
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   /**
* Create new `Decimal` with given precision and scale.
*
-   * @return a non-null `Decimal` value if successful or `null` if 
overflow would occur.
+   * @return a non-null `Decimal` value if successful. Otherwise, if 
`nullOnOverflow` is true, null
+   * is returned; if `nullOnOverflow` is false, an 
`ArithmeticException` is thrown.
*/
   private[sql] def toPrecision(
   precision: Int,
   scale: Int,
-  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = 
{
+  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
+  nullOnOverflow: Boolean = true): Decimal = {
 val copy = clone()
-if (copy.changePrecision(precision, scale, roundMode)) copy else null
+if (copy.changePrecision(precision, scale, roundMode)) {
+  copy
+} else {
+  val message = s"$toDebugString cannot be represented as 
Decimal($precision, $scale)."
+  if (nullOnOverflow) {
+logWarning(s"$message NULL is returned.")
--- End diff --

If we hit it often, the result we get is quite useless. I added it only to 
notify the user of something which is an unexpected/undesired situation and now 
happens silently. I think it is bad that the user cannot know if a NULL is a 
result of an operation involving NULLs or the result of an overflow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r162957968
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   /**
* Create new `Decimal` with given precision and scale.
*
-   * @return a non-null `Decimal` value if successful or `null` if 
overflow would occur.
+   * @return a non-null `Decimal` value if successful. Otherwise, if 
`nullOnOverflow` is true, null
+   * is returned; if `nullOnOverflow` is false, an 
`ArithmeticException` is thrown.
*/
   private[sql] def toPrecision(
   precision: Int,
   scale: Int,
-  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = 
{
+  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
+  nullOnOverflow: Boolean = true): Decimal = {
 val copy = clone()
-if (copy.changePrecision(precision, scale, roundMode)) copy else null
+if (copy.changePrecision(precision, scale, roundMode)) {
+  copy
+} else {
+  val message = s"$toDebugString cannot be represented as 
Decimal($precision, $scale)."
+  if (nullOnOverflow) {
+logWarning(s"$message NULL is returned.")
--- End diff --

I am not sure if we should log this message. If we hit this often we'll end 
up with huge logs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-22 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/20350

[SPARK-23179][SQL] Support option to throw exception if overflow occurs

## What changes were proposed in this pull request?

SQL ANSI 2011 states that in case of overflow during arithmetic operations, 
an exception should be thrown. This is what most of the SQL DBs do (eg. 
SQLServer, DB2). Hive currently returns NULL (as Spark does) but HIVE-18291 is 
open to be SQL compliant.

The PR introduce an option to decide which behavior Spark should follow, 
ie. returning NULL on overflow or throwing an exception.

## How was this patch tested?

added UTs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23179

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20350.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20350


commit 449b69c0804622cc747d280d28415e93144ca272
Author: Marco Gaido 
Date:   2018-01-22T14:32:04Z

[SPARK-23179][SQL] Support option to throw exception if overflow occurs




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org