[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r182820983 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala --- @@ -80,30 +80,44 @@ case class PromotePrecision(child: Expression) extends UnaryExpression { /** * Rounds the decimal to given scale and check whether the decimal can fit in provided precision - * or not, returns null if not. + * or not. If not, if `nullOnOverflow` is `true`, it returns `null`; otherwise an + * `ArithmeticException` is thrown. */ -case class CheckOverflow(child: Expression, dataType: DecimalType) extends UnaryExpression { +case class CheckOverflow( +child: Expression, +dataType: DecimalType, +nullOnOverflow: Boolean) extends UnaryExpression { override def nullable: Boolean = true override def nullSafeEval(input: Any): Any = -input.asInstanceOf[Decimal].toPrecision(dataType.precision, dataType.scale) +input.asInstanceOf[Decimal].toPrecision( + dataType.precision, + dataType.scale, + Decimal.ROUND_HALF_UP, + nullOnOverflow) override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, eval => { val tmp = ctx.freshName("tmp") + val onOverflow = if (nullOnOverflow) { --- End diff -- Why are you not just calling `Decimal.toPrecision` here? There seems to be very little value in code generating this (no specialization). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163269198 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1074,6 +1074,16 @@ object SQLConf { .booleanConf .createWithDefault(true) + val DECIMAL_OPERATIONS_NULL_ON_OVERFLOW = +buildConf("spark.sql.decimalOperations.nullOnOverflow") + .internal() + .doc("When true (default), if an overflow on a decimal occurs, then NULL is returned. " + +"Spark's older versions and Hive behave in this way. If turned to false, SQL ANSI 2011 " + +"specification, will be followed instead: an arithmetic exception is thrown. This is " + +"what most of the SQL databases do.") --- End diff -- Tiny nit: If turned to false, SQL ANSI 2011 specification, will be followed instead This should be If turned to false, SQL ANSI 2011 specification will be followed instead --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163184979 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with Serializable { /** * Create new `Decimal` with given precision and scale. * - * @return a non-null `Decimal` value if successful or `null` if overflow would occur. + * @return a non-null `Decimal` value if successful. Otherwise, if `nullOnOverflow` is true, null + * is returned; if `nullOnOverflow` is false, an `ArithmeticException` is thrown. */ private[sql] def toPrecision( precision: Int, scale: Int, - roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = { + roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, + nullOnOverflow: Boolean = true): Decimal = { val copy = clone() -if (copy.changePrecision(precision, scale, roundMode)) copy else null +if (copy.changePrecision(precision, scale, roundMode)) { + copy +} else { + def message = s"$toDebugString cannot be represented as Decimal($precision, $scale)." + if (nullOnOverflow) { +if (log.isDebugEnabled) logDebug(s"$message NULL is returned.") +null --- End diff -- since also @hvanhovell was suggesting that this is not necessary, even though I think it would be good to have it, I am removing it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163184915 --- Diff: sql/core/src/test/resources/sql-tests/inputs/decimalArithmeticOperations.sql --- @@ -49,7 +49,6 @@ select 1e35 / 0.1; -- arithmetic operations causing a precision loss are truncated select 123456789123456789.1234567890 * 1.123456789123456789; -select 0.001 / 9876543210987654321098765432109876543.2 --- End diff -- yes, unfortunately I missed it somehow previously... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163134873 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with Serializable { /** * Create new `Decimal` with given precision and scale. * - * @return a non-null `Decimal` value if successful or `null` if overflow would occur. + * @return a non-null `Decimal` value if successful. Otherwise, if `nullOnOverflow` is true, null + * is returned; if `nullOnOverflow` is false, an `ArithmeticException` is thrown. */ private[sql] def toPrecision( precision: Int, scale: Int, - roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = { + roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, + nullOnOverflow: Boolean = true): Decimal = { val copy = clone() -if (copy.changePrecision(precision, scale, roundMode)) copy else null +if (copy.changePrecision(precision, scale, roundMode)) { + copy +} else { + def message = s"$toDebugString cannot be represented as Decimal($precision, $scale)." + if (nullOnOverflow) { +if (log.isDebugEnabled) logDebug(s"$message NULL is returned.") +null --- End diff -- Do we really need to log for this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163134537 --- Diff: sql/core/src/test/resources/sql-tests/inputs/decimalArithmeticOperations.sql --- @@ -49,7 +49,6 @@ select 1e35 / 0.1; -- arithmetic operations causing a precision loss are truncated select 123456789123456789.1234567890 * 1.123456789123456789; -select 0.001 / 9876543210987654321098765432109876543.2 --- End diff -- I think it is missing a `;` before... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163013386 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with Serializable { /** * Create new `Decimal` with given precision and scale. * - * @return a non-null `Decimal` value if successful or `null` if overflow would occur. + * @return a non-null `Decimal` value if successful. Otherwise, if `nullOnOverflow` is true, null + * is returned; if `nullOnOverflow` is false, an `ArithmeticException` is thrown. */ private[sql] def toPrecision( precision: Int, scale: Int, - roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = { + roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, + nullOnOverflow: Boolean = true): Decimal = { val copy = clone() -if (copy.changePrecision(precision, scale, roundMode)) copy else null +if (copy.changePrecision(precision, scale, roundMode)) { + copy +} else { + val message = s"$toDebugString cannot be represented as Decimal($precision, $scale)." + if (nullOnOverflow) { +logWarning(s"$message NULL is returned.") --- End diff -- I am ok with using debug/trace level logging. Can you make sure we do not construct the message unless we are logging or throwing the exception (changing `val` into `def` should be enough)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163008946 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with Serializable { /** * Create new `Decimal` with given precision and scale. * - * @return a non-null `Decimal` value if successful or `null` if overflow would occur. + * @return a non-null `Decimal` value if successful. Otherwise, if `nullOnOverflow` is true, null + * is returned; if `nullOnOverflow` is false, an `ArithmeticException` is thrown. */ private[sql] def toPrecision( precision: Int, scale: Int, - roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = { + roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, + nullOnOverflow: Boolean = true): Decimal = { val copy = clone() -if (copy.changePrecision(precision, scale, roundMode)) copy else null +if (copy.changePrecision(precision, scale, roundMode)) { + copy +} else { + val message = s"$toDebugString cannot be represented as Decimal($precision, $scale)." + if (nullOnOverflow) { +logWarning(s"$message NULL is returned.") --- End diff -- I see your point. And I agree with you. But I wanted to put some traces of what was happening What about using DEBUG as log level? In this case most of the time we are not logging anything, but if we want to check is an overflow is happening we can. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163007885 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with Serializable { /** * Create new `Decimal` with given precision and scale. * - * @return a non-null `Decimal` value if successful or `null` if overflow would occur. + * @return a non-null `Decimal` value if successful. Otherwise, if `nullOnOverflow` is true, null + * is returned; if `nullOnOverflow` is false, an `ArithmeticException` is thrown. */ private[sql] def toPrecision( precision: Int, scale: Int, - roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = { + roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, + nullOnOverflow: Boolean = true): Decimal = { val copy = clone() -if (copy.changePrecision(precision, scale, roundMode)) copy else null +if (copy.changePrecision(precision, scale, roundMode)) { + copy +} else { + val message = s"$toDebugString cannot be represented as Decimal($precision, $scale)." + if (nullOnOverflow) { +logWarning(s"$message NULL is returned.") --- End diff -- I agree that a result becomes less useful if we return nulls often. My problem is more that if we process a million non convertible decimals we log the same message a million times, which is going to cause a significant regression. Moreover this is logged on the executor, an end-user typically does not look at those logs (there is also no reason to do so since the job does not throw an error). My suggestion would be to not log at all, or just log once. I prefer not to log at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r162959471 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with Serializable { /** * Create new `Decimal` with given precision and scale. * - * @return a non-null `Decimal` value if successful or `null` if overflow would occur. + * @return a non-null `Decimal` value if successful. Otherwise, if `nullOnOverflow` is true, null + * is returned; if `nullOnOverflow` is false, an `ArithmeticException` is thrown. */ private[sql] def toPrecision( precision: Int, scale: Int, - roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = { + roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, + nullOnOverflow: Boolean = true): Decimal = { val copy = clone() -if (copy.changePrecision(precision, scale, roundMode)) copy else null +if (copy.changePrecision(precision, scale, roundMode)) { + copy +} else { + val message = s"$toDebugString cannot be represented as Decimal($precision, $scale)." + if (nullOnOverflow) { +logWarning(s"$message NULL is returned.") --- End diff -- If we hit it often, the result we get is quite useless. I added it only to notify the user of something which is an unexpected/undesired situation and now happens silently. I think it is bad that the user cannot know if a NULL is a result of an operation involving NULLs or the result of an overflow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r162957968 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with Serializable { /** * Create new `Decimal` with given precision and scale. * - * @return a non-null `Decimal` value if successful or `null` if overflow would occur. + * @return a non-null `Decimal` value if successful. Otherwise, if `nullOnOverflow` is true, null + * is returned; if `nullOnOverflow` is false, an `ArithmeticException` is thrown. */ private[sql] def toPrecision( precision: Int, scale: Int, - roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = { + roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, + nullOnOverflow: Boolean = true): Decimal = { val copy = clone() -if (copy.changePrecision(precision, scale, roundMode)) copy else null +if (copy.changePrecision(precision, scale, roundMode)) { + copy +} else { + val message = s"$toDebugString cannot be represented as Decimal($precision, $scale)." + if (nullOnOverflow) { +logWarning(s"$message NULL is returned.") --- End diff -- I am not sure if we should log this message. If we hit this often we'll end up with huge logs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/20350 [SPARK-23179][SQL] Support option to throw exception if overflow occurs ## What changes were proposed in this pull request? SQL ANSI 2011 states that in case of overflow during arithmetic operations, an exception should be thrown. This is what most of the SQL DBs do (eg. SQLServer, DB2). Hive currently returns NULL (as Spark does) but HIVE-18291 is open to be SQL compliant. The PR introduce an option to decide which behavior Spark should follow, ie. returning NULL on overflow or throwing an exception. ## How was this patch tested? added UTs You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-23179 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20350.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20350 commit 449b69c0804622cc747d280d28415e93144ca272 Author: Marco GaidoDate: 2018-01-22T14:32:04Z [SPARK-23179][SQL] Support option to throw exception if overflow occurs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org