Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20350#discussion_r163007885 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with Serializable { /** * Create new `Decimal` with given precision and scale. * - * @return a non-null `Decimal` value if successful or `null` if overflow would occur. + * @return a non-null `Decimal` value if successful. Otherwise, if `nullOnOverflow` is true, null + * is returned; if `nullOnOverflow` is false, an `ArithmeticException` is thrown. */ private[sql] def toPrecision( precision: Int, scale: Int, - roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = { + roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP, + nullOnOverflow: Boolean = true): Decimal = { val copy = clone() - if (copy.changePrecision(precision, scale, roundMode)) copy else null + if (copy.changePrecision(precision, scale, roundMode)) { + copy + } else { + val message = s"$toDebugString cannot be represented as Decimal($precision, $scale)." + if (nullOnOverflow) { + logWarning(s"$message NULL is returned.") --- End diff -- I agree that a result becomes less useful if we return nulls often. My problem is more that if we process a million non convertible decimals we log the same message a million times, which is going to cause a significant regression. Moreover this is logged on the executor, an end-user typically does not look at those logs (there is also no reason to do so since the job does not throw an error). My suggestion would be to not log at all, or just log once. I prefer not to log at all.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org