Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20350#discussion_r163007885
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
    @@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
       /**
        * Create new `Decimal` with given precision and scale.
        *
    -   * @return a non-null `Decimal` value if successful or `null` if 
overflow would occur.
    +   * @return a non-null `Decimal` value if successful. Otherwise, if 
`nullOnOverflow` is true, null
    +   *         is returned; if `nullOnOverflow` is false, an 
`ArithmeticException` is thrown.
        */
       private[sql] def toPrecision(
           precision: Int,
           scale: Int,
    -      roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = 
{
    +      roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
    +      nullOnOverflow: Boolean = true): Decimal = {
         val copy = clone()
    -    if (copy.changePrecision(precision, scale, roundMode)) copy else null
    +    if (copy.changePrecision(precision, scale, roundMode)) {
    +      copy
    +    } else {
    +      val message = s"$toDebugString cannot be represented as 
Decimal($precision, $scale)."
    +      if (nullOnOverflow) {
    +        logWarning(s"$message NULL is returned.")
    --- End diff --
    
    I agree that a result becomes less useful if we return nulls often. My 
problem is more that if we process a million non convertible decimals we log 
the same message a million times, which is going to cause a significant 
regression. Moreover this is logged on the executor, an end-user typically does 
not look at those logs (there is also no reason to do so since the job does not 
throw an error).
    
    My suggestion would be to not log at all, or just log once. I prefer not to 
log at all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to