[ 
https://issues.apache.org/jira/browse/SPARK-40129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790107#comment-17790107
 ] 

Thomas Graves commented on SPARK-40129:
---------------------------------------

this looks like a dup of https://issues.apache.org/jira/browse/SPARK-45786

> Decimal multiply can produce the wrong answer because it rounds twice
> ---------------------------------------------------------------------
>
>                 Key: SPARK-40129
>                 URL: https://issues.apache.org/jira/browse/SPARK-40129
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0, 3.3.0, 3.4.0
>            Reporter: Robert Joseph Evans
>            Priority: Major
>              Labels: pull-request-available
>
> This looks like it has been around for a long time, but I have reproduced it 
> in 3.2.0+
> The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), 
> but I think it can be reproduced with other number combinations, and possibly 
> with divide too.
> {code:java}
> Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as 
> DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as 
> DECIMAL(38,10))").show(truncate=false)
> {code}
> This produces an answer in Spark of 
> {{-110083130231976019291714061058.575920}} But if I do the calculation in 
> regular java BigDecimal I get {{-110083130231976019291714061058.575919}}
> {code:java}
> BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913");
> BigDecimal r = new BigDecimal("-12.0000000000");
> BigDecimal prod = l.multiply(r);
> BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP);
> {code}
> Spark does essentially all of the same operations, but it used Decimal to do 
> it instead of java's BigDecimal directly. Spark, by way of Decimal, will set 
> a MathContext for the multiply operation that has a max precision of 38 and 
> will do half up rounding. That means that the result of the multiply 
> operation in Spark is {{{}-110083130231976019291714061058.57591950{}}}, but 
> for the java BigDecimal code the result is 
> {{{}-110083130231976019291714061058.57591949560000000000{}}}. Then in 
> CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression 
> in 3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that 
> point the already rounded number is rounded yet again resulting in what is 
> arguably a wrong answer by Spark.
> I have not fully tested this, but it looks like we could just remove the 
> MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal 
> operations appear to have their own overflow and rounding anyways. If we want 
> to potentially reduce the total memory usage, we could also set the max 
> precision to 39 and truncate (round down) the result in the math context 
> instead.  That would then let us round the result correctly in setPrecision 
> afterwards.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to