Benito Kestelman created SPARK-25585:
----------------------------------------

             Summary: Allow users to specify scale of result in Decimal 
arithmetic
                 Key: SPARK-25585
                 URL: https://issues.apache.org/jira/browse/SPARK-25585
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Benito Kestelman


The current behavior of Spark Decimal during arithmetic makes it difficult for 
users to achieve their desired level of precision. Numeric literals are 
automatically cast to unlimited precision during arithmetic, but the final 
result is cast down depending on the precision and scale of the operands, 
according to MS SQL rules (discussed in other JIRA's). This final cast can 
cause substantial loss of scale.

For example:
{noformat}
scala> spark.sql("select 1.3/3.41").show(false)

+--------------------------------------------------------+

|(CAST(1.3 AS DECIMAL(3,2)) / CAST(3.41 AS DECIMAL(3,2)))|

+--------------------------------------------------------+

|0.381232                                                |

+--------------------------------------------------------+{noformat}
To get higher scale in the result, a user must cast the operands to higher 
scale:
{noformat}
scala> spark.sql("select cast(1.3 as decimal(5,4))/cast(3.41 as 
decimal(5,4))").show(false)

+--------------------------------------------------------+

|(CAST(1.3 AS DECIMAL(5,4)) / CAST(3.41 AS DECIMAL(5,4)))|

+--------------------------------------------------------+

|0.3812316716                                            |

+--------------------------------------------------------+

scala> spark.sql("select cast(1.3 as decimal(10,9))/cast(3.41 as 
decimal(10,9))").show(false)

+----------------------------------------------------------+

|(CAST(1.3 AS DECIMAL(10,9)) / CAST(3.41 AS DECIMAL(10,9)))|

+----------------------------------------------------------+

|0.38123167155425219941                                    |

+----------------------------------------------------------+{noformat}
But if the user casts too high, the result's scale decreases. 
{noformat}
scala> spark.sql("select cast(1.3 as decimal(25,24))/cast(3.41 as 
decimal(25,24))").show(false)

+------------------------------------------------------------+

|(CAST(1.3 AS DECIMAL(25,24)) / CAST(3.41 AS DECIMAL(25,24)))|

+------------------------------------------------------------+

|0.3812316715543                                             |

+------------------------------------------------------------+{noformat}
Thus, the user has no way of knowing how to cast to get the scale he wants. 
This problem is even harder to deal with when using variables instead of 
literals. 

The user should be able to explicitly set the desired scale of the result. 
MySQL offers this capability in the form of a system variable called 
"div_precision_increment."

>From the MySQL docs: "In division performed with 
>[{{/}}|https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html#operator_divide],
> the scale of the result when using two exact-value operands is the scale of 
>the first operand plus the value of the 
>[{{div_precision_increment}}|https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_div_precision_increment]
> system variable (which is 4 by default). For example, the result of the 
>expression {{5.05 / 0.014}} has a scale of six decimal places 
>({{360.714286}})."
{noformat}
mysql> SELECT 1/7;
+--------+
| 1/7    |
+--------+
| 0.1429 |
+--------+
mysql> SET div_precision_increment = 12;
mysql> SELECT 1/7;
+----------------+
| 1/7            |
+----------------+
| 0.142857142857 |
+----------------+{noformat}
This gives the user full control of the result's scale after arithmetic and 
obviates the need for casting all over the place.

Since Spark 2.3, we already have DecimalType.MINIMUM_ADJUSTED_SCALE, which is 
similar to div_precision_increment. It just needs to be made modifiable by the 
user. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to