Benito Kestelman created SPARK-25585: ----------------------------------------
Summary: Allow users to specify scale of result in Decimal arithmetic Key: SPARK-25585 URL: https://issues.apache.org/jira/browse/SPARK-25585 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Benito Kestelman The current behavior of Spark Decimal during arithmetic makes it difficult for users to achieve their desired level of precision. Numeric literals are automatically cast to unlimited precision during arithmetic, but the final result is cast down depending on the precision and scale of the operands, according to MS SQL rules (discussed in other JIRA's). This final cast can cause substantial loss of scale. For example: {noformat} scala> spark.sql("select 1.3/3.41").show(false) +--------------------------------------------------------+ |(CAST(1.3 AS DECIMAL(3,2)) / CAST(3.41 AS DECIMAL(3,2)))| +--------------------------------------------------------+ |0.381232 | +--------------------------------------------------------+{noformat} To get higher scale in the result, a user must cast the operands to higher scale: {noformat} scala> spark.sql("select cast(1.3 as decimal(5,4))/cast(3.41 as decimal(5,4))").show(false) +--------------------------------------------------------+ |(CAST(1.3 AS DECIMAL(5,4)) / CAST(3.41 AS DECIMAL(5,4)))| +--------------------------------------------------------+ |0.3812316716 | +--------------------------------------------------------+ scala> spark.sql("select cast(1.3 as decimal(10,9))/cast(3.41 as decimal(10,9))").show(false) +----------------------------------------------------------+ |(CAST(1.3 AS DECIMAL(10,9)) / CAST(3.41 AS DECIMAL(10,9)))| +----------------------------------------------------------+ |0.38123167155425219941 | +----------------------------------------------------------+{noformat} But if the user casts too high, the result's scale decreases. {noformat} scala> spark.sql("select cast(1.3 as decimal(25,24))/cast(3.41 as decimal(25,24))").show(false) +------------------------------------------------------------+ |(CAST(1.3 AS DECIMAL(25,24)) / CAST(3.41 AS DECIMAL(25,24)))| +------------------------------------------------------------+ |0.3812316715543 | +------------------------------------------------------------+{noformat} Thus, the user has no way of knowing how to cast to get the scale he wants. This problem is even harder to deal with when using variables instead of literals. The user should be able to explicitly set the desired scale of the result. MySQL offers this capability in the form of a system variable called "div_precision_increment." >From the MySQL docs: "In division performed with >[{{/}}|https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html#operator_divide], > the scale of the result when using two exact-value operands is the scale of >the first operand plus the value of the >[{{div_precision_increment}}|https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_div_precision_increment] > system variable (which is 4 by default). For example, the result of the >expression {{5.05 / 0.014}} has a scale of six decimal places >({{360.714286}})." {noformat} mysql> SELECT 1/7; +--------+ | 1/7 | +--------+ | 0.1429 | +--------+ mysql> SET div_precision_increment = 12; mysql> SELECT 1/7; +----------------+ | 1/7 | +----------------+ | 0.142857142857 | +----------------+{noformat} This gives the user full control of the result's scale after arithmetic and obviates the need for casting all over the place. Since Spark 2.3, we already have DecimalType.MINIMUM_ADJUSTED_SCALE, which is similar to div_precision_increment. It just needs to be made modifiable by the user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org