[ https://issues.apache.org/jira/browse/SPARK-25585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-25585. ---------------------------------- Resolution: Incomplete > Allow users to specify scale of result in Decimal arithmetic > ------------------------------------------------------------ > > Key: SPARK-25585 > URL: https://issues.apache.org/jira/browse/SPARK-25585 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Benito Kestelman > Priority: Major > Labels: bulk-closed, decimal, usability > > The current behavior of Spark Decimal during arithmetic makes it difficult > for users to achieve their desired level of precision. Numeric literals are > automatically cast to unlimited precision during arithmetic, but the final > result is cast down depending on the precision and scale of the operands, > according to MS SQL rules (discussed in other JIRA's). This final cast can > cause substantial loss of scale. > For example: > {noformat} > scala> spark.sql("select 1.3/3.41").show(false) > +--------------------------------------------------------+ > |(CAST(1.3 AS DECIMAL(3,2)) / CAST(3.41 AS DECIMAL(3,2)))| > +--------------------------------------------------------+ > |0.381232 | > +--------------------------------------------------------+{noformat} > To get higher scale in the result, a user must cast the operands to higher > scale: > {noformat} > scala> spark.sql("select cast(1.3 as decimal(5,4))/cast(3.41 as > decimal(5,4))").show(false) > +--------------------------------------------------------+ > |(CAST(1.3 AS DECIMAL(5,4)) / CAST(3.41 AS DECIMAL(5,4)))| > +--------------------------------------------------------+ > |0.3812316716 | > +--------------------------------------------------------+ > scala> spark.sql("select cast(1.3 as decimal(10,9))/cast(3.41 as > decimal(10,9))").show(false) > +----------------------------------------------------------+ > |(CAST(1.3 AS DECIMAL(10,9)) / CAST(3.41 AS DECIMAL(10,9)))| > +----------------------------------------------------------+ > |0.38123167155425219941 | > +----------------------------------------------------------+{noformat} > But if the user casts too high, the result's scale decreases. > {noformat} > scala> spark.sql("select cast(1.3 as decimal(25,24))/cast(3.41 as > decimal(25,24))").show(false) > +------------------------------------------------------------+ > |(CAST(1.3 AS DECIMAL(25,24)) / CAST(3.41 AS DECIMAL(25,24)))| > +------------------------------------------------------------+ > |0.3812316715543 | > +------------------------------------------------------------+{noformat} > Thus, the user has no way of knowing how to cast to get the scale he wants. > This problem is even harder to deal with when using variables instead of > literals. > The user should be able to explicitly set the desired scale of the result. > MySQL offers this capability in the form of a system variable called > "div_precision_increment." > From the MySQL docs: "In division performed with > [{{/}}|https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html#operator_divide], > the scale of the result when using two exact-value operands is the scale of > the first operand plus the value of the > [{{div_precision_increment}}|https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_div_precision_increment] > system variable (which is 4 by default). For example, the result of the > expression {{5.05 / 0.014}} has a scale of six decimal places > ({{360.714286}})." > {noformat} > mysql> SELECT 1/7; > +--------+ > | 1/7 | > +--------+ > | 0.1429 | > +--------+ > mysql> SET div_precision_increment = 12; > mysql> SELECT 1/7; > +----------------+ > | 1/7 | > +----------------+ > | 0.142857142857 | > +----------------+{noformat} > This gives the user full control of the result's scale after arithmetic and > obviates the need for casting all over the place. > Since Spark 2.3, we already have DecimalType.MINIMUM_ADJUSTED_SCALE, which is > similar to div_precision_increment. It just needs to be made modifiable by > the user. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org