[jira] [Updated] (SPARK-8800) Spark SQL Decimal Division operation loss of precision/scale when type is defined as DecimalType.Unlimited

Jihong MA (JIRA) Thu, 02 Jul 2015 14:23:17 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jihong MA updated SPARK-8800:
-----------------------------
    Description: 
According to specification defined in Java doc over BigDecimal :

http://docs.oracle.com/javase/1.5.0/docs/api/java/math/BigDecimal.html

When a MathContext object is supplied with a precision setting of 0 (for 
example, MathContext.UNLIMITED), arithmetic operations are exact, as are the 
arithmetic methods which take no MathContext object. (This is the only behavior 
that was supported in releases prior to 5.) As a corollary of computing the 
exact result, the rounding mode setting of a MathContext object with a 
precision setting of 0 is not used and thus irrelevant. In the case of divide, 
the exact quotient could have an infinitely long decimal expansion; for 
example, 1 divided by 3. If the quotient has a nonterminating decimal expansion 
and the operation is specified to return an exact result, an 
ArithmeticException is thrown. Otherwise, the exact result of the division is 
returned, as done for other operations.

when Decimal data is defined as DecimalType.Unlimited in Spark SQL, the exact 
result of the division should be returned or truncated to precision = 38 which 
is in align with what Hive supports. the current behavior is as shown 
following, which cause we lose the accuracy of Decimal division operation. 

scala> val aa = Decimal(2) / Decimal(3);
aa: org.apache.spark.sql.types.Decimal = 1

here is another example where we should return 0.125 instead of 0

scala> val aa = Decimal(1) /Decimal(8)
aa: org.apache.spark.sql.types.Decimal = 0

  was:
According to specification defined in Java doc over BigDecimal :

http://docs.oracle.com/javase/1.5.0/docs/api/java/math/BigDecimal.html

When a MathContext object is supplied with a precision setting of 0 (for 
example, MathContext.UNLIMITED), arithmetic operations are exact, as are the 
arithmetic methods which take no MathContext object. (This is the only behavior 
that was supported in releases prior to 5.) As a corollary of computing the 
exact result, the rounding mode setting of a MathContext object with a 
precision setting of 0 is not used and thus irrelevant. In the case of divide, 
the exact quotient could have an infinitely long decimal expansion; for 
example, 1 divided by 3. If the quotient has a nonterminating decimal expansion 
and the operation is specified to return an exact result, an 
ArithmeticException is thrown. Otherwise, the exact result of the division is 
returned, as done for other operations.

when Decimal data is defined as DecimalType.Unlimited in Spark SQL, the exact 
result of the division should be returned or truncated to precision = 38 which 
is in align with what Hive supports. the current behavior is as shown 
following, which cause we lose the accuracy of Decimal division operation. 

scala> val aa = Decimal(2) / Decimal(3);
aa: org.apache.spark.sql.types.Decimal = 1


> Spark SQL Decimal Division operation loss of precision/scale when type is 
> defined as DecimalType.Unlimited
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-8800
>                 URL: https://issues.apache.org/jira/browse/SPARK-8800
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Jihong MA
>
> According to specification defined in Java doc over BigDecimal :
> http://docs.oracle.com/javase/1.5.0/docs/api/java/math/BigDecimal.html
> When a MathContext object is supplied with a precision setting of 0 (for 
> example, MathContext.UNLIMITED), arithmetic operations are exact, as are the 
> arithmetic methods which take no MathContext object. (This is the only 
> behavior that was supported in releases prior to 5.) As a corollary of 
> computing the exact result, the rounding mode setting of a MathContext object 
> with a precision setting of 0 is not used and thus irrelevant. In the case of 
> divide, the exact quotient could have an infinitely long decimal expansion; 
> for example, 1 divided by 3. If the quotient has a nonterminating decimal 
> expansion and the operation is specified to return an exact result, an 
> ArithmeticException is thrown. Otherwise, the exact result of the division is 
> returned, as done for other operations.
> when Decimal data is defined as DecimalType.Unlimited in Spark SQL, the exact 
> result of the division should be returned or truncated to precision = 38 
> which is in align with what Hive supports. the current behavior is as shown 
> following, which cause we lose the accuracy of Decimal division operation. 
> scala> val aa = Decimal(2) / Decimal(3);
> aa: org.apache.spark.sql.types.Decimal = 1
> here is another example where we should return 0.125 instead of 0
> scala> val aa = Decimal(1) /Decimal(8)
> aa: org.apache.spark.sql.types.Decimal = 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8800) Spark SQL Decimal Division operation loss of precision/scale when type is defined as DecimalType.Unlimited

Reply via email to