[ https://issues.apache.org/jira/browse/SPARK-27283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mats updated SPARK-27283: ------------------------- Description: When performing arithmetics between doubles and decimals, the resulting value is always a double. This is very strange to me; when an exact type is present as one of the inputs, I would expect that the inexact type is lifted and the result presented exactly, rather than lowering the exact type to the inexact and presenting a result that may contain rounding errors. The choice to use a decimal was probably taken because rounding errors were deemed an issue. When performing arithmetics between decimals and integers, the expected behaviour is seen; the result is a decimal. See the following example: {code:java} import org.apache.spark.sql.functions val df = sparkSession.createDataFrame(Seq(Tuple1(0L))).toDF("a") val decimalInt = df.select(functions.lit(BigDecimal(3.14)) + functions.lit(1) as "d") val decimalDouble = df.select(functions.lit(BigDecimal(3.14)) + functions.lit(1.0) as "d") decimalInt.schema.printTreeString() decimalInt.show() decimalDouble.schema.printTreeString() decimalDouble.show(){code} which produces this output (with possible variation on the rounding error): {code:java} root |-- d: decimal(4,2) (nullable = true) +----+ | d | +----+ |4.14| +----+ root |-- d: double (nullable = false) +-----------------+ | d | +-----------------+ |4.140000000000001| +-----------------+ {code} I would argue that this is a bug, and that the correct thing to do would be to lift the result to a decimal also when one operand is a double. was: When performing arithmetics between doubles and decimals, the resulting value is always a double. This is very strange to me; when an exact type is present as one of the inputs, I would expect that the inexact type is lifted and the result presented exactly, rather than lowering the exact type to the inexact and presenting a result that may contain rounding errors. The choice to use a decimal was probably taken because rounding errors were deemed an issue. When performing arithmetics between decimals and integers, the expected behaviour is seen; the result is a decimal. See the following example: {code:java} import org.apache.spark.sql.functions val df = sparkSession.createDataFrame(Seq(Tuple1(0L))).toDF("a") val decimalInt = df.select(functions.lit(BigDecimal(3.14)) + functions.lit(1) as "d") val decimalDouble = df.select(functions.lit(BigDecimal(3.14)) + functions.lit(1.0) as "d") decimalInt.schema.printTreeString() decimalInt.show() decimalDouble.schema.printTreeString() decimalDouble.show(){code} which produces this output (with possible variation on the rounding error): {code:java} root |-- d: decimal(4,2) (nullable = true) +----+ | d | +----+ |4.14| +----+ root |-- d: double (nullable = false) +-----------------+ | d | +-----------------+ |4.140000000000001| +-----------------+ {code} > BigDecimal arithmetic losing precision > -------------------------------------- > > Key: SPARK-27283 > URL: https://issues.apache.org/jira/browse/SPARK-27283 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Mats > Priority: Minor > Labels: decimal, float, sql > > When performing arithmetics between doubles and decimals, the resulting value > is always a double. This is very strange to me; when an exact type is present > as one of the inputs, I would expect that the inexact type is lifted and the > result presented exactly, rather than lowering the exact type to the inexact > and presenting a result that may contain rounding errors. The choice to use a > decimal was probably taken because rounding errors were deemed an issue. > When performing arithmetics between decimals and integers, the expected > behaviour is seen; the result is a decimal. > See the following example: > {code:java} > import org.apache.spark.sql.functions > val df = sparkSession.createDataFrame(Seq(Tuple1(0L))).toDF("a") > val decimalInt = df.select(functions.lit(BigDecimal(3.14)) + functions.lit(1) > as "d") > val decimalDouble = df.select(functions.lit(BigDecimal(3.14)) + > functions.lit(1.0) as "d") > decimalInt.schema.printTreeString() > decimalInt.show() > decimalDouble.schema.printTreeString() > decimalDouble.show(){code} > which produces this output (with possible variation on the rounding error): > {code:java} > root > |-- d: decimal(4,2) (nullable = true) > +----+ > | d | > +----+ > |4.14| > +----+ > root > |-- d: double (nullable = false) > +-----------------+ > | d | > +-----------------+ > |4.140000000000001| > +-----------------+ > {code} > > I would argue that this is a bug, and that the correct thing to do would be > to lift the result to a decimal also when one operand is a double. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org