Miquel created SPARK-26233: ------------------------------ Summary: Incorrect decimal value with java beans and first/last/max... functions Key: SPARK-26233 URL: https://issues.apache.org/jira/browse/SPARK-26233 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 2.4.0, 2.3.1 Reporter: Miquel
Decimal values from Java beans are incorrectly scaled when used with functions like first/last/max... This problem came because Encoders.bean always set Decimal values as _DecimalType(this.MAX_PRECISION(), 18)._ Usually it's not a problem if you use numeric functions like *sum* but for functions like *first*/*last*/*max*... it is a problem. How to reproduce this error: Using this class as an example: {code:java} public class Foo implements Serializable { private String group; private BigDecimal var; public BigDecimal getVar() { return var; } public void setVar(BigDecimal var) { this.var = var; } public String getGroup() { return group; } public void setGroup(String group) { this.group = group; } } {code} And a dummy code to create some objects: {code:java} Dataset<Foo> ds = spark.range(5) .map(l -> { Foo foo = new Foo(); foo.setGroup("" + l); foo.setVar(BigDecimal.valueOf(l + 0.1111)); return foo; }, Encoders.bean(Foo.class)); ds.printSchema(); ds.show(); +-----+------+ |group| var| +-----+------+ | 0|0.1111| | 1|1.1111| | 2|2.1111| | 3|3.1111| | 4|4.1111| +-----+------+ {code} We can see that the DecimalType is precision 38 and 18 scale and all values are show correctly. But if we use a first function, they are scaled incorrectly: {code:java} ds.groupBy(col("group")) .agg( first("var") ) .show(); +-----+-----------------+ |group|first(var, false)| +-----+-----------------+ | 3| 3.1111E-14| | 0| 1.111E-15| | 1| 1.1111E-14| | 4| 4.1111E-14| | 2| 2.1111E-14| +-----+-----------------+ {code} This incorrect behavior cannot be reproduced if we use "numerical "functions like sum or if the column is cast a new Decimal Type. {code:java} ds.groupBy(col("group")) .agg( sum("var") ) .show(); +-----+--------------------+ |group| sum(var)| +-----+--------------------+ | 3|3.111100000000000000| | 0|0.111100000000000000| | 1|1.111100000000000000| | 4|4.111100000000000000| | 2|2.111100000000000000| +-----+--------------------+ ds.groupBy(col("group")) .agg( first(col("var").cast(new DecimalType(38, 8))) ) .show(); +-----+----------------------------------------+ |group|first(CAST(var AS DECIMAL(38,8)), false)| +-----+----------------------------------------+ | 3| 3.11110000| | 0| 0.11110000| | 1| 1.11110000| | 4| 4.11110000| | 2| 2.11110000| +-----+----------------------------------------+ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org