[ https://issues.apache.org/jira/browse/SPARK-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-26233: ---------------------------------- Affects Version/s: 2.0.2 > Incorrect decimal value with java beans and first/last/max... functions > ----------------------------------------------------------------------- > > Key: SPARK-26233 > URL: https://issues.apache.org/jira/browse/SPARK-26233 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.0.2, 2.1.3, 2.2.2, 2.3.1, 2.4.0 > Reporter: Miquel > Priority: Minor > > Decimal values from Java beans are incorrectly scaled when used with > functions like first/last/max... > This problem came because Encoders.bean always set Decimal values as > _DecimalType(this.MAX_PRECISION(), 18)._ > Usually it's not a problem if you use numeric functions like *sum* but for > functions like *first*/*last*/*max*... it is a problem. > How to reproduce this error: > Using this class as an example: > {code:java} > public class Foo implements Serializable { > private String group; > private BigDecimal var; > public BigDecimal getVar() { > return var; > } > public void setVar(BigDecimal var) { > this.var = var; > } > public String getGroup() { > return group; > } > public void setGroup(String group) { > this.group = group; > } > } > {code} > > And a dummy code to create some objects: > {code:java} > Dataset<Foo> ds = spark.range(5) > .map(l -> { > Foo foo = new Foo(); > foo.setGroup("" + l); > foo.setVar(BigDecimal.valueOf(l + 0.1111)); > return foo; > }, Encoders.bean(Foo.class)); > ds.printSchema(); > ds.show(); > +-----+------+ > |group| var| > +-----+------+ > | 0|0.1111| > | 1|1.1111| > | 2|2.1111| > | 3|3.1111| > | 4|4.1111| > +-----+------+ > {code} > We can see that the DecimalType is precision 38 and 18 scale and all values > are show correctly. > But if we use a first function, they are scaled incorrectly: > {code:java} > ds.groupBy(col("group")) > .agg( > first("var") > ) > .show(); > +-----+-----------------+ > |group|first(var, false)| > +-----+-----------------+ > | 3| 3.1111E-14| > | 0| 1.111E-15| > | 1| 1.1111E-14| > | 4| 4.1111E-14| > | 2| 2.1111E-14| > +-----+-----------------+ > {code} > This incorrect behavior cannot be reproduced if we use "numerical "functions > like sum or if the column is cast a new Decimal Type. > {code:java} > ds.groupBy(col("group")) > .agg( > sum("var") > ) > .show(); > +-----+--------------------+ > |group| sum(var)| > +-----+--------------------+ > | 3|3.111100000000000000| > | 0|0.111100000000000000| > | 1|1.111100000000000000| > | 4|4.111100000000000000| > | 2|2.111100000000000000| > +-----+--------------------+ > ds.groupBy(col("group")) > .agg( > first(col("var").cast(new DecimalType(38, 8))) > ) > .show(); > +-----+----------------------------------------+ > |group|first(CAST(var AS DECIMAL(38,8)), false)| > +-----+----------------------------------------+ > | 3| 3.11110000| > | 0| 0.11110000| > | 1| 1.11110000| > | 4| 4.11110000| > | 2| 2.11110000| > +-----+----------------------------------------+ > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org