[ https://issues.apache.org/jira/browse/SPARK-25146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Darabos updated SPARK-25146: ----------------------------------- Description: We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average them. The average in some cases comes out to {{null}} to our surprise (and disappointment). After a bit of digging it looks like these numbers have ended up with the {{decimal(37,30)}} type. I've got a Spark Shell (2.3.0 and 2.3.1) repro with this type: {code} scala> (1 to 10000).map(_*0.001).toDF.createOrReplaceTempView("x") scala> spark.sql("select cast(value as decimal(37, 30)) as v from x").createOrReplaceTempView("x") scala> spark.sql("select avg(v) from x").show +------+ |avg(v)| +------+ | null| +------+ {code} For up to 4471 numbers it is able to calculate the average. For 4472 or more numbers it's {{null}}. Now I'll just change these numbers to {{double}}. But we got the types entirely automatically. We never asked for {{decimal}}. If this is the default type, it's important to support averaging a handful of them. (Sorry for the bitterness. I like {{double}} more. :)) Curiously, {{sum()}} works. And {{count()}} too. So it's quite the surprise that {{avg()}} fails. was: We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average them. The average in some cases comes out to {{null}} to our surprise (and disappointment). After a bit of digging it looks like these numbers have ended up with the {{decimal(37,30)}} type. I've got a Spark Shell (2.3.0 and 2.3.1) repro with this type: {{scala> (1 to 10000).map(_*0.001).toDF.createOrReplaceTempView("x")}} {{scala> spark.sql("select cast(value as decimal(37, 30)) as v from x").createOrReplaceTempView("x")}} {{scala> spark.sql("select avg(v) from x").show}} {{+------+}} {{|avg(v)|}} {{+------+}} {{| null|}} {{+------+}} For up to 4471 numbers it is able to calculate the average. For 4472 or more numbers it's {{null}}. Now I'll just change these numbers to {{double}}. But we got the types entirely automatically. We never asked for {{decimal}}. If this is the default type, it's important to support averaging a handful of them. (Sorry for the bitterness. I like {{double}} more. :)) Curiously, {{sum()}} works. And {{count()}} too. So it's quite the surprise that {{avg()}} fails. > avg() returns null on some decimals > ----------------------------------- > > Key: SPARK-25146 > URL: https://issues.apache.org/jira/browse/SPARK-25146 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0, 2.3.1 > Reporter: Daniel Darabos > Priority: Major > > We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average > them. The average in some cases comes out to {{null}} to our surprise (and > disappointment). > After a bit of digging it looks like these numbers have ended up with the > {{decimal(37,30)}} type. I've got a Spark Shell (2.3.0 and 2.3.1) repro with > this type: > {code} > scala> (1 to 10000).map(_*0.001).toDF.createOrReplaceTempView("x") > scala> spark.sql("select cast(value as decimal(37, 30)) as v from > x").createOrReplaceTempView("x") > scala> spark.sql("select avg(v) from x").show > +------+ > |avg(v)| > +------+ > | null| > +------+ > {code} > For up to 4471 numbers it is able to calculate the average. For 4472 or more > numbers it's {{null}}. > Now I'll just change these numbers to {{double}}. But we got the types > entirely automatically. We never asked for {{decimal}}. If this is the > default type, it's important to support averaging a handful of them. (Sorry > for the bitterness. I like {{double}} more. :)) > Curiously, {{sum()}} works. And {{count()}} too. So it's quite the surprise > that {{avg()}} fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org