[ https://issues.apache.org/jira/browse/SPARK-35741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kousuke Saruta resolved SPARK-35741. ------------------------------------ Resolution: Not A Problem > Variance of 1 record gives NULL in Spark 3.x and NaN in Spark 2.x > ----------------------------------------------------------------- > > Key: SPARK-35741 > URL: https://issues.apache.org/jira/browse/SPARK-35741 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.1 > Reporter: Abdeali Kothari > Priority: Major > > I have a few testcases in my suite which started failing after moving to > Spark 3 > And I noticed that the reason was that VARIANCE() function was earlier > returning NaN when it was run on 1 record. Now, it gives NULL. > > {code:java} > export SPARK_HOME=/usr/local/hadoop/spark-2.4.6-bin-hadoop2.7/ > python > >>> import pyspark > >>> spark = pyspark.sql.SparkSession.builder.getOrCreate() > >>> spark.sql('SELECT VARIANCE(1)').show() > +---------------------------+ > |var_samp(CAST(1 AS DOUBLE))| > +---------------------------+ > | NaN| > +---------------------------+{code} > With spark 3: > {code:java} > export SPARK_HOME=/usr/local/hadoop/spark-3.1.1-bin-hadoop2.7/ > python > >>> import pyspark > >>> spark = pyspark.sql.SparkSession.builder.getOrCreate() > >>> spark.sql('SELECT VARIANCE(1)').show() > +---------------------------+ > |variance(CAST(1 AS DOUBLE))| > +---------------------------+ > | null| > +---------------------------+ > {code} > > Just thought I'd report it here as I didn't see it in any of the release notes > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org