Barry Becker created SPARK-13411: ------------------------------------ Summary: change in null aggregation behavior from spark 1.5.2 and 1.6.0 Key: SPARK-13411 URL: https://issues.apache.org/jira/browse/SPARK-13411 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Barry Becker
I don't know if the behavior in 1.5.3 or 1.6.0 is correct, but its definitely different. I suspect 1.6.0 is wrong. Suppose I have a dataframe with a double column, "foo", that is all null valued. If I do val ext: DataFrame = df.agg(min("foo"), max("foo"), count(col("foo")).alias("nonNullCount")) then in 1.6.0 I get a completely empty dataframe as the result. In 1.5.2, I got a single row with the aggregate min and max values being Double.NaN. Which is correct. I think the 1.5.2 behavior is better otherwise I need to add special case handling for when a column is all null. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org