[ https://issues.apache.org/jira/browse/SPARK-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Barry Becker updated SPARK-13411: --------------------------------- Description: I don't know if the behavior in 1.5.3 or 1.6.0 is correct, but its definitely different. Suppose I have a dataframe with a double column, "foo", that is all null valued. If I do val ext: DataFrame = df.agg(min("foo"), max("foo"), count(col("foo")).alias("nonNullCount")) In 1.5.2 I could do ext.getDouble(0) and get Double.NaN. In 1.6.0, when I try this I get "value in null at index 0". Maybe the new behavior is correct, but I think there is a typo in the message. It should say "value is null at index 0". Which behavior is correct? If 1.6.0 is correct, then it looks like I will need to add isNull checks everywhere when retrieving values. was: I don't know if the behavior in 1.5.3 or 1.6.0 is correct, but its definitely different. Suppose I have a dataframe with a double column, "foo", that is all null valued. If I do val ext: DataFrame = df.agg(min("foo"), max("foo"), count(col("foo")).alias("nonNullCount")) In 1.5.2 I could do ext.getDouble(0) and get Double.NaN. In 1.6.0, when I try this I get Which is correct. I think the 1.5.2 behavior is better otherwise I need to add special case handling for when a column is all null. > change in null aggregation behavior from spark 1.5.2 and 1.6.0 > --------------------------------------------------------------- > > Key: SPARK-13411 > URL: https://issues.apache.org/jira/browse/SPARK-13411 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.6.0 > Reporter: Barry Becker > > I don't know if the behavior in 1.5.3 or 1.6.0 is correct, but its definitely > different. > Suppose I have a dataframe with a double column, "foo", that is all null > valued. > If I do > val ext: DataFrame = df.agg(min("foo"), max("foo"), > count(col("foo")).alias("nonNullCount")) > In 1.5.2 I could do ext.getDouble(0) and get Double.NaN. > In 1.6.0, when I try this I get "value in null at index 0". Maybe the new > behavior is correct, but I think there is a typo in the message. It should > say "value is null at index 0". > Which behavior is correct? If 1.6.0 is correct, then it looks like I will > need to add isNull checks everywhere when retrieving values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org