[ https://issues.apache.org/jira/browse/SPARK-47397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Rueckl updated SPARK-47397: ---------------------------------- Attachment: image-2024-03-14-16-13-03-107.png > count_distinct ignores null values > ---------------------------------- > > Key: SPARK-47397 > URL: https://issues.apache.org/jira/browse/SPARK-47397 > Project: Spark > Issue Type: Bug > Components: Documentation, Spark Core > Affects Versions: 3.4.1 > Reporter: Martin Rueckl > Priority: Critical > Attachments: image-2024-03-14-16-12-35-267.png, > image-2024-03-14-16-13-03-107.png > > > The documentation states, that in group by and count statements, null values > will not be ignored / form their own groups. > !image-2024-03-14-16-09-20-045.png|width=441,height=327! > However, the behavior of count_distinct does not account for nulls. > Either the documentation or the implementation is wrong here... > !image-2024-03-14-16-12-35-267.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org