[ https://issues.apache.org/jira/browse/SPARK-47397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956790#comment-17956790 ]
Nicholas Chammas commented on SPARK-47397: ------------------------------------------ I would change this issue type from a bug to an improvement. [~martinitus] - If you would like to add a brief note to the docs on NULL semantics about {{count_distinct()}}, [the relevant file is here|https://github.com/apache/spark/blob/1b7df31fa9e81319cde30b78ccdc979c762ad662/docs/sql-ref-null-semantics.md]. > count_distinct ignores null values > ---------------------------------- > > Key: SPARK-47397 > URL: https://issues.apache.org/jira/browse/SPARK-47397 > Project: Spark > Issue Type: Bug > Components: Documentation, Spark Core > Affects Versions: 3.4.1 > Reporter: Martin Rueckl > Priority: Minor > Attachments: image-2024-03-14-16-12-35-267.png, > image-2024-03-14-16-13-03-107.png, image-2024-04-02-10-32-44-461.png > > > The documentation states, that in group by and count statements, null values > will not be ignored / form their own groups. > !image-2024-03-14-16-13-03-107.png|width=491,height=373! > However, the behavior of count_distinct does not account for nulls. > Either the documentation or the implementation is wrong here... > !image-2024-03-14-16-12-35-267.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org