[ 
https://issues.apache.org/jira/browse/SPARK-47397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Rueckl updated SPARK-47397:
----------------------------------
    Attachment: image-2024-03-14-16-13-03-107.png

> count_distinct ignores null values
> ----------------------------------
>
>                 Key: SPARK-47397
>                 URL: https://issues.apache.org/jira/browse/SPARK-47397
>             Project: Spark
>          Issue Type: Bug
>          Components: Documentation, Spark Core
>    Affects Versions: 3.4.1
>            Reporter: Martin Rueckl
>            Priority: Critical
>         Attachments: image-2024-03-14-16-12-35-267.png, 
> image-2024-03-14-16-13-03-107.png
>
>
> The documentation states, that in group by and count statements, null values 
> will not be ignored / form their own groups.
> !image-2024-03-14-16-09-20-045.png|width=441,height=327!
> However, the behavior of count_distinct does not account for nulls. 
> Either the documentation or the implementation is wrong here...
> !image-2024-03-14-16-12-35-267.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to