[jira] [Commented] (SPARK-34165) Add countDistinct option to Dataset#summary

Apache Spark (Jira) Tue, 19 Jan 2021 20:05:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-34165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268356#comment-17268356
 ]


Apache Spark commented on SPARK-34165:
--------------------------------------

User 'MrPowers' has created a pull request for this issue:
https://github.com/apache/spark/pull/31254

> Add countDistinct option to Dataset#summary
> -------------------------------------------
>
>                 Key: SPARK-34165
>                 URL: https://issues.apache.org/jira/browse/SPARK-34165
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Matthew Powers
>            Priority: Minor
>
> The Dataset#summary function supports options like count, mean, min, and max. 
>  It's a great little function for lightweight exploratory data analysis.
> A count distinct of each column is a common exploratory data analysis 
> workflow.  This should be easy to add (piggybacking off the existing 
> countDistinct code), entirely backwards compatible, and will help a lot of 
> users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34165) Add countDistinct option to Dataset#summary

Reply via email to