Matthew Powers created SPARK-34165:
--------------------------------------

             Summary: Add countDistinct option to Dataset#summary
                 Key: SPARK-34165
                 URL: https://issues.apache.org/jira/browse/SPARK-34165
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Matthew Powers


The Dataset#summary function supports options like count, mean, min, and max.  
It's a great little function for lightweight exploratory data analysis.

A count distinct of each column is a common exploratory data analysis workflow. 
 This should be easy to add (piggybacking off the existing countDistinct code), 
entirely backwards compatible, and will help a lot of users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to