Matthew Powers created SPARK-34165: -------------------------------------- Summary: Add countDistinct option to Dataset#summary Key: SPARK-34165 URL: https://issues.apache.org/jira/browse/SPARK-34165 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.2.0 Reporter: Matthew Powers
The Dataset#summary function supports options like count, mean, min, and max. It's a great little function for lightweight exploratory data analysis. A count distinct of each column is a common exploratory data analysis workflow. This should be easy to add (piggybacking off the existing countDistinct code), entirely backwards compatible, and will help a lot of users. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org