[ https://issues.apache.org/jira/browse/SPARK-27581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-27581: ----------------------------------- Assignee: Liang-Chi Hsieh > DataFrame countDistinct("*") fails with AnalysisException: "Invalid usage of > '*' in expression 'count'" > ------------------------------------------------------------------------------------------------------- > > Key: SPARK-27581 > URL: https://issues.apache.org/jira/browse/SPARK-27581 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Josh Rosen > Assignee: Liang-Chi Hsieh > Priority: Major > Fix For: 3.0.0 > > > If I have a DataFrame then I can use {{count("*")}} as an expression, e.g.: > {code:java} > import org.apache.spark.sql.functions._ > val df = sql("select id % 100 from range(100000)") > df.select(count("*")).first() > {code} > However, if I try to do the same thing with {{countDistinct}} I get an error: > {code:java} > import org.apache.spark.sql.functions._ > val df = sql("select id % 100 from range(100000)") > df.select(countDistinct("*")).first() > org.apache.spark.sql.AnalysisException: Invalid usage of '*' in expression > 'count'; > {code} > As a workaround, I need to use {{expr}}, e.g. > {code:java} > import org.apache.spark.sql.functions._ > val df = sql("select id % 100 from range(100000)") > df.select(expr("count(distinct(*))")).first() > {code} > You might be wondering "why not just use {{df.count()}} or > {{df.distinct().count()}}?" but in my case I'd ultimately to compute both > counts as part of the same aggregation, e.g. > {code:java} > val (cnt, distinctCnt) = df.select(count("*"), countDistinct("*)).as[(Long, > Long)].first() > {code} > I'm reporting this because it's a minor usability annoyance / surprise for > inexperienced Spark users. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org