[ https://issues.apache.org/jira/browse/SPARK-41391 ]
Ritika Maheshwari deleted comment on SPARK-41391: ------------------------------------------- was (Author: ritikam): Modifying the agg method in RelationalGroupedDataset.scala to @scala.annotation.varargs def agg(expr: Column, exprs: Column*): DataFrame = { toDF((expr +: exprs).map { case typed: TypedColumn[_, _] => typed.withInputType(df.exprEnc, df.logicalPlan.output).expr case f: Column => if (f.expr != null && f.expr.isInstanceOf[UnresolvedFunction] && f.expr.asInstanceOf[UnresolvedFunction].arguments(0).isInstanceOf[UnresolvedStar]) { f.expr.asInstanceOf[UnresolvedFunction].childrenResolved f.expr.asInstanceOf[UnresolvedFunction].copy(arguments = df.numericColumns) } else { f.expr } case c => c.expr }) } Seems to work though I am not able to get the distinct in the count function. scala> df.groupBy("id").agg(count_distinct($"*")) *res7*: *org.apache.spark.sql.DataFrame* = [id: bigint, COUNT(id, value): bigint] Some how the alias formed for this function is missing the distinct key word too 'COUNT(distinct id#0L, value#2) AS COUNT(id, value)#47 Any ideas why distinct is not showing up? Also is this a possible solution for the unresolvedstar? We resolve the unresolvedstar in the RelationGroupedDataset rather than in the Analyzer. > The output column name of `groupBy.agg(count_distinct)` is incorrect > -------------------------------------------------------------------- > > Key: SPARK-41391 > URL: https://issues.apache.org/jira/browse/SPARK-41391 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0, 3.3.0, 3.4.0 > Reporter: Ruifeng Zheng > Priority: Major > > scala> val df = spark.range(1, 10).withColumn("value", lit(1)) > df: org.apache.spark.sql.DataFrame = [id: bigint, value: int] > scala> df.createOrReplaceTempView("table") > scala> df.groupBy("id").agg(count_distinct($"value")) > res1: org.apache.spark.sql.DataFrame = [id: bigint, count(value): bigint] > scala> spark.sql(" SELECT id, COUNT(DISTINCT value) FROM table GROUP BY id ") > res2: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT value): > bigint] > scala> df.groupBy("id").agg(count_distinct($"*")) > res3: org.apache.spark.sql.DataFrame = [id: bigint, count(unresolvedstar()): > bigint] > scala> spark.sql(" SELECT id, COUNT(DISTINCT *) FROM table GROUP BY id ") > res4: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT id, > value): bigint] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org