Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r232556359 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext { df.where($"city".contains(new java.lang.Character('A'))), Seq(Row("Amsterdam"))) } + + test("SPARK-25942: typed aggregation on primitive type") { + val ds = Seq(1, 2, 3).toDS() + + val agg = ds.groupByKey(_ >= 2) + .agg(sum("value").as[Long], sum($"value" + 1).as[Long]) --- End diff -- I think we should not make decisions for users. For untyped APIs, users can refer the grouping columns in the aggregate expressions, I think the typed APIs should be same. For this particular case, currrently spark allows grouping columns inside aggregate functions, so the `value` here is indeed ambiguous. There is nothing we can do, but fail and ask users to add alias. BTW, we should check other databases and see if "grouping columns inside aggregate functions" should be allowed,
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org