Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22944#discussion_r232556359
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
    @@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
           df.where($"city".contains(new java.lang.Character('A'))),
           Seq(Row("Amsterdam")))
       }
    +
    +  test("SPARK-25942: typed aggregation on primitive type") {
    +    val ds = Seq(1, 2, 3).toDS()
    +
    +    val agg = ds.groupByKey(_ >= 2)
    +      .agg(sum("value").as[Long], sum($"value" + 1).as[Long])
    --- End diff --
    
    I think we should not make decisions for users. For untyped APIs, users can 
refer the grouping columns in the aggregate expressions, I think the typed APIs 
should be same.
    
    For this particular case, currrently spark allows grouping columns inside 
aggregate functions, so the `value` here is indeed ambiguous. There is nothing 
we can do, but fail and ask users to add alias.
    
    BTW, we should check other databases and see if "grouping columns inside 
aggregate functions" should be allowed,


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to