[GitHub] spark issue #20174: [SPARK-22951][SQL] fix aggregation after dropDuplicates ...

liancheng Wed, 10 Jan 2018 10:33:04 -0800

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/20174
  
    @mgaido91 We can't because we do not know whether there are any input rows 
or not. For example:
    
    ```scala
    val df1 = spark.range(10).select()
    val df2 = spark.range(10).filter($"id" < 0).select()
    val df3 = df1.dropDuplicates()
    val df4 = df2.dropDuplicates()
    ```
    
    `df1` has zero columns and ten rows while `df2` has no columns and zero 
rows. Therefore, `df3` should return one row containing zero columns while 
`df4` should return zero rows.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20174: [SPARK-22951][SQL] fix aggregation after dropDuplicates ...

Reply via email to