[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

maropu Mon, 28 May 2018 08:04:34 -0700

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/21443


    [SPARK-24369][SQL] Correct handling for multiple distinct aggregations 
having the same argument set

    ## What changes were proposed in this pull request?
    This pr fixed an issue when having multiple distinct aggregations having 
the same argument set, e.g.,
    ```
    scala>: paste
    val df = sql(
      s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*)
         | FROM (VALUES (1, 1), (2, 2), (2, 2)) t(x, y)
       """.stripMargin)
    
    java.lang.RuntimeException
    You hit a query analyzer bug. Please report your query to Spark user 
mailing list.
    ```
    The root cause is that `RewriteDistinctAggregates` can't detect multiple 
distinct aggregations if they have the same argument set. This pr modified code 
so that `RewriteDistinctAggregates` could count the number of aggregate 
expressions with `isDistinct=true`.
    
    ## How was this patch tested?
    Added tests in `DataFrameAggregateSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-24369

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21443.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21443
    
----
commit 00f6ad9547f462fd0cc3377cdd3aee44be19ffaf
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-05-28T14:54:21Z

    Fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

Reply via email to