GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/19488

    SPARK-22266 The same aggregate function was evaluated multiple times

    ## What changes were proposed in this pull request?
    
    To let the same aggregate function that appear multiple times in an 
Aggregate be evaluated only once, we need to deduplicate the aggregate 
expressions. The original code was trying to use a "distinct" call to get a set 
of aggregate expressions, but did not work, since the "distinct" did not 
compare semantic equality. And even if it did, further work should be done in 
result expression rewriting.
    In this PR, I changed the "set" to a map mapping the semantic identity of a 
aggregate expression to itself. Thus, later on, when rewriting result 
expressions (i.e., output expressions), the aggregate expression reference can 
be fixed.
    
    ## How was this patch tested?
    
    Added a new test in SQLQuerySuite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-22266

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19488.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19488
    
----
commit 32bdf771fe70444ac23adf796702b5a26e085805
Author: maryannxue <maryann....@gmail.com>
Date:   2017-10-13T05:31:10Z

    SPARK-22266 The same aggregate function was evaluated multiple times

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to