GitHub user maryannxue opened a pull request: https://github.com/apache/spark/pull/19488
SPARK-22266 The same aggregate function was evaluated multiple times ## What changes were proposed in this pull request? To let the same aggregate function that appear multiple times in an Aggregate be evaluated only once, we need to deduplicate the aggregate expressions. The original code was trying to use a "distinct" call to get a set of aggregate expressions, but did not work, since the "distinct" did not compare semantic equality. And even if it did, further work should be done in result expression rewriting. In this PR, I changed the "set" to a map mapping the semantic identity of a aggregate expression to itself. Thus, later on, when rewriting result expressions (i.e., output expressions), the aggregate expression reference can be fixed. ## How was this patch tested? Added a new test in SQLQuerySuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/maryannxue/spark spark-22266 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19488.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19488 ---- commit 32bdf771fe70444ac23adf796702b5a26e085805 Author: maryannxue <maryann....@gmail.com> Date: 2017-10-13T05:31:10Z SPARK-22266 The same aggregate function was evaluated multiple times ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org