GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/15289
[SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath aggregates ## What changes were proposed in this pull request? This patch fixes a minor correctness issue impacting the pushdown of filters beneath aggregates. Specifically, if a filter condition references no grouping or aggregate columns (e.g. `WHERE false`) then it would be incorrectly pushed beneath an aggregate. Intuitively, the only case where you can push a filter beneath an aggregate is when that filter is deterministic and is defined over the grouping columns / expressions, since in that case the filter is acting to exclude entire groups from the query (like a `HAVING` clause). The existing code would only push deterministic filters beneath aggregates when all of the filter's references were grouping columns, but this logic missed the case where a filter has no references. For example, `WHERE false` is deterministic but is independent of the actual data. This patch fixes this minor bug by adding a new check to ensure that we don't push filters beneath aggregates when those filters don't reference any columns. ## How was this patch tested? New regression test in FilterPushdownSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark SPARK-17712 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15289.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15289 ---- commit 09870fc689bc71895d730021f7a8ba3a90973113 Author: Josh Rosen <joshro...@databricks.com> Date: 2016-09-28T23:29:13Z Add regression test for SPARK-17712 commit 87504e431800e0a99f05df437f9ce6543ca468a4 Author: Josh Rosen <joshro...@databricks.com> Date: 2016-09-28T23:30:14Z Minimal fix. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org