GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/15289

    [SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath 
aggregates

    ## What changes were proposed in this pull request?
    
    This patch fixes a minor correctness issue impacting the pushdown of 
filters beneath aggregates. Specifically, if a filter condition references no 
grouping or aggregate columns (e.g. `WHERE false`) then it would be incorrectly 
pushed beneath an aggregate.
    
    Intuitively, the only case where you can push a filter beneath an aggregate 
is when that filter is deterministic and is defined over the grouping columns / 
expressions, since in that case the filter is acting to exclude entire groups 
from the query (like a `HAVING` clause). The existing code would only push 
deterministic filters beneath aggregates when all of the filter's references 
were grouping columns, but this logic missed the case where a filter has no 
references. For example, `WHERE false` is deterministic but is independent of 
the actual data.
    
    This patch fixes this minor bug by adding a new check to ensure that we 
don't push filters beneath aggregates when those filters don't reference any 
columns.
    
    ## How was this patch tested?
    
    New regression test in FilterPushdownSuite.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-17712

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15289
    
----
commit 09870fc689bc71895d730021f7a8ba3a90973113
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-09-28T23:29:13Z

    Add regression test for SPARK-17712

commit 87504e431800e0a99f05df437f9ce6543ca468a4
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-09-28T23:30:14Z

    Minimal fix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to