GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/3367

    [SPARK-4493][SQL] Don't pushdown Eq, NotEq, Lt, LtEq, Gt and GtEq 
predicates with nulls for Parquet

    Predicates like `a = NULL` and `a < NULL` can't be pushed down since 
Parquet `Lt`, `LtEq`, `Gt`, `GtEq` doesn't accept null value. Not that `Eq` and 
`NotEq` can only be used with `null` to represent predicates like `a IS NULL` 
and `a IS NOT NULL`.
    
    However, normally this issue doesn't cause NPE because any value compared 
to `NULL` results `NULL`, and Spark SQL automatically optimizes out `NULL` 
predicate in the `SimplifyFilters` rule. Only testing code that intentionally 
disables the optimizer may trigger this issue. (That's why this issue is not 
marked as blocker and I don't think we need to backport this to branch-1.1
    This PR restricts `Lt`, `LtEq`, `Gt` and `GtEq` to non-null values only, 
and only uses `Eq` with null value to pushdown `IsNull` and `IsNotNull`. Also, 
added support for Parquet `NotEq` filter for completeness and (tiny) 
performance gain, it's also used to pushdown `IsNotNull`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark filters-with-null

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3367.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3367
    
----
commit de7de288e3e609feaee1d70b4cfbfcca624edec2
Author: Cheng Lian <l...@databricks.com>
Date:   2014-11-19T15:36:30Z

    Adds stricter rules for Parquet filters with null

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to