[ 
https://issues.apache.org/jira/browse/SPARK-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065210#comment-16065210
 ] 

Michael Styles commented on SPARK-21218:
----------------------------------------

In Parquet 1.7, there as a bug involving corrupt statistics on binary columns 
(https://issues.apache.org/jira/browse/PARQUET-251). This bug prevented earlier 
versions of Spark from generating Parquet filters on any string columns. Spark 
2.1 has moved up to Parquet 1.8.2, so this issue no longer exists.

> Convert IN predicate to equivalent Parquet filter
> -------------------------------------------------
>
>                 Key: SPARK-21218
>                 URL: https://issues.apache.org/jira/browse/SPARK-21218
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer
>    Affects Versions: 2.1.1
>            Reporter: Michael Styles
>         Attachments: IN Predicate.png, OR Predicate.png
>
>
> Convert IN predicate to equivalent expression involving equality conditions 
> to allow the filter to be pushed down to Parquet.
> For instance,
> C1 IN (10, 20) is rewritten as (C1 = 10) OR (C1 = 20)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to