[ 
https://issues.apache.org/jira/browse/PARQUET-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321363#comment-14321363
 ] 

Mick Davies edited comment on PARQUET-36 at 2/14/15 11:06 AM:
--------------------------------------------------------------

I tested the IN predicate by integrating with Spark SQL and running an IN query 
which filtered for 20 strings out of a possible 20K. For my test query which 
was a group by plus this filter I found that for a file containing many 10s of 
millions of rows query time was down from 3.9s to about 3.2s so around 20%. 
This is compared with previous time where IN filter evaluation was done in 
Spark.

I am going to take a look at using run lengths from RLE to further optimise 
dictionary filters. The data I am working with often has long runs of columns 
that are being filtered and so this may be a good optimisation for us.


was (Author: michael davies):
I tested the IN predicate by integrating with Spark SQL and running an IN query 
which filtered for 20 strings out of a possible 20K. For my test query which 
was a group by with an In filter I found that for a file containing many 10s of 
millions of rows query time was down from 3.9s to about 3.2s so around 20%. 
This is compared with previous time where IN filter evaluation was done in 
Spark.

I am going to take a look at using run lengths from RLE to further optimise 
dictionary filters. The data I am working with often has long runs of columns 
that are being filtered and so this may be a good optimisation for us.

> FilteringPrimitiveConverter should support dictionaries
> -------------------------------------------------------
>
>                 Key: PARQUET-36
>                 URL: https://issues.apache.org/jira/browse/PARQUET-36
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Alex Levenson
>            Priority: Minor
>              Labels: filter2
>
> If the delegated PrimitiveConverter supports dictionaries, then 
> FilteringPrimitiveConverter should too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to