[
https://issues.apache.org/jira/browse/PARQUET-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321363#comment-14321363
]
Mick Davies edited comment on PARQUET-36 at 2/14/15 11:06 AM:
--------------------------------------------------------------
I tested the IN predicate by integrating with Spark SQL and running an IN query
which filtered for 20 strings out of a possible 20K. For my test query which
was a group by plus this filter I found that for a file containing many 10s of
millions of rows query time was down from 3.9s to about 3.2s so around 20%.
This is compared with previous time where IN filter evaluation was done in
Spark.
I am going to take a look at using run lengths from RLE to further optimise
dictionary filters. The data I am working with often has long runs of columns
that are being filtered and so this may be a good optimisation for us.
was (Author: michael davies):
I tested the IN predicate by integrating with Spark SQL and running an IN query
which filtered for 20 strings out of a possible 20K. For my test query which
was a group by with an In filter I found that for a file containing many 10s of
millions of rows query time was down from 3.9s to about 3.2s so around 20%.
This is compared with previous time where IN filter evaluation was done in
Spark.
I am going to take a look at using run lengths from RLE to further optimise
dictionary filters. The data I am working with often has long runs of columns
that are being filtered and so this may be a good optimisation for us.
> FilteringPrimitiveConverter should support dictionaries
> -------------------------------------------------------
>
> Key: PARQUET-36
> URL: https://issues.apache.org/jira/browse/PARQUET-36
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Alex Levenson
> Priority: Minor
> Labels: filter2
>
> If the delegated PrimitiveConverter supports dictionaries, then
> FilteringPrimitiveConverter should too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)