[ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486884#comment-16486884
 ] 

Jim Apple commented on PARQUET-41:
----------------------------------

In response to [~junjie]'s question above, "Sure, it is feasible, then we are 
comparing bloom filter vs disadvantaged dictionary filter. The result may be 
not as useful as comparing with plain encoding, right?" I'm not sure I 
understand. The suggestion I was making would not, in my opinion, disadvantage 
the dictionary filter. It would perhaps allow it to compete on a more even 
playing field with the Bloom filter.

Before we jump into the nitty-gritty of the code reviews, I would like for us 
to be clear about what advantage we expect from this over using dictionary 
filtering.



> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Ferdinand Xu
>            Priority: Major
>              Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.
> Pull request:
> https://github.com/apache/parquet-mr/pull/215



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to