[ https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486884#comment-16486884 ]
Jim Apple commented on PARQUET-41: ---------------------------------- In response to [~junjie]'s question above, "Sure, it is feasible, then we are comparing bloom filter vs disadvantaged dictionary filter. The result may be not as useful as comparing with plain encoding, right?" I'm not sure I understand. The suggestion I was making would not, in my opinion, disadvantage the dictionary filter. It would perhaps allow it to compete on a more even playing field with the Bloom filter. Before we jump into the nitty-gritty of the code reviews, I would like for us to be clear about what advantage we expect from this over using dictionary filtering. > Add bloom filters to parquet statistics > --------------------------------------- > > Key: PARQUET-41 > URL: https://issues.apache.org/jira/browse/PARQUET-41 > Project: Parquet > Issue Type: New Feature > Components: parquet-format, parquet-mr > Reporter: Alex Levenson > Assignee: Ferdinand Xu > Priority: Major > Labels: filter2 > > For row groups with no dictionary, we could still produce a bloom filter. > This could be very useful in filtering entire row groups. > Pull request: > https://github.com/apache/parquet-mr/pull/215 -- This message was sent by Atlassian JIRA (v7.6.3#76005)