[ https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005755#comment-16005755 ]
Ferdinand Xu commented on PARQUET-41: ------------------------------------- It's very useful when trying to filter non-partitioning column. With this patch, we could obtain the following performance acceleration in customer's environment. For Bloom Filter in Impala, initial test results shows it brings about 2X faster when query an existent item, and about 15X faster when query an nonexistent item. > Add bloom filters to parquet statistics > --------------------------------------- > > Key: PARQUET-41 > URL: https://issues.apache.org/jira/browse/PARQUET-41 > Project: Parquet > Issue Type: New Feature > Components: parquet-format, parquet-mr > Reporter: Alex Levenson > Assignee: Ferdinand Xu > Labels: filter2 > > For row groups with no dictionary, we could still produce a bloom filter. > This could be very useful in filtering entire row groups. > Pull request: > https://github.com/apache/parquet-mr/pull/215 -- This message was sent by Atlassian JIRA (v6.3.15#6346)