[ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019802#comment-16019802
 ] 

Ryan Blue commented on PARQUET-41:
----------------------------------

[~Ferd], it shouldn't matter that the bloom filter is stored at the column or 
page level. The size of the filter depends on the number of unique values and 
the FPP, so its size relative to a page or a column chunk is the same 
percentage as long as the average encoded size is consistent between the two.

It sounds like there's a good use case for bloom filters based on your telecom 
example / case study. What's the status of the PARQUET-319 proposal?

> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Ferdinand Xu
>              Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.
> Pull request:
> https://github.com/apache/parquet-mr/pull/215



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to