[ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324126#comment-17324126
 ] 

ASF GitHub Bot commented on PARQUET-41:
---------------------------------------

shannonwells edited a comment on pull request #757:
URL: https://github.com/apache/parquet-mr/pull/757#issuecomment-821722152


   @chenjunjiedada I'm interested in how you arrived at the formula for the 
optimal number of bits. Can you please elaborate on this? After reading the 
referenced paper on it 
(http://algo2.iti.kit.edu/documents/cacheefficientbloomfilters-jea.pdf) I'm 
unclear as to which equation you used from that paper or if you used another 
one.  We're attempting to implement this algorithm in a different language. 
Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Junjie Chen
>            Priority: Major
>              Labels: filter2, pull-request-available
>             Fix For: format-2.7.0, 1.12.0
>
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.
> Pull request:
> https://github.com/apache/parquet-mr/pull/215



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to