[ 
https://issues.apache.org/jira/browse/HUDI-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960255#comment-16960255
 ] 

sivabalan narayanan edited comment on HUDI-106 at 10/26/19 3:55 AM:
--------------------------------------------------------------------

sure vinoth. I am yet to start getting my hands dirty with code base. But a 
naive question. 

Based on my reading of concepts, during compaction we know the total number of 
entries for a given file group. So in that case, why can't we create a regular 
bloom filter with the right size rather than using hard coded(or config based) 
value. Wondering is DynamicBF is really necessary here. Or this is mainly 
catered towards the delta logs and not parquet? 


was (Author: shivnarayan):
sure vinoth. I am yet to starting my hands dirty with code base. But a naive 
question. 

Based on my reading of concepts, during compaction we know the total number of 
entries for a given file group. So in that case, why can't we create a regular 
bloom filter with the right size rather than using hard coded(or config based) 
value. Wondering is DynamicBF is really necessary here. Or this is mainly 
catered towards the delta logs and not parquet? 

> Dynamically tune bloom filter entries
> -------------------------------------
>
>                 Key: HUDI-106
>                 URL: https://issues.apache.org/jira/browse/HUDI-106
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Vinoth Chandar
>            Assignee: sivabalan narayanan
>            Priority: Major
>              Labels: realtime-data-lakes
>             Fix For: 0.5.1
>
>
> Tuning bloom filters is currently based on a configuration, that could be 
> cumbersome to tune per dataset to obtain good indexing performance.. Lets add 
> support for Dynamic Bloom Filters, that can automatically achieve a 
> configured false positive ratio depending on number of entries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to