[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033697#comment-16033697
 ] 

ASF GitHub Bot commented on APEXMALHAR-2366:
--------------------------------------------

GitHub user PramodSSImmaneni opened a pull request:

    https://github.com/apache/apex-malhar/pull/631

    APEXMALHAR-2366 #resolve #comment Apply BloomFilter to Bucket, use internal 
BloomFilter

    @bhupeshchawda please see, this is to finish up the work started in 
https://github.com/apache/apex-malhar/pull/521

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/PramodSSImmaneni/apex-malhar APEXMALHAR-2366

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/apex-malhar/pull/631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #631
    
----
commit 31c15fe839569610edb9c89af1d3371401114956
Author: brightchen <bri...@datatorrent.com>
Date:   2016-12-05T19:34:48Z

    APEXMALHAR-2366 #resolve #comment Apply BloomFilter to Bucket, use internal 
BloomFilter

commit 58fec176277eca6c6b872552d411ce3b852fc70d
Author: Pramod Immaneni <pra...@datatorrent.com>
Date:   2017-05-25T21:39:23Z

    Merge branch 'APEXMALHAR-2366' of github.com:brightchen/apex-malhar into 
APEXMALHAR-2366

commit 7ec135b45d2eb99497944bd6608d3b29e89ade9b
Author: Pramod Immaneni <pra...@datatorrent.com>
Date:   2017-06-01T21:02:26Z

    Added license references

----


> Apply BloomFilter to Bucket
> ---------------------------
>
>                 Key: APEXMALHAR-2366
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>            Reporter: bright chen
>            Assignee: bright chen
>   Original Estimate: 192h
>  Remaining Estimate: 192h
>
> The bucket get() will check the cache and then check from the stored files if 
> the entry is not in the cache. The checking from files is a pretty heavy 
> operation due to file seek.
> The chance of check from file is very high if the key range are large.
> Suggest to apply BloomFilter for bucket to reduce the chance read from file.
> If the buckets were managed by ManagedStateImpl, the entry of bucket would be 
> very huge and the BloomFilter maybe not useful after a while. But If the 
> buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain 
> amount of entry and BloomFilter would be very useful.
> For implementation:
> The Guava already have BloomFilter and the interface are pretty simple and 
> fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use 
> Sink while Guava 14 use PrimitiveSink).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to