[ https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033697#comment-16033697 ]
ASF GitHub Bot commented on APEXMALHAR-2366: -------------------------------------------- GitHub user PramodSSImmaneni opened a pull request: https://github.com/apache/apex-malhar/pull/631 APEXMALHAR-2366 #resolve #comment Apply BloomFilter to Bucket, use internal BloomFilter @bhupeshchawda please see, this is to finish up the work started in https://github.com/apache/apex-malhar/pull/521 You can merge this pull request into a Git repository by running: $ git pull https://github.com/PramodSSImmaneni/apex-malhar APEXMALHAR-2366 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #631 ---- commit 31c15fe839569610edb9c89af1d3371401114956 Author: brightchen <bri...@datatorrent.com> Date: 2016-12-05T19:34:48Z APEXMALHAR-2366 #resolve #comment Apply BloomFilter to Bucket, use internal BloomFilter commit 58fec176277eca6c6b872552d411ce3b852fc70d Author: Pramod Immaneni <pra...@datatorrent.com> Date: 2017-05-25T21:39:23Z Merge branch 'APEXMALHAR-2366' of github.com:brightchen/apex-malhar into APEXMALHAR-2366 commit 7ec135b45d2eb99497944bd6608d3b29e89ade9b Author: Pramod Immaneni <pra...@datatorrent.com> Date: 2017-06-01T21:02:26Z Added license references ---- > Apply BloomFilter to Bucket > --------------------------- > > Key: APEXMALHAR-2366 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366 > Project: Apache Apex Malhar > Issue Type: Improvement > Reporter: bright chen > Assignee: bright chen > Original Estimate: 192h > Remaining Estimate: 192h > > The bucket get() will check the cache and then check from the stored files if > the entry is not in the cache. The checking from files is a pretty heavy > operation due to file seek. > The chance of check from file is very high if the key range are large. > Suggest to apply BloomFilter for bucket to reduce the chance read from file. > If the buckets were managed by ManagedStateImpl, the entry of bucket would be > very huge and the BloomFilter maybe not useful after a while. But If the > buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain > amount of entry and BloomFilter would be very useful. > For implementation: > The Guava already have BloomFilter and the interface are pretty simple and > fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use > Sink while Guava 14 use PrimitiveSink). -- This message was sent by Atlassian JIRA (v6.3.15#6346)