[
https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040534#comment-16040534
]
ASF GitHub Bot commented on APEXMALHAR-2366:
--------------------------------------------
GitHub user PramodSSImmaneni reopened a pull request:
https://github.com/apache/apex-malhar/pull/631
APEXMALHAR-2366 #resolve #comment Apply BloomFilter to Bucket, use internal
BloomFilter
@bhupeshchawda please see, this is to finish up the work started in
https://github.com/apache/apex-malhar/pull/521
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/PramodSSImmaneni/apex-malhar APEXMALHAR-2366
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/apex-malhar/pull/631.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #631
----
commit 3c3a01777329252aaa46a39e52ab9a190dbfb74f
Author: brightchen <[email protected]>
Date: 2016-12-05T19:34:48Z
APEXMALHAR-2366 #resolve #comment Apply BloomFilter to Bucket, use internal
BloomFilter
commit e08ccd091ff23eac38ecf7997230262c772cbdc1
Author: Pramod Immaneni <[email protected]>
Date: 2017-06-01T21:02:26Z
Added license references, this closes #521
----
> Apply BloomFilter to Bucket
> ---------------------------
>
> Key: APEXMALHAR-2366
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366
> Project: Apache Apex Malhar
> Issue Type: Improvement
> Reporter: bright chen
> Assignee: bright chen
> Original Estimate: 192h
> Remaining Estimate: 192h
>
> The bucket get() will check the cache and then check from the stored files if
> the entry is not in the cache. The checking from files is a pretty heavy
> operation due to file seek.
> The chance of check from file is very high if the key range are large.
> Suggest to apply BloomFilter for bucket to reduce the chance read from file.
> If the buckets were managed by ManagedStateImpl, the entry of bucket would be
> very huge and the BloomFilter maybe not useful after a while. But If the
> buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain
> amount of entry and BloomFilter would be very useful.
> For implementation:
> The Guava already have BloomFilter and the interface are pretty simple and
> fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use
> Sink while Guava 14 use PrimitiveSink).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)