bright chen created APEXMALHAR-2366:
---------------------------------------
Summary: Apply BloomFilter to Bucket
Key: APEXMALHAR-2366
URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366
Project: Apache Apex Malhar
Issue Type: Improvement
Reporter: bright chen
Assignee: bright chen
The bucket get() will check the cache and then check from the stored files if
the entry is not in the cache. The checking from files is a pretty heavy
operation due to file seek.
The chance of check from file is very high if the key range are large.
Suggest to apply BloomFilter for bucket to reduce the chance read from file.
If the buckets were managed by ManagedStateImpl, the entry of bucket would be
very huge and the BloomFilter maybe not useful after a while. But If the
buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain
amount of entry and BloomFilter would be very useful.
For implementation:
The Guava already have BloomFilter and the interface are pretty simple and fit
for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use Sink
while Guava 14 use PrimitiveSink).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)