bright chen created APEXMALHAR-2366:
---------------------------------------

             Summary: Apply BloomFilter to Bucket
                 Key: APEXMALHAR-2366
                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366
             Project: Apache Apex Malhar
          Issue Type: Improvement
            Reporter: bright chen
            Assignee: bright chen


The bucket get() will check the cache and then check from the stored files if 
the entry is not in the cache. The checking from files is a pretty heavy 
operation due to file seek.

The chance of check from file is very high if the key range are large.

Suggest to apply BloomFilter for bucket to reduce the chance read from file.

If the buckets were managed by ManagedStateImpl, the entry of bucket would be 
very huge and the BloomFilter maybe not useful after a while. But If the 
buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain 
amount of entry and BloomFilter would be very useful.

For implementation:
The Guava already have BloomFilter and the interface are pretty simple and fit 
for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use Sink 
while Guava 14 use PrimitiveSink).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to