Benedict created CASSANDRA-6709:
-----------------------------------

             Summary: Changes to KeyCache
                 Key: CASSANDRA-6709
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6709
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Benedict
            Priority: Minor


It seems to me that KeyCache can be improved in a number of ways, but first 
let's state the basic goal of KeyCache: to reduce the average query response 
time by providing an exact seek position in a file for a given key.

As it stands, KeyCache is both 100% accurate, but requires a lot of overhead 
per entry.

I propose to make KeyCache *mostly* accurate (say 99.9999%), by which I means 
it will always fail accurately, but may rarely return an incorrect address, and 
code the end users of it to be able to retry to confirm they seeked to the 
correct position in the file, and to retry without the cache if they did not.

The advantage of this is that we can both take the cache off-heap easily, and 
pack a lot more items into the cache. If we permit collisions across files and 
simply use the (full 128-bit) murmur hash of the key + cfId + file generation, 
we should get enough uniqueness to rarely have an erroneuous collision, however 
we will be using only 20 bytes per key, instead of the current 100 + <key 
length> bytes. This should allow us to answer far more queries from the key 
cache than before, so the positive improvement to performance should be greater 
than the negative drain.

For the structure I propose an associative cache, where a single contiguous 
address space is broken up into regions of, say, 8 entries, plus one counter. 
The counter tracks the recency of access of each of the entries, so that on 
write the least recently accessed/written can be replaced. A linear probe 
within the region is used to determine if the entry we're looking for is 
present. This should be very quick, as the entire region should fit into one or 
two lines of L1.

Advantage: we may see 5x bump in cache hit-rate, or even more for large keys.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to