Benedict created CASSANDRA-8931:
-----------------------------------

             Summary: IndexSummary (and Index) should store the token, and the 
minimal key to unambiguously direct a query
                 Key: CASSANDRA-8931
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8931
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Benedict


Since these files are likely sticking around a little longer, it is probably 
worth optimising them. A relatively simple change to Index and IndexSummary 
could reduce the amount of space required significantly, reduce the CPU burden 
of lookup, and hopefully bound the amount of space needed as key size grows. On 
writing first we always store the token before the key (if it is different to 
the key); then we simply truncate the whole record to the minimum length 
necessary to answer an inequality search. Since the data file contains the key 
also, we can corroborate we have the right key once we've looked up. Since BFs 
are used to reduce unnecessary lookups, we don't save much by ruling the false 
positives out one step earlier. 

 An improved follow up version would be to use a trie of shortest length to 
answer inequality lookups, as this would also ensure very long keys with common 
prefixes would not significantly increase the size of the index or summary. 
This would translate to a trie index for the summary keying into a static trie 
page for the index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to