Benedict created CASSANDRA-7520:
-----------------------------------

             Summary: Permit sorting sstables by raw partition key, as opposed 
to token
                 Key: CASSANDRA-7520
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7520
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Benedict


At the moment we have some counter-intuitive behaviour, which is that with a 
hashed partitioner (recommended) the more compacted the data is, the more 
randomly distributed it is amongst the file. This means that data access 
locality is made pretty much as bad as possible, and we rely on the OS to do 
its best to fix that for us with its page cache.

[~jasobrown] mentioned this at the NGCC, but thinking on it some more it seems 
that many use cases may benefit from dropping the token at the storage level 
and sorting based on the raw key data. For workloads where nearness of key => 
likelihood of being coreferenced, this could improve data locality and cache 
hit rate dramatically. Timeseries workloads spring to mind, but I doubt this is 
constrained to them. Most likely any non-random access pattern could benefit. A 
random access pattern would most likely suffer from this scheme, as we can 
index more efficiently into the hashed data. However there's no reason we could 
not support both schemes. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to