[ https://issues.apache.org/jira/browse/CASSANDRA-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056269#comment-14056269 ]
Benedict commented on CASSANDRA-7520: ------------------------------------- With or without vnodes; I don't think they make a huge difference to the idea, although they may increase the odds of it happening with their current distribution quality. Obviously with truly huge clusters only the cluster-wide behaviour patterns are likely to benefit, but with moderate sized clusters (<32 nodes) most of these benefits would emerge for _some_ datasets > Permit sorting sstables by raw partition key, as opposed to token > ----------------------------------------------------------------- > > Key: CASSANDRA-7520 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7520 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > > At the moment we have some counter-intuitive behaviour, which is that with a > hashed partitioner (recommended) the more compacted the data is, the more > randomly distributed it is amongst the file. This means that data access > locality is made pretty much as bad as possible, and we rely on the OS to do > its best to fix that for us with its page cache. > [~jasobrown] mentioned this at the NGCC, but thinking on it some more it > seems that many use cases may benefit from dropping the token at the storage > level and sorting based on the raw key data. For workloads where nearness of > key => likelihood of being coreferenced, this could improve data locality and > cache hit rate dramatically. Timeseries workloads spring to mind, but I doubt > this is constrained to them. Most likely any non-random access pattern could > benefit. A random access pattern would most likely suffer from this scheme, > as we can index more efficiently into the hashed data. However there's no > reason we could not support both schemes. -- This message was sent by Atlassian JIRA (v6.2#6252)