[jira] [Commented] (CASSANDRA-7520) Permit sorting sstables by raw partition key, as opposed to token

Benedict (JIRA) Wed, 09 Jul 2014 07:11:28 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056269#comment-14056269
 ]


Benedict commented on CASSANDRA-7520:
-------------------------------------

With or without vnodes; I don't think they make a huge difference to the idea, 
although they may increase the odds of it happening with their current 
distribution quality. Obviously with truly huge clusters only the cluster-wide 
behaviour patterns are likely to benefit, but with moderate sized clusters (<32 
nodes) most of these benefits would emerge for _some_ datasets

> Permit sorting sstables by raw partition key, as opposed to token
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-7520
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7520
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>
> At the moment we have some counter-intuitive behaviour, which is that with a 
> hashed partitioner (recommended) the more compacted the data is, the more 
> randomly distributed it is amongst the file. This means that data access 
> locality is made pretty much as bad as possible, and we rely on the OS to do 
> its best to fix that for us with its page cache.
> [~jasobrown] mentioned this at the NGCC, but thinking on it some more it 
> seems that many use cases may benefit from dropping the token at the storage 
> level and sorting based on the raw key data. For workloads where nearness of 
> key => likelihood of being coreferenced, this could improve data locality and 
> cache hit rate dramatically. Timeseries workloads spring to mind, but I doubt 
> this is constrained to them. Most likely any non-random access pattern could 
> benefit. A random access pattern would most likely suffer from this scheme, 
> as we can index more efficiently into the hashed data. However there's no 
> reason we could not support both schemes. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7520) Permit sorting sstables by raw partition key, as opposed to token

Reply via email to