[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521484#comment-14521484
 ] 

Branimir Lambov commented on CASSANDRA-7032:
--------------------------------------------

Merged your changes into the 
[branch|https://github.com/apache/cassandra/compare/trunk...blambov:7032-vnode-assignment]
 and further refactored {{evaluateImprovement}}. It should now be much more 
understandable. I have verified that the running time is not significantly 
affected and that the results are the same as before (disregarding 
insignificant changes caused by floating point rounding).

The {{expandable}} property means that the ownership range of the token can be 
expanded. It is not set in the case where currGroup == newGroup, but I swapped 
things around in {{findUpdatedReplicationStart}} to make what we do in that 
case clearer.

On your other concern, group clumpings quickly create very heavy 
underutilization in the individual tokens, which the algorithm will not accept. 
A clumping may be a good short-term fix to a heavily overutilized node; I did a 
quick test and excluding neighbours of the same group does appear to give 
somewhat worse results (e.g. the included long test no longer passes). 
Excluding could potentially also cause problems when the number of groups is 
close to the replication factor, hence I think we should leave it as it is for 
now. It is not hard to change if we want to do so in the future.

> Improve vnode allocation
> ------------------------
>
>                 Key: CASSANDRA-7032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>              Labels: performance, vnodes
>             Fix For: 3.x
>
>         Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes 
> hotspots of ownership. It should be possible to improve dramatically on this 
> with deterministic allocation. I have quickly thrown together a simple greedy 
> algorithm that allocates vnodes efficiently, and will repair hotspots in a 
> randomly allocated cluster gradually as more nodes are added, and also 
> ensures that token ranges are fairly evenly spread between nodes (somewhat 
> tunably so). The allocation still permits slight discrepancies in ownership, 
> but it is bound by the inverse of the size of the cluster (as opposed to 
> random allocation, which strangely gets worse as the cluster size increases). 
> I'm sure there is a decent dynamic programming solution to this that would be 
> even better.
> If on joining the ring a new node were to CAS a shared table where a 
> canonical allocation of token ranges lives after running this (or a similar) 
> algorithm, we could then get guaranteed bounds on the ownership distribution 
> in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to