[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2017-07-24 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098599#comment-16098599
 ] 

Branimir Lambov commented on CASSANDRA-7032:


I can't think of a way to do such a reduction on an existing DC. Adding a new 
node with the same number of vnodes should improve the distribution of tokens 
within the DC, but adding one with e.g. 8 in a 256-vnode DC will just give it 
32-times smaller ownership. Trying to gradually transition will involve a 
period of severely overloading one or more of the non-converted nodes.

It looks like DC-by-DC is you best bet, unfortunately.

> Improve vnode allocation
> 
>
> Key: CASSANDRA-7032
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>  Labels: performance, vnodes
> Fix For: 3.0 alpha 1
>
> Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes 
> hotspots of ownership. It should be possible to improve dramatically on this 
> with deterministic allocation. I have quickly thrown together a simple greedy 
> algorithm that allocates vnodes efficiently, and will repair hotspots in a 
> randomly allocated cluster gradually as more nodes are added, and also 
> ensures that token ranges are fairly evenly spread between nodes (somewhat 
> tunably so). The allocation still permits slight discrepancies in ownership, 
> but it is bound by the inverse of the size of the cluster (as opposed to 
> random allocation, which strangely gets worse as the cluster size increases). 
> I'm sure there is a decent dynamic programming solution to this that would be 
> even better.
> If on joining the ring a new node were to CAS a shared table where a 
> canonical allocation of token ranges lives after running this (or a similar) 
> algorithm, we could then get guaranteed bounds on the ownership distribution 
> in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2017-07-23 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097733#comment-16097733
 ] 

Jay Zhuang commented on CASSANDRA-7032:
---

Hi [~blambov], [~dikanggu], any suggestion to reduce the token number for the 
existing cluster with traffics? For example, one solution is to re-create DC by 
DC with the new algorithm, but it requires tear down the whole DC.
Is it possible to add new nodes 1 by 1 with the new algorithm and fewer vnodes, 
then decommission older nodes 1 by 1. Will the tokens are still balanced after 
that?

> Improve vnode allocation
> 
>
> Key: CASSANDRA-7032
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>  Labels: performance, vnodes
> Fix For: 3.0 alpha 1
>
> Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes 
> hotspots of ownership. It should be possible to improve dramatically on this 
> with deterministic allocation. I have quickly thrown together a simple greedy 
> algorithm that allocates vnodes efficiently, and will repair hotspots in a 
> randomly allocated cluster gradually as more nodes are added, and also 
> ensures that token ranges are fairly evenly spread between nodes (somewhat 
> tunably so). The allocation still permits slight discrepancies in ownership, 
> but it is bound by the inverse of the size of the cluster (as opposed to 
> random allocation, which strangely gets worse as the cluster size increases). 
> I'm sure there is a decent dynamic programming solution to this that would be 
> even better.
> If on joining the ring a new node were to CAS a shared table where a 
> canonical allocation of token ranges lives after running this (or a similar) 
> algorithm, we could then get guaranteed bounds on the ownership distribution 
> in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2017-02-08 Thread burak ibrahim sevindi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857895#comment-15857895
 ] 

burak ibrahim sevindi commented on CASSANDRA-7032:
--

Are there any insights on lowering the number of default vnodes in 
cassanda.yaml?

> Improve vnode allocation
> 
>
> Key: CASSANDRA-7032
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>  Labels: performance, vnodes
> Fix For: 3.0 alpha 1
>
> Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes 
> hotspots of ownership. It should be possible to improve dramatically on this 
> with deterministic allocation. I have quickly thrown together a simple greedy 
> algorithm that allocates vnodes efficiently, and will repair hotspots in a 
> randomly allocated cluster gradually as more nodes are added, and also 
> ensures that token ranges are fairly evenly spread between nodes (somewhat 
> tunably so). The allocation still permits slight discrepancies in ownership, 
> but it is bound by the inverse of the size of the cluster (as opposed to 
> random allocation, which strangely gets worse as the cluster size increases). 
> I'm sure there is a decent dynamic programming solution to this that would be 
> even better.
> If on joining the ring a new node were to CAS a shared table where a 
> canonical allocation of token ranges lives after running this (or a similar) 
> algorithm, we could then get guaranteed bounds on the ownership distribution 
> in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-10-05 Thread Sebastian Estevez (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943678#comment-14943678
 ] 

Sebastian Estevez commented on CASSANDRA-7032:
--

Is there a way we can allow this to work without specifying the keyspace?

How about using a weighted average of ownership across all existing keyspaces?

> Improve vnode allocation
> 
>
> Key: CASSANDRA-7032
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Branimir Lambov
>  Labels: performance, vnodes
> Fix For: 3.0 alpha 1
>
> Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes 
> hotspots of ownership. It should be possible to improve dramatically on this 
> with deterministic allocation. I have quickly thrown together a simple greedy 
> algorithm that allocates vnodes efficiently, and will repair hotspots in a 
> randomly allocated cluster gradually as more nodes are added, and also 
> ensures that token ranges are fairly evenly spread between nodes (somewhat 
> tunably so). The allocation still permits slight discrepancies in ownership, 
> but it is bound by the inverse of the size of the cluster (as opposed to 
> random allocation, which strangely gets worse as the cluster size increases). 
> I'm sure there is a decent dynamic programming solution to this that would be 
> even better.
> If on joining the ring a new node were to CAS a shared table where a 
> canonical allocation of token ranges lives after running this (or a similar) 
> algorithm, we could then get guaranteed bounds on the ownership distribution 
> in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-06-26 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602624#comment-14602624
 ] 

Branimir Lambov commented on CASSANDRA-7032:


Yes, if we change the vnode allocation to be done by this algorithm by default. 
Which may be the right thing to do, but I don't know if we shouldn't be a bit 
cautious and let it be tested in the real world first as a non-default.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.x

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-06-25 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601551#comment-14601551
 ] 

T Jake Luciani commented on CASSANDRA-7032:
---

Should we consider lowering the number of default vnodes in {{cassandra.yaml}}

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.x

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-06-22 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595838#comment-14595838
 ] 

Branimir Lambov commented on CASSANDRA-7032:


Done, including the static access change for the commitlog code.

[testall|http://cassci.datastax.com/view/Dev/view/blambov/job/blambov-7032-vnode-assignment-testall/]
 is OK, 
[dtest|http://cassci.datastax.com/view/Dev/view/blambov/job/blambov-7032-vnode-assignment-dtest/]
 includes an {{incremental_repair_test.TestIncRepair.sstable_repairedset_test}} 
which last failed ~20 builds ago on trunk.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.x

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-06-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594566#comment-14594566
 ] 

Benedict commented on CASSANDRA-7032:
-

Ok, it's looking good to me. A couple of final things to do before commit:

# Swap order of static {access scope} to {access scope} static - this 
doesn't appear to be in our code style, which looks like an oversight, but 
there are thousands of occurrences of the latter, and only your occurrences of 
the former :)
#* It may be worth sneaking in the swap for your CL code as well, although I 
can ninja that separately
# Rebase, modify CHANGES.txt and get clean cassci runs

We should then file a follow up for improving how users can access this 
functionality. It may be helpful to introduce a nodetool command for 
rebalancing the cluster through gradual application of this algorithm to more 
nodes. It would be neat to potentially introduce a modification to this 
algorithm for evaluating token _removal_, so that we could reduce the number of 
vnodes in an existing cluster, adding just the necessary number to get a good 
result. A more simple goal be to support bootstraping a vnode cluster without a 
keyspace present.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.x

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-06-11 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582033#comment-14582033
 ] 

Branimir Lambov commented on CASSANDRA-7032:


[Uploaded|https://github.com/apache/cassandra/compare/trunk...blambov:7032-vnode-assignment]
 a new version that interprets the token ranges correctly. Also incorporates 
your changes and fixes a conversion error in the token size calculations that 
was reducing the precision to float and rarely resulting in slightly wrong 
results.

The results achieved are pretty close to what I was getting with the older 
interpretation.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.x

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-06-08 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577307#comment-14577307
 ] 

Branimir Lambov commented on CASSANDRA-7032:


bq. the token ranges we're optimising for in this ticket are not those the 
cluster actually runs on. They start at the token assigned to each endpoint, 
instead of end

I checked this several times because the way I understood it complicates things 
a lot. I do not understand how I still managed to get it wrong. 
{{TokenAllocation.addOwnership()}} does the wrong thing, it should have picked 
a representative of the range rather than the start, and then it shows the 
problem...

This changes (simplifies actually) things a lot; I have to do some extra 
testing to make sure everything is still working as well or better.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.x

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-06-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573452#comment-14573452
 ] 

Benedict commented on CASSANDRA-7032:
-

I've pushed some minor suggestions 
[here|https://github.com/belliottsmith/cassandra/tree/7032-suggestions]

I think the integration / mechanism for employing this functionality could be 
improved, but that can be delayed to a follow up ticket.

Before that, though, AFAICT the token ranges we're optimising for in this 
ticket are not those the cluster actually runs on. They _start_ at the token 
assigned to each endpoint, instead of _end_

Other than that it LGTM.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.x

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-04-30 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521484#comment-14521484
 ] 

Branimir Lambov commented on CASSANDRA-7032:


Merged your changes into the 
[branch|https://github.com/apache/cassandra/compare/trunk...blambov:7032-vnode-assignment]
 and further refactored {{evaluateImprovement}}. It should now be much more 
understandable. I have verified that the running time is not significantly 
affected and that the results are the same as before (disregarding 
insignificant changes caused by floating point rounding).

The {{expandable}} property means that the ownership range of the token can be 
expanded. It is not set in the case where currGroup == newGroup, but I swapped 
things around in {{findUpdatedReplicationStart}} to make what we do in that 
case clearer.

On your other concern, group clumpings quickly create very heavy 
underutilization in the individual tokens, which the algorithm will not accept. 
A clumping may be a good short-term fix to a heavily overutilized node; I did a 
quick test and excluding neighbours of the same group does appear to give 
somewhat worse results (e.g. the included long test no longer passes). 
Excluding could potentially also cause problems when the number of groups is 
close to the replication factor, hence I think we should leave it as it is for 
now. It is not hard to change if we want to do so in the future.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.x

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-04-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394730#comment-14394730
 ] 

Benedict commented on CASSANDRA-7032:
-

One other thing to consider optimising for: hash bit distribution. A lot of 
algorithmic optimisations can be made if we can assume each node has an 
approximately uniform distribution of hash bits. We should introduce some 
scoring based on this. e.g. try the best N candidates by our current 
evaluation, and select the one that delivers the best resulting hash bit 
distribution.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-04-02 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392912#comment-14392912
 ] 

Benedict commented on CASSANDRA-7032:
-

I've pushed a version of this 
[here|https://github.com/belliottsmith/cassandra/tree/7032-nits] which has a 
lot of comments and some small refactors performed, that I needed to perform in 
order to understand the code.

On the whole it seems like a good approach, and carefully optimised.

I think some further changes are necessary for improved understanding, as the 
functions are currently very dense, with a focus on performance even in areas 
that won't make a tremendous difference. Algorithmically it's good, but we 
should IMO preferably separate the temporary state maintenance and cleanup from 
the main goal steps, and avoid inflating every nested property access into its 
own variable, as it pollutes the method body and makes it hard to track which 
actions are important. Inside {{evaluateImprovement}} the ownership adjustment 
calculations should all preferably be split out into separate methods, and if 
possible the {{unitsChain}} there and {{prevPopulate}} chain in 
{{populateTokenInfo}} should use a class like {{ReplicationVisitor}} that I 
introduced to extract some of the dense state maintenance and make the logic 
more declarative.

I'm still a little confused by the {{expandable}} property, both because I 
don't understand its name, but also because its definition doesn't seem to 
match up to how it's used (it's quite likely I'm misunderstanding though, so if 
you could consider my code comment, and explain - preferably with clearer 
comments around its stated aim)

I think there is a potential small modification to be made as well, which may 
or may not be helpful. It bothers me that we can in theory create very 
suboptimal chains for establishing the owners of a token, by having a bunching 
of groups within the same replication range. Since these should _generally_ 
lead to worse distribution outcomes this algorithm should generally fight 
against them, but it is possible to find problematic local minima. It would be 
nice to pre-exclude any candidates that would lead to an increased clumping of 
groups. Perhaps even to exclude any tokens that would shadow the ownership 
range of a partner (same group), if we have more than RF groups, and otherwise 
permitting a maximum traversal of one partner to define the replication range.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-03-25 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380174#comment-14380174
 ] 

Branimir Lambov commented on CASSANDRA-7032:


Patch is up for review 
[here|https://github.com/apache/cassandra/compare/trunk...blambov:7032-vnode-assignment].
 It gives the option to specify a allocate_tokens_keyspace when bringing up a 
node. The node's tokens are then allocated to optimize the load distribution 
for the replication strategy of that keyspace.

The allocation is currently restricted to Murmur3Partitioner and SimpleStrategy 
or NetworkTopologyStrategy (is there anything else we need to support?). With 
the latter it cannot deal with cases where the number of racks in the dc is 
more than one but less than the replication factor, which should not be a 
common case.

There are a couple of things still left to do or explore, possibly in separate 
patches:
- add a dtest starting several nodes with allocation
- run a cstar_perf to see if it could show improvement for RF 2 in a 3-node 
cluster
- optimization of the selection for the first RF nodes in the cluster to 
guarantee good distribution later (see 
ReplicationAwareTokenAllocator.testNewCluster)
- (if deemed worthwhile) multiple different replication factors in one 
datacentre; the current code works ok when asked to allocate alternatingly but 
this could be improved if we consider all relevant strategies in parallel

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-02-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309190#comment-14309190
 ] 

Benedict commented on CASSANDRA-7032:
-

Before going ahead and integrating it, could we confirm that the algorithm 
performs well under heterogenous data centre / rack configurations, by which I 
mean different numbers of nodes in each data centre / rack (not differing 
numbers of vnodes), for varying numbers of data centres and racks? And that it, 
most importantly, evenly distributes load across the disks within any single 
node?

It would be great to perform a simulation of a wide range of state spaces, 
along each dimension of: 

# number of data centres
# size variability of data centres
# number of disks per node
# variability of number of disks (perhaps only varying between data centres)
# number of racks within a data centre

And graph the variability of ownership _by disk_. Getting balanced disks is the 
really hard thing to do, and is really essential for CASSANDRA-6696. Data 
centres may also behave unexpectedly for replication; you may own an even 
amount of the ring, but it may translate into a disproportionately contiguous 
range within the data centre, so you end up owning an uneven quantity within 
the data centre.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-02-06 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309383#comment-14309383
 ] 

Branimir Lambov commented on CASSANDRA-7032:


The answer relies on a couple of mappings:
- a Cassandra data centre is a separate ring to the algorithm. Data centres 
have their own allocation of replicas, hence one can only balance load within 
them, not across multiple DCs. The algorithm can simply ignore everything 
outside the data centre where the new node is spawned.
- in current Cassandra terms, a node is a node to the algorithm.
- in future Cassandra terms, a disk is a node to the algorithm*.
- a Cassandra rack is a rack to the algorithm if there number within the data 
centre is equal to or higher than the replication factor, otherwise the 
Cassandra node is the rack of the algorithm. Implicitly all disks in a machine 
belong to the same rack; the machine level can be completely ignored if racks 
are defined.

The only fine point is starred. I know your preference is to split the token 
space in a single range per disk. I see little benefit in this compared to 
simply assigning a subset of the machine's vnodes to each disk. The latter is 
more flexible, perhaps even easier to achieve when SSTables are per vnode. 
(Perhaps what's not so easy to see is that each vnode is always responsible for 
replicating a contiguous range of tokens and it is easy to convert between a 
vnode and the token range that it serves (see 
{{ReplicationStrategy.replicationStart}} in the attached).) Slower disks take a 
small number of vnodes (4?), faster ones take more (8/16?). Bigger machines 
have more disks hence more vnodes and automatically take a higher load.

The algorithm handles heterogeneous rack and node configuration without issues 
as long as the number of distinct racks (in algorithm terms) is higher than the 
replication factor. 

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-02-06 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309401#comment-14309401
 ] 

Branimir Lambov commented on CASSANDRA-7032:


I will rename the Node concept in code, it should be Unit (or something 
similar) to make it obvious that its meaning can (and will) be different from a 
Cassandra node.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-02-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309403#comment-14309403
 ] 

Benedict commented on CASSANDRA-7032:
-

Sounds good. I just wanted clarification these things were being explicitly 
addressed. It would be helpful still to simulate these characteristics 
explicitly now, though, and see how well it actually turns out don't you think? 
So we can get numbers on how well it will actually perform at various ends of 
the spectrum.

I still have a nagging suspicion there may be subtelties in mapping DCs into 
the single address space without error, but that's a separate issue I guess, to 
be addressed as we tie it in.

I note I don't have my heart set on any mechanism, so long as we can achieve 
the balance, and it does sound like your approach will generalise well.

The thing I like about this is we can actually add extra nodes if we expand 
the number of disks on a node. We will need to think carefully about how we 
actually expose this node mapping to the user, in general, though, to ensure it 
is easy to reason about and robust.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-01-14 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277652#comment-14277652
 ] 

Branimir Lambov commented on CASSANDRA-7032:


No, I have not tried asymmetric configs yet, but I don't foresee significant 
problems adapting the method to them. Assuming configuration is done by 
specifying a number of vnodes in the node it should be just a matter of setting 
the optimal size based on that number; of course it remains to be seen if that 
won't cause some weird behaviour.

The speed problem is now mostly solved, I'll post a new version soon and try 
that out. 

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2015-01-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275343#comment-14275343
 ] 

Benedict commented on CASSANDRA-7032:
-

Sounds very promising! Have you tried this on asymmetric cluster configs? i.e. 
with racks or DCs of unequal size? Had a glance at the code and it looks like 
not, but in case I misread... it would be great to factor this in.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-11 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243148#comment-14243148
 ] 

Jon Haddad commented on CASSANDRA-7032:
---

Was the original 256 number chosen in hopes that it would minimize the chance 
of imbalance?  If so, would this patch result in recommending fewer vnodes, say 
16?  If so, I imagine that would result in less time consuming repair as well 
as improvements to the spark driver, which, as of the last time I checked, did 
1 query per token to achieve data locality.

I would assume 16 nodes streaming data to a single one would still achieve the 
benefits of vnodes, but I'm just picking a number out of the air.  

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-10 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241332#comment-14241332
 ] 

Branimir Lambov commented on CASSANDRA-7032:


Ignoring replication for the time being (more on that below), and looking at 
what's the best thing we can do when we have an existing setup and we are 
trying to add a node, I came up with the following approach.

We can only assign new vnodes, which means that we can only _take away_ load 
from other nodes, never add to it. On the one hand this means that 
underutilized nodes are hopeless until the cluster grows enough for their share 
to become normal. On the other it means that the best thing to do (aiming for 
the smallest overutilization, i.e. max deviation from mean) is to take the 
highest-load nodes and spread their load evenly between them and the new node.

Adding a new node gives us vnodes many (_vn_) new tokens to issue, i.e. we can 
decrease the load in at most _vn_ other nodes. We can pick up the _vn_ 
highest-load ones, but some of them may already have a lower load than the 
target spread; we thus select the largest _n = vn_ highest load nodes such 
that the spread load _t_, which is their combined load divided by _n+1_, is 
lower than the load of each individual node. We can then choose how to assign 
_vn_ tokens splitting some of the ranges in these _n_ nodes to reduce the load 
of each node to _t_. This should also leave the new node with a load of _t_.

The attached code implements a simple version of this which improves 
overutilization very quickly with every new node-- a typical simulation looks 
like:
{code}
Random generation of 1000 nodes with 256 tokens each
Size 1000   max 1.24 min 0.80   No replication
Adding 1 node(s) using NoReplicationTokenDistributor
Size 1001   max 1.11 min 0.80   No replication
Adding 9 node(s) using NoReplicationTokenDistributor
Size 1010   max 1.05 min 0.81   No replication
Adding 30 node(s) using NoReplicationTokenDistributor
Size 1040   max 1.02 min 0.83   No replication
Adding 210 node(s) using NoReplicationTokenDistributor
Size 1250   max 1.00 min 1.00   No replication
{code}
It also constructs clusters from empty pretty well.

However, when replication is present the load distribution of this allocation 
does not look good (the added node tends to take much more than it should; one 
reason for this is that it becomes a replica of the token ranges it splits), 
which is not unexpected. I am now trying to see how exactly taking replication 
into account affects the reasoning above. We can still only remove load, but 
the way splitting affects the loads is not that clear any more.

As far as I can see the following simplification of Cassandra's replication 
strategies should suffice for handling the current and planned variations:
* we have units made up of a number of vnodes whose load we want to be able to 
balance (currently unit==node, but in the future the unit could be smaller (a 
disk or core))
* units are bunched up in racks (if racks are not defined, a node is implicitly 
a rack for its units)
* replicas of data must be placed on the closest higher vnodes that belong to 
different racks
* the replication strategy specifies the number of replicas and the set of 
units belonging to each rack

Datacentres are irrelevant as replication is specified within each dc, i.e. we 
can isolate the vnode allocation to the individual dc. If disk/core-level 
allocation is in place, the node boundaries within a rack can be ignored as 
well. Is there anything I'm missing?


[~benedict]: I believe you prefer to split the disk/core workload inside the 
node by assigning a token range (e.g. the vnodes that intersect with a range 
corresponding to _1/n_ of the token ring are to be handled by that disk/core). 
I prefer to just choose _1/n_ of the vnodes, because it lets me directly 
balance them-- do you have any objections to this?


 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The 

[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241349#comment-14241349
 ] 

Benedict commented on CASSANDRA-7032:
-

If you mean for V vnode tokens in ascending order [0..V), and e.g. D disks, the 
disks would own one of the token lists in the set { [dV/D..(d+1)V/D) : 0 = d  
D }, and you guarantee that the owned range of each list is balanced with the 
other lists, this seems pretty analogous to the approach I was describing and 
perfectly reasonable.

The main goal is only that once a range or set of vnode tokens has been 
assigned to a given resource (disk, cpu, node, rack, whatever) that resource 
never needs to reassign its tokens.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232857#comment-14232857
 ] 

Benedict commented on CASSANDRA-7032:
-

Well, NetworkTopologyStrategy already enforces some degree of balance across 
racks, and absolutely guarantees balance across DCs as far as replication 
ownership is concerned. It _would_ be nice to migrate this behaviour to the 
token selection so that we could reason about ownership a bit more clearly (NTS 
might enforce our general ownership constraints, but having a predictably cheap 
generation strategy for end points would be great, as the amount of state 
necessary to route queries could shrink dramatically. if we could rely on a 
sequence of adjacent tokens ensuring these properties, for instance), but a 
simpler goal of simply ensuring that for any given arbitrary slice of the 
global token range, all nodes have a share of the range that is within epsilon 
of perfect, should be more than sufficient.

TL;DR; our goal should probably be: for any given arbitrary slice of the 
global token range, all nodes have a share of the range that is within epsilon* 
of perfect

* with epsilon probably inversely proportional to the size of the slice

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-02 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231419#comment-14231419
 ] 

Branimir Lambov commented on CASSANDRA-7032:


[~benedict], can you point me to some more information on the imbalance that is 
known to appear and its behaviour with increasing number of nodes? I'd like to 
get a better understanding of the problem, and how much of it is caused by 
replica selection rather than token imbalance.

It seems to me that the best approach here is to build in some replication 
strategy / network topology awareness into the algorithm to be able to account 
for replica selection. This will complicate the algorithm but, in addition to 
getting better balance, could also improve the time spent finding replicas 
(e.g. CASSANDRA-6976).

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-02 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231425#comment-14231425
 ] 

Benedict commented on CASSANDRA-7032:
-

It's plain old statistics. Have a look at the java code I attached that 
simulates and reports the level of imbalance. Currently we randomly assign the 
tokens, and this results in some nodes happening to fall with all of their 
token ranges narrow vs the other existing tokens, and others wider.

Consistent hashing is what Riak uses to achieve balance, which is one approach. 
Rendezvous hashing is another. But these would likely involve changing the 
tokens of every node in the cluster on adding a new node. This would be 
acceptable, but I expect with the amount of state space to work with we can 
design an algorithm that guarantees low bounds of imbalance without having to 
change the tokens assigned to any existing nodes.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-02 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231433#comment-14231433
 ] 

Benedict commented on CASSANDRA-7032:
-

I should note that the dovetailing with CASSANDRA-6696 is very important. 
Acceptable imbalance _per node_ is actually not _too_ tricky to deliver. But 
ensuring each disk on each node will have a fair share of the pie is a little 
harder

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-02 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231496#comment-14231496
 ] 

Branimir Lambov commented on CASSANDRA-7032:


Per-disk balance is just another level in the hierarchy. Ideally we would like 
per-disk, per-node, per-rack and per-datacentre balance (configurable by number 
of vnodes), wouldn't we? Presumably with highest emphasis on the lower levels.

Ignoring replica selection, this all comes for free if we can ensure equal 
vnode size (e.g. by reassigning all tokens on adding a node). With reassignment 
it should also be trivial to build the network topology into the token 
assignment.

As I see it there are two separate objectives:
- to build clusters incrementally by introducing and maintaining _some_ 
imbalance in the ring, which can be exploited to avoid reassignment.
- to improve the balance in existing, probably highly unbalanced clusters, 
built without the above algorithm in mind.

The former might be a solution to the latter, but it is not necessary that it 
is. In any case I intend to look at it in isolation first and then think how it 
would apply to existing clusters.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-02 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231559#comment-14231559
 ] 

Jeremiah Jordan commented on CASSANDRA-7032:


reassignment of existing tokens is not really an acceptable way to achieve 
this. That requires shuffling the data which belongs to those tokens as well. 
We have already seen that shuffle is hard to get right and resource 
intensive, and removed the previous attempt at that from the code base.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-02 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232062#comment-14232062
 ] 

Tyler Hobbs commented on CASSANDRA-7032:


I agree with [~jjordan] here.  One of the main motivations for moving to 
multiple tokens was to avoid expensive rebalance operations when nodes are 
added or removed. Changing that is likely to be a no-go.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-02 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232076#comment-14232076
 ] 

Branimir Lambov commented on CASSANDRA-7032:


Perhaps I wasn't clear enough. Reassigning is not an option I am considering, I 
mentioned it as a way to imagine what an optimal balance would look like. The 
point is to get close enough results without it.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-09-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138983#comment-14138983
 ] 

Jonathan Ellis commented on CASSANDRA-7032:
---

Curious what [~soverton] thinks of this.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-09-18 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139167#comment-14139167
 ] 

Brandon Williams commented on CASSANDRA-7032:
-

It seems that our main problem when choosing a distribution for vnodes is not 
knowing what the existing distribution in the cluster is.  Now that we have 
shadow gossip (and already use it during bootstrap) we could know this before 
choosing our own tokens and be able to achieve a perfect distribution.  Of 
course for a new cluster where you aren't bootstrapping it's still worth 
algorithmic improvements.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-04-22 Thread Tupshin Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977190#comment-13977190
 ] 

Tupshin Harper commented on CASSANDRA-7032:
---

It seems like there might be a way to constrain vnode RDF (replication 
distribution factor) in the general scope of this ticket as well. Call it a 
nice to have, should it present itself.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968319#comment-13968319
 ] 

Benedict commented on CASSANDRA-7032:
-

Uploaded new version with debugging info about per-node disk distribution, if 
we were to evenly split the token range across a given number of disks, as 
suggested for CASSANDRA-6696. It looks pretty promising here too: no more than 
~5% stdev of ownership per disk (declining as more disks are added) as a 
proportion of the total owned range of the node.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-04-14 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969100#comment-13969100
 ] 

Jeremiah Jordan commented on CASSANDRA-7032:


Make sure to have rack awareness be part of the algorithm.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.2#6252)