Just to add on to your response:
*num_tokens* define the number of vnodes a node can have. Default is 256.
*Initial token* range is predefined (For murmur -2**63 to 2**63-1)
So if you have one node in (does not make sense) cluster with num_tokens as
256 then you will have 256vnodes. Scaling up wil
I will try to give a made-up example to show what I understand.
Let us assume our hash function outputs a number between 1 to 10,000
So hash(primary-key) is between 1 and 10,000
Prior to vnodes, the above 1 to 10k range was split among the nodes.
With vnodes, this 10k range is now split into say
I think your mental model here is trying to map a different db concept
(like elasticsearch shards) to a distributed hash table that doesnt really
map that way.
There's no physical thing as a vnode. Vnode, as a concept, is "a single
node runs multiple tokens and owns multiple ranges". Multiple rang
Thanks Jeff.
One follow-up question please: Each node specifies num_tokens.
So if there are 4 nodes and each specifies 256 tokens, then it means
together they are responsible for 1024 vnodes.
Now, when a fifth node joins and has num_tokens set to 256 as well, then
does the system have 1024+256 = 1
When a machine starts for the first time, the joining node basically
chooses a number of tokens (num_tokens) randomly within the range of the
partitioner (for murmur3, -2**63 to 2**63), and then bootstraps to claim
them.
This is sort of a lie, in newer versions, we try to make it a bit more
determ
Thanks Jeff.
I think what you explained below is before and after vnodes introduction.
The vnodes part is clear - how each node holds a small range of tokens and
how each node holds a discontiguous set of vnodes.
1. What is not clear is how each node decided what vnodes it will get.
If it we
Vnodes are implemented by giving a single process multiple tokens.
Tokens ultimately determine which data lives on which node. When you hash a
partition key, it gives you a token (let's say 570). The 3 processes that
own token 57 are the next 3 tokens in the ring ABOVE 570, so if you had
A = 0
B =
Hello,
Going through
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archDataDistributeDistribute.html
.
But it is not clear how a node decides where each of its vnodes will be
replicated to.
As an example from the above page:
1. Why is vnode A present in nodes 1,2 and