[ 
https://issues.apache.org/jira/browse/CASSANDRA-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-15260:
----------------------------
    Description: 
Similar to DSE's option: {{allocate_tokens_for_local_replication_factor}}

Currently the 
[ReplicationAwareTokenAllocator|https://www.datastax.com/dev/blog/token-allocation-algorithm]
 requires a defined keyspace and a replica factor specified in the current 
datacenter.

This is problematic in a number of ways. The real keyspace can not be used when 
adding new datacenters as, in practice, all its nodes need to be up and running 
before it has the capacity to replicate data into it. New datacenters (or 
lift-and-shifting a cluster via datacenter migration) therefore has to be done 
using a dummy keyspace that duplicates the replication strategy+factor of the 
real keyspace. This gets even more difficult come version 4.0, as the replica 
factor can not even be defined in new datacenters before those datacenters are 
up and running. 

These issues are removed by avoiding the keyspace definition and lookup, and 
presuming the replica strategy is by datacenter, ie NTS, with the introduction 
of an {{allocate_tokens_for_dc_rf}} option.

It may also be of value considering whether {{allocate_tokens_for_dc_rf=3}} 
becomes the default, as this is the replication factor for the vast majority of 
datacenters in production. I suspect this would be a good improvement over the 
existing randomly generated tokens algorithm.

Initial patch is available in 
[https://github.com/thelastpickle/cassandra/commit/fc4865b0399570e58f11215565ba17dc4a53da97]

The patch does not remove the existing {{allocate_tokens_for_keyspace}} option, 
as it still provides the codebase for handling different replication strategies.

  was:
Similar to option in DSE `allocate_tokens_for_local_replication_factor`

Currently the 
[ReplicationAwareTokenAllocator|https://www.datastax.com/dev/blog/token-allocation-algorithm]
 requires a defined keyspace and a replica factor specified in the current 
datacenter.

This is problematic in a number of ways. Come version 4.0 the replica factor 
can not be defined in new datacenters before those datacenters are up and 
running. Previously even real keyspaces could not be used as a new datacenter 
has to, in practice, have all its nodes up and running before it has the 
capacity to replicate data into it. New datacenters, or lift-and-shifting a 
cluster via datacenter migration, can be done using a dummy keyspace that 
duplicates the replication strategy and factor of the real keyspace.

This issues are reduced by avoiding the keyspace definition and lookup, and 
presuming the replica strategy is by datacenter, ie NTS, with the introduction 
of an `allocate_tokens_for_dc_rf` option.

It may also be of value considering whether `allocate_tokens_for_dc_rf=3` is 
the default, as this is the replication factor for the vast majority of 
datacenters in production. I suspect this would be a good improvement over the 
existing randomly generated tokens algorithm.

Initial patch is available in 
https://github.com/thelastpickle/cassandra/commit/fc4865b0399570e58f11215565ba17dc4a53da97


> Add `allocate_tokens_for_dc_rf` yaml option for token allocation
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-15260
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15260
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: mck
>            Assignee: mck
>            Priority: Normal
>
> Similar to DSE's option: {{allocate_tokens_for_local_replication_factor}}
> Currently the 
> [ReplicationAwareTokenAllocator|https://www.datastax.com/dev/blog/token-allocation-algorithm]
>  requires a defined keyspace and a replica factor specified in the current 
> datacenter.
> This is problematic in a number of ways. The real keyspace can not be used 
> when adding new datacenters as, in practice, all its nodes need to be up and 
> running before it has the capacity to replicate data into it. New datacenters 
> (or lift-and-shifting a cluster via datacenter migration) therefore has to be 
> done using a dummy keyspace that duplicates the replication strategy+factor 
> of the real keyspace. This gets even more difficult come version 4.0, as the 
> replica factor can not even be defined in new datacenters before those 
> datacenters are up and running. 
> These issues are removed by avoiding the keyspace definition and lookup, and 
> presuming the replica strategy is by datacenter, ie NTS, with the 
> introduction of an {{allocate_tokens_for_dc_rf}} option.
> It may also be of value considering whether {{allocate_tokens_for_dc_rf=3}} 
> becomes the default, as this is the replication factor for the vast majority 
> of datacenters in production. I suspect this would be a good improvement over 
> the existing randomly generated tokens algorithm.
> Initial patch is available in 
> [https://github.com/thelastpickle/cassandra/commit/fc4865b0399570e58f11215565ba17dc4a53da97]
> The patch does not remove the existing {{allocate_tokens_for_keyspace}} 
> option, as it still provides the codebase for handling different replication 
> strategies.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to