[ 
https://issues.apache.org/jira/browse/CASSANDRA-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892392#action_12892392
 ] 

Jeremy Hanna commented on CASSANDRA-1066:
-----------------------------------------

There was some discussion on whether the replication strategy should be part of 
the keyspace metadata.  I'll try to distill my thoughts on why I think it 
should in here.

Problem:
Currently the DatacenterShardStrategy uses configuration options found in a 
properties file.  Work has been done to reload this automatically.  However, 
what happens when 0.7 adds dynamic keyspaces?  A client can't just add a 
keyspace with DSS.  They would first have to update their DSS configuration 
file to include settings for the new keyspace.  Then they would refresh that 
configuration and create their keyspace.  That's pretty onerous for a client to 
have to do.

Also currently, the replication strategies are separate from the keyspace 
metadata, even though there is a 1:1 relationship between KSM and the 
replication strategy.  That results in various utility methods and special 
casing in StorageService and DatabaseDescriptor to handle these separate quasi 
singleton replication strategies.  For example, the special cases to init and 
clean replication strategies when setting and clearing table definitions in 
DatabaseDescriptor.  In the past we've done it this way because it's worked and 
because there was no state for a strategy.  Now there is - DSS configuration.

Solution:
I proposed making RS an instance variable of KSM.  It would do in a more direct 
way what we had previously been doing in a round about way - maintaining their 
1:1 relationship more cleanly.  It's been said that the KSM should only contain 
only storage data.  Currently we already store the replication strategy class.  
The configuration options are the only thing that would be added in this 
scenario.  When serializing and deserializing, we just store the keyspace name, 
class name, and configuration options (Map<String,String>).  This is immutable 
data.  The TokenMetadata and Snitch are just references to the current TM and 
Snitch.  Every time a KSM is deserialized, it just gets the current TM and 
Snitch along with the other info to create a new RS instance.

Alternatives?
Are there any alternatives to doing it this way?  We could possibly extend what 
we're doing with the external model for replication strategies so that they 
would include state.  That would make them external to the KSM but be specific 
for each KSM (removing the quasi-singleton behavior).  That would be less of a 
change, but seems more hackish to me.

I would be welcome to other alternatives.  I just think a dynamic/automate-able 
way to create keyspaces shouldn't need to be handled specially for those using 
the DSS in 0.7.

> DatacenterShardStrategy needs enforceable and keyspace based RF
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-1066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 1066-changes-patch.txt, 1066.txt
>
>
> Currently, the DatacenterShardStrategy reads in a properties file - 
> datacenters.properties - to get a per-datacenter replication factor.  So any 
> keyspace that is using the DSS in the cluster is using that same properties 
> file to configure its replication factor.  The implementation doesn't take 
> into account the per-keyspace replication factor, but it is assumed that the 
> sum of all the datacenter RF values equals the per-keyspace replication value 
> that is part of the keyspace metadata.
> It seems that an improvement could be two-fold:
> 1. Enforce the replication factor for the keyspace as always equal the sum of 
> all the datacenter RF values.  Otherwise, if they aren't equal, bad things 
> (tm) can happen.
> 2. Make the datacenter RF values part of the keyspace metadata rather than a 
> global value.  Again, currently if any keyspace in the cluster is configured 
> to use DSS, it will be using the global DC RF values found in the properties 
> file.  An improvement could be to instead of having the properties file, 
> configure that on a per keyspace basis.  That would make the cluster more 
> multi-tenant friendly so it could be flexible with multiple keyspaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to