[ 
https://issues.apache.org/jira/browse/CASSANDRA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228120#comment-13228120
 ] 

Tyler Hobbs commented on CASSANDRA-3810:
----------------------------------------

bq. I would be in favor of removing the current concept of rack awareness and 
better documenting the way to achieve distribution among racks...

I agree, although we'd presumably want to deprecate NTS (similar to what was 
done for ONTS) and make a new MultiDCStrategy.

bq. I'd rather see nodetool get fixed so that imbalances can easily be seen...

I think the correct way to do that would be to show the percentage of the total 
ring that the node owns for each of the defined keyspaces.  You could add a 
column to the output for each keyspace, but since the current output is already 
fairly wide, perhaps a separate table below the current output or a separate 
command entirely would be appropriate.
                
> reconsider rack awareness
> -------------------------
>
>                 Key: CASSANDRA-3810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> We believed we wanted to be rack aware because we want to ensure that loosing 
> a rack only affects a single replica of any given row key.
> When using rack awareness, the first problem you encounter immediately if you 
> aren't careful is that you induce hotspots as a result of rack aware replica 
> selection. Using the format {{rackname-nodename}}, consider a part of the 
> ring that looks like this:
> {code}
> ...
> r1-n1
> r1-n2
> r1-n3
> r2-n1
> r3-n1
> r4-n1
> ...
> {code}
> Due to the rack awareness, {{r2-n1}} will be the second replica for all data 
> whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they 
> would all be forced to skip over any identical racks.
> The way we end up allocating nodes in a cluster is to satisfy this criteria:
> * Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must 
> not have another node in {{r}} within {{rf-1}} steps in the ring in either 
> direction.
> Any violation of this criteria implies the induction of hotspots due to rack 
> awareness.
> The realization however, that I had a few days ago, is that *the 
> rackawareness is not actually changing replica placement* when using this 
> ring topology. In other words, *the way you have to use* rack awareness is to 
> construct the ring such that *the rack awareness is a NOOP*.
> So, questions:
> * Is there any non-hotspot inducing use-case where rack awareness can be used 
> ("used" in the sense that it actually changes the placement relative to 
> non-awareness) effectively without satisfying the criteria above?
> * Is it misleading and counter-productive to teach people (via documentation 
> for example) to rely on rack awareness in their rings instead of just giving 
> them the rule above for ring topology?
> * Would it be a better service to the user to provide an easy way to *ensure* 
> that the ring topology adheres to this criteria (such as refusing to 
> bootstrap a new node if rack awareness is requested, and taking it into 
> consideration on automatic token selection (does anyone use that?)), than to 
> "silently" generate hotspots by altering the replication strategy? (The 
> "silence" problem is magnified by the fact that {{nodetool ring}} doesn't 
> reflect this; so the user must take into account both the RF *and* the racks 
> when interpreting {{nodetool ring}} output.)
> FWIW, internally we just go with the criteria outlined above, and we have a 
> separate tool which will print the *actual* ownership percentage of a node in 
> the ring (based on the thrift {{describe_ring}} call). Any ring that has node 
> selections that causes a violation of the criteria is effectively a 
> bug/mis-configured ring, so only in the event of mistakes are we "using" the 
> rack awareness (using the definition of "use" above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to