[jira] [Commented] (CASSANDRA-3810) reconsider rack awareness

Dominique De Vito (JIRA) Fri, 21 Dec 2012 05:41:20 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537865#comment-13537865
 ]


Dominique De Vito commented on CASSANDRA-3810:
----------------------------------------------

> I think the correct way to do that would be to show the percentage of the 
> total ring that the node owns for each of the defined keyspaces. 

I think there is a complementary other way: it's about using SimpleStrategy 
with RackInferringSnitch or PropertyFileSnitch.

The replica placement strategy remains SimpleStrategy current behavior.
The idea is that with such snitchs, nodetool could provide a simple placement 
diagnostic: it could tell 
- "optimal" : if all replica are on different racks 
- "sub-optimal" : if some replica are on the same rack, while there are replica 
on different racks
- "spof" : if all replica are on the same rack
- "?" : if the snitch doesn't provide enough information to make a diagnostic

                
> reconsider rack awareness
> -------------------------
>
>                 Key: CASSANDRA-3810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> We believed we wanted to be rack aware because we want to ensure that loosing 
> a rack only affects a single replica of any given row key.
> When using rack awareness, the first problem you encounter immediately if you 
> aren't careful is that you induce hotspots as a result of rack aware replica 
> selection. Using the format {{rackname-nodename}}, consider a part of the 
> ring that looks like this:
> {code}
> ...
> r1-n1
> r1-n2
> r1-n3
> r2-n1
> r3-n1
> r4-n1
> ...
> {code}
> Due to the rack awareness, {{r2-n1}} will be the second replica for all data 
> whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they 
> would all be forced to skip over any identical racks.
> The way we end up allocating nodes in a cluster is to satisfy this criteria:
> * Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must 
> not have another node in {{r}} within {{rf-1}} steps in the ring in either 
> direction.
> Any violation of this criteria implies the induction of hotspots due to rack 
> awareness.
> The realization however, that I had a few days ago, is that *the 
> rackawareness is not actually changing replica placement* when using this 
> ring topology. In other words, *the way you have to use* rack awareness is to 
> construct the ring such that *the rack awareness is a NOOP*.
> So, questions:
> * Is there any non-hotspot inducing use-case where rack awareness can be used 
> ("used" in the sense that it actually changes the placement relative to 
> non-awareness) effectively without satisfying the criteria above?
> * Is it misleading and counter-productive to teach people (via documentation 
> for example) to rely on rack awareness in their rings instead of just giving 
> them the rule above for ring topology?
> * Would it be a better service to the user to provide an easy way to *ensure* 
> that the ring topology adheres to this criteria (such as refusing to 
> bootstrap a new node if rack awareness is requested, and taking it into 
> consideration on automatic token selection (does anyone use that?)), than to 
> "silently" generate hotspots by altering the replication strategy? (The 
> "silence" problem is magnified by the fact that {{nodetool ring}} doesn't 
> reflect this; so the user must take into account both the RF *and* the racks 
> when interpreting {{nodetool ring}} output.)
> FWIW, internally we just go with the criteria outlined above, and we have a 
> separate tool which will print the *actual* ownership percentage of a node in 
> the ring (based on the thrift {{describe_ring}} call). Any ring that has node 
> selections that causes a violation of the criteria is effectively a 
> bug/mis-configured ring, so only in the event of mistakes are we "using" the 
> rack awareness (using the definition of "use" above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3810) reconsider rack awareness

Reply via email to