[ https://issues.apache.org/jira/browse/CASSANDRA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537865#comment-13537865 ]
Dominique De Vito commented on CASSANDRA-3810: ---------------------------------------------- > I think the correct way to do that would be to show the percentage of the > total ring that the node owns for each of the defined keyspaces. I think there is a complementary other way: it's about using SimpleStrategy with RackInferringSnitch or PropertyFileSnitch. The replica placement strategy remains SimpleStrategy current behavior. The idea is that with such snitchs, nodetool could provide a simple placement diagnostic: it could tell - "optimal" : if all replica are on different racks - "sub-optimal" : if some replica are on the same rack, while there are replica on different racks - "spof" : if all replica are on the same rack - "?" : if the snitch doesn't provide enough information to make a diagnostic > reconsider rack awareness > ------------------------- > > Key: CASSANDRA-3810 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3810 > Project: Cassandra > Issue Type: Task > Reporter: Peter Schuller > Assignee: Peter Schuller > Priority: Minor > > We believed we wanted to be rack aware because we want to ensure that loosing > a rack only affects a single replica of any given row key. > When using rack awareness, the first problem you encounter immediately if you > aren't careful is that you induce hotspots as a result of rack aware replica > selection. Using the format {{rackname-nodename}}, consider a part of the > ring that looks like this: > {code} > ... > r1-n1 > r1-n2 > r1-n3 > r2-n1 > r3-n1 > r4-n1 > ... > {code} > Due to the rack awareness, {{r2-n1}} will be the second replica for all data > whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they > would all be forced to skip over any identical racks. > The way we end up allocating nodes in a cluster is to satisfy this criteria: > * Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must > not have another node in {{r}} within {{rf-1}} steps in the ring in either > direction. > Any violation of this criteria implies the induction of hotspots due to rack > awareness. > The realization however, that I had a few days ago, is that *the > rackawareness is not actually changing replica placement* when using this > ring topology. In other words, *the way you have to use* rack awareness is to > construct the ring such that *the rack awareness is a NOOP*. > So, questions: > * Is there any non-hotspot inducing use-case where rack awareness can be used > ("used" in the sense that it actually changes the placement relative to > non-awareness) effectively without satisfying the criteria above? > * Is it misleading and counter-productive to teach people (via documentation > for example) to rely on rack awareness in their rings instead of just giving > them the rule above for ring topology? > * Would it be a better service to the user to provide an easy way to *ensure* > that the ring topology adheres to this criteria (such as refusing to > bootstrap a new node if rack awareness is requested, and taking it into > consideration on automatic token selection (does anyone use that?)), than to > "silently" generate hotspots by altering the replication strategy? (The > "silence" problem is magnified by the fact that {{nodetool ring}} doesn't > reflect this; so the user must take into account both the RF *and* the racks > when interpreting {{nodetool ring}} output.) > FWIW, internally we just go with the criteria outlined above, and we have a > separate tool which will print the *actual* ownership percentage of a node in > the ring (based on the thrift {{describe_ring}} call). Any ring that has node > selections that causes a violation of the criteria is effectively a > bug/mis-configured ring, so only in the event of mistakes are we "using" the > rack awareness (using the definition of "use" above). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira