[ https://issues.apache.org/jira/browse/CASSANDRA-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984783#comment-14984783 ]
Jeff Jirsa edited comment on CASSANDRA-7306 at 11/2/15 9:52 PM: ---------------------------------------------------------------- I've implemented some of this, primarily for my own education (learning some of the internals of gossip better). I've approached this by creating a pluggable IDatacenterTopologyProvider, and implemented a full mesh, file-based whitelist, and file-based-blacklist provider. I then extended Gossiper to filter it's list of live endpoitns by calling the IDatacenterTopologyProvider instance to filter non-gossipable endpoints, which seems to fit the goal of this ticket. This enables not only hub/spoke, but arbitrary graphs of database connectivity. However, the ticket is pretty poorly defined in terms of behaviors. [~tupshin] , This ticket title mentions "more flexible gossip" - does this carry into requests/CL as well? What's the desired/expected behavior if a KS uses NTS to have rf=3 in dcs a,b, and c, but hosts in dc=b are set not to gossip with hosts in dc=c, and vice versa? CL=ALL fails, CL=QUORUM fills with a+b, and writes just assume all nodes in c are down? Or should it be smart enough to know that c is disconnected, and not count hosts in c towards quorum/ALL ? My primary hangup is finding the right way to notify the KS replication strategy to reload if the list of of whitelisted/blacklisted DCs changes. I know it's a solvable problem, but if it's out of scope, I won't waste time with it. I realize that this is a {{ponies}} ticket, and there's a ton of bike-shed/ponies opportunity here, but if we can get some consensus on definition, I can try to get this to a point where it can potentially be ready for real review. was (Author: jjirsa): This ticket title mentions "more flexible gossip" - does this carry into requests/CL as well? What's the desired/expected behavior if a KS uses NTS to have rf=3 in dcs {a,b,c}, but hosts in dc=b are set not to gossip with hosts in dc=c, and vice versa? CL=ALL fails, CL=QUORUM fills with a+b, and writes just assume all nodes in c are down? Or should it be smart enough to know that c is disconnected, and not count hosts in c towards quorum/ALL ? > Support "edge dcs" with more flexible gossip > -------------------------------------------- > > Key: CASSANDRA-7306 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7306 > Project: Cassandra > Issue Type: Improvement > Reporter: Tupshin Harper > Labels: ponies > > As Cassandra clusters get bigger and bigger, and their topology becomes more > complex, there is more and more need for a notion of "hub" and "spoke" > datacenters. > One of the big obstacles to supporting hundreds (or thousands) of remote dcs, > is the assumption that all dcs need to talk to each other (and be connected > all the time). > This ticket is a vague placeholder with the goals of achieving: > 1) better behavioral support for occasionally disconnected datacenters > 2) explicit support for custom dc to dc routing. A simple approach would be > an optional per-dc annotation of which other DCs that DC could gossip with. -- This message was sent by Atlassian JIRA (v6.3.4#6332)