[ 
https://issues.apache.org/jira/browse/CASSANDRA-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984783#comment-14984783
 ] 

Jeff Jirsa edited comment on CASSANDRA-7306 at 11/2/15 9:52 PM:
----------------------------------------------------------------

I've implemented some of this, primarily for my own education (learning some of 
the internals of gossip better). I've approached this by creating a pluggable 
IDatacenterTopologyProvider, and implemented a full mesh, file-based whitelist, 
and file-based-blacklist provider. I then extended Gossiper to filter it's list 
of live endpoitns by calling the IDatacenterTopologyProvider instance to filter 
non-gossipable endpoints, which seems to fit the goal of this ticket. This 
enables not only hub/spoke, but arbitrary graphs of database connectivity. 

However, the ticket is pretty poorly defined in terms of behaviors. 

[~tupshin] , This ticket title mentions "more flexible gossip" -  does this 
carry into requests/CL as well? What's the desired/expected behavior if a KS 
uses NTS to have rf=3 in dcs a,b, and c, but hosts in dc=b are set not to 
gossip with hosts in dc=c, and vice versa? CL=ALL fails, CL=QUORUM fills with 
a+b, and writes just assume all nodes in c are down? Or should it be smart 
enough to know that c is disconnected, and not count hosts in c towards 
quorum/ALL ? 

My primary hangup is finding the right way to notify the KS replication 
strategy to reload if the list of of whitelisted/blacklisted DCs changes. I 
know it's a solvable problem, but if it's out of scope, I won't waste time with 
it. I realize that this is a {{ponies}} ticket, and there's a ton of 
bike-shed/ponies opportunity here, but if we can get some consensus on 
definition, I can try to get this to a point where it can potentially be ready 
for real review. 


was (Author: jjirsa):
This ticket title mentions "more flexible gossip" -  does this carry into 
requests/CL as well? What's the desired/expected behavior if a KS uses NTS to 
have rf=3 in dcs {a,b,c}, but hosts in dc=b are set not to gossip with hosts in 
dc=c, and vice versa? CL=ALL fails, CL=QUORUM fills with a+b, and writes just 
assume all nodes in c are down? Or should it be smart enough to know that c is 
disconnected, and not count hosts in c towards quorum/ALL ? 


> Support "edge dcs" with more flexible gossip
> --------------------------------------------
>
>                 Key: CASSANDRA-7306
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7306
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Tupshin Harper
>              Labels: ponies
>
> As Cassandra clusters get bigger and bigger, and their topology becomes more 
> complex, there is more and more need for a notion of "hub" and "spoke" 
> datacenters.
> One of the big obstacles to supporting hundreds (or thousands) of remote dcs, 
> is the assumption that all dcs need to talk to each other (and be connected 
> all the time).
> This ticket is a vague placeholder with the goals of achieving:
> 1) better behavioral support for occasionally disconnected datacenters
> 2) explicit support for custom dc to dc routing. A simple approach would be 
> an optional per-dc annotation of which other DCs that DC could gossip with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to