On Nov 1, 2013, at 7:41 AM, Bela Ban <[email protected]> wrote: > > > On 10/31/13 10:50 PM, Erik Salter wrote: >> Thanks for the wiki and the discussion. Since I'm encountering this in the >> wild, allow me to offer my thoughts. >> >> Basically my feeling is that no matter what you do in split-brain handling, >> it's going to be wrong. >> In my use case, I have individual blades where each blade runs a suite of >> application nodes; one of which is a data grid node. Each node is >> single-homed. And they wire into the same switch. This setup is orthogonal >> across a data center (WAN). In this deployment, these two DCs make up a >> single cluster. There is a concept of a set of keys for my caches being >> "owned" by a site, i.e. only one set of clients will access these keys. >> These keys are striped across the WAN with a TACH. >> >> So a split brain on a local data center only can occur when a NIC on one of >> the blades goes bad and the node is still running. The merge will always be >> of the [subgroups=N-1, 1] variety, where N is the number of running nodes in >> the cluster. Since these nodes are single-homed, they cannot receive >> requests if they are "offline" from the NIC. I don't have to worry about >> state collision, but I DO have to worry about stale state from the merged >> node. > > In my experience, partitions are almost never caused by malfunctioning > hardware, but by GC pauses, high CPU spikes and other blocking behavior > which causes FD/FD_ALL to falsely suspect a node. > >> In this case, it's easy to tell when I might be in a split-brain. The FD >> protocol will suspect and exclude a node. Currently, though, I have no way >> of knowing how or why a node was excluded. > > We *could* detect graceful leaves... this would narrow the exclusion > cases to crashed and partitioned nodes. > >> If the WAN goes down, I have a rather large problem.First off is >> detection. If there's an ACL blockage, or worse, a unidirectional outage >> (i.e. east can see west, but not vice-versa), it takes the cluster a minute >> (really, about 60 seconds) to figure things out. > > 1 minute because FD/FD_ALL is configured to 60s correct ? > > I would definitely *not* lower this threshold, as we don't want entire > sites to be falsely suspected, only to later get merged back. > >> One side will have spurious MERGEs, the other side will have leaves from >> either FD_SOCK or FD. > > You're not referring to xsite here, are you ? This is your striped > architecture, where you have a set of TCP-based stripes (clusters) > *across* sites, right ? > > If so, having a cluster spread across sites is challenging, to say the > least. The risk of partitions is greater than for a purely local > cluster, as the WAN increases the risk of intermediate switches > crashing, failure detection messages to get lost or delayed by high > latency etc.
Erik, why not using using xsite in this deployment? Lack of state transfer perhaps? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
