On Thu, Aug 13, 2009 at 10:37 AM, Konstantin Shvachko <s...@yahoo-inc.com>wrote:
> Steve, > > There are other groups claimed they work on HA solution. > We had discussions about it not so long ago in this list. > Is it possible that your colleagues present their design? > As you point out the issue gets fairly complex fast, > particularly because of the split-brain problem you describe. > IMHO the split-brain problem is why failover has to either be triggered manually, or has to be done by an external system like Linux-HA where you can get multiple media connecting the two masters. In the past I've done this for firewalls and DB servers using a null modem serial connection plus a crossover plus pings over the LAN - with 3 separate heartbeats it's very tough to get a split brain. If you absolutely must avoid it, you can also trigger a "STONITH" policy: http://linux-ha.org/STONITH > > There are several jiras dedicated to the problem already. > You can post your design there or create a new one. > > > Looking at the facebook/google "multi-master" solution, I think they > > don't worry about consistency, just let the masters drift apart. > > Not sure I follow this. > What facebook/google "multi-master" solution? > Why would they not worry about consistency? > Consistency of what? > > Thanks, > --Konstantin > > > Steve Loughran wrote: > >> Konstantin Shvachko wrote: >> >>> And the only remaining step is to implement fail-over mechanism. >>> >> >> :) >> >> Colleagues of mine work on HA stuff; I try and steer clear of it as it >> gets complex fast. Test case: what happens when a network failure splits >> the datacentre in two, you now have two clusters each with half the data and >> possibly a primary/2ary master in each one. Then leave the partition up for >> a while, do inconsistent operations on each then have the network come back >> up. Then work out how to merge the state >> >> Looking at the facebook/google "multi-master" solution, I think they don't >> worry about consistency, just let the masters drift apart. >> >> see also Johan's recent talk on HDFS: >> http://www.slideshare.net/steve_l/hdfs >> >>