On 6/7/2013 12:31 PM, Brian Candler wrote:
> On 07/06/2013 10:41, Jakob Bohm wrote:
>>
>> Which brings us to the option of setting up a file system where two or
>> more servers share physical storage (via NBD, iSCSI, fibre channel etc.)
>> and synchronize file locking etc. transparently.  I have no good 
>> overview
>> of such systems
> This is apparently the case :-)
>> , but know to avoid the system promoted by Red Hat
>> (because the academic team that created it has a crazy idea about
>> deliberately crashing servers in order to solve synchronization issues,
>> they call it "fencing").
> Youe need to read about "split brain". This is when you have two 
> copies of the data which are being written to independently, so that 
> they diverge. If these are two copies of a filesystem it's almost 
> impossible to recover from - you would typically end up throwing away 
> all the writes made to one filesystem or the other. Worse, having two 
> servers writing to the same filesystem without being aware of the 
> other would result in total filesystem corruption. Hence it's critical 
> to avoid getting into this state in the first place, as far as is 
> possible.
>
> And yes, ensuring that the other server is dead is an accepted way to 
> do this. It's called STONITH - Shoot The Other Node In The Head. 
> Killing a server is less bad than having corrupted data.
But accidentally killing the only working server is even worse, and that
is what you risk with systems that are so hell-bent on using STONITH as
the only recovery mechanism.

There are much saner ways to arrange redundant systems so that split
brain syndrome does not occur.  The most well known is to have at least 3
servers, and arranging the protocols so nothing will happen without a
majority of those servers being connected to each other and agreeing to
handle the request.  Any partitioned off minority will not be able to do
anything bad even though they are not shut in the head, which means they
will recover gracefully as soon as connectivity is reestablished.

The junk that the academics foisted on Red Hat goes out of its way to
make recovery difficult.  For example, unmounting the shared file system
and then mounting it again will not help because they insist you must
reboot all other parts of the affected server for no good reason at all.



Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  http://www.wisemo.com
Transformervej 29, 2730 Herlev, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded


------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Courier-imap mailing list
Courier-imap@lists.sourceforge.net
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-imap

Reply via email to