Hi,

On Thu, Jun 19, 2008 at 09:26:13PM +0800, Xinwei Hu wrote:
> 2008/6/19 Keisuke MORI <[EMAIL PROTECTED]>:
> > Hi,
> >
> > "Xinwei Hu" <[EMAIL PROTECTED]> writes:
> >> I'm the one who opposed sfex in the previous discussion.
> >>
> >> My point was simple that:
> >> """"
> >> check-and-reserve on disk is not an atomic CAS operation. and lock
> >> based on that may silently cause data corruption.
> >> """
> >
> > sfex doest not rely on the atomicity of "check-and-reserve".
> > It's always _overwriting_ the control data and the detection of
> > losing the ownership is done by timeout based.
> >
> >
> > Indeed it can happen that two nodes try to write the control
> > data at a same time in a particular condition, but
> >
> > 1) Such situation will not happen on the scenario of the typical
> >   split-brain condition with sfex. It only can happen in a
> >   particular condition such as a miss-operation that trys to
> >   launch two nodes simultaneously _without_ fixing the
> >   split-brain condition.
> >
> > 2) Even if such situation had occured, sfex resolves it as follows;
> >   - sfex always writes its control data as "one sector" data
> >     (512 bytes in most of cases) through the direct I/O.
> >     That would be a single write request to the disk controller.
> >   - If two nodes tried to write the data at a same time,
> >     the request will be serialized in the disk controller, so
> >     'the latter one' will win.
> >   - sfex makes sure that the written data is "mine" and
> >     the "loser" will return an error to prevent from lauching resources.
> >
> >
> >
> > Does it explain to you?
> 
> No.
> Your basic assumption is that sfex can run in a deterministic
> environment. Right ?
> I think so because sfex totally relies on predicable execution time.
> But Linux (for example) indeed is not such an environment, as the
> process can be scheduled out at _any_ point for _any_ time.

True. It is possible to break sfex, but the probability that that
is going to happen is extremely low and could be due only to a
very pathological timing. One way to make this probability still
lower is to implement sfex as a combination of a resource and a
daemon process which would lock itself in memory and send
asynchronous monitor failures to lrmd.

BTW, asynchronous monitor API has just been implemented and waits
for its first users :).

> And this is an essential problem due to the lack of CAS operation for disk.
> 
> btw: dskcm is lockless because of the same problem.

I'm going to look into dskcm. Has it been used? Any field experience?

Thanks,

Dejan
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to