Hi, On Thu, Jun 19, 2008 at 09:26:13PM +0800, Xinwei Hu wrote: > 2008/6/19 Keisuke MORI <[EMAIL PROTECTED]>: > > Hi, > > > > "Xinwei Hu" <[EMAIL PROTECTED]> writes: > >> I'm the one who opposed sfex in the previous discussion. > >> > >> My point was simple that: > >> """" > >> check-and-reserve on disk is not an atomic CAS operation. and lock > >> based on that may silently cause data corruption. > >> """ > > > > sfex doest not rely on the atomicity of "check-and-reserve". > > It's always _overwriting_ the control data and the detection of > > losing the ownership is done by timeout based. > > > > > > Indeed it can happen that two nodes try to write the control > > data at a same time in a particular condition, but > > > > 1) Such situation will not happen on the scenario of the typical > > split-brain condition with sfex. It only can happen in a > > particular condition such as a miss-operation that trys to > > launch two nodes simultaneously _without_ fixing the > > split-brain condition. > > > > 2) Even if such situation had occured, sfex resolves it as follows; > > - sfex always writes its control data as "one sector" data > > (512 bytes in most of cases) through the direct I/O. > > That would be a single write request to the disk controller. > > - If two nodes tried to write the data at a same time, > > the request will be serialized in the disk controller, so > > 'the latter one' will win. > > - sfex makes sure that the written data is "mine" and > > the "loser" will return an error to prevent from lauching resources. > > > > > > > > Does it explain to you? > > No. > Your basic assumption is that sfex can run in a deterministic > environment. Right ? > I think so because sfex totally relies on predicable execution time. > But Linux (for example) indeed is not such an environment, as the > process can be scheduled out at _any_ point for _any_ time.
True. It is possible to break sfex, but the probability that that is going to happen is extremely low and could be due only to a very pathological timing. One way to make this probability still lower is to implement sfex as a combination of a resource and a daemon process which would lock itself in memory and send asynchronous monitor failures to lrmd. BTW, asynchronous monitor API has just been implemented and waits for its first users :). > And this is an essential problem due to the lack of CAS operation for disk. > > btw: dskcm is lockless because of the same problem. I'm going to look into dskcm. Has it been used? Any field experience? Thanks, Dejan _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/