2007/8/13, Junko IKEDA <[EMAIL PROTECTED]>: > > > Assume we have 2 nodes. > > > 1. Node A & B reach step 3) in the same time. > > > 2. sfex_lock on Node B is scheduled out due to some other reasons. > > > 3. sfex_lock on Node A goes through step 3 to 6, and Node A holds > > > the lock now. > > Node A is sure to hold the lock at this moment. > sfex_lock() is going to return the value 0, and RA will start monitoring on > Node A. > during the monitor operation, sfex_update() is running, and it can check and > update the status of Node A. > > If Node B updates the lock status _at just the right moment_, > sfex_update() detects that the other node is trying to update its status, > and it will be terminated with exit(2). This time window is enough to destroy all data if you are bad luck ;-(
> > > 4. sfex_lock on Node B is scheduled back, and goes through step 3 to > > > 6 also. > > RA monitor on Node A will also be stopped. > Node B can get the lock during a situation like this. > > > This statement is wrong according to your code. > > Especially, your check-and-reserve is not an atomic CAS operation. > > By the way, the lock status stores on the partition, (not using file system) > so, as a communication media, it can keep read-write operation atomicity. > All nodes' action, like read (check) or write (reserve) the status won't > bump against each other. > inconsequent remark? Yes, but still, the CAS operation is not atomic unless we do some tricks like scsi reservation. > Thanks, > Junko > > _______________________________________________________ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/