> > Assume we have 2 nodes. > > 1. Node A & B reach step 3) in the same time. > > 2. sfex_lock on Node B is scheduled out due to some other reasons. > > 3. sfex_lock on Node A goes through step 3 to 6, and Node A holds > > the lock now.
Node A is sure to hold the lock at this moment. sfex_lock() is going to return the value 0, and RA will start monitoring on Node A. during the monitor operation, sfex_update() is running, and it can check and update the status of Node A. If Node B updates the lock status _at just the right moment_, sfex_update() detects that the other node is trying to update its status, and it will be terminated with exit(2). > > 4. sfex_lock on Node B is scheduled back, and goes through step 3 to > > 6 also. RA monitor on Node A will also be stopped. Node B can get the lock during a situation like this. > This statement is wrong according to your code. > Especially, your check-and-reserve is not an atomic CAS operation. By the way, the lock status stores on the partition, (not using file system) so, as a communication media, it can keep read-write operation atomicity. All nodes' action, like read (check) or write (reserve) the status won't bump against each other. inconsequent remark? Thanks, Junko _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/