2007/8/13, Junko IKEDA <[EMAIL PROTECTED]>:
> > > Assume we have 2 nodes.
> > > 1. Node A & B reach step 3) in the same time.
> > > 2. sfex_lock on Node B is scheduled out due to some other reasons.
> > > 3. sfex_lock on Node A goes through step 3 to 6, and Node A holds
> > > the lock now.
>
> Node A is sure to hold the lock at this moment.
> sfex_lock() is going to return the value 0, and RA will start monitoring on
> Node A.
> during the monitor operation, sfex_update() is running, and it can check and
> update the status of Node A.
>
> If Node B updates the lock status _at just the right moment_,
> sfex_update() detects that the other node is trying to update its status,
> and it will be terminated with exit(2).
This time window is enough to destroy all data if you are bad luck ;-(

> > > 4. sfex_lock on Node B is scheduled back, and goes through step 3 to
> > > 6 also.
>
> RA monitor on Node A will also be stopped.
> Node B can get the lock during a situation like this.
>
> > This statement is wrong according to your code.
> > Especially, your check-and-reserve is not an atomic CAS operation.
>
> By the way, the lock status stores on the partition, (not using file system)
> so, as a communication media, it can keep read-write operation atomicity.
> All nodes' action, like read (check) or write (reserve) the status won't
> bump against each other.
> inconsequent remark?
Yes, but still, the CAS operation is not atomic unless we do some tricks like
scsi reservation.

> Thanks,
> Junko
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to