Junko IKEDA wrote: >> OK. I think you are mis-understanding the problem. >> >> When the communication between Node A & B is fine, you don't need any >> kind of lock. Heartbeat itself can ensure the resource runs on one > selected >> node, and on one node only. > > sfex_lock() is just checking the status that shows which node succeeded to > lock. > It won't be always trying to lock over and over again > >> sfex_lock is valuable when the communication between A & B is broken. >> But when the communication IS broken, you can't assume sfex_lock will run >> in order any more. > > If the interconnect LAN is down, Split-Brain will come. > the lock status is reserved for Node A at this moment, > but Node B is also trying to update the status in order to lock because > Split-Brain has arisen. > while Node A checks the status, Node B might update it. > Node A, which is overwrote its status, is going to release the lock. > sfex_lock() doesn't have such a complex logic.
I believe that the point he was trying to make is that it _needs_ the complexity of the logic to be always correct even in the split-brain case - and I agree. If this logic fails and both sides think they have exclusive access in a split-brain case, then a filesystem on disk may be destroyed. This is a _very_ bad consequence - much worse than a crash. It doesn't matter if it is relatively unlikely, because the consequence is so terrible. With hundreds of thousands of clusters running Heartbeat, even unlikely events eventually happen. http://linux-ha.org/BadThingsWillHappen You should be able to run hundreds of thousands or millions of tests where both sides are trying to get the lock at the same time, and be able to verify that only one side got the lock - in every single case. Please don't be discouraged. Horms started a similar effort a few years ago, but he wasn't able to spend enough time with it to get it right. What you're doing is a valuable thing to do, and we all understand very well that it's difficult. When I first entered this discussion, I mentioned lockless synchronization algorithms as being good things to study. In this case, we are trying to create a lock, but I suspect the lockless methods would be a good way to synchronize the creation of a lock (even though this sounds odd). -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/