> > However, on some occasions, I observe that node A continues in the loop > > believing that it is successfully writing to the file
node A has the exclusive lock, so it continues writing... > > but, according to > > node C, the file stops being updated. (Meanwhile, the file written by > > node B continues to be up-to-date as read by C.) This is concerning -- > > it looks like I/O writes are being completed on node A even though other > > nodes in the cluster cannot see the results. Is node C blocked trying to read the file A is writing? That what we'd expect until recovery has removed node A. Or are C's reads completing while A continues writing the file? That would not be correct. > However, if A happens to own the DLM lock, it does not need > to ask DLM's permission because it owns the lock. Therefore, it goes > on writing. Meanwhile, the other node can't get DLM's permission to > get the lock back, so it hangs. The description sounds like C might not be hanging in read as we'd expect while A continues writing. If that's the case, then it implies that dlm recovery has been completed by nodes B and C (removing A), which allows the lock to be granted to C for reading. If dlm recovery on B/C has completed, it means that A should have been fenced, so A should not be able to write once C is given the lock. Dave -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster