Re: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss

2016-04-18 Thread Jonathan Davies



On 15/04/16 17:14, David Teigland wrote:

However, on some occasions, I observe that node A continues in the loop
believing that it is successfully writing to the file


node A has the exclusive lock, so it continues writing...


but, according to
node C, the file stops being updated. (Meanwhile, the file written by
node B continues to be up-to-date as read by C.) This is concerning --
it looks like I/O writes are being completed on node A even though other
nodes in the cluster cannot see the results.


Is node C blocked trying to read the file A is writing?  That what we'd
expect until recovery has removed node A.  Or are C's reads completing
while A continues writing the file?  That would not be correct.


However, if A happens to own the DLM lock, it does not need
to ask DLM's permission because it owns the lock. Therefore, it goes
on writing. Meanwhile, the other node can't get DLM's permission to
get the lock back, so it hangs.


The description sounds like C might not be hanging in read as we'd expect
while A continues writing.  If that's the case, then it implies that dlm
recovery has been completed by nodes B and C (removing A), which allows
the lock to be granted to C for reading.  If dlm recovery on B/C has
completed, it means that A should have been fenced, so A should not be
able to write once C is given the lock.


Thanks Bob and Dave for your very helpful insights.

Your line of reasoning led me to realise that I am running dlm with 
fencing disabled, which explains everything. Node C was not hanging in 
read while A continued to write; it was constantly returning an old 
value. I presume that's legitimate as C believes the value it saw last 
must still be up-to-date because A must have been fenced so couldn't 
have updated it. (It also explains why I didn't see anything useful in 
the logs.)


When I run the same test with fencing enabled then, although A continues 
writing after the failure, the read on C hangs until A is fenced, at 
which point it is able to read the last value A wrote. That's exactly 
what I want.


Apologies for the noise, and thanks for the explanations.

Jonathan

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss

2016-04-15 Thread David Teigland
> > However, on some occasions, I observe that node A continues in the loop
> > believing that it is successfully writing to the file

node A has the exclusive lock, so it continues writing...

> > but, according to
> > node C, the file stops being updated. (Meanwhile, the file written by
> > node B continues to be up-to-date as read by C.) This is concerning --
> > it looks like I/O writes are being completed on node A even though other
> > nodes in the cluster cannot see the results.

Is node C blocked trying to read the file A is writing?  That what we'd
expect until recovery has removed node A.  Or are C's reads completing
while A continues writing the file?  That would not be correct.

> However, if A happens to own the DLM lock, it does not need
> to ask DLM's permission because it owns the lock. Therefore, it goes
> on writing. Meanwhile, the other node can't get DLM's permission to
> get the lock back, so it hangs.

The description sounds like C might not be hanging in read as we'd expect
while A continues writing.  If that's the case, then it implies that dlm
recovery has been completed by nodes B and C (removing A), which allows
the lock to be granted to C for reading.  If dlm recovery on B/C has
completed, it means that A should have been fenced, so A should not be
able to write once C is given the lock.

Dave

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss

2016-04-15 Thread Bob Peterson
- Original Message -
> Dear linux-cluster,
> 
> I have made some observations about the behaviour of gfs2 and would
> appreciate confirmation of whether this is expected behaviour or
> something has gone wrong.
> 
> I have a three-node cluster -- let's call the nodes A, B and C. On each
> of nodes A and B, I have a loop that repeatedly writes an increasing
> integer value to a file in the GFS2-mountpoint. On node C, I have a loop
> that reads from both these files from the GFS2-mountpoint. The reads on
> node C show the latest values written by A and B, and stay up-to-date.
> All good so far.
> 
> I then cause node A to drop the corosync heartbeat by executing the
> following on node A:
> 
> iptables -I INPUT -p udp --dport 5404 -j DROP
> iptables -I INPUT -p udp --dport 5405 -j DROP
> iptables -I INPUT -p tcp --dport 21064 -j DROP
> 
> After a few seconds, I normally observe that all I/O to the GFS2
> filesystem hangs forever on node A: the latest value read by node C is
> the same as the last successful write by node A. This is exactly the
> behaviour I want -- I want to be sure that node A never completes I/O
> that is not able to be seen by other nodes.
> 
> However, on some occasions, I observe that node A continues in the loop
> believing that it is successfully writing to the file but, according to
> node C, the file stops being updated. (Meanwhile, the file written by
> node B continues to be up-to-date as read by C.) This is concerning --
> it looks like I/O writes are being completed on node A even though other
> nodes in the cluster cannot see the results.
> 
> I performed this test 20 times, rebooting node A between each, and saw
> the "I/O hanging" behaviour 16 times and the "I/O appears to continue"
> behaviour 4 times. I couldn't see anything that might cause it to
> sometimes adopt one behaviour and sometimes the other.
> 
> So... is this expected? Should I be able to rely upon I/O hanging? Or
> have I misconfigured something? Advice would be appreciated.
> 
> Thanks,
> Jonathan

Hi Jonathan,

This seems like expected behavior to me. It probably all goes back to
whatever node "masters" the glock and the node that "owns" the glock,
when communications are lost.

In your test, the DLM lock is being traded back and forth between the
file's writer on A and the file's reader on C. Then communication to
the DLM is blocked. 

When that happens, if the reader (C) happens to own the DLM lock when
it loses DLM communications, the writer will block on DLM, and can't
write a new value. The reader owns the lock, so it keeps reading the
same value over and over.

However, if A happens to own the DLM lock, it does not need
to ask DLM's permission because it owns the lock. Therefore, it goes
on writing. Meanwhile, the other node can't get DLM's permission to
get the lock back, so it hangs.

There's also the problem of the DLM lock "master" which presents
another level of complexity to the mix, but let's not go into that now.

Suffice it to say I think it's working as expected.

Regards,

Bob Peterson
Red Hat File Systems

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster