Re: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss

2016-04-15 Thread David Teigland
> > However, on some occasions, I observe that node A continues in the loop
> > believing that it is successfully writing to the file

node A has the exclusive lock, so it continues writing...

> > but, according to
> > node C, the file stops being updated. (Meanwhile, the file written by
> > node B continues to be up-to-date as read by C.) This is concerning --
> > it looks like I/O writes are being completed on node A even though other
> > nodes in the cluster cannot see the results.

Is node C blocked trying to read the file A is writing?  That what we'd
expect until recovery has removed node A.  Or are C's reads completing
while A continues writing the file?  That would not be correct.

> However, if A happens to own the DLM lock, it does not need
> to ask DLM's permission because it owns the lock. Therefore, it goes
> on writing. Meanwhile, the other node can't get DLM's permission to
> get the lock back, so it hangs.

The description sounds like C might not be hanging in read as we'd expect
while A continues writing.  If that's the case, then it implies that dlm
recovery has been completed by nodes B and C (removing A), which allows
the lock to be granted to C for reading.  If dlm recovery on B/C has
completed, it means that A should have been fenced, so A should not be
able to write once C is given the lock.

Dave

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss

2016-04-15 Thread Bob Peterson
- Original Message -
> Dear linux-cluster,
> 
> I have made some observations about the behaviour of gfs2 and would
> appreciate confirmation of whether this is expected behaviour or
> something has gone wrong.
> 
> I have a three-node cluster -- let's call the nodes A, B and C. On each
> of nodes A and B, I have a loop that repeatedly writes an increasing
> integer value to a file in the GFS2-mountpoint. On node C, I have a loop
> that reads from both these files from the GFS2-mountpoint. The reads on
> node C show the latest values written by A and B, and stay up-to-date.
> All good so far.
> 
> I then cause node A to drop the corosync heartbeat by executing the
> following on node A:
> 
> iptables -I INPUT -p udp --dport 5404 -j DROP
> iptables -I INPUT -p udp --dport 5405 -j DROP
> iptables -I INPUT -p tcp --dport 21064 -j DROP
> 
> After a few seconds, I normally observe that all I/O to the GFS2
> filesystem hangs forever on node A: the latest value read by node C is
> the same as the last successful write by node A. This is exactly the
> behaviour I want -- I want to be sure that node A never completes I/O
> that is not able to be seen by other nodes.
> 
> However, on some occasions, I observe that node A continues in the loop
> believing that it is successfully writing to the file but, according to
> node C, the file stops being updated. (Meanwhile, the file written by
> node B continues to be up-to-date as read by C.) This is concerning --
> it looks like I/O writes are being completed on node A even though other
> nodes in the cluster cannot see the results.
> 
> I performed this test 20 times, rebooting node A between each, and saw
> the "I/O hanging" behaviour 16 times and the "I/O appears to continue"
> behaviour 4 times. I couldn't see anything that might cause it to
> sometimes adopt one behaviour and sometimes the other.
> 
> So... is this expected? Should I be able to rely upon I/O hanging? Or
> have I misconfigured something? Advice would be appreciated.
> 
> Thanks,
> Jonathan

Hi Jonathan,

This seems like expected behavior to me. It probably all goes back to
whatever node "masters" the glock and the node that "owns" the glock,
when communications are lost.

In your test, the DLM lock is being traded back and forth between the
file's writer on A and the file's reader on C. Then communication to
the DLM is blocked. 

When that happens, if the reader (C) happens to own the DLM lock when
it loses DLM communications, the writer will block on DLM, and can't
write a new value. The reader owns the lock, so it keeps reading the
same value over and over.

However, if A happens to own the DLM lock, it does not need
to ask DLM's permission because it owns the lock. Therefore, it goes
on writing. Meanwhile, the other node can't get DLM's permission to
get the lock back, so it hangs.

There's also the problem of the DLM lock "master" which presents
another level of complexity to the mix, but let's not go into that now.

Suffice it to say I think it's working as expected.

Regards,

Bob Peterson
Red Hat File Systems

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss

2016-04-15 Thread Jonathan Davies

Dear linux-cluster,

I have made some observations about the behaviour of gfs2 and would 
appreciate confirmation of whether this is expected behaviour or 
something has gone wrong.


I have a three-node cluster -- let's call the nodes A, B and C. On each 
of nodes A and B, I have a loop that repeatedly writes an increasing 
integer value to a file in the GFS2-mountpoint. On node C, I have a loop 
that reads from both these files from the GFS2-mountpoint. The reads on 
node C show the latest values written by A and B, and stay up-to-date. 
All good so far.


I then cause node A to drop the corosync heartbeat by executing the 
following on node A:


iptables -I INPUT -p udp --dport 5404 -j DROP
iptables -I INPUT -p udp --dport 5405 -j DROP
iptables -I INPUT -p tcp --dport 21064 -j DROP

After a few seconds, I normally observe that all I/O to the GFS2 
filesystem hangs forever on node A: the latest value read by node C is 
the same as the last successful write by node A. This is exactly the 
behaviour I want -- I want to be sure that node A never completes I/O 
that is not able to be seen by other nodes.


However, on some occasions, I observe that node A continues in the loop 
believing that it is successfully writing to the file but, according to 
node C, the file stops being updated. (Meanwhile, the file written by 
node B continues to be up-to-date as read by C.) This is concerning -- 
it looks like I/O writes are being completed on node A even though other 
nodes in the cluster cannot see the results.


I performed this test 20 times, rebooting node A between each, and saw 
the "I/O hanging" behaviour 16 times and the "I/O appears to continue" 
behaviour 4 times. I couldn't see anything that might cause it to 
sometimes adopt one behaviour and sometimes the other.


So... is this expected? Should I be able to rely upon I/O hanging? Or 
have I misconfigured something? Advice would be appreciated.


Thanks,
Jonathan

Notes:
 * The I/O from node A uses an fd that is O_DIRECT|O_SYNC, so the page 
cache is not involved.


 * Versions: corosync 2.3.4, dlm_controld 4.0.2, gfs2 as per RHEL 7.2.

 * I don't see anything particularly useful being logged. Soon after I 
insert the iptables rules on node A, I see the following on node A:


2016-04-15T14:15:45.608175+00:00 localhost corosync[3074]:  [TOTEM ] The 
token was lost in the OPERATIONAL state.
2016-04-15T14:15:45.608191+00:00 localhost corosync[3074]:  [TOTEM ] A 
processor failed, forming new configuration.
2016-04-15T14:15:45.608198+00:00 localhost corosync[3074]:  [TOTEM ] 
entering GATHER state from 2(The token was lost in the OPERATIONAL state.).


Around the time node C sees the output from node A stop changing, node A 
reports:


2016-04-15T14:15:58.388404+00:00 localhost corosync[3074]:  [TOTEM ] 
entering GATHER state from 0(consensus timeout).


 * corosync.conf:

totem {
  version: 2
  secauth: off
  cluster_name: 1498d523
  transport: udpu
  token_retransmits_before_loss_const: 10
  token: 1
}

logging {
  debug: on
}

quorum {
  provider: corosync_votequorum
}

nodelist {
  node {
ring0_addr: 10.220.73.6
  }
  node {
ring0_addr: 10.220.73.7
  }
  node {
ring0_addr: 10.220.73.3
  }
}

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster