Re: [ClusterLabs] Failing over NFSv4/TCP exports

2016-08-18 Thread Patrick Zwahlen
Hi,

> -Original Message-
> From: Andreas Kurz [mailto:andreas.k...@gmail.com]
> Sent: mercredi, 17 août 2016 23:16
> To: Cluster Labs - All topics related to open-source clustering welcomed
> 
> Subject: Re: [ClusterLabs] Failing over NFSv4/TCP exports
> 
> This is a known problem ... have a look into the portblock RA - it has
> the feature to send out TCP tickle ACKs to reset such hanging sessions.
> So you can configure a portblock resource that blocks the tcp port
> before starting the VIP and another portblock resource that unblocks the
> port afterwards and sends out that tickle ACKs.

Thanks Andreas for pointing me to the portblock RA. I wasn't aware of it and 
will read/test.

I also made some further testing using ESXi and I found out that the ESXi NFS 
client behaves in a completely different way when compared to the Linux client 
and at first sight it actually seems to work (where the Linux client fails).

It's mainly due to 2 things:

1) Their NFS client is much more aggressive in terms of monitoring the server 
and restarting sessions.

2) Every new TCP session comes from a different source port compared to the 
Linux client which seems to stick to a single source port. This actually solves 
the issue of failing back to a node with FIN_WAIT1 sessions.

Regards, Patrick

**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager. "postmas...@navixia.com"  Navixia SA
**
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Failing over NFSv4/TCP exports

2016-08-17 Thread Patrick Zwahlen
Dear list (sorry for the rather long e-mail),

I'm looking for someone who has successfully implemented the "exportfs" RA with 
NFSv4 over TCP (and is willing to share some information).

The final goal is to present NFS datastores to ESXi over 2 "head" nodes. Both 
nodes must be active in the sense that they both have an NFS server running but 
they export different file systems (via exports and floating IPAddr2). 

When moving an export to another node, we move the entire 
"filesystem/export/ipaddr" stack but we keep the NFS server running (as it 
might potentially be exporting some other file systems via other IPs).

Both nodes are sharing disks (JBOD for physical and shared VMDKs for testing). 
Disks are only accessed by a single "head" node at any given time so a 
clustered file system is not required.

To my knowledge, this setup has been best described by Florian Haas over there:
https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html
(except we're not using DRBD and LVM)

Before going into more details, I mention that I have already read all those 
posts and examples as well as many of the NFS related questions in this list 
for the past year or so.

http://wiki.linux-nfs.org/wiki/index.php/Nfsd4_server_recovery
http://wiki.linux-nfs.org/wiki/index.php/NFS_Recovery_and_Client_Migration
http://oss.clusterlabs.org/pipermail/pacemaker/2011-July/011000.html
https://access.redhat.com/solutions/42868

I'm forced to use TCP because of ESXi and I'm willing to use NFSv4 because ESXi 
can use "session trunking" or some sort of "multipath" with version 4 (not 
tested yet)

The problem I see is what a lot of people have already mentioned: Failover 
works nicely but failback takes a very long time. Many posts mention putting 
/var/lib/nfs on a shared disk but this only makes sense when we failover an 
entire NFS server (compared to just exports). Moreover, I don't see any 
relevant information written to /var/lib/nfs when a single Linux NFSv4 client 
is mounting a folder.

NFSv4 LEASE and GRACE time have been reduced to 10 seconds. I'm using the 
exportfs RA parameter "wait_for_leasetime_on_stop=true".

>From my investigation, the problem actually happens at the TCP level. Let's 
>describe the most basic scenario, ie a single filesystem moving from node1 to 
>node2 and back.

I first start the NFS servers using a clone resource. Node1 then starts a group 
that mounts a file system, adds it to the export list (exportfs RA) and adds a 
floating IP.

I then mount this folder from a Linux NFS client.

When I "migrate" my group out of node1, everything correctly moves to node2. 
IPAddr2:stop, then the exportfs "stop" action takes about 12 seconds (10 
seconds LEASE time plus the rest) and my file system gets unmounted. During 
that time, I see the NFS client trying to talk to the floating IP (on its node1 
MAC address). Once everything has moved to node2, the client sends TCP packets 
to the new MAC address and node2 replies with a TCP RESET. At this point, the 
client restarts a NEW TCP session and it works fine.

However, on node 1, I can still see an ESTABLISHED TCP session between the 
client and the floating IP on port 2049 (NFS), even though the IP is gone. 
After a short time, the session moves to FIN_WAIT1 and stays there for a while.

When I then "unmigrate" my group to node1 I see the same behavior except that 
node1 is *not* sending TCP RESETS because it still has a TCP session with the 
client. I imagine that the sequence numbers do not match so node1 simply 
doesn't reply at all. It then takes several minutes for the client to give up 
and restart a new NFS session.

Does anyone have an idea about how to handle this problem ? I have done this 
with iSCSI where we can explicitly "kill" sessions but I don't think NFS has 
something similar. I also don't see anything in the IPAddr2 RA that would help 
in killing TCP sessions while removing a floating IP.

Next ideas would be to either tune the TCP stack in order to reduce the 
FIN_WAIT1 state or to synchronize sessions between the nodes (using 
conntrackd). That just seems an overkill.

Thanks for any input! Patrick


**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager. "postmas...@navixia.com"  Navixia SA
**

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Triggered assert at xml.c:594

2016-02-14 Thread Patrick Zwahlen
Replying to myself,

This seems to be related to the latest drbd RA (8.9.4+).

Still, is it something we should worry about ?

Regards!

-Original Message-
From: Patrick Zwahlen [mailto:p...@navixia.com] 
Sent: samedi, 13 février 2016 15:47
To: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
Subject: [ClusterLabs] Triggered assert at xml.c:594

Hi,

Short:
I'm getting asserts in my logs and wonder if I should worry.

Long:
I'm Running a lab on CentOS 7.2:
  pacemaker-1.1.13-10.el7.x86_64
  corosync-2.3.4-7.el7_2.1.x86_64

Since my latest "yum update", I see the following errors in my logs:

Feb 13 15:22:54 san1.local crmd[1896]:error: pcmkRegisterNode: Triggered 
assert at xml.c:594 : node->type == XML_ELEMENT_NODE

I see these logs during cluster start and when moving DRBD resources from one 
node to the other. Everything seems to work, though.

Strange thing is, I didn't update any pacemaker related RPMs:

Feb 13 14:35:02 Updated: 1:openssl-libs-1.0.1e-51.el7_2.2.x86_64
Feb 13 14:35:02 Updated: openssh-6.6.1p1-23.el7_2.x86_64
Feb 13 14:35:02 Updated: 32:bind-license-9.9.4-29.el7_2.2.noarch
Feb 13 14:35:02 Updated: 32:bind-libs-9.9.4-29.el7_2.2.x86_64
Feb 13 14:35:02 Updated: nss-3.19.1-19.el7_2.x86_64
Feb 13 14:35:02 Updated: nss-sysinit-3.19.1-19.el7_2.x86_64
Feb 13 14:35:03 Updated: 1:grub2-tools-2.02-0.34.el7.centos.x86_64
Feb 13 14:35:03 Updated: 1:grub2-2.02-0.34.el7.centos.x86_64
Feb 13 14:35:03 Updated: nss-tools-3.19.1-19.el7_2.x86_64
Feb 13 14:35:03 Updated: 32:bind-utils-9.9.4-29.el7_2.2.x86_64
Feb 13 14:35:04 Updated: 32:bind-libs-lite-9.9.4-29.el7_2.2.x86_64
Feb 13 14:35:04 Updated: openssh-clients-6.6.1p1-23.el7_2.x86_64
Feb 13 14:35:04 Updated: openssh-server-6.6.1p1-23.el7_2.x86_64
Feb 13 14:35:04 Updated: 1:openssl-1.0.1e-51.el7_2.2.x86_64
Feb 13 14:35:04 Updated: ntpdate-4.2.6p5-22.el7.centos.1.x86_64
Feb 13 14:35:04 Updated: python-perf-3.10.0-327.4.5.el7.x86_64
Feb 13 14:35:04 Updated: gnutls-3.3.8-14.el7_2.x86_64
Feb 13 14:35:10 Updated: tzdata-2016a-1.el7.noarch
Feb 13 14:35:10 Installed: kernel-3.10.0-327.4.5.el7.x86_64

Could the new kernel be the reason for those asserts ?

Thanks for your input on this one. - Patrick -

**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager. "postmas...@navixia.com"  Navixia SA
**

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Triggered assert at xml.c:594

2016-02-13 Thread Patrick Zwahlen
Hi,

Short:
I'm getting asserts in my logs and wonder if I should worry.

Long:
I'm Running a lab on CentOS 7.2:
  pacemaker-1.1.13-10.el7.x86_64
  corosync-2.3.4-7.el7_2.1.x86_64

Since my latest "yum update", I see the following errors in my logs:

Feb 13 15:22:54 san1.local crmd[1896]:error: pcmkRegisterNode: Triggered 
assert at xml.c:594 : node->type == XML_ELEMENT_NODE

I see these logs during cluster start and when moving DRBD resources from one 
node to the other. Everything seems to work, though.

Strange thing is, I didn't update any pacemaker related RPMs:

Feb 13 14:35:02 Updated: 1:openssl-libs-1.0.1e-51.el7_2.2.x86_64
Feb 13 14:35:02 Updated: openssh-6.6.1p1-23.el7_2.x86_64
Feb 13 14:35:02 Updated: 32:bind-license-9.9.4-29.el7_2.2.noarch
Feb 13 14:35:02 Updated: 32:bind-libs-9.9.4-29.el7_2.2.x86_64
Feb 13 14:35:02 Updated: nss-3.19.1-19.el7_2.x86_64
Feb 13 14:35:02 Updated: nss-sysinit-3.19.1-19.el7_2.x86_64
Feb 13 14:35:03 Updated: 1:grub2-tools-2.02-0.34.el7.centos.x86_64
Feb 13 14:35:03 Updated: 1:grub2-2.02-0.34.el7.centos.x86_64
Feb 13 14:35:03 Updated: nss-tools-3.19.1-19.el7_2.x86_64
Feb 13 14:35:03 Updated: 32:bind-utils-9.9.4-29.el7_2.2.x86_64
Feb 13 14:35:04 Updated: 32:bind-libs-lite-9.9.4-29.el7_2.2.x86_64
Feb 13 14:35:04 Updated: openssh-clients-6.6.1p1-23.el7_2.x86_64
Feb 13 14:35:04 Updated: openssh-server-6.6.1p1-23.el7_2.x86_64
Feb 13 14:35:04 Updated: 1:openssl-1.0.1e-51.el7_2.2.x86_64
Feb 13 14:35:04 Updated: ntpdate-4.2.6p5-22.el7.centos.1.x86_64
Feb 13 14:35:04 Updated: python-perf-3.10.0-327.4.5.el7.x86_64
Feb 13 14:35:04 Updated: gnutls-3.3.8-14.el7_2.x86_64
Feb 13 14:35:10 Updated: tzdata-2016a-1.el7.noarch
Feb 13 14:35:10 Installed: kernel-3.10.0-327.4.5.el7.x86_64

Could the new kernel be the reason for those asserts ?

Thanks for your input on this one. - Patrick -

**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager. "postmas...@navixia.com"  Navixia SA
**

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org