Re: [ClusterLabs] Failing over NFSv4/TCP exports
Hi, > -Original Message- > From: Andreas Kurz [mailto:andreas.k...@gmail.com] > Sent: mercredi, 17 août 2016 23:16 > To: Cluster Labs - All topics related to open-source clustering welcomed >> Subject: Re: [ClusterLabs] Failing over NFSv4/TCP exports > > This is a known problem ... have a look into the portblock RA - it has > the feature to send out TCP tickle ACKs to reset such hanging sessions. > So you can configure a portblock resource that blocks the tcp port > before starting the VIP and another portblock resource that unblocks the > port afterwards and sends out that tickle ACKs. Thanks Andreas for pointing me to the portblock RA. I wasn't aware of it and will read/test. I also made some further testing using ESXi and I found out that the ESXi NFS client behaves in a completely different way when compared to the Linux client and at first sight it actually seems to work (where the Linux client fails). It's mainly due to 2 things: 1) Their NFS client is much more aggressive in terms of monitoring the server and restarting sessions. 2) Every new TCP session comes from a different source port compared to the Linux client which seems to stick to a single source port. This actually solves the issue of failing back to a node with FIN_WAIT1 sessions. Regards, Patrick ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. "postmas...@navixia.com" Navixia SA ** ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Failing over NFSv4/TCP exports
Dear list (sorry for the rather long e-mail), I'm looking for someone who has successfully implemented the "exportfs" RA with NFSv4 over TCP (and is willing to share some information). The final goal is to present NFS datastores to ESXi over 2 "head" nodes. Both nodes must be active in the sense that they both have an NFS server running but they export different file systems (via exports and floating IPAddr2). When moving an export to another node, we move the entire "filesystem/export/ipaddr" stack but we keep the NFS server running (as it might potentially be exporting some other file systems via other IPs). Both nodes are sharing disks (JBOD for physical and shared VMDKs for testing). Disks are only accessed by a single "head" node at any given time so a clustered file system is not required. To my knowledge, this setup has been best described by Florian Haas over there: https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html (except we're not using DRBD and LVM) Before going into more details, I mention that I have already read all those posts and examples as well as many of the NFS related questions in this list for the past year or so. http://wiki.linux-nfs.org/wiki/index.php/Nfsd4_server_recovery http://wiki.linux-nfs.org/wiki/index.php/NFS_Recovery_and_Client_Migration http://oss.clusterlabs.org/pipermail/pacemaker/2011-July/011000.html https://access.redhat.com/solutions/42868 I'm forced to use TCP because of ESXi and I'm willing to use NFSv4 because ESXi can use "session trunking" or some sort of "multipath" with version 4 (not tested yet) The problem I see is what a lot of people have already mentioned: Failover works nicely but failback takes a very long time. Many posts mention putting /var/lib/nfs on a shared disk but this only makes sense when we failover an entire NFS server (compared to just exports). Moreover, I don't see any relevant information written to /var/lib/nfs when a single Linux NFSv4 client is mounting a folder. NFSv4 LEASE and GRACE time have been reduced to 10 seconds. I'm using the exportfs RA parameter "wait_for_leasetime_on_stop=true". >From my investigation, the problem actually happens at the TCP level. Let's >describe the most basic scenario, ie a single filesystem moving from node1 to >node2 and back. I first start the NFS servers using a clone resource. Node1 then starts a group that mounts a file system, adds it to the export list (exportfs RA) and adds a floating IP. I then mount this folder from a Linux NFS client. When I "migrate" my group out of node1, everything correctly moves to node2. IPAddr2:stop, then the exportfs "stop" action takes about 12 seconds (10 seconds LEASE time plus the rest) and my file system gets unmounted. During that time, I see the NFS client trying to talk to the floating IP (on its node1 MAC address). Once everything has moved to node2, the client sends TCP packets to the new MAC address and node2 replies with a TCP RESET. At this point, the client restarts a NEW TCP session and it works fine. However, on node 1, I can still see an ESTABLISHED TCP session between the client and the floating IP on port 2049 (NFS), even though the IP is gone. After a short time, the session moves to FIN_WAIT1 and stays there for a while. When I then "unmigrate" my group to node1 I see the same behavior except that node1 is *not* sending TCP RESETS because it still has a TCP session with the client. I imagine that the sequence numbers do not match so node1 simply doesn't reply at all. It then takes several minutes for the client to give up and restart a new NFS session. Does anyone have an idea about how to handle this problem ? I have done this with iSCSI where we can explicitly "kill" sessions but I don't think NFS has something similar. I also don't see anything in the IPAddr2 RA that would help in killing TCP sessions while removing a floating IP. Next ideas would be to either tune the TCP stack in order to reduce the FIN_WAIT1 state or to synchronize sessions between the nodes (using conntrackd). That just seems an overkill. Thanks for any input! Patrick ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. "postmas...@navixia.com" Navixia SA ** ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Triggered assert at xml.c:594
Replying to myself, This seems to be related to the latest drbd RA (8.9.4+). Still, is it something we should worry about ? Regards! -Original Message- From: Patrick Zwahlen [mailto:p...@navixia.com] Sent: samedi, 13 février 2016 15:47 To: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Subject: [ClusterLabs] Triggered assert at xml.c:594 Hi, Short: I'm getting asserts in my logs and wonder if I should worry. Long: I'm Running a lab on CentOS 7.2: pacemaker-1.1.13-10.el7.x86_64 corosync-2.3.4-7.el7_2.1.x86_64 Since my latest "yum update", I see the following errors in my logs: Feb 13 15:22:54 san1.local crmd[1896]:error: pcmkRegisterNode: Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE I see these logs during cluster start and when moving DRBD resources from one node to the other. Everything seems to work, though. Strange thing is, I didn't update any pacemaker related RPMs: Feb 13 14:35:02 Updated: 1:openssl-libs-1.0.1e-51.el7_2.2.x86_64 Feb 13 14:35:02 Updated: openssh-6.6.1p1-23.el7_2.x86_64 Feb 13 14:35:02 Updated: 32:bind-license-9.9.4-29.el7_2.2.noarch Feb 13 14:35:02 Updated: 32:bind-libs-9.9.4-29.el7_2.2.x86_64 Feb 13 14:35:02 Updated: nss-3.19.1-19.el7_2.x86_64 Feb 13 14:35:02 Updated: nss-sysinit-3.19.1-19.el7_2.x86_64 Feb 13 14:35:03 Updated: 1:grub2-tools-2.02-0.34.el7.centos.x86_64 Feb 13 14:35:03 Updated: 1:grub2-2.02-0.34.el7.centos.x86_64 Feb 13 14:35:03 Updated: nss-tools-3.19.1-19.el7_2.x86_64 Feb 13 14:35:03 Updated: 32:bind-utils-9.9.4-29.el7_2.2.x86_64 Feb 13 14:35:04 Updated: 32:bind-libs-lite-9.9.4-29.el7_2.2.x86_64 Feb 13 14:35:04 Updated: openssh-clients-6.6.1p1-23.el7_2.x86_64 Feb 13 14:35:04 Updated: openssh-server-6.6.1p1-23.el7_2.x86_64 Feb 13 14:35:04 Updated: 1:openssl-1.0.1e-51.el7_2.2.x86_64 Feb 13 14:35:04 Updated: ntpdate-4.2.6p5-22.el7.centos.1.x86_64 Feb 13 14:35:04 Updated: python-perf-3.10.0-327.4.5.el7.x86_64 Feb 13 14:35:04 Updated: gnutls-3.3.8-14.el7_2.x86_64 Feb 13 14:35:10 Updated: tzdata-2016a-1.el7.noarch Feb 13 14:35:10 Installed: kernel-3.10.0-327.4.5.el7.x86_64 Could the new kernel be the reason for those asserts ? Thanks for your input on this one. - Patrick - ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. "postmas...@navixia.com" Navixia SA ** ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Triggered assert at xml.c:594
Hi, Short: I'm getting asserts in my logs and wonder if I should worry. Long: I'm Running a lab on CentOS 7.2: pacemaker-1.1.13-10.el7.x86_64 corosync-2.3.4-7.el7_2.1.x86_64 Since my latest "yum update", I see the following errors in my logs: Feb 13 15:22:54 san1.local crmd[1896]:error: pcmkRegisterNode: Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE I see these logs during cluster start and when moving DRBD resources from one node to the other. Everything seems to work, though. Strange thing is, I didn't update any pacemaker related RPMs: Feb 13 14:35:02 Updated: 1:openssl-libs-1.0.1e-51.el7_2.2.x86_64 Feb 13 14:35:02 Updated: openssh-6.6.1p1-23.el7_2.x86_64 Feb 13 14:35:02 Updated: 32:bind-license-9.9.4-29.el7_2.2.noarch Feb 13 14:35:02 Updated: 32:bind-libs-9.9.4-29.el7_2.2.x86_64 Feb 13 14:35:02 Updated: nss-3.19.1-19.el7_2.x86_64 Feb 13 14:35:02 Updated: nss-sysinit-3.19.1-19.el7_2.x86_64 Feb 13 14:35:03 Updated: 1:grub2-tools-2.02-0.34.el7.centos.x86_64 Feb 13 14:35:03 Updated: 1:grub2-2.02-0.34.el7.centos.x86_64 Feb 13 14:35:03 Updated: nss-tools-3.19.1-19.el7_2.x86_64 Feb 13 14:35:03 Updated: 32:bind-utils-9.9.4-29.el7_2.2.x86_64 Feb 13 14:35:04 Updated: 32:bind-libs-lite-9.9.4-29.el7_2.2.x86_64 Feb 13 14:35:04 Updated: openssh-clients-6.6.1p1-23.el7_2.x86_64 Feb 13 14:35:04 Updated: openssh-server-6.6.1p1-23.el7_2.x86_64 Feb 13 14:35:04 Updated: 1:openssl-1.0.1e-51.el7_2.2.x86_64 Feb 13 14:35:04 Updated: ntpdate-4.2.6p5-22.el7.centos.1.x86_64 Feb 13 14:35:04 Updated: python-perf-3.10.0-327.4.5.el7.x86_64 Feb 13 14:35:04 Updated: gnutls-3.3.8-14.el7_2.x86_64 Feb 13 14:35:10 Updated: tzdata-2016a-1.el7.noarch Feb 13 14:35:10 Installed: kernel-3.10.0-327.4.5.el7.x86_64 Could the new kernel be the reason for those asserts ? Thanks for your input on this one. - Patrick - ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. "postmas...@navixia.com" Navixia SA ** ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org