On Fri, Nov 06, 2015 at 07:47:35AM -0500, Soumya Koduri wrote: > Hi, > > In a 2-node nfs-ganesha cluster setup, we have noticed that after > couple of iterations of failover & failback of the IP between those > nodes, client I/O gets stuck. We have observed this in RHEL 7.1 > environments (not sure about RHEL 6). While debugging I see that, the > node which takes over Virtual IP(after couple of iterations) doesn't > respond(acknowledge) to the client's TCP SYN packet. > > Found couple of discussions around it in few forums and I tried tuning > certain TCP parameters (tcp_timestamp, tcp_window_scaling) as > mentioned in there. But it did not work. The current work-around we > are left with (to resume the I/Os) is either > * restart nfs-ganesha service on the node which has taken over IP, to > clear the existing established TCP connections. Or > * failback the IP by getting the original node back online to resume > the I/O. > > Any ideas on what could be have been the reason for TCP ACK not being > sent to the TCP SYN packet coming on an existing connection in > ESTABLISHED state? Any pointers on how to fix that?
CTDB has a function to "tickle" connections. This facilitates a faster fail-over if the client does not detect it needs to re-connect. We possibly need to do something like this for pacemaker/ganesha too. Some details about it can be found here: https://github.com/samba-team/samba/commit/a104d1d8237979ae9fc5dd332e6624c0392be1d0 Pacemaker does seem to have something like that too, but documentation about it is difficult to find: http://linux-ha.org/doc/man-pages/re-ra-portblock.html HTH, Niels ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
