On 11/06/2015 06:57 PM, Niels de Vos wrote: > On Fri, Nov 06, 2015 at 07:47:35AM -0500, Soumya Koduri wrote: >> Hi, >> >> In a 2-node nfs-ganesha cluster setup, we have noticed that after >> couple of iterations of failover & failback of the IP between those >> nodes, client I/O gets stuck. We have observed this in RHEL 7.1 >> environments (not sure about RHEL 6). While debugging I see that, the >> node which takes over Virtual IP(after couple of iterations) doesn't >> respond(acknowledge) to the client's TCP SYN packet. >> >> Found couple of discussions around it in few forums and I tried tuning >> certain TCP parameters (tcp_timestamp, tcp_window_scaling) as >> mentioned in there. But it did not work. The current work-around we >> are left with (to resume the I/Os) is either >> * restart nfs-ganesha service on the node which has taken over IP, to >> clear the existing established TCP connections. Or >> * failback the IP by getting the original node back online to resume >> the I/O. >> >> Any ideas on what could be have been the reason for TCP ACK not being >> sent to the TCP SYN packet coming on an existing connection in >> ESTABLISHED state? Any pointers on how to fix that? > > CTDB has a function to "tickle" connections. This facilitates a faster > fail-over if the client does not detect it needs to re-connect. We > possibly need to do something like this for pacemaker/ganesha too. > > Some details about it can be found here: > > > https://github.com/samba-team/samba/commit/a104d1d8237979ae9fc5dd332e6624c0392be1d0 > > Pacemaker does seem to have something like that too, but documentation > about it is difficult to find: > > http://linux-ha.org/doc/man-pages/re-ra-portblock.html > Thanks Niels. I shall check out this parameter too.
-Soumya > HTH, > Niels > ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
