Hi, In a 2-node nfs-ganesha cluster setup, we have noticed that after couple of iterations of failover & failback of the IP between those nodes, client I/O gets stuck. We have observed this in RHEL 7.1 environments (not sure about RHEL 6). While debugging I see that, the node which takes over Virtual IP(after couple of iterations) doesn't respond(acknowledge) to the client's TCP SYN packet.
Found couple of discussions around it in few forums and I tried tuning certain TCP parameters (tcp_timestamp, tcp_window_scaling) as mentioned in there. But it did not work. The current work-around we are left with (to resume the I/Os) is either * restart nfs-ganesha service on the node which has taken over IP, to clear the existing established TCP connections. Or * failback the IP by getting the original node back online to resume the I/O. Any ideas on what could be have been the reason for TCP ACK not being sent to the TCP SYN packet coming on an existing connection in ESTABLISHED state? Any pointers on how to fix that? Thanks, Soumya ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
