I hope you don't mind a top post...
I've been testing network partitioning between the only Linux client
I have (5.2 kernel) and a FreeBSD server with the xprtdied.patch
(does soshutdown(..SHUT_WR) when it knows the socket is broken)
applied to it.

I'm not enough of a TCP guy to know if this is useful, but here's what
I see...

While partitioned:
On the FreeBSD server end, the socket either goes to CLOSED during
the network partition or stays ESTABLISHED.
On the Linux end, the socket seems to remain ESTABLISHED for a
little while, and then disappears.

After unpartitioning:
On the FreeBSD server end, you get another socket showing up at
the same port#
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)    
tcp4       0      0 nfsv4-new3.nfsd        nfsv4-linux.678        ESTABLISHED
tcp4       0      0 nfsv4-new3.nfsd        nfsv4-linux.678        CLOSED     

The Linux client shows the same connection ESTABLISHED.
(The mount sometimes reports an error. I haven't looked at packet
 traces to see if it retries RPCs or why the errors occur.)
--> However I never get hangs.
Sometimes it goes to SYN_SENT for a while and the FreeBSD server
shows FIN_WAIT_1, but then both ends go to ESTABLISHED and the
mount starts working again.

The most obvious thing is that the Linux client always keeps using
the same port#. (The FreeBSD client will use a different port# when
it does a TCP reconnect after no response from the NFS server for
a little while.)

What do those TCP conversant think?

rick
ps: I can capture packets while doing this, if anyone has a use
      for them.






________________________________________
From: owner-freebsd-...@freebsd.org <owner-freebsd-...@freebsd.org> on behalf 
of Youssef  GHORBAL <youssef.ghor...@pasteur.fr>
Sent: Saturday, March 27, 2021 6:57 PM
To: Jason Breitman
Cc: Rick Macklem; freebsd-net@freebsd.org
Subject: Re: NFS Mount Hangs

CAUTION: This email originated from outside of the University of Guelph. Do not 
click links or open attachments unless you recognize the sender and know the 
content is safe. If in doubt, forward suspicious emails to ith...@uoguelph.ca




On 27 Mar 2021, at 13:20, Jason Breitman 
<jbreit...@tildenparkcapital.com<mailto:jbreit...@tildenparkcapital.com>> wrote:

The issue happened again so we can say that disabling TSO and LRO on the NIC 
did not resolve this issue.
# ifconfig lagg0 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso
# ifconfig lagg0
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 
1500
        
options=8100b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER>

We can also say that the sysctl settings did not resolve this issue.

# sysctl net.inet.tcp.fast_finwait2_recycle=1
net.inet.tcp.fast_finwait2_recycle: 0 -> 1

# sysctl net.inet.tcp.finwait2_timeout=1000
net.inet.tcp.finwait2_timeout: 60000 -> 1000

I don’t think those will do anything in your case since the FIN_WAIT2 are on 
the client side and those sysctls are for BSD.
By the way it seems that Linux recycles automatically TCP sessions in FIN_WAIT2 
after 60 seconds (sysctl net.ipv4.tcp_fin_timeout)

tcp_fin_timeout (integer; default: 60; since Linux 2.2)
              This specifies how many seconds to wait for a final FIN
              packet before the socket is forcibly closed.  This is
              strictly a violation of the TCP specification, but
              required to prevent denial-of-service attacks.  In Linux
              2.2, the default value was 180.

So I don’t get why it stucks in the FIN_WAIT2 state anyway.

You really need to have a packet capture during the outage (client and server 
side) so you’ll get over the wire chat and start speculating from there.
No need to capture the beginning of the outage for now. All you have to do, is 
run a tcpdump for 10 minutes or so when you notice a client stuck.

* I have not rebooted the NFS Server nor have I restarted nfsd, but do not 
believe that is required as these settings are at the TCP level and I would 
expect new sessions to use the updated settings.

The issue occurred after 5 days following a reboot of the client machines.
I ran the capture information again to make use of the situation.

#!/bin/sh

while true
do
  /bin/date >> /tmp/nfs-hang.log
  /bin/ps axHl | grep nfsd | grep -v grep >> /tmp/nfs-hang.log
  /usr/bin/procstat -kk 2947 >> /tmp/nfs-hang.log
  /usr/bin/procstat -kk 2944 >> /tmp/nfs-hang.log
  /bin/sleep 60
done


On the NFS Server
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4       0      0 NFS.Server.IP.X.2049      NFS.Client.IP.X.48286     
CLOSE_WAIT

On the NFS Client
tcp        0      0 NFS.Client.IP.X:48286      NFS.Server.IP.X:2049       
FIN_WAIT2



You had also asked for the output below.

# nfsstat -E -s
BackChannelCtBindConnToSes
            0            0

# sysctl vfs.nfsd.request_space_throttle_count
vfs.nfsd.request_space_throttle_count: 0

I see that you are testing a patch and I look forward to seeing the results.


Jason Breitman


On Mar 21, 2021, at 6:21 PM, Rick Macklem 
<rmack...@uoguelph.ca<mailto:rmack...@uoguelph.ca>> wrote:

Youssef GHORBAL <youssef.ghor...@pasteur.fr<mailto:youssef.ghor...@pasteur.fr>> 
wrote:
>Hi Jason,
>
>> On 17 Mar 2021, at 18:17, Jason Breitman 
>> <jbreit...@tildenparkcapital.com<mailto:jbreit...@tildenparkcapital.com>> 
>> wrote:
>>
>> Please review the details below and let me know if there is a setting that I 
>> should apply to my FreeBSD NFS Server or if there is a bug fix that I can 
>> apply to resolve my issue.
>> I shared this information with the linux-nfs mailing list and they believe 
>> the issue is on the server side.
>>
>> Issue
>> NFSv4 mounts periodically hang on the NFS Client.
>>
>> During this time, it is possible to manually mount from another NFS Server 
>> on the NFS Client having issues.
>> Also, other NFS Clients are successfully mounting from the NFS Server in 
>> question.
>> Rebooting the NFS Client appears to be the only solution.
>
>I had experienced a similar weird situation with periodically stuck Linux NFS 
>clients >mounting Isilon NFS servers (Isilon is FreeBSD based but they seem to 
>have there >own nfsd)
Yes, my understanding is that Isilon uses a proprietary user space nfsd and
not the kernel based RPC and nfsd in FreeBSD.

>We’ve had better luck and we did manage to have packet captures on both sides 
>>during the issue. The gist of it goes like follows:
>
>- Data flows correctly between SERVER and the CLIENT
>- At some point SERVER starts decreasing it's TCP Receive Window until it 
>reachs 0
>- The client (eager to send data) can only ack data sent by SERVER.
>- When SERVER was done sending data, the client starts sending TCP Window 
>>Probes hoping that the TCP Window opens again so he can flush its buffers.
>- SERVER responds with a TCP Zero Window to those probes.
Having the window size drop to zero is not necessarily incorrect.
If the server is overloaded (has a backlog of NFS requests), it can stop doing
soreceive() on the socket (so the socket rcv buffer can fill up and the TCP 
window
closes). This results in "backpressure" to stop the NFS client from flooding the
NFS server with requests.
--> However, once the backlog is handled, the nfsd should start to soreceive()
again and this shouls cause the window to open back up.
--> Maybe this is broken in the socket/TCP code. I quickly got lost in
tcp_output() when it decides what to do about the rcvwin.

>- After 6 minutes (the NFS server default Idle timeout) SERVER racefully 
>closes the >TCP connection sending a FIN Packet (and still a TCP Window 0)
This probably does not happen for Jason's case, since the 6minute timeout
is disabled when the TCP connection is assigned as a backchannel (most likely
the case for NFSv4.1).

>- CLIENT ACK that FIN.
>- SERVER goes in FIN_WAIT_2 state
>- CLIENT closes its half part part of the socket and goes in LAST_ACK state.
>- FIN is never sent by the client since there still data in its SendQ and 
>receiver TCP >Window is still 0. At this stage the client starts sending TCP 
>Window Probes again >and again hoping that the server opens its TCP Window so 
>it can flush it's buffers >and terminate its side of the socket.
>- SERVER keeps responding with a TCP Zero Window to those probes.
>=> The last two steps goes on and on for hours/days freezing the NFS mount 
>bound >to that TCP session.
>
>If we had a situation where CLIENT was responsible for closing the TCP Window 
>(and >initiating the TCP FIN first) and server wanting to send data we’ll end 
>up in the same >state as you I think.
>
>We’ve never had the root cause of why the SERVER decided to close the TCP 
>>Window and no more acccept data, the fix on the Isilon part was to recycle 
>more >aggressively the FIN_WAIT_2 sockets 
>(net.inet.tcp.fast_finwait2_recycle=1 & >net.inet.tcp.finwait2_timeout=5000). 
>Once the socket recycled and at the next >occurence of CLIENT TCP Window 
>probe, SERVER sends a RST, triggering the >teardown of the session on the 
>client side, a new TCP handchake, etc and traffic >flows again (NFS starts 
>responding)
>
>To avoid rebooting the client (and before the aggressive FIN_WAIT_2 was 
>>implemented on the Isilon side) we’ve added a check script on the client that 
>detects >LAST_ACK sockets on the client and through iptables rule enforces a 
>TCP RST, >Something like: -A OUTPUT -p tcp -d $nfs_server_addr --sport 
>$local_port -j REJECT >--reject-with tcp-reset (the script removes this 
>iptables rule as soon as the LAST_ACK >disappears)
>
>The bottom line would be to have a packet capture during the outage (client 
>and/or >server side), it will show you at least the shape of the TCP exchange 
>when NFS is >stuck.
Interesting story and good work w.r.t. sluething, Youssef, thanks.

I looked at Jason's log and it shows everything is ok w.r.t the nfsd threads.
(They're just waiting for RPC requests.)
However, I do now think I know why the soclose() does not happen.
When the TCP connection is assigned as a backchannel, that takes a reference
cnt on the structure. This refcnt won't be released until the connection is
replaced by a BindConnectiotoSession operation from the client. But that won't
happen until the client creates a new TCP connection.
--> No refcnt release-->no refcnt of 0-->no soclose().

I've created the attached patch (completely different from the previous one)
that adds soshutdown(SHUT_WR) calls in the three places where the TCP
connection is going away. This seems to get it past CLOSE_WAIT without a
soclose().
--> I know you are not comfortable with patching your server, but I do think
this change will get the socket shutdown to complete.

There are a couple more things you can check on the server...
# nfsstat -E -s
--> Look for the count under "BindConnToSes".
--> If non-zero, backchannels have been assigned
# sysctl -a | fgrep request_space_throttle_count
--> If non-zero, the server has been overloaded at some point.

I think the attached patch might work around the problem.
The code that should open up the receive window needs to be checked.
I am also looking at enabling the 6minute timeout when a backchannel is
assigned.

rick

Youssef

_______________________________________________
freebsd-net@freebsd.org<mailto:freebsd-net@freebsd.org> mailing list
https://urldefense.com/v3/__https://lists.freebsd.org/mailman/listinfo/freebsd-net__;!!JFdNOqOXpB6UZW0!_c2MFNbir59GXudWPVdE5bNBm-qqjXeBuJ2UEmFv5OZciLj4ObR_drJNv5yryaERfIbhKR2d$
To unsubscribe, send any mail to 
"freebsd-net-unsubscr...@freebsd.org<mailto:freebsd-net-unsubscr...@freebsd.org>"
<xprtdied.patch>

<nfs-hang.log.gz>

_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to