[Bug 254244] panics after upgrade to stable/13-n244861-b9773574371
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254244 Richard Scheffenegger changed: What|Removed |Added Resolution|--- |FIXED Status|In Progress |Closed -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: NFS Mount Hangs
> On 18. Mar 2021, at 21:55, Rick Macklem wrote: > > Michael Tuexen wrote: >>> On 18. Mar 2021, at 13:42, Scheffenegger, Richard >>> wrote: >>> > Output from the NFS Client when the issue occurs # netstat -an | grep > NFS.Server.IP.X > tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 > FIN_WAIT2 I'm no TCP guy. Hopefully others might know why the client would be stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack, but could be wrong?) >>> >>> When the client is in Fin-Wait2 this is the state you end up when the >>> Client side actively close() the tcp session, and then the server also >>> ACKed the FIN. > Jason noted: > >> When the issue occurs, this is what I see on the NFS Server. >> tcp4 0 0 NFS.Server.IP.X.2049 NFS.Client.IP.X.51550 >> CLOSE_WAIT >> >> which corresponds to the state on the client side. The server received the >> FIN >> from the client and acked it. >> The server is waiting for a close call to happen. >> So the question is: Is the server also closing the connection? > Did you mean to say "client closing the connection here?" Yes. > > The server should call soclose() { it never calls soshutdown() } when > soreceive(with MSG_WAIT) returns 0 bytes or an error that indicates > the socket is broken. > --> The soreceive() call is triggered by an upcall for the rcv side of the > socket. > So, are you saying the FreeBSD NFS server did not call soclose() for this > case? Yes. If the state at the server side is CLOSE_WAIT, no close call has happened yet. The FIN from the client was received, it was ACKED, but no close() call (or shutdown(..., SHUT_WR) or shutdown(..., SHUT_RDWR)) was issued. Therefore, no FIN was sent and the client should be in the FINWAIT-2 state. This was also reported. So the reported states are consistent. Best regards Michael > > rick > > Best regards > Michael >> This will last for ~2 min or so, but is asynchronous. However, the same >> 4-tuple can not be reused during this time. >> >> With other words, from the socket / TCP, a properly executed active close() >> will end up in this state. (If the other side initiated the close, a passive >> close, will not end in this state) >> >> >> ___ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > > > ___ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Severe IPv6 TCP transfer issues on 13.0-RC1 and RC2
> On 18. Mar 2021, at 21:09, Michael Tuexen wrote: > >> On 15. Mar 2021, at 12:56, Blake Hartshorn >> wrote: >> >> The short version, when I use FreeBSD 13, delivering data can take 5 minutes >> for 1MB over SSH or HTTP when using IPv6. This problem does not happen with >> IPv4. I installed FreeBSD 12 and Linux on that same device, neither had the >> problem. >> >> Did some troubleshooting with Linode, have ultimately ruled the network >> itself out at this point. When the server is on FreeBSD 13, it can download >> quickly over IPv6, but not deliver. Started investigating after noticing my >> SSH session was lagging when cat'ing large files or running builds. This >> problem even occurs between VMs in the same datacenter. I generated a 1MB >> file of base64 garbage served by nginx for testing. IPv6 is being configured >> by SLAAC and on both 12 and 13 installs was setup by the installer. Linode >> uses Linux/KVM hosts for their virtual machines so it's running on that >> virtual adapter. >> >> I asked on the forums, another user recommended going to the mailing lists >> instead. Does anyone know if config settings need to be different on 13? Did >> I maybe just find a real issue? I can provide any requested details. Thanks! > I was able to reproduce the issue locally. A fix is under review: > https://reviews.freebsd.org/D29331 The fix is now committed in main, stable/13, releng/13.0, and will be included in the upcoming RC3. Best regards Michael > > Best regards > Michael >> >> >> ___ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > > ___ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: NFS Mount Hangs
Michael Tuexen wrote: >> On 18. Mar 2021, at 13:42, Scheffenegger, Richard >> wrote: >> Output from the NFS Client when the issue occurs # netstat -an | grep NFS.Server.IP.X tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 FIN_WAIT2 >>> I'm no TCP guy. Hopefully others might know why the client would be stuck >>> in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack, but >>> could be wrong?) >> >> When the client is in Fin-Wait2 this is the state you end up when the Client >> side actively close() the tcp session, and then the server also ACKed the >> FIN. Jason noted: >When the issue occurs, this is what I see on the NFS Server. >tcp4 0 0 NFS.Server.IP.X.2049 NFS.Client.IP.X.51550 >CLOSE_WAIT > >which corresponds to the state on the client side. The server received the FIN >from the client and acked it. >The server is waiting for a close call to happen. >So the question is: Is the server also closing the connection? Did you mean to say "client closing the connection here?" The server should call soclose() { it never calls soshutdown() } when soreceive(with MSG_WAIT) returns 0 bytes or an error that indicates the socket is broken. --> The soreceive() call is triggered by an upcall for the rcv side of the socket. So, are you saying the FreeBSD NFS server did not call soclose() for this case? rick Best regards Michael > This will last for ~2 min or so, but is asynchronous. However, the same > 4-tuple can not be reused during this time. > > With other words, from the socket / TCP, a properly executed active close() > will end up in this state. (If the other side initiated the close, a passive > close, will not end in this state) > > > ___ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 254366] Severe IPv6 TCP transfer issues on 13.0-RC2 with virtio network adapter
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254366 Michael Tuexen changed: What|Removed |Added Status|In Progress |Closed Resolution|--- |FIXED Flags||mfc-stable13+ -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 254366] Severe IPv6 TCP transfer issues on 13.0-RC2 with virtio network adapter
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254366 --- Comment #21 from Michael Tuexen --- This will be in the upcoming RC3. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 254366] Severe IPv6 TCP transfer issues on 13.0-RC2 with virtio network adapter
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254366 --- Comment #20 from commit-h...@freebsd.org --- A commit in branch releng/13.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=11660fa28fd39a644cb7d30a4378cf4751b89f15 commit 11660fa28fd39a644cb7d30a4378cf4751b89f15 Author: Michael Tuexen AuthorDate: 2021-03-18 20:25:47 + Commit: Michael Tuexen CommitDate: 2021-03-18 20:35:49 + vtnet: fix TSO for TCP/IPv6 The decision whether a TCP packet is sent over IPv4 or IPv6 was based on ethertype, which works correctly. In D27926 the criteria was changed to checking if the CSUM_IP_TSO flag is set in the csum-flags and then considering it to be TCP/IPv4. However, the TCP stack sets the flag to CSUM_TSO for IPv4 and IPv6, where CSUM_TSO is defined as CSUM_IP_TSO|CSUM_IP6_TSO. Therefore TCP/IPv6 packets gets mis-classified as TCP/IPv4, which breaks TSO for TCP/IPv6. This patch bases the check again on the ethertype. This fix is instantly MFCed. Approved by:re(gjb) PR: 254366 Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D29331 (cherry picked from commit d4697a6b56168876fc0ffec1a0bb1b24d25b198e) (cherry picked from commit 6064ea8172f078c54954bc6e8865625feb7979fe) sys/dev/virtio/network/if_vtnet.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 254366] Severe IPv6 TCP transfer issues on 13.0-RC2 with virtio network adapter
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254366 --- Comment #19 from commit-h...@freebsd.org --- A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6064ea8172f078c54954bc6e8865625feb7979fe commit 6064ea8172f078c54954bc6e8865625feb7979fe Author: Michael Tuexen AuthorDate: 2021-03-18 20:25:47 + Commit: Michael Tuexen CommitDate: 2021-03-18 20:33:32 + vtnet: fix TSO for TCP/IPv6 The decision whether a TCP packet is sent over IPv4 or IPv6 was based on ethertype, which works correctly. In D27926 the criteria was changed to checking if the CSUM_IP_TSO flag is set in the csum-flags and then considering it to be TCP/IPv4. However, the TCP stack sets the flag to CSUM_TSO for IPv4 and IPv6, where CSUM_TSO is defined as CSUM_IP_TSO|CSUM_IP6_TSO. Therefore TCP/IPv6 packets gets mis-classified as TCP/IPv4, which breaks TSO for TCP/IPv6. This patch bases the check again on the ethertype. This fix is instantly MFCed. Approved by:re(gjb) PR: 254366 Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D29331 (cherry picked from commit d4697a6b56168876fc0ffec1a0bb1b24d25b198e) sys/dev/virtio/network/if_vtnet.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 254366] Severe IPv6 TCP transfer issues on 13.0-RC2 with virtio network adapter
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254366 --- Comment #18 from commit-h...@freebsd.org --- A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d4697a6b56168876fc0ffec1a0bb1b24d25b198e commit d4697a6b56168876fc0ffec1a0bb1b24d25b198e Author: Michael Tuexen AuthorDate: 2021-03-18 20:25:47 + Commit: Michael Tuexen CommitDate: 2021-03-18 20:32:20 + vtnet: fix TSO for TCP/IPv6 The decision whether a TCP packet is sent over IPv4 or IPv6 was based on ethertype, which works correctly. In D27926 the criteria was changed to checking if the CSUM_IP_TSO flag is set in the csum-flags and then considering it to be TCP/IPv4. However, the TCP stack sets the flag to CSUM_TSO for IPv4 and IPv6, where CSUM_TSO is defined as CSUM_IP_TSO|CSUM_IP6_TSO. Therefore TCP/IPv6 packets gets mis-classified as TCP/IPv4, which breaks TSO for TCP/IPv6. This patch bases the check again on the ethertype. This fix will be MFC instantly as discussed with re(gjb). MFC after: instantly PR: 254366 Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D29331 sys/dev/virtio/network/if_vtnet.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Severe IPv6 TCP transfer issues on 13.0-RC1 and RC2
> On 15. Mar 2021, at 12:56, Blake Hartshorn wrote: > > The short version, when I use FreeBSD 13, delivering data can take 5 minutes > for 1MB over SSH or HTTP when using IPv6. This problem does not happen with > IPv4. I installed FreeBSD 12 and Linux on that same device, neither had the > problem. > > Did some troubleshooting with Linode, have ultimately ruled the network > itself out at this point. When the server is on FreeBSD 13, it can download > quickly over IPv6, but not deliver. Started investigating after noticing my > SSH session was lagging when cat'ing large files or running builds. This > problem even occurs between VMs in the same datacenter. I generated a 1MB > file of base64 garbage served by nginx for testing. IPv6 is being configured > by SLAAC and on both 12 and 13 installs was setup by the installer. Linode > uses Linux/KVM hosts for their virtual machines so it's running on that > virtual adapter. > > I asked on the forums, another user recommended going to the mailing lists > instead. Does anyone know if config settings need to be different on 13? Did > I maybe just find a real issue? I can provide any requested details. Thanks! I was able to reproduce the issue locally. A fix is under review: https://reviews.freebsd.org/D29331 Best regards Michael > > > ___ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 254366] Severe IPv6 TCP transfer issues on 13.0-RC2 with virtio network adapter
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254366 --- Comment #17 from Michael Tuexen --- (In reply to Michael Tratz from comment #16) Thanks for reporting back! The path is under review: https://reviews.freebsd.org/D29331 -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 254366] Severe IPv6 TCP transfer issues on 13.0-RC2 with virtio network adapter
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254366 --- Comment #16 from Michael Tratz --- Thank you so much Michael for the quick fix. Everything is back to normal. IPv6 is speedy again. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 254366] Severe IPv6 TCP transfer issues on 13.0-RC2 with virtio network adapter
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254366 Michael Tuexen changed: What|Removed |Added Assignee|b...@freebsd.org|tue...@freebsd.org Status|New |In Progress CC||n...@freebsd.org -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 254333] [tcp] sysctl net.inet.tcp.hostcache.list hangs
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254333 --- Comment #2 from Maxim Shalomikhin --- It will take some time to reproduce the issue. Plese let me now if there any other information I can collect when sysctl hangs again. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: AW: NFS Mount Hangs
> >>Output from the NFS Client when the issue occurs # netstat -an | grep > >>NFS.Server.IP.X > >>tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 > >>FIN_WAIT2 > >I'm no TCP guy. Hopefully others might know why the client would be stuck in > >FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack, but > >could be wrong?) > > When the client is in Fin-Wait2 this is the state you end up when the Client > side actively close() the tcp session, and then the server also ACKed the > FIN. > This will last for ~2 min or so, but is asynchronous. However, the same > 4-tuple can not be reused during this time. I do not think this is the full story. If infact the Client did call close() you are correct, you would end up in FIN_WAIT2 for upto what ever the timeout is set to, IIRC default on that is 60 seconds, further in this case the socket is disconnected from the application and only the kernel has knowledge of it. The socket can stay in FIN_WAIT2 indefanitly if the application calls shutdown(2) on it, and never follows up with a close(2). Which situation exists can be determined by looking at fstat to see if the socket has an associted PID or not, not sure if that works on Linux though. > > With other words, from the socket / TCP, a properly executed active close() > will end up in this state. (If the other side initiated the close, a passive > close, will not end in this state) But only for a brief period. Stuck in this state indicates close(2) was not called, but shutdown(2) was. I believe it is also possible to get sockets stuck in this state when a client side port number reuse colides with a server side socket that is still in s CLOSE_WAIT state, oh wait, that ends up in no response to the SYN, hu... > freebsd-net@freebsd.org mailing list -- Rod Grimes rgri...@freebsd.org ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: NFS Mount Hangs
The laggproto is lacp and the switch is made by Extreme Networks. Jason Breitman On Mar 18, 2021, at 4:06 AM, Gerrit Kuehn wrote: On Wed, 17 Mar 2021 18:17:14 -0400 Jason Breitman wrote: > I will look into disabling the TSO and LRO options and let the group > know how it goes. Below are the current options on the NFS Server. > lagg0: flags=8943 > metric 0 mtu 1500 > options=e507bb What laggproto are you using, and what kind of switch is connected on the other end? cu Gerrit ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: NFS Mount Hangs
> On 18. Mar 2021, at 13:53, Rodney W. Grimes > wrote: > > Note I am NOT a TCP expert, but know enough about it to add a comment... > >> Alan Somers wrote: >> [stuff snipped] >>> Is the 128K limit related to MAXPHYS? If so, it should be greater in 13.0. >> For the client, yes. For the server, no. >> For the server, it is just a compile time constant NFS_SRVMAXIO. >> >> It's mainly related to the fact that I haven't gotten around to testing >> larger >> sizes yet. >> - kern.ipc.maxsockbuf needs to be several times the limit, which means it >> would >> have to increase for 1Mbyte. >> - The session code must negotiate a maximum RPC size > 1 Mbyte. >> (I think the server code does do this, but it needs to be tested.) >> And, yes, the client is limited to MAXPHYS. >> >> Doing this is on my todo list, rick >> >> The client should acquire the attributes that indicate that and set >> rsize/wsize >> to that. "# nfsstat -m" on the client should show you what the client >> is actually using. If it is larger than 128K, set both rsize and wsize to >> 128K. >> >>> Output from the NFS Client when the issue occurs >>> # netstat -an | grep NFS.Server.IP.X >>> tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 >>> FIN_WAIT2 >> I'm no TCP guy. Hopefully others might know why the client would be >> stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack, >> but could be wrong?) > > The most common way to get stuck in FIN_WAIT2 is to call > shutdown(2) on a socket, but never following up with a > close(2) after some timeout period. The "client" is still > connected to the socket and can stay in this shutdown state > for ever, the kernel well not reap the socket as it is > associated with a processes, aka not orphaned. I suspect > that the Linux client has a corner condition that is leading > to this socket leak. > > If on the Linux client you can look at the sockets to see > if these are still associated with a process, ala fstat or > what ever Linux tool does this that would be helpfull. > If they are infact connected to a process it is that > process that must call close(2) to clean these up. > > IIRC the server side socket would be gone at this point > and there is nothing the server can do that would allow > a FIN_WAIT2 to close down. Jason reported that the server is in CLOSE-WAIT. This would mean the the server received the FIN, ACKed it, but has not initiated the teardown of the Server->Client direction. So the server side socket is still there and close has not be called yet. > > The real TCP experts can now correct my 30 year old TCP > stack understanding... I wouldn't count myself as a real TCP expert, but the behaviour hasn't changed in the last 30 years, I think... Best regards Michael > >> >>> # cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info >>> netid: tcp >>> addr: NFS.Server.IP.X >>> port: 2049 >>> state: 0x51 >>> >>> syslog >>> Mar 4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status -client- >>> --rqstp- ->timeout ---ops-- >>> Mar 4 10:29:27 hostname kernel: [437414.133158] 57419 40a1 0 9b723c73 >>> >143cfadf3 4ca953b5 nfsv4 OPEN_NOATTR a:call_connect_status >>> [sunrpc] >q:xprt_pending >> I don't know what OPEN_NOATTR means, but I assume it is some variant >> of NFSv4 Open operation. >> [stuff snipped] >>> Mar 4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 >>> xprt_connect_status: >connect attempt timed out >>> Mar 4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 >>> call_connect_status >>> (status -110) >> I have no idea what status -110 means? >>> Mar 4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout >>> (major) >>> Mar 4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind >>> (status 0) >>> Mar 4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect >>> xprt >e061831b is not connected >>> Mar 4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect >>> xprt >e061831b is not connected >>> Mar 4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 >>> xprt_connect_status: >connect attempt timed out >>> Mar 4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 >>> call_connect_status >(status -110) >>> Mar 4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout >>> (minor) >>> Mar 4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind >>> (status 0) >>> Mar 4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect >>> xprt >e061831b is not connected >>> Mar 4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect >>> xprt >e061831b is not connected >> Is it possible that the client is trying to (re)connect using the same >> client port#? >> I would normally expect the client to create a new TCP connection using a >> different client port# and then retry the outstanding RPCs. >> --> Capturing packets when this happens would show us what is going on.
Re: NFS Mount Hangs
> On 18. Mar 2021, at 13:42, Scheffenegger, Richard > wrote: > >>> Output from the NFS Client when the issue occurs # netstat -an | grep >>> NFS.Server.IP.X >>> tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 >>> FIN_WAIT2 >> I'm no TCP guy. Hopefully others might know why the client would be stuck in >> FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack, but >> could be wrong?) > > When the client is in Fin-Wait2 this is the state you end up when the Client > side actively close() the tcp session, and then the server also ACKed the > FIN. Jason noted: When the issue occurs, this is what I see on the NFS Server. tcp4 0 0 NFS.Server.IP.X.2049 NFS.Client.IP.X.51550 CLOSE_WAIT which corresponds to the state on the client side. The server received the FIN from the client and acked it. The server is waiting for a close call to happen. So the question is: Is the server also closing the connection? Best regards Michael > This will last for ~2 min or so, but is asynchronous. However, the same > 4-tuple can not be reused during this time. > > With other words, from the socket / TCP, a properly executed active close() > will end up in this state. (If the other side initiated the close, a passive > close, will not end in this state) > > > ___ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: NFS Mount Hangs
Note I am NOT a TCP expert, but know enough about it to add a comment... > Alan Somers wrote: > [stuff snipped] > >Is the 128K limit related to MAXPHYS? If so, it should be greater in 13.0. > For the client, yes. For the server, no. > For the server, it is just a compile time constant NFS_SRVMAXIO. > > It's mainly related to the fact that I haven't gotten around to testing larger > sizes yet. > - kern.ipc.maxsockbuf needs to be several times the limit, which means it > would >have to increase for 1Mbyte. > - The session code must negotiate a maximum RPC size > 1 Mbyte. >(I think the server code does do this, but it needs to be tested.) > And, yes, the client is limited to MAXPHYS. > > Doing this is on my todo list, rick > > The client should acquire the attributes that indicate that and set > rsize/wsize > to that. "# nfsstat -m" on the client should show you what the client > is actually using. If it is larger than 128K, set both rsize and wsize to > 128K. > > >Output from the NFS Client when the issue occurs > ># netstat -an | grep NFS.Server.IP.X > >tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 > >FIN_WAIT2 > I'm no TCP guy. Hopefully others might know why the client would be > stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack, > but could be wrong?) The most common way to get stuck in FIN_WAIT2 is to call shutdown(2) on a socket, but never following up with a close(2) after some timeout period. The "client" is still connected to the socket and can stay in this shutdown state for ever, the kernel well not reap the socket as it is associated with a processes, aka not orphaned. I suspect that the Linux client has a corner condition that is leading to this socket leak. If on the Linux client you can look at the sockets to see if these are still associated with a process, ala fstat or what ever Linux tool does this that would be helpfull. If they are infact connected to a process it is that process that must call close(2) to clean these up. IIRC the server side socket would be gone at this point and there is nothing the server can do that would allow a FIN_WAIT2 to close down. The real TCP experts can now correct my 30 year old TCP stack understanding... > > ># cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info > >netid: tcp > >addr: NFS.Server.IP.X > >port: 2049 > >state: 0x51 > > > >syslog > >Mar 4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status -client- > >--rqstp- ->timeout ---ops-- > >Mar 4 10:29:27 hostname kernel: [437414.133158] 57419 40a1 0 9b723c73 > >>143cfadf3 4ca953b5 nfsv4 OPEN_NOATTR a:call_connect_status [sunrpc] > >>q:xprt_pending > I don't know what OPEN_NOATTR means, but I assume it is some variant > of NFSv4 Open operation. > [stuff snipped] > >Mar 4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 > >xprt_connect_status: >connect attempt timed out > >Mar 4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 > >call_connect_status > >(status -110) > I have no idea what status -110 means? > >Mar 4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout > >(major) > >Mar 4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind > >(status 0) > >Mar 4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect > >xprt >e061831b is not connected > >Mar 4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect > >xprt >e061831b is not connected > >Mar 4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 > >xprt_connect_status: >connect attempt timed out > >Mar 4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 > >call_connect_status >(status -110) > >Mar 4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout > >(minor) > >Mar 4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind > >(status 0) > >Mar 4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect > >xprt >e061831b is not connected > >Mar 4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect > >xprt >e061831b is not connected > Is it possible that the client is trying to (re)connect using the same client > port#? > I would normally expect the client to create a new TCP connection using a > different client port# and then retry the outstanding RPCs. > --> Capturing packets when this happens would show us what is going on. > > If there is a problem on the FreeBSD end, it is most likely a broken > network device driver. > --> Try disabling TSO , LRO. > --> Try a different driver for the net hardware on the server. > --> Try a different net chip on the server. > If you can capture packets when (not after) the hang > occurs, then you can look at them in wireshark and see > what is actually happening. (Ideally on both client and > server, to check that your network hasn't dropped anything.) > --> I know, if the hangs aren't easily reproducible, this isn't > easily done. > --> Try a
AW: NFS Mount Hangs
>>Output from the NFS Client when the issue occurs # netstat -an | grep >>NFS.Server.IP.X >>tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 >>FIN_WAIT2 >I'm no TCP guy. Hopefully others might know why the client would be stuck in >FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack, but could >be wrong?) When the client is in Fin-Wait2 this is the state you end up when the Client side actively close() the tcp session, and then the server also ACKed the FIN. This will last for ~2 min or so, but is asynchronous. However, the same 4-tuple can not be reused during this time. With other words, from the socket / TCP, a properly executed active close() will end up in this state. (If the other side initiated the close, a passive close, will not end in this state) ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: NFS Mount Hangs
On Wed, 17 Mar 2021 18:17:14 -0400 Jason Breitman wrote: > I will look into disabling the TSO and LRO options and let the group > know how it goes. Below are the current options on the NFS Server. > lagg0: flags=8943 > metric 0 mtu 1500 > options=e507bb What laggproto are you using, and what kind of switch is connected on the other end? cu Gerrit ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"