Per below from the trace file Nov 30 11:13:40 duckseason kernel: [1291756.354728] nfsd_dispatch: vers 4 proc 1 Nov 30 11:13:40 duckseason kernel: [1291756.354731] svc: server 000000007c7e7536, pool 0, transport 000000003fd86d34, inuse=3 Nov 30 11:13:40 duckseason kernel: [1291756.354732] process_renew(6554b87b/4ab45507): starting Nov 30 11:13:40 duckseason kernel: [1291756.354734] svc: tcp_recv 000000003fd86d34 data 1 conn 0 close 0 Nov 30 11:13:40 duckseason kernel: [1291756.354736] svc: socket 000000003fd86d34 recvfrom(0000000003fecffb, 4) = -11 Nov 30 11:13:40 duckseason kernel: [1291756.354737] RPC: TCP recv_record got -11 Nov 30 11:13:40 duckseason kernel: [1291756.354737] RPC: TCP recvfrom got EAGAIN
we can see NFS server return -11 (EAGAIN), which can be executed from from the path, svc_recv -> svc_handle_xprt -> xprt->xpt_ops->xpo_recvfrom svc_tcp_recvfrom -> svc_recvfrom -> sock_recvmsg which probably triggers sock_recvmsg_nosec -> ... -> tcp_recvmsg As mentioned in recvfrom manpage, ERRORS The recvfrom() function shall fail if: EAGAIN or EWOULDBLOCK The socket's file descriptor is marked O_NONBLOCK and no data is waiting to be received; or MSG_OOB is set and no out-of-band data is available and either the socket's file descriptor is marked O_NONBLOCK or the socket does not support blocking to await out-of-band data. I am not sure if 7.3 NFS client opened non-blocking socket and no data on that socket to be read. So I would like to check if 7.3 client sent something different compared with 7.2 client which caused server returned BAD_SEQID to AIX 7.3 client. Please also collect relevant trace log from server side when connecting with 7.2 client, then we can investigate the difference between good one and bad one. If possible, maybe you can try with the latest 5.4 stable (5.4.274) and upstream version (6.9-rc4). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2042363 Title: AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl() on a Ubuntu 20.04 NFSV4 server Status in linux package in Ubuntu: New Bug description: ---Problem Description--- AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl(). NFS server is Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not appear to affect other combinations of NFS client (including AIX 7.2) with this NFS server. The AIX team have indicated that the cause of the EIO is triggered by the NFS server returning a BAD_SEQID error which leads to the AIX NFS client incorrectly zeroing the stateid, which then leads to the NFS server returning a BAD_STATEID error and the NFS client then returns the EIO error. The AIX team would like to understand why the BAD_SEQID has been returned. ---uname output--- Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 2.30GHz ---Steps to Reproduce--- We cannot offer a simple way to recreate the problem as it involves IBM MQ running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 storage. However, we can provide any requested trace or dumps from any or all of the involved machines. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp