Per below from the trace file
Nov 30 11:13:40 duckseason kernel: [1291756.354728] nfsd_dispatch: vers 4 proc 1
Nov 30 11:13:40 duckseason kernel: [1291756.354731] svc: server
000000007c7e7536, pool 0, transport 000000003fd86d34, inuse=3
Nov 30 11:13:40 duckseason kernel: [1291756.354732]
process_renew(6554b87b/4ab45507): starting
Nov 30 11:13:40 duckseason kernel: [1291756.354734] svc: tcp_recv
000000003fd86d34 data 1 conn 0 close 0
Nov 30 11:13:40 duckseason kernel: [1291756.354736] svc: socket
000000003fd86d34 recvfrom(0000000003fecffb, 4) = -11
Nov 30 11:13:40 duckseason kernel: [1291756.354737] RPC: TCP recv_record got -11
Nov 30 11:13:40 duckseason kernel: [1291756.354737] RPC: TCP recvfrom got EAGAIN
we can see NFS server return -11 (EAGAIN), which can be executed from
from the path,
svc_recv -> svc_handle_xprt
-> xprt->xpt_ops->xpo_recvfrom
svc_tcp_recvfrom
-> svc_recvfrom
-> sock_recvmsg which probably triggers sock_recvmsg_nosec ->
... -> tcp_recvmsg
As mentioned in recvfrom manpage,
ERRORS
The recvfrom() function shall fail if:
EAGAIN or EWOULDBLOCK
The socket's file descriptor is marked O_NONBLOCK and no data is
waiting to be received; or MSG_OOB is set and no out-of-band
data is available and either the socket's file descriptor is
marked O_NONBLOCK or the socket does not support blocking to
await out-of-band data.
I am not sure if 7.3 NFS client opened non-blocking socket and no data on that
socket to be read.
So I would like to check if 7.3 client sent something different compared with
7.2 client which caused server returned BAD_SEQID to AIX 7.3 client.
Please also collect relevant trace log from server side when connecting
with 7.2 client, then we can investigate the difference between good one
and bad one.
If possible, maybe you can try with the latest 5.4 stable (5.4.274) and
upstream version (6.9-rc4).
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363
Title:
AIX 7.3 NFS client frequently returns an EIO error to an application
when reading or writing to a file that has been locked with fcntl() on
a Ubuntu 20.04 NFSV4 server
Status in linux package in Ubuntu:
New
Bug description:
---Problem Description---
AIX 7.3 NFS client frequently returns an EIO error to an application when
reading or writing to a file that has been locked with fcntl(). NFS server is
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not
appear to affect other combinations of NFS client (including AIX 7.2) with this
NFS server.
The AIX team have indicated that the cause of the EIO is triggered by the NFS
server returning a BAD_SEQID error which leads to the AIX NFS client
incorrectly zeroing the stateid, which then leads to the NFS server returning a
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team
would like to understand why the BAD_SEQID has been returned.
---uname output---
Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC
2023 x86_64 x86_64 x86_64 GNU/Linux
Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @
2.30GHz
---Steps to Reproduce---
We cannot offer a simple way to recreate the problem as it involves IBM MQ
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4
storage.
However, we can provide any requested trace or dumps from any or all
of the involved machines.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp