Per below from the trace file

Nov 30 11:13:40 duckseason kernel: [1291756.354728] nfsd_dispatch: vers 4 proc 1
Nov 30 11:13:40 duckseason kernel: [1291756.354731] svc: server 
000000007c7e7536, pool 0, transport 000000003fd86d34, inuse=3
Nov 30 11:13:40 duckseason kernel: [1291756.354732] 
process_renew(6554b87b/4ab45507): starting
Nov 30 11:13:40 duckseason kernel: [1291756.354734] svc: tcp_recv 
000000003fd86d34 data 1 conn 0 close 0
Nov 30 11:13:40 duckseason kernel: [1291756.354736] svc: socket 
000000003fd86d34 recvfrom(0000000003fecffb, 4) = -11
Nov 30 11:13:40 duckseason kernel: [1291756.354737] RPC: TCP recv_record got -11
Nov 30 11:13:40 duckseason kernel: [1291756.354737] RPC: TCP recvfrom got EAGAIN

we can see NFS server return -11 (EAGAIN), which can be executed from
from the path,

svc_recv -> svc_handle_xprt
            -> xprt->xpt_ops->xpo_recvfrom
               svc_tcp_recvfrom
               -> svc_recvfrom
                  -> sock_recvmsg which probably triggers sock_recvmsg_nosec -> 
... -> tcp_recvmsg

As mentioned in recvfrom manpage,

ERRORS
       The recvfrom() function shall fail if:
       EAGAIN or EWOULDBLOCK
              The socket's file descriptor is marked O_NONBLOCK and no data is
              waiting  to  be  received;  or MSG_OOB is set and no out-of-band
              data is available and either the  socket's  file  descriptor  is
              marked  O_NONBLOCK  or  the  socket does not support blocking to
              await out-of-band data.

I am not sure if 7.3 NFS client opened non-blocking socket and no data on that 
socket to be read. 
So I would like to check if 7.3 client sent something different compared with 
7.2 client which caused server returned BAD_SEQID to AIX 7.3 client.

Please also collect relevant trace log from server side when connecting
with 7.2 client, then we can investigate the difference between good one
and bad one.

If possible, maybe you can try with the latest 5.4 stable (5.4.274) and
upstream version (6.9-rc4).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to