Sorry, I can't distinguish which parts of logs in the attachments
(#comment11, #comment12 and #comment13) are belong to the connection
from working 7.2 and non-working 7.3. All the attachments have "TCP
recvfrom got EAGAIN" which should from the connection for 7.3.

$ grep "TCP recvfrom got EAGAIN" 
syslog_16042024_amaliada_primary_adamsongrunter_partner_both_aix73_part1.log 
-r|wc -l
213127
$ grep "TCP recvfrom got EAGAIN" 
syslog_16042024_amaliada_primary_adamsongrunter_partner_both_aix73_part2.log 
-r|wc -l
226005
$ grep "TCP recvfrom got EAGAIN" 
syslog_17042024_adia_primary_amberjack_partner_both_aix72.log -r|wc -l
20233


May I suggest to collect those logs in two separated files? One from 7.2 and 
another from 7.3 instead of mix them together.

Not an network expert, but I see some NFS RENEW ops packets between
9.20.32.85 (server) and 9.20.120.127 (7.2 client) in
tcp_dump17_04_2024_09H_10M, but no such RENEW packets for 9.20.32.85
(server) and 9.20.120.112 (7.3 client) in tcpdump16_04_2024_14H_03M.
Given NFS4 is a stateful fs which is based on leases, without client
send an operation to renew the lease, it is possible for server to
return EAGAIN. And please check if 7.3 client is not same as 7.2 client
regarding lease renewing.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to