Hello everyone,

I’m troubleshooting an issue on Rocky Linux 8.10 where an MGS mount appears to succeed (exit code 0) but the server unmounts a few seconds later due to MGC request timeouts. Because of this, MDT/OST targets cannot register with the MGS afterwards and I can't get a working filesystem.

At first I thought this was related to switching from ldiskfs to ZFS (OpenZFS DKMS), because the problem started after installing ZFS DKMS (from the Lustre repo) and rebuilding modules. However, I reproduced the same behavior even when using the kmod-based Lustre packages and also when trying ldiskfs again, so I'm a bit lost on what could have caused the issue.

I'm running Lustre 2.15.7 on Rocky 8.10, and this behaviour happens on the (only) MGS/MDS node. To reproduce the issue, I only need to format the MGT and then mount it normally. The mount command returns success, but shortly after that the server unmounts automatically.

Example output in dmesg:

Lustre: 236012:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1768852634/real 1768852634]  req@000000001ae72b11 x1854776314691712/t0(0) o251->MGC10.0.0.4@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1768852640 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'

Lustre: server umount MGS complete

A couple things I've tried:

 * lnetctl ping <mgs_ip>@tcp works
 * lctl list_nids shows the correct NID
 * lnetctl net show shows the TCP NI on the correct interface and the
   loopback
 * lnetctl ping <mgs_ip>@tcp from another host works
 * Port 988 is listening and open
 * Disabling firewalld does not change anything
 * SELinux is disabled
 * Removed all Lustre, zfs, kmod and dkms packages, rebuilt initramfs,
   changed to stock kernel and back to custom Lustre kernel,
   reinstalled all packages, etc.

However nothing worked and I can't explain the issue nor why I can't even mount through regular ldiskfs anymore.

Does anyone know the cause behind this issue and what could I do to fix it? My last resort would be reinstalling the OS and starting from scratch but I would very much prefer not to do that. This is a testing environment so I don't mind having to reformat, recreate or reinstall anything.

Thank you very much in advance.

Santiago

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to