[ceph-users] Re: Issue with CephFS (mds stuck in clientreplay status) since upgrade to 18.2.0.

2023-11-27 Thread Dan van der Ster
Hi Giuseppe, There are likely one or two clients whose op is blocking the reconnect/replay. If you increase debug_mds perhaps you can find the guilty client and disconnect it / block it from mounting. Or for a more disruptive recovery you can try this "Deny all reconnect to clients " option:

[ceph-users] Re: Issue with CephFS (mds stuck in clientreplay status) since upgrade to 18.2.0.

2023-11-27 Thread Lo Re Giuseppe
Hi David, Thanks a lot for your reply. Yes we have heavy load from clients on the same subtree. We have multiple MDSs that were setup with the hope to distribute the load among them, but this is not really happening, in moments of high load we see most of the load on one MDS. We don't use

[ceph-users] Re: Issue with CephFS (mds stuck in clientreplay status) since upgrade to 18.2.0.

2023-11-27 Thread David C.
Hi Guiseppe, Wouldn't you have clients who heavily load the MDS with concurrent access on the same trees ? Perhaps, also, look at the stability of all your clients (even if there are many) [dmesg -T, ...] How are your 4 active MDS configured (pinning?) ? Probably nothing to do but normal for 2