[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-06-09 Thread Emmanuel Jaep
Hi Eugen, thanks for the response! :-) We have (kind of) solved the problem immediately at hand. The whole process was stuck because the MDSes were actually getting 'killed'. In fact, the amount of RAM we allocated to the MDSes was insufficient to accommodate the logs' complete replay. Therefore,

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-06-08 Thread Eugen Block
Hi, sorry for not responing earlier. Pardon my ignorance, I'm not quite sure I know what you mean by subtree pinning. I quickly googled it and saw it was a new feature in Luminous. We are running Pacific. I would assume this feature was not out yet. Luminous is older than Pacific, so the

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-25 Thread achhen
Hi Emmanuel, regarding stopping state. We had a similar issue. see subject: MDS Upgrade from 17.2.5 to 17.2.6 not possible​ We solved this by failing the MDS, which was in the stop state, but I don't know if that's a good idea in general. What does the log of the mds (stopping) shows? We

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-25 Thread Emmanuel Jaep
Hi Eugen, Also, do you know why you use a multi-active MDS setup? To be completely candid, I don't really know why this choice was made. I assume the goal was to provide fault-tolerance and load-balancing. Was that a requirement for subtree pinning (otherwise multiple active daemons would

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-25 Thread Emmanuel Jaep
Hi Wes, thanks for the heads-up. Best, Emmanuel On Wed, May 24, 2023 at 5:47 PM Wesley Dillingham wrote: > There was a memory issue with standby-replay that may have been resolved > since and fix is in 16.2.10 (not sure), the suggestion at the time was to > avoid standby-replay. > > Perhaps

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Wesley Dillingham
There was a memory issue with standby-replay that may have been resolved since and fix is in 16.2.10 (not sure), the suggestion at the time was to avoid standby-replay. Perhaps a dev can chime in on that status. Your MDSs look pretty inactive. I would consider scaling them down (potentially to

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Eugen Block
Hi, using standby-replay daemons is something to test as it can have a negative impact, it really depends on the actual workload. We stopped using standby-replay in all clusters we (help) maintain, in one specific case with many active MDSs and a high load the failover time decreased and

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Emmanuel Jaep
So I guess, I'll end up doing: ceph fs set cephfs max_mds 4 ceph fs set cephfs allow_standby_replay true On Wed, May 24, 2023 at 4:13 PM Hector Martin wrote: > Hi, > > On 24/05/2023 22.02, Emmanuel Jaep wrote: > > Hi Hector, > > > > thank you very much for the detailed explanation and link to

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Hector Martin
Hi, On 24/05/2023 22.02, Emmanuel Jaep wrote: > Hi Hector, > > thank you very much for the detailed explanation and link to the > documentation. > > Given our current situation (7 active MDSs and 1 standby MDS): > RANK STATE MDS ACTIVITY DNSINOS DIRS CAPS > 0

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Emmanuel Jaep
Hi Hector, thank you very much for the detailed explanation and link to the documentation. Given our current situation (7 active MDSs and 1 standby MDS): RANK STATE MDS ACTIVITY DNSINOS DIRS CAPS 0active icadmin012 Reqs: 82 /s 2345k 2288k 97.2k 307k 1

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Hector Martin
On 24/05/2023 21.15, Emmanuel Jaep wrote: > Hi, > > we are currently running a ceph fs cluster at the following version: > MDS version: ceph version 16.2.10 > (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable) > > The cluster is composed of 7 active MDSs and 1 standby MDS: > RANK STATE

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Hector Martin
On 24/05/2023 21.15, Emmanuel Jaep wrote: > Hi, > > we are currently running a ceph fs cluster at the following version: > MDS version: ceph version 16.2.10 > (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable) > > The cluster is composed of 7 active MDSs and 1 standby MDS: > RANK STATE