[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-24 Thread Jan Marquardt
Hi Pablo,

> We are willing to work with a Ceph Consultant Specialist, because the data
> at stage is very critical, so if you're interested please let me know
> off-list, to discuss the details.

I totally understand that you want to communicate with potential consultants 
off-list, 
but I, and many others I’d guess, would appreciate if you let us know the 
outcome,
e.g. if and how you were able to recover all your data, a post-mortem as far as 
you 
might disclose it, etc. 

Best Regards

Jan

signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Test after list GC

2024-06-24 Thread Anthony D'Atri
Here’s a test after de-crufting held messages.  Grok the fullness.

— aad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Replacing SSD disk(metadata, rocksdb) which are associated with HDDs(osd block)

2024-06-24 Thread TaekSoo Lim
Hi all, I have configured a ceph cluster to be used as an object store with a 
combination of SSDs and HDDs, where the block.db is stored on LVM on SSDs and 
the OSD block is stored on HDDs. 
I have set up one SSD for storing metadata (rocksDB), and five HDDs are 
associated with it to store the OSD blocks. 
During a disk replacement test, if one SSD that stores the block.db fails, all 
five associated OSDs go down. 
When replacing and recovering the failed SSD, it seems necessary to reconfigure 
the five OSDs, which takes too long to recover data and has significant 
performance impact. 
So here are two questions:

- Is it common practice to configure SSDs with block.db and associate them with 
five HDD disks to store OSD blocks when using eight SSDs and forty HDDs? 
  Or would it be better to only store the rgw index on SSDs? I am also curious 
about the difference in performance between these configurations.
- If SSDs are configured with block.db as described above, will it be necessary 
to reinstall the five associated OSDs (HDDs) if one SSD fails ? 
  Also, is there a way to reconstruct the metadata on the newly replaced SSD 
from the remaining five intact HDDs? 

As a novice in ceph clusters, I seek advice from experienced users. 
Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Incomplete PGs. Ceph Consultant Wanted

2024-06-24 Thread cellosof...@gmail.com
Hi community!

Recently we had a major outage in production and after running the
automated ceph recovery, some PGs remain in "incomplete" state, and IO
operations are blocked.

Searching in documentation, forums, and this mailing list archive, I
haven't found yet if this means this data is recoverable or not. We don't
have any "unknown" objects or PGs, so I believe this is somehow an
intermediate stage where we have to tell ceph which version of the objects
to recover from.

We are willing to work with a Ceph Consultant Specialist, because the data
at stage is very critical, so if you're interested please let me know
off-list, to discuss the details.

Thanks in advance

Best Regards
Pablo
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-06-24 Thread Patrick Donnelly
On Mon, Jun 24, 2024 at 5:22 PM Dietmar Rieder
 wrote:
>
> (resending this, the original message seems that it didn't make it through 
> between all the SPAM recently sent to the list, my apologies if it doubles at 
> some point)
>
> Hi List,
>
> we are still struggeling to get our cephfs back online again, this is an 
> update to inform you what we did so far, and we kindly ask for any input on 
> this to get an idea on how to proceed:
>
> After resetting the journals Xiubo suggested (in a PM) to go on with the 
> disaster recovery procedure:
>
> cephfs-data-scan init skipped creating the inodes 0x0x1 and 0x0x100
>
> [root@ceph01-b ~]# cephfs-data-scan init
> Inode 0x0x1 already exists, skipping create.  Use --force-init to overwrite 
> the existing object.
> Inode 0x0x100 already exists, skipping create.  Use --force-init to overwrite 
> the existing object.
>
> We did not use --force-init and proceeded with scan_extents using a single 
> worker, which was indeed very slow.
>
> After ~24h we interupted the scan_extents and restarted it with 32 workers 
> which went through in about 2h15min w/o any issue.
>
> Then I started scan_inodes with 32 workers this was also finished after 
> ~50min no output on stderr or stdout.
>
> I went on with scan_links, which after ~45 minutes threw the following error:
>
> # cephfs-data-scan scan_links
> Error ((2) No such file or directory)

Not sure what this indicates necessarily. You can try to get more
debug information using:

[client]
  debug mds = 20
  debug ms = 1
  debug client = 20

in the local ceph.conf for the node running cephfs-data-scan.

> then "cephfs-data-scan cleanup" went through w/o any message and took about 
> 9hrs 20min.
>
> Unfortunately, when starting the MDS the cephfs seems still to be in damage. 
> I get quite some "loaded already corrupt dentry:" messages and 2  "[ERR] : 
> bad backtrace on directory inode"  errors:

The "corrupt dentry" message is erroneous and fixed already (backports
in flight).

> (In the following log I removed almost all "loaded already corrupt dentry" 
> entries, for clarity reasons)
>
> 2024-06-23T08:06:20.934+ 7ff05728fb00  0 set uid:gid to 167:167 
> (ceph:ceph)
> 2024-06-23T08:06:20.934+ 7ff05728fb00  0 ceph version 18.2.2 
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, 
> pid 2
> 2024-06-23T08:06:20.934+ 7ff05728fb00  1 main not setting numa affinity
> 2024-06-23T08:06:20.934+ 7ff05728fb00  0 pidfile_write: ignore empty 
> --pid-file
> 2024-06-23T08:06:20.936+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
> Updating MDS map to version 8062 from mon.0
> 2024-06-23T08:06:21.583+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
> Updating MDS map to version 8063 from mon.0
> 2024-06-23T08:06:21.583+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
> Monitors have assigned me to become a standby.
> 2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
> Updating MDS map to version 8064 from mon.0
> 2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.0.8064 handle_mds_map i am 
> now mds.0.8064
> 2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.0.8064 handle_mds_map state 
> change up:standby --> up:replay
> 2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.0.8064 replay_start
> 2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.0.8064  waiting for osdmap 
> 34327 (which blocklists prior instance)
> 2024-06-23T08:06:21.627+ 7ff0452b9700  0 mds.0.cache creating system 
> inode with ino:0x100
> 2024-06-23T08:06:21.627+ 7ff0452b9700  0 mds.0.cache creating system 
> inode with ino:0x1
> 2024-06-23T08:06:21.636+ 7ff0442b7700  1 mds.0.journal EResetJournal
> 2024-06-23T08:06:21.636+ 7ff0442b7700  1 mds.0.sessionmap wipe start
> 2024-06-23T08:06:21.636+ 7ff0442b7700  1 mds.0.sessionmap wipe result
> 2024-06-23T08:06:21.636+ 7ff0442b7700  1 mds.0.sessionmap wipe done
> 2024-06-23T08:06:21.656+ 7ff045aba700  1 mds.0.8064 Finished replaying 
> journal
> 2024-06-23T08:06:21.656+ 7ff045aba700  1 mds.0.8064 making mds journal 
> writeable
> 2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
> Updating MDS map to version 8065 from mon.0
> 2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.0.8064 handle_mds_map i am 
> now mds.0.8064
> 2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.0.8064 handle_mds_map state 
> change up:replay --> up:reconnect
> 2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.0.8064 reconnect_start
> 2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.0.8064 reopen_log
> 2024-06-23T08:06:22.605+ 7ff04bac6700  1 mds.0.8064 reconnect_done
> 2024-06-23T08:06:23.605+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
> Updating MDS map to version 8066 from mon.0
> 2024-06-23T08:06:23.605+ 7ff04bac6700  1 mds.0.8064 handle_mds_map i am 
> now mds.0.8064
> 2024-06-23T08:06:23.605+ 7ff04bac6700  1 mds.0.8064 handle_mds_map state 
> change up:reconnect --> up:rejoin
> 2024-06-23T08:06:

[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-06-24 Thread Dietmar Rieder
(resending this, the original message seems that it didn't make it through 
between all the SPAM recently sent to the list, my apologies if it doubles at 
some point)

Hi List, 

we are still struggeling to get our cephfs back online again, this is an update 
to inform you what we did so far, and we kindly ask for any input on this to 
get an idea on how to proceed:

After resetting the journals Xiubo suggested (in a PM) to go on with the 
disaster recovery procedure:

cephfs-data-scan init skipped creating the inodes 0x0x1 and 0x0x100

[root@ceph01-b ~]# cephfs-data-scan init
Inode 0x0x1 already exists, skipping create.  Use --force-init to overwrite the 
existing object.
Inode 0x0x100 already exists, skipping create.  Use --force-init to overwrite 
the existing object.

We did not use --force-init and proceeded with scan_extents using a single 
worker, which was indeed very slow.

After ~24h we interupted the scan_extents and restarted it with 32 workers 
which went through in about 2h15min w/o any issue.

Then I started scan_inodes with 32 workers this was also finished after ~50min 
no output on stderr or stdout.

I went on with scan_links, which after ~45 minutes threw the following error:

# cephfs-data-scan scan_links
Error ((2) No such file or directory)

then "cephfs-data-scan cleanup" went through w/o any message and took about 
9hrs 20min.

Unfortunately, when starting the MDS the cephfs seems still to be in damage. I 
get quite some "loaded already corrupt dentry:" messages and 2  "[ERR] : bad 
backtrace on directory inode"  errors:


(In the following log I removed almost all "loaded already corrupt dentry" 
entries, for clarity reasons)

2024-06-23T08:06:20.934+ 7ff05728fb00  0 set uid:gid to 167:167 (ceph:ceph)
2024-06-23T08:06:20.934+ 7ff05728fb00  0 ceph version 18.2.2 
(531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, pid 
2
2024-06-23T08:06:20.934+ 7ff05728fb00  1 main not setting numa affinity
2024-06-23T08:06:20.934+ 7ff05728fb00  0 pidfile_write: ignore empty 
--pid-file
2024-06-23T08:06:20.936+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
Updating MDS map to version 8062 from mon.0
2024-06-23T08:06:21.583+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
Updating MDS map to version 8063 from mon.0
2024-06-23T08:06:21.583+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
Monitors have assigned me to become a standby.
2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
Updating MDS map to version 8064 from mon.0
2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.0.8064 handle_mds_map i am now 
mds.0.8064
2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.0.8064 handle_mds_map state 
change up:standby --> up:replay
2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.0.8064 replay_start
2024-06-23T08:06:21.604+ 7ff04bac6700  1 mds.0.8064  waiting for osdmap 
34327 (which blocklists prior instance)
2024-06-23T08:06:21.627+ 7ff0452b9700  0 mds.0.cache creating system inode 
with ino:0x100
2024-06-23T08:06:21.627+ 7ff0452b9700  0 mds.0.cache creating system inode 
with ino:0x1
2024-06-23T08:06:21.636+ 7ff0442b7700  1 mds.0.journal EResetJournal
2024-06-23T08:06:21.636+ 7ff0442b7700  1 mds.0.sessionmap wipe start
2024-06-23T08:06:21.636+ 7ff0442b7700  1 mds.0.sessionmap wipe result
2024-06-23T08:06:21.636+ 7ff0442b7700  1 mds.0.sessionmap wipe done
2024-06-23T08:06:21.656+ 7ff045aba700  1 mds.0.8064 Finished replaying 
journal
2024-06-23T08:06:21.656+ 7ff045aba700  1 mds.0.8064 making mds journal 
writeable
2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
Updating MDS map to version 8065 from mon.0
2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.0.8064 handle_mds_map i am now 
mds.0.8064
2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.0.8064 handle_mds_map state 
change up:replay --> up:reconnect
2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.0.8064 reconnect_start
2024-06-23T08:06:22.604+ 7ff04bac6700  1 mds.0.8064 reopen_log
2024-06-23T08:06:22.605+ 7ff04bac6700  1 mds.0.8064 reconnect_done
2024-06-23T08:06:23.605+ 7ff04bac6700  1 mds.default.cephmon-01.cepqjp 
Updating MDS map to version 8066 from mon.0
2024-06-23T08:06:23.605+ 7ff04bac6700  1 mds.0.8064 handle_mds_map i am now 
mds.0.8064
2024-06-23T08:06:23.605+ 7ff04bac6700  1 mds.0.8064 handle_mds_map state 
change up:reconnect --> up:rejoin
2024-06-23T08:06:23.605+ 7ff04bac6700  1 mds.0.8064 rejoin_start
2024-06-23T08:06:23.609+ 7ff04bac6700  1 mds.0.8064 rejoin_joint_start
2024-06-23T08:06:23.611+ 7ff045aba700  1 mds.0.cache.den(0x100 
groups) loaded already corrupt dentry: [dentry #0x1/data/groups [bf,head] 
rep@0.0 NULL (dversion lock) pv=0 v=
7910497 ino=(nil) state=0 0x55cf13e9f400]
2024-06-23T08:06:23.611+ 7ff045aba700  1 mds.0.cache.den(0x1000192ec16.1* 
scad_prj) loaded already corrupt dentry: [dentry #0x1/home/scad_prj [159,head] 
rep@0.0 NULL (dve

[ceph-users] Re: Lot of spams on the list

2024-06-24 Thread Anthony D'Atri
I’m not sure if I have access but I can try.

> On Jun 24, 2024, at 4:37 PM, Kai Stian Olstad  wrote:
> 
> On 24.06.2024 19:15, Anthony D'Atri wrote:
>> * Subscription is now moderated
>> * The three worst spammers (you know who they are) have been removed
>> * I’ve deleted tens of thousands of crufty mail messages from the queue
>> The list should work normally now.  Working on the backlog of held messages. 
>>  99% are bogus, but I want to be careful wrt baby and bathwater.
> 
> Will the archive[1] also be clean up?
> 
> [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/
> 
> -- 
> Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lot of spams on the list

2024-06-24 Thread Kai Stian Olstad

On 24.06.2024 19:15, Anthony D'Atri wrote:

* Subscription is now moderated
* The three worst spammers (you know who they are) have been removed
* I’ve deleted tens of thousands of crufty mail messages from the queue

The list should work normally now.  Working on the backlog of held 
messages.  99% are bogus, but I want to be careful wrt baby and 
bathwater.


Will the archive[1] also be clean up?

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] June User + Dev Monthly Meeting [Recording Available]

2024-06-24 Thread Noah Lehman
Hi Ceph users,

A recording of June's user + developer monthly meeting is now available.
Thank you to everyone who participated, asked questions, and shared
insights. Your feedback is crucial to the growth and health of the ceph
community!

Watch it here: https://youtu.be/7D9otll-kjA?feature=shared

Best,

Noah
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon

On 24/06/2024 20:49, Matthew Vernon wrote:

On 19/06/2024 19:45, Adam King wrote:
I think this is at least partially a code bug in the rgw module. Where 


...the code path seems to have a bunch of places it might raise an 
exception; are those likely to result in some entry in a log-file? I've 


Ah, I do now find:

2024-06-24T17:33:26.880065+00:00 moss-be2001 ceph-mgr[129346]: [rgw 
ERROR root] Non-zero return from ['radosgw-admin', '-k', 
'/var/lib/ceph/mgr/ceph-moss-be2001.qvwcaq/keyring', '-n', 
'mgr.moss-be2001.qvwcaq', 'realm', 'pull', '--url', 
'https://apus.svc.eqiad.wmnet:443', '--access-key', 'REDACTED', 
'--secret', 'REDACTED', '--rgw-realm', 'apus']: request failed: (5) 
Input/output error


EIO is an odd sort of error [doesn't sound very network-y], and I don't 
think I see any corresponding request in the radosgw logs in the primary 
zone. From the CLI outside the container I can do e.g. curl 
https://apus.svc.eqiad.wmnet/ just fine, are there other things worth 
checking here? Could it matter that the mgr node isn't an rgw?


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon

On 19/06/2024 19:45, Adam King wrote:
I think this is at least partially a code bug in the rgw module. Where 


...the code path seems to have a bunch of places it might raise an 
exception; are those likely to result in some entry in a log-file? I've 
not found anything, which is making working out what the problem is 
quite challenging...


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lot of spams on the list

2024-06-24 Thread Alex
Thanks Anthony!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lot of spams on the list

2024-06-24 Thread Anthony D'Atri
* Subscription is now moderated
* The three worst spammers (you know who they are) have been removed
* I’ve deleted tens of thousands of crufty mail messages from the queue

The list should work normally now.  Working on the backlog of held messages.  
99% are bogus, but I want to be careful wrt baby and bathwater.



> On Jun 24, 2024, at 1:09 PM, Alex  wrote:
> 
> They seem to use the same few email address and then make new once. It
> should be possible to block them once a day to at least cut down the volume
> of emails but not completely block?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lot of spams on the list

2024-06-24 Thread Alex
They seem to use the same few email address and then make new once. It
should be possible to block them once a day to at least cut down the volume
of emails but not completely block?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Lot of spams on the list

2024-06-24 Thread Alain Péan

Hi all,

I already sent an email abbout this this morning (in France) but 
curiously, I never received it (blocked by my own email server From 
University Paris-saclay ?), although hundred of spams, seemingly coming 
from India, with sexual subject, have arrived in my email box (even if 
marked as spam by Thunderbird) . It lasts since about two weeks. I think 
I am not the only one on the list to get them.


I am very upset to see that nothing seems to be done against these spams.

Spammers should be unsubscribed for example priya.yt1 at gmail), and 
perhaps new subscription blocked for a time ?


Best Regards,

Alain

--
Administrateur Système/Réseau
C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
Tel : 01-70-27-06-88 Bureau A255
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-24 Thread Ivan Clayson

Hello,

We have been experiencing a serious issue with our CephFS backup cluster 
running quincy (version 17.2.7) on a RHEL8-derivative Linux kernel 
(Alma8.9, 4.18.0-513.9.1 kernel) where our MDSes for our filesystem are 
constantly in a "replay" or "replay(laggy)" state and keep crashing.


We have a single MDS filesystem called "ceph_backup" with 2 standby 
MDSes along with a 2nd unused filesystem "ceph_archive" (this holds 
little to no data) where we are using our "ceph_backup" filesystem to 
backup our data and this is the one which is currently broken. The Ceph 
health outputs currently are:


   root@pebbles-s1 14:05 [~]: ceph -s
  cluster:
    id: e3f7535e-d35f-4a5d-88f0-a1e97abcd631
    health: HEALTH_WARN
    1 filesystem is degraded
    insufficient standby MDS daemons available
    1319 pgs not deep-scrubbed in time
    1054 pgs not scrubbed in time

  services:
    mon: 4 daemons, quorum
   pebbles-s1,pebbles-s2,pebbles-s3,pebbles-s4 (age 36m)
    mgr: pebbles-s2(active, since 36m), standbys: pebbles-s4,
   pebbles-s3, pebbles-s1
    mds: 2/2 daemons up
    osd: 1380 osds: 1380 up (since 29m), 1379 in (since 3d); 37
   remapped pgs

  data:
    volumes: 1/2 healthy, 1 recovering
    pools:   7 pools, 2177 pgs
    objects: 3.55G objects, 7.0 PiB
    usage:   8.9 PiB used, 14 PiB / 23 PiB avail
    pgs: 83133528/30006841533 objects misplaced (0.277%)
 2090 active+clean
 47   active+clean+scrubbing+deep
 29   active+remapped+backfilling
 8    active+remapped+backfill_wait
 2    active+clean+scrubbing
 1    active+clean+snaptrim

  io:
    recovery: 1.9 GiB/s, 719 objects/s

   root@pebbles-s1 14:09 [~]: ceph fs status
   ceph_backup - 0 clients
   ===
   RANK  STATE MDS  ACTIVITY   DNS    INOS   DIRS CAPS
 0    replay(laggy)  pebbles-s3   0  0 0  0
    POOL    TYPE USED  AVAIL
   mds_backup_fs  metadata  1255G  2780G
   ec82_primary_fs_data    data   0   2780G
  ec82pool  data    8442T  3044T
   ceph_archive - 2 clients
   
   RANK  STATE  MDS ACTIVITY DNS    INOS   DIRS CAPS
 0    active  pebbles-s2  Reqs:    0 /s  13.4k  7105    118 2
    POOL    TYPE USED  AVAIL
   mds_archive_fs metadata  5184M  2780G
   ec83_primary_fs_data    data   0   2780G
  ec83pool  data 138T  2767T
   MDS version: ceph version 17.2.7
   (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
   root@pebbles-s1 14:09 [~]: ceph health detail | head
   HEALTH_WARN 1 filesystem is degraded; insufficient standby MDS
   daemons available; 1319 pgs not deep-scrubbed in time; 1054 pgs not
   scrubbed in time
   [WRN] FS_DEGRADED: 1 filesystem is degraded
    fs ceph_backup is degraded
   [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons
   available
    have 0; want 1 more

When our cluster first ran after a reboot, Ceph ran through the 2 
standby MDSes, crashing them all, until it reached the final MDS and is 
now stuck in this "replay(laggy)" state. Putting our MDSes into 
debugging mode, we can see that this MDS crashed when replaying the 
journal for a particular inode (this is the same for all the MDSes and 
they all crash on the same object):


   ...
   2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
   EMetaBlob.replay for [521,head] had [inode 0x1005ba89481
   [...539,head]
   
/cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/
   auth fragtree_t(*^2 00*^3 0*^
   4 1*^3 00010*^4 00011*^4 00100*^4 00101*^4 00110*^4 00111*^4
   01*^3 01000*^4 01001*^3 01010*^4 01011*^3 01100*^4 01101*^4 01110*^4
   0*^4 10*^3 1*^4 10001*^4 10010*^4 10011*^4 10100*^4 10101*^3
   10110*^4 10111*^4 11*^6) v10880645 f(v0 m2024-06-22
   T05:41:10.213700+0100 1281276=1281276+0) n(v12
   rc2024-06-22T05:41:10.213700+0100 b1348251683896 1281277=1281276+1)
   old_inodes=8 (iversion lock) | dirfrag=416 dirty=1 0x55770a2bdb80]
   2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
   EMetaBlob.replay dir 0x1005ba89481.011011000*
   2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
   EMetaBlob.replay updated dir [dir 0x1005ba89481.011011000*
   
/cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/
   [2,head] auth v=436385 cv=0/0 state=107374182
   4 f(v0 m2024-06-22T05:41:10.213700+0100 2502=2502+0) n(v12
   rc2024-06-22T05:41:10.213700+0100 b2120744220 2502=2502+0)
   hs=32+33,ss=0+0 dirty=65 | child=1 0x55770ebcda80]
   2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
   EMetaBlob.replay added (full) [dentry
   
#0x1/cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_microg