[ceph-users] Re: MDS recovery with existing pools

2023-12-11 Thread Konstantin Shalygin
Good to hear that, Eugen! CC'ed Zac for a your docs mention k > On Dec 11, 2023, at 23:28, Eugen Block wrote: > > Update: apparently, we did it! > We walked through the disaster recovery steps where one of the steps was to > reset the journal. I was under the impression that the specified com

[ceph-users] Re: MDS recovery with existing pools

2023-12-11 Thread Eugen Block
Update: apparently, we did it! We walked through the disaster recovery steps where one of the steps was to reset the journal. I was under the impression that the specified command 'cephfs-journal-tool [--rank=N] journal reset' would simply reset all the journals (mdlog and purge_queue), but

[ceph-users] Re: MDS recovery with existing pools

2023-12-11 Thread Eugen Block
So we did walk through the advanced recovery page but didn't really succeed. The CephFS is still going to readonly because of the purge_queue error. Is there any chance to recover from that or should we try to recover with an empty metadata pool next? I'd still appreciate any comments. ;-)

[ceph-users] Re: MDS recovery with existing pools

2023-12-08 Thread Eugen Block
Some more information on the damaged CephFS, apparently the journal is damaged: ---snip--- # cephfs-journal-tool --rank=storage:0 --journal=mdlog journal inspect 2023-12-08T15:35:22.922+0200 7f834d0320c0 -1 Missing object 200.000527c4 2023-12-08T15:35:22.938+0200 7f834d0320c0 -1 Bad entry sta

[ceph-users] Re: MDS recovery with existing pools

2023-12-08 Thread Eugen Block
I was able to (almost) reproduce the issue in a (Pacific) test cluster. I rebuilt the monmap from the OSDs, brought everything back up, started the mds recovery like described in [1]: ceph fs new--force --recover Then I added two mds daemons which went into standby: ---snip--- Started C

[ceph-users] Re: MDS recovery

2023-04-27 Thread Kotresh Hiremath Ravishankar
Hi, First of all I would suggest upgrading your cluster on one of the supported releases. I think full recovery is recommended to get back the mds. 1. Stop the mdses and all the clients. 2. Fail the fs. a. ceph fs fail 3. Backup the journal: (If the below command fails, make rados level c