Good to hear that, Eugen!
CC'ed Zac for a your docs mention
k
> On Dec 11, 2023, at 23:28, Eugen Block wrote:
>
> Update: apparently, we did it!
> We walked through the disaster recovery steps where one of the steps was to
> reset the journal. I was under the impression that the specified com
Update: apparently, we did it!
We walked through the disaster recovery steps where one of the steps
was to reset the journal. I was under the impression that the
specified command 'cephfs-journal-tool [--rank=N] journal reset' would
simply reset all the journals (mdlog and purge_queue), but
So we did walk through the advanced recovery page but didn't really
succeed. The CephFS is still going to readonly because of the
purge_queue error. Is there any chance to recover from that or should
we try to recover with an empty metadata pool next?
I'd still appreciate any comments. ;-)
Some more information on the damaged CephFS, apparently the journal is
damaged:
---snip---
# cephfs-journal-tool --rank=storage:0 --journal=mdlog journal inspect
2023-12-08T15:35:22.922+0200 7f834d0320c0 -1 Missing object 200.000527c4
2023-12-08T15:35:22.938+0200 7f834d0320c0 -1 Bad entry sta
I was able to (almost) reproduce the issue in a (Pacific) test
cluster. I rebuilt the monmap from the OSDs, brought everything back
up, started the mds recovery like described in [1]:
ceph fs new--force --recover
Then I added two mds daemons which went into standby:
---snip---
Started C
Hi,
First of all I would suggest upgrading your cluster on one of the supported
releases.
I think full recovery is recommended to get back the mds.
1. Stop the mdses and all the clients.
2. Fail the fs.
a. ceph fs fail
3. Backup the journal: (If the below command fails, make rados level c