[ceph-users] Re: MDS crashes to damaged metadata

Patrick Donnelly Mon, 22 May 2023 11:25:26 -0700

Hi Felix,

On Sat, May 13, 2023 at 9:18 AM Stolte, Felix <f.sto...@fz-juelich.de> wrote:
>
> Hi Patrick,
>
> we have been running one daily snapshot since december and our cephfs crashed 
> 3 times because of this https://tracker.ceph.com/issues/38452
>
> We currentliy have 19 files with corrupt metadata found by your 
> first-damage.py script. We isolated the these files from access by users and 
> are waiting for a fix before we remove them with your script (or maybe a new 
> way?)


No other fix is anticipated at this time. Probably one will be
developed after the cause is understood.

> Today we upgraded our cluster from 16.2.11 and 16.2.13. After Upgrading the 
> mds  servers, cluster health went to ERROR MDS_DAMAGE. 'ceph tells mds 0 
> damage ls‘ is showing me the same files as your script (initially only a 
> part, after a cephfs scrub all of them).

This is expected. Once the dentries are marked damaged, the MDS won't
allow operations on those files (like those triggering tracker
#38452).

> I noticed "mds: catch damage to CDentry’s first member before persisting 
> (issue#58482, pr#50781, Patrick Donnelly)“ in the change logs for 16.2.13  
> and like to ask you the following questions:
>
> a) can we repair the damaged files online now instead of bringing down the 
> whole fs and using the python script?

Not yet.

> b) should we set one of the new mds options in our specific case to avoid our 
> fileserver crashing because of the wrong snap ids?

Have your MDS crashed or just marked the dentries damaged? If you can
reproduce a crash with detailed logs (debug_mds=20), that would be
incredibly helpful.

> c) will your patch prevent wrong snap ids in the future?

It will prevent persisting the damage.


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS crashes to damaged metadata

Reply via email to