[ceph-users] Re: MDS crashes to damaged metadata

Venky Shankar Sun, 08 Jan 2023 22:01:59 -0800

Hi Felix,

On Thu, Dec 15, 2022 at 8:03 PM Stolte, Felix <f.sto...@fz-juelich.de> wrote:
>
> Hi Patrick,
>
> we used your script to repair the damaged objects on the weekend and it went 
> smoothly. Thanks for your support.
>
> We adjusted your script to scan for damaged files on a daily basis, runtime 
> is about 6h. Until thursday last week, we had exactly the same 17 Files. On 
> thursday at 13:05 a snapshot was created and our active mds crashed once at 
> this time (snapshot was created):
>
> 2022-12-08T13:05:48.919+0100 7f440afec700 -1 
> /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void 
> ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time 
> 2022-12-08T13:05:48.921223+0100
> /build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state 
> LOCK_XLOCK || state LOCK_XLOCKDONE)


This crash is the same as detailed in
https://tracker.ceph.com/issues/49132. Fix is under backport to p/q
releases.

>
> 12 Minutes lates the unlink_local error crashes appeared again. This time 
> with a new file. During debugging we noticed a MTU mismatch between MDS 
> (1500) and client (9000) with cephfs kernel mount. The client is also 
> creating the snapshots via mkdir in the .snap directory.
>
> We disabled snapshot creation for now, but really need this feature. I 
> uploaded the mds logs of the first crash along with the information above to 
> https://tracker.ceph.com/issues/38452
>
> I would greatly appreciate it, if you could answer me the following question:
>
> Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to 
> 1500 on all nodes in the ceph public network on the weekend also.
>
> If you need a debug level 20 log of the ScatterLock for further analysis, i 
> could schedule snapshots at the end of our workdays and increase the debug 
> level 5 Minutes arround snap shot creation.
>
> Regards
> Felix
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
>
> Am 02.12.2022 um 20:08 schrieb Patrick Donnelly <pdonn...@redhat.com>:
>
> On Thu, Dec 1, 2022 at 5:08 PM Stolte, Felix <f.sto...@fz-juelich.de> wrote:
>
> Script is running for ~2 hours and according to the line count in the memo 
> file we are at 40% (cephfs is still online).
>
> We had to modify the script putting a try/catch arround the for loop in line 
> 78 to 87. For some reasons there are some objects (186 at this moment) which 
> throw an UnicodeDecodeError exception during the iteration:
>
> <rados.OmapIterator object at 0x7f9606f8bcf8> Traceback (most recent call 
> last): File "first-damage.py", line 138, in <module> traverse(f, ioctx) File 
> "first-damage.py", line 79, in traverse for (dnk, val) in it: File 
> "rados.pyx", line 1382, in rados.OmapIterator.__next__ File "rados.pyx", line 
> 311, in rados.decode_cstr UnicodeDecodeError: 'utf-8' codec can't decode 
> bytes in position 10-11: invalid continuation byte
>
> Don’t know if this is because of the filesystem still running. We saved the 
> object names in a separate file and i will investigate further tomorrow. We 
> should be able to modify the script to only check for the objects which threw 
> the exception instead of searching through the whole pool again.
>
> That shouldn't be caused by teh fs running. It may be you have some
> file names which have invalid unicode characters?
>
> Regarding the mds logfiles with debug 20:
> We cannot run this debug level for longer than one hour since the logfile 
> size increase is to high for the local storage on the mds servers where logs 
> are stored (don’t have a central logging yet).
>
> Okay.
>
> But if you are just interested in the time frame arround the crash, i could 
> set the debug level to 20, trigger the crash on the weekend and sent you the 
> logs.
>
> The crash is unlikely to point to what causes the corruption. I was
> hoping we could locate an instance of damage while the MDS is running.
>
> Regards Felix
>
>
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
>
> Am 01.12.2022 um 20:51 schrieb Patrick Donnelly <pdonn...@redhat.com>:
>
> On Thu, Dec 1, 2022 at 3:55 AM Stolte, Felix <f.sto...@fz-juelich.de> wrote:
>
>
> I set debug_mds=20 in ceph.conf and inserted it on the running daemon via 
> "ceph daemon mds.mon-e2-1 config set debug_mds 20“. I have to check with my 
> superiors, if i am allowed to provide yout the logs though.
>
>
> Suggest using `ceph config set` instead of ceph.conf. It's much easier.
>
> Regarding the tool:
> <pool> is refering to the cephfs_metadata pool? (just want to be sure)
>
>
> Yes.
>
> How long will the runs gonna take? We have 15M Objects in our metadata pool 
> and 330M in data pools
>
>
> Not sure. You can monitor the number of lines generated on the memo
> file to get an idea of objects/s.
>
> You can speed test the tool without bringing the file system by
> **not** using `--remove`.
>
> Regarding the root cause:
> As far as i can tell, all damaged inodes have been only accessed via two 
> samba servers running with ctdb. We are also running nfs gateways on 
> different systems, but there hasn’t been a damaged inode (yet).
>
> Samba Servers running Ubuntu 18.04 with kernel 5.4.0-132 and samba version 
> 4.7.6.
> Cephfs is accessed via kernel mount and
>
> ceph version is 16.2.10 across all nodes
> we have one filesystem and two data pools and using cehpfs snapshots
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
>
>
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS crashes to damaged metadata

Reply via email to