On Fri, May 12, 2023 at 5:28 AM Frank Schilder <fr...@dtu.dk> wrote: > > Dear Xiubo and others. > > >> I have never heard about that option until now. How do I check that and > >> how to I disable it if necessary? > >> I'm in meetings pretty much all day and will try to send some more info > >> later. > > > > $ mount|grep ceph > > I get > > MON-IPs:SRC on DST type ceph > (rw,relatime,name=con-fs2-rit-pfile,secret=<hidden>,noshare,acl,mds_namespace=con-fs2,_netdev) > > so async dirop seems disabled. > > > Yeah, the kclient just received a corrupted snaptrace from MDS. > > So the first thing is you need to fix the corrupted snaptrace issue in > > cephfs and then continue. > > Ooookaaayyyy. I will take it as a compliment that you seem to assume I know > how to do that. The documentation gives 0 hits. Could you please provide me > with instructions of what to look for and/or what to do first? > > > If possible you can parse the above corrupted snap message to check what > > exactly corrupted. > > I haven't get a chance to do that. > > Again, how would I do that? Is there some documentation and what should I > expect? > > > You seems didn't enable the 'osd blocklist' cephx auth cap for mon: > > I can't find anything about an osd blocklist client auth cap in the > documentation. Is this something that came after octopus? Our caps are as > shown in the documentation for a ceph fs client > (https://docs.ceph.com/en/octopus/cephfs/client-auth/), the one for mon is > "allow r": > > caps mds = "allow rw path=/shares" > caps mon = "allow r" > caps osd = "allow rw tag cephfs data=con-fs2" > > > > I checked that but by reading the code I couldn't get what had cause the > > MDS crash. > > There seems something wrong corrupt the metadata in cephfs. > > He wrote something about an invalid xattrib (empty value). It would be really > helpful to get a clue how to proceed. I managed to dump the MDS cache with > the critical inode in cache. Would this help with debugging? I also managed > to get debug logs with debug_mds=20 during a crash caused by an "mds dump > inode" command. Would this contain something interesting? I can also pull the > rados objects out and can upload all of these files.
I was just guessing about the invalid xattr based on the very limited crash info, so if it's clearly broken snapshot metadata from the kclient logs I would focus on that. I'm surprised/concerned your system managed to generate one of those, of course...I'll let Xiubo work with you on that. -Greg _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io