On Fri, May 12, 2023 at 5:28 AM Frank Schilder <fr...@dtu.dk> wrote:
>
> Dear Xiubo and others.
>
> >> I have never heard about that option until now. How do I check that and 
> >> how to I disable it if necessary?
> >> I'm in meetings pretty much all day and will try to send some more info 
> >> later.
> >
> > $ mount|grep ceph
>
> I get
>
> MON-IPs:SRC on DST type ceph 
> (rw,relatime,name=con-fs2-rit-pfile,secret=<hidden>,noshare,acl,mds_namespace=con-fs2,_netdev)
>
> so async dirop seems disabled.
>
> > Yeah, the kclient just received a corrupted snaptrace from MDS.
> > So the first thing is you need to fix the corrupted snaptrace issue in 
> > cephfs and then continue.
>
> Ooookaaayyyy. I will take it as a compliment that you seem to assume I know 
> how to do that. The documentation gives 0 hits. Could you please provide me 
> with instructions of what to look for and/or what to do first?
>
> > If possible you can parse the above corrupted snap message to check what 
> > exactly corrupted.
> > I haven't get a chance to do that.
>
> Again, how would I do that? Is there some documentation and what should I 
> expect?
>
> > You seems didn't enable the 'osd blocklist' cephx auth cap for mon:
>
> I can't find anything about an osd blocklist client auth cap in the 
> documentation. Is this something that came after octopus? Our caps are as 
> shown in the documentation for a ceph fs client 
> (https://docs.ceph.com/en/octopus/cephfs/client-auth/), the one for mon is 
> "allow r":
>
>         caps mds = "allow rw path=/shares"
>         caps mon = "allow r"
>         caps osd = "allow rw tag cephfs data=con-fs2"
>
>
> > I checked that but by reading the code I couldn't get what had cause the 
> > MDS crash.
> > There seems something wrong corrupt the metadata in cephfs.
>
> He wrote something about an invalid xattrib (empty value). It would be really 
> helpful to get a clue how to proceed. I managed to dump the MDS cache with 
> the critical inode in cache. Would this help with debugging? I also managed 
> to get debug logs with debug_mds=20 during a crash caused by an "mds dump 
> inode" command. Would this contain something interesting? I can also pull the 
> rados objects out and can upload all of these files.

I was just guessing about the invalid xattr based on the very limited
crash info, so if it's clearly broken snapshot metadata from the
kclient logs I would focus on that.

I'm surprised/concerned your system managed to generate one of those,
of course...I'll let Xiubo work with you on that.
-Greg
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to