[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
On Tue, Dec 5, 2023 at 6:34 AM Xiubo Li wrote: > > > On 12/4/23 16:25, zxcs wrote: > > Thanks a lot, Xiubo! > > > > we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease. > > > > But somehow we still see mds complain slow request. and from mds log , can > > see > > > > “slow request *** seconds old, received at 2023-12-04T…: internal op > > exportdir:mds.* currently acquired locks” > > > > so our question is, why it still see "internal op exportdir”, any other > > config also need to set 0? and could please shed light here which config we > > need set . > > > IMO, this should be enough. > > Venky, > > Did I miss something here ? You missed nothing. Setting `mds_bal_interval = 0` disables the balancer. I guess there are in-progress exports that would take some time to backoff and the slow ops should eventually get cleaned up. I'd say wait a bit and see if the slow request resolves by itself. FWIW, there was a feature request a while back to cancel an ongoing export. We should prioritize having that. > > Thanks > > - Xiubo > > > > Thanks, > > xz > > > >> 2023年11月27日 13:19,Xiubo Li 写道: > >> > >> > >> On 11/27/23 13:12, zxcs wrote: > >>> current, we using `ceph config set mds mds_bal_interval 3600` to set a > >>> fixed time(1 hour). > >>> > >>> we also have a question about how to set no balance for multi active mds. > >>> > >>> means, we will enable multi active mds(to improve throughput) and no > >>> balance for these mds. > >>> > >>> and if we set mds_bal_interval as big number seems can void this issue? > >>> > >> You can just set 'mds_bal_interval' to 0. > >> > >> > >>> > >>> Thanks, > >>> xz > >>> > 2023年11月27日 10:56,Ben 写道: > > with the same mds configuration, we see exactly the same(problem, log and > solution) with 17.2.5, constantly happening again and again in couples > days > intervals. MDS servers are stuck somewhere, ceph status reports no issue > however. We need to restart some of the mds (if not all of them) to > restore > them back. Hopefully this could be fixed soon or get docs updated with > warning for the balancer's usage in production environment. > > thanks and regards > > Xiubo Li 于2023年11月23日周四 15:47写道: > > > On 11/23/23 11:25, zxcs wrote: > >> Thanks a ton, Xiubo! > >> > >> it not disappear. > >> > >> even we umount the ceph directory on these two old os node. > >> > >> after dump ops flight , we can see some request, and the earliest > > complain “failed to authpin, subtree is being exported" > >> And how to avoid this, would you please help to shed some light here? > > Okay, as Frank mentioned you can try to disable the balancer by pining > > the directories. As I remembered the balancer is buggy. > > > > And also you can raise one ceph tracker and provide the debug logs if > > you have. > > > > Thanks > > > > - Xiubo > > > > > >> Thanks, > >> xz > >> > >> > >>> 2023年11月22日 19:44,Xiubo Li 写道: > >>> > >>> > >>> On 11/22/23 16:02, zxcs wrote: > HI, Experts, > > we are using cephfs with 16.2.* with multi active mds, and recently, > > we have two nodes mount with ceph-fuse due to the old os system. > and one nodes run a python script with `glob.glob(path)`, and > another > > client doing `cp` operation on the same path. > then we see some log about `mds slow request`, and logs complain > > “failed to authpin, subtree is being exported" > then need to restart mds, > > > our question is, does there any dead lock? how can we avoid this and > > how to fix it without restart mds(it will influence other users) ? > >>> BTW, won't the slow requests disappear themself later ? > >>> > >>> It looks like the exporting is slow or there too many exports are > >>> going > > on. > >>> Thanks > >>> > >>> - Xiubo > >>> > Thanks a ton! > > > xz > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > >>> ___ > >>> ceph-users mailing list -- ceph-users@ceph.io > >>> To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
On 12/4/23 16:25, zxcs wrote: Thanks a lot, Xiubo! we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease. But somehow we still see mds complain slow request. and from mds log , can see “slow request *** seconds old, received at 2023-12-04T…: internal op exportdir:mds.* currently acquired locks” so our question is, why it still see "internal op exportdir”, any other config also need to set 0? and could please shed light here which config we need set . IMO, this should be enough. Venky, Did I miss something here ? Thanks - Xiubo Thanks, xz 2023年11月27日 13:19,Xiubo Li 写道: On 11/27/23 13:12, zxcs wrote: current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed time(1 hour). we also have a question about how to set no balance for multi active mds. means, we will enable multi active mds(to improve throughput) and no balance for these mds. and if we set mds_bal_interval as big number seems can void this issue? You can just set 'mds_bal_interval' to 0. Thanks, xz 2023年11月27日 10:56,Ben 写道: with the same mds configuration, we see exactly the same(problem, log and solution) with 17.2.5, constantly happening again and again in couples days intervals. MDS servers are stuck somewhere, ceph status reports no issue however. We need to restart some of the mds (if not all of them) to restore them back. Hopefully this could be fixed soon or get docs updated with warning for the balancer's usage in production environment. thanks and regards Xiubo Li 于2023年11月23日周四 15:47写道: On 11/23/23 11:25, zxcs wrote: Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, subtree is being exported" And how to avoid this, would you please help to shed some light here? Okay, as Frank mentioned you can try to disable the balancer by pining the directories. As I remembered the balancer is buggy. And also you can raise one ceph tracker and provide the debug logs if you have. Thanks - Xiubo Thanks, xz 2023年11月22日 19:44,Xiubo Li 写道: On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" then need to restart mds, our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? BTW, won't the slow requests disappear themself later ? It looks like the exporting is slow or there too many exports are going on. Thanks - Xiubo Thanks a ton! xz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
Thanks a lot, Xiubo! we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease. But somehow we still see mds complain slow request. and from mds log , can see “slow request *** seconds old, received at 2023-12-04T…: internal op exportdir:mds.* currently acquired locks” so our question is, why it still see "internal op exportdir”, any other config also need to set 0? and could please shed light here which config we need set . Thanks, xz > 2023年11月27日 13:19,Xiubo Li 写道: > > > On 11/27/23 13:12, zxcs wrote: >> current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed >> time(1 hour). >> >> we also have a question about how to set no balance for multi active mds. >> >> means, we will enable multi active mds(to improve throughput) and no balance >> for these mds. >> >> and if we set mds_bal_interval as big number seems can void this issue? >> > You can just set 'mds_bal_interval' to 0. > > >> >> >> Thanks, >> xz >> >>> 2023年11月27日 10:56,Ben 写道: >>> >>> with the same mds configuration, we see exactly the same(problem, log and >>> solution) with 17.2.5, constantly happening again and again in couples days >>> intervals. MDS servers are stuck somewhere, ceph status reports no issue >>> however. We need to restart some of the mds (if not all of them) to restore >>> them back. Hopefully this could be fixed soon or get docs updated with >>> warning for the balancer's usage in production environment. >>> >>> thanks and regards >>> >>> Xiubo Li 于2023年11月23日周四 15:47写道: >>> On 11/23/23 11:25, zxcs wrote: > Thanks a ton, Xiubo! > > it not disappear. > > even we umount the ceph directory on these two old os node. > > after dump ops flight , we can see some request, and the earliest complain “failed to authpin, subtree is being exported" > > And how to avoid this, would you please help to shed some light here? Okay, as Frank mentioned you can try to disable the balancer by pining the directories. As I remembered the balancer is buggy. And also you can raise one ceph tracker and provide the debug logs if you have. Thanks - Xiubo > Thanks, > xz > > >> 2023年11月22日 19:44,Xiubo Li 写道: >> >> >> On 11/22/23 16:02, zxcs wrote: >>> HI, Experts, >>> >>> we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. >>> >>> and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. >>> >>> then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" >>> >>> then need to restart mds, >>> >>> >>> our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? >> BTW, won't the slow requests disappear themself later ? >> >> It looks like the exporting is slow or there too many exports are going on. >> >> Thanks >> >> - Xiubo >> >>> Thanks a ton! >>> >>> >>> xz >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
On 11/27/23 13:12, zxcs wrote: current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed time(1 hour). we also have a question about how to set no balance for multi active mds. means, we will enable multi active mds(to improve throughput) and no balance for these mds. and if we set mds_bal_interval as big number seems can void this issue? You can just set 'mds_bal_interval' to 0. Thanks, xz 2023年11月27日 10:56,Ben 写道: with the same mds configuration, we see exactly the same(problem, log and solution) with 17.2.5, constantly happening again and again in couples days intervals. MDS servers are stuck somewhere, ceph status reports no issue however. We need to restart some of the mds (if not all of them) to restore them back. Hopefully this could be fixed soon or get docs updated with warning for the balancer's usage in production environment. thanks and regards Xiubo Li 于2023年11月23日周四 15:47写道: On 11/23/23 11:25, zxcs wrote: Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, subtree is being exported" And how to avoid this, would you please help to shed some light here? Okay, as Frank mentioned you can try to disable the balancer by pining the directories. As I remembered the balancer is buggy. And also you can raise one ceph tracker and provide the debug logs if you have. Thanks - Xiubo Thanks, xz 2023年11月22日 19:44,Xiubo Li 写道: On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" then need to restart mds, our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? BTW, won't the slow requests disappear themself later ? It looks like the exporting is slow or there too many exports are going on. Thanks - Xiubo Thanks a ton! xz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed time(1 hour). we also have a question about how to set no balance for multi active mds. means, we will enable multi active mds(to improve throughput) and no balance for these mds. and if we set mds_bal_interval as big number seems can void this issue? Thanks, xz > 2023年11月27日 10:56,Ben 写道: > > with the same mds configuration, we see exactly the same(problem, log and > solution) with 17.2.5, constantly happening again and again in couples days > intervals. MDS servers are stuck somewhere, ceph status reports no issue > however. We need to restart some of the mds (if not all of them) to restore > them back. Hopefully this could be fixed soon or get docs updated with > warning for the balancer's usage in production environment. > > thanks and regards > > Xiubo Li 于2023年11月23日周四 15:47写道: > >> >> On 11/23/23 11:25, zxcs wrote: >>> Thanks a ton, Xiubo! >>> >>> it not disappear. >>> >>> even we umount the ceph directory on these two old os node. >>> >>> after dump ops flight , we can see some request, and the earliest >> complain “failed to authpin, subtree is being exported" >>> >>> And how to avoid this, would you please help to shed some light here? >> >> Okay, as Frank mentioned you can try to disable the balancer by pining >> the directories. As I remembered the balancer is buggy. >> >> And also you can raise one ceph tracker and provide the debug logs if >> you have. >> >> Thanks >> >> - Xiubo >> >> >>> Thanks, >>> xz >>> >>> 2023年11月22日 19:44,Xiubo Li 写道: On 11/22/23 16:02, zxcs wrote: > HI, Experts, > > we are using cephfs with 16.2.* with multi active mds, and recently, >> we have two nodes mount with ceph-fuse due to the old os system. > > and one nodes run a python script with `glob.glob(path)`, and another >> client doing `cp` operation on the same path. > > then we see some log about `mds slow request`, and logs complain >> “failed to authpin, subtree is being exported" > > then need to restart mds, > > > our question is, does there any dead lock? how can we avoid this and >> how to fix it without restart mds(it will influence other users) ? BTW, won't the slow requests disappear themself later ? It looks like the exporting is slow or there too many exports are going >> on. Thanks - Xiubo > Thanks a ton! > > > xz > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
with the same mds configuration, we see exactly the same(problem, log and solution) with 17.2.5, constantly happening again and again in couples days intervals. MDS servers are stuck somewhere, ceph status reports no issue however. We need to restart some of the mds (if not all of them) to restore them back. Hopefully this could be fixed soon or get docs updated with warning for the balancer's usage in production environment. thanks and regards Xiubo Li 于2023年11月23日周四 15:47写道: > > On 11/23/23 11:25, zxcs wrote: > > Thanks a ton, Xiubo! > > > > it not disappear. > > > > even we umount the ceph directory on these two old os node. > > > > after dump ops flight , we can see some request, and the earliest > complain “failed to authpin, subtree is being exported" > > > > And how to avoid this, would you please help to shed some light here? > > Okay, as Frank mentioned you can try to disable the balancer by pining > the directories. As I remembered the balancer is buggy. > > And also you can raise one ceph tracker and provide the debug logs if > you have. > > Thanks > > - Xiubo > > > > Thanks, > > xz > > > > > >> 2023年11月22日 19:44,Xiubo Li 写道: > >> > >> > >> On 11/22/23 16:02, zxcs wrote: > >>> HI, Experts, > >>> > >>> we are using cephfs with 16.2.* with multi active mds, and recently, > we have two nodes mount with ceph-fuse due to the old os system. > >>> > >>> and one nodes run a python script with `glob.glob(path)`, and another > client doing `cp` operation on the same path. > >>> > >>> then we see some log about `mds slow request`, and logs complain > “failed to authpin, subtree is being exported" > >>> > >>> then need to restart mds, > >>> > >>> > >>> our question is, does there any dead lock? how can we avoid this and > how to fix it without restart mds(it will influence other users) ? > >> BTW, won't the slow requests disappear themself later ? > >> > >> It looks like the exporting is slow or there too many exports are going > on. > >> > >> Thanks > >> > >> - Xiubo > >> > >>> Thanks a ton! > >>> > >>> > >>> xz > >>> ___ > >>> ceph-users mailing list -- ceph-users@ceph.io > >>> To unsubscribe send an email to ceph-users-le...@ceph.io > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
On 11/23/23 11:25, zxcs wrote: Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, subtree is being exported" And how to avoid this, would you please help to shed some light here? Okay, as Frank mentioned you can try to disable the balancer by pining the directories. As I remembered the balancer is buggy. And also you can raise one ceph tracker and provide the debug logs if you have. Thanks - Xiubo Thanks, xz 2023年11月22日 19:44,Xiubo Li 写道: On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" then need to restart mds, our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? BTW, won't the slow requests disappear themself later ? It looks like the exporting is slow or there too many exports are going on. Thanks - Xiubo Thanks a ton! xz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, subtree is being exported" And how to avoid this, would you please help to shed some light here? Thanks, xz > 2023年11月22日 19:44,Xiubo Li 写道: > > > On 11/22/23 16:02, zxcs wrote: >> HI, Experts, >> >> we are using cephfs with 16.2.* with multi active mds, and recently, we >> have two nodes mount with ceph-fuse due to the old os system. >> >> and one nodes run a python script with `glob.glob(path)`, and another >> client doing `cp` operation on the same path. >> >> then we see some log about `mds slow request`, and logs complain “failed to >> authpin, subtree is being exported" >> >> then need to restart mds, >> >> >> our question is, does there any dead lock? how can we avoid this and how to >> fix it without restart mds(it will influence other users) ? > > BTW, won't the slow requests disappear themself later ? > > It looks like the exporting is slow or there too many exports are going on. > > Thanks > > - Xiubo > >> >> Thanks a ton! >> >> >> xz >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
There are some unhandled race conditions in the MDS cluster in rare circumstances. We had this issue with mimic and octopus and it went away after manually pinning sub-dirs to MDS ranks; see https://docs.ceph.com/en/nautilus/cephfs/multimds/?highlight=dir%20pin#manually-pinning-directory-trees-to-a-particular-rank. This has the added advantage that one can bypass the internal load-balancer, which was horrible for our work loads. I have a related post about ephemeral pinning on this list one-two years ago. You should be able to find it. Short story: after manually pinning all user directories to ranks, all our problems disappeared and performance improved a lot. MDS load dropped from 130% average to 10-20%. So did memory consumption and cache recycling. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Wednesday, November 22, 2023 12:30 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported" Hi, we've seen this a year ago in a Nautilus cluster with multi-active MDS as well. It turned up only once within several years and we decided not to look too closely at that time. How often do you see it? Is it reproducable? In that case I'd recommend to create a tracker issue. Regards, Eugen Zitat von zxcs : > HI, Experts, > > we are using cephfs with 16.2.* with multi active mds, and > recently, we have two nodes mount with ceph-fuse due to the old os > system. > > and one nodes run a python script with `glob.glob(path)`, and > another client doing `cp` operation on the same path. > > then we see some log about `mds slow request`, and logs complain > “failed to authpin, subtree is being exported" > > then need to restart mds, > > > our question is, does there any dead lock? how can we avoid this > and how to fix it without restart mds(it will influence other users) ? > > > Thanks a ton! > > > xz > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" then need to restart mds, our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? BTW, won't the slow requests disappear themself later ? It looks like the exporting is slow or there too many exports are going on. Thanks - Xiubo Thanks a ton! xz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
Hi, we've seen this a year ago in a Nautilus cluster with multi-active MDS as well. It turned up only once within several years and we decided not to look too closely at that time. How often do you see it? Is it reproducable? In that case I'd recommend to create a tracker issue. Regards, Eugen Zitat von zxcs : HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" then need to restart mds, our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? Thanks a ton! xz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io