[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-12-04 Thread Venky Shankar
On Tue, Dec 5, 2023 at 6:34 AM Xiubo Li  wrote:
>
>
> On 12/4/23 16:25, zxcs wrote:
> > Thanks a lot, Xiubo!
> >
> > we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease.
> >
> > But somehow we still see mds complain slow request. and from mds log , can 
> > see
> >
> > “slow request *** seconds old, received at 2023-12-04T…: internal op 
> > exportdir:mds.* currently acquired locks”
> >
> > so our question is, why it still see "internal op exportdir”, any other 
> > config also need to set 0? and could please shed light here which config we 
> > need set .
> >
> IMO, this should be enough.
>
> Venky,
>
> Did I miss something here ?

You missed nothing. Setting `mds_bal_interval = 0` disables the
balancer. I guess there are in-progress exports that would take some
time to backoff and the slow ops should eventually get cleaned up.

I'd say wait a bit and see if the slow request resolves by itself.
FWIW, there was a feature request a while back to cancel an ongoing
export. We should prioritize having that.

>
> Thanks
>
> - Xiubo
>
>
> > Thanks,
> > xz
> >
> >> 2023年11月27日 13:19,Xiubo Li  写道:
> >>
> >>
> >> On 11/27/23 13:12, zxcs wrote:
> >>> current, we using `ceph config set mds mds_bal_interval 3600` to set a 
> >>> fixed time(1 hour).
> >>>
> >>> we also have a question about how to set no balance for multi active mds.
> >>>
> >>> means, we will enable multi active mds(to improve throughput) and no 
> >>> balance for these mds.
> >>>
> >>> and if we set mds_bal_interval as big number seems can void this issue?
> >>>
> >> You can just set 'mds_bal_interval' to 0.
> >>
> >>
> >>>
> >>> Thanks,
> >>> xz
> >>>
>  2023年11月27日 10:56,Ben  写道:
> 
>  with the same mds configuration, we see exactly the same(problem, log and
>  solution) with 17.2.5, constantly happening again and again in couples 
>  days
>  intervals. MDS servers are stuck somewhere, ceph status reports no issue
>  however. We need to restart some of the mds (if not all of them) to 
>  restore
>  them back. Hopefully this could be fixed soon or get docs updated with
>  warning for the balancer's usage in production environment.
> 
>  thanks and regards
> 
>  Xiubo Li  于2023年11月23日周四 15:47写道:
> 
> > On 11/23/23 11:25, zxcs wrote:
> >> Thanks a ton, Xiubo!
> >>
> >> it not disappear.
> >>
> >> even we umount the ceph directory on these two old os node.
> >>
> >> after dump ops flight , we can see some request, and the earliest
> > complain “failed to authpin, subtree is being exported"
> >> And how to avoid this, would you please help to shed some light here?
> > Okay, as Frank mentioned you can try to disable the balancer by pining
> > the directories. As I remembered the balancer is buggy.
> >
> > And also you can raise one ceph tracker and provide the debug logs if
> > you have.
> >
> > Thanks
> >
> > - Xiubo
> >
> >
> >> Thanks,
> >> xz
> >>
> >>
> >>> 2023年11月22日 19:44,Xiubo Li  写道:
> >>>
> >>>
> >>> On 11/22/23 16:02, zxcs wrote:
>  HI, Experts,
> 
>  we are using cephfs with  16.2.* with multi active mds, and recently,
> > we have two nodes mount with ceph-fuse due to the old os system.
>  and  one nodes run a python script with `glob.glob(path)`, and 
>  another
> > client doing `cp` operation on the same path.
>  then we see some log about `mds slow request`, and logs complain
> > “failed to authpin, subtree is being exported"
>  then need to restart mds,
> 
> 
>  our question is, does there any dead lock?  how can we avoid this and
> > how to fix it without restart mds(it will influence other users) ?
> >>> BTW, won't the slow requests disappear themself later ?
> >>>
> >>> It looks like the exporting is slow or there too many exports are 
> >>> going
> > on.
> >>> Thanks
> >>>
> >>> - Xiubo
> >>>
>  Thanks a ton!
> 
> 
>  xz
>  ___
>  ceph-users mailing list -- ceph-users@ceph.io
>  To unsubscribe send an email to ceph-users-le...@ceph.io
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>  ___
>  ceph-users mailing list -- ceph-users@ceph.io
>  To unsubscribe send an email to ceph-users-le...@ceph.io
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to 

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-12-04 Thread Xiubo Li


On 12/4/23 16:25, zxcs wrote:

Thanks a lot, Xiubo!

we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease.

But somehow we still see mds complain slow request. and from mds log , can see

“slow request *** seconds old, received at 2023-12-04T…: internal op 
exportdir:mds.* currently acquired locks”

so our question is, why it still see "internal op exportdir”, any other config 
also need to set 0? and could please shed light here which config we need set .


IMO, this should be enough.

Venky,

Did I miss something here ?

Thanks

- Xiubo



Thanks,
xz


2023年11月27日 13:19,Xiubo Li  写道:


On 11/27/23 13:12, zxcs wrote:

current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed 
time(1 hour).

we also have a question about how to set no balance for multi active mds.

means, we will enable multi active mds(to improve throughput) and no balance 
for these mds.

and if we set mds_bal_interval as big number seems can void this issue?


You can just set 'mds_bal_interval' to 0.




Thanks,
xz


2023年11月27日 10:56,Ben  写道:

with the same mds configuration, we see exactly the same(problem, log and
solution) with 17.2.5, constantly happening again and again in couples days
intervals. MDS servers are stuck somewhere, ceph status reports no issue
however. We need to restart some of the mds (if not all of them) to restore
them back. Hopefully this could be fixed soon or get docs updated with
warning for the balancer's usage in production environment.

thanks and regards

Xiubo Li  于2023年11月23日周四 15:47写道:


On 11/23/23 11:25, zxcs wrote:

Thanks a ton, Xiubo!

it not disappear.

even we umount the ceph directory on these two old os node.

after dump ops flight , we can see some request, and the earliest

complain “failed to authpin, subtree is being exported"

And how to avoid this, would you please help to shed some light here?

Okay, as Frank mentioned you can try to disable the balancer by pining
the directories. As I remembered the balancer is buggy.

And also you can raise one ceph tracker and provide the debug logs if
you have.

Thanks

- Xiubo



Thanks,
xz



2023年11月22日 19:44,Xiubo Li  写道:


On 11/22/23 16:02, zxcs wrote:

HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and recently,

we have two nodes mount with ceph-fuse due to the old os system.

and  one nodes run a python script with `glob.glob(path)`, and another

client doing `cp` operation on the same path.

then we see some log about `mds slow request`, and logs complain

“failed to authpin, subtree is being exported"

then need to restart mds,


our question is, does there any dead lock?  how can we avoid this and

how to fix it without restart mds(it will influence other users) ?

BTW, won't the slow requests disappear themself later ?

It looks like the exporting is slow or there too many exports are going

on.

Thanks

- Xiubo


Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-12-04 Thread zxcs
Thanks a lot, Xiubo!

we already set ‘mds_bal_interval’ to 0. and the slow mds seems decrease.

But somehow we still see mds complain slow request. and from mds log , can see 

“slow request *** seconds old, received at 2023-12-04T…: internal op 
exportdir:mds.* currently acquired locks”

so our question is, why it still see "internal op exportdir”, any other config 
also need to set 0? and could please shed light here which config we need set .


Thanks,
xz 

> 2023年11月27日 13:19,Xiubo Li  写道:
> 
> 
> On 11/27/23 13:12, zxcs wrote:
>> current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed 
>> time(1 hour).
>> 
>> we also have a question about how to set no balance for multi active mds.
>> 
>> means, we will enable multi active mds(to improve throughput) and no balance 
>> for these mds.
>> 
>> and if we set mds_bal_interval as big number seems can void this issue?
>> 
> You can just set 'mds_bal_interval' to 0.
> 
> 
>> 
>> 
>> Thanks,
>> xz
>> 
>>> 2023年11月27日 10:56,Ben  写道:
>>> 
>>> with the same mds configuration, we see exactly the same(problem, log and
>>> solution) with 17.2.5, constantly happening again and again in couples days
>>> intervals. MDS servers are stuck somewhere, ceph status reports no issue
>>> however. We need to restart some of the mds (if not all of them) to restore
>>> them back. Hopefully this could be fixed soon or get docs updated with
>>> warning for the balancer's usage in production environment.
>>> 
>>> thanks and regards
>>> 
>>> Xiubo Li  于2023年11月23日周四 15:47写道:
>>> 
 
 On 11/23/23 11:25, zxcs wrote:
> Thanks a ton, Xiubo!
> 
> it not disappear.
> 
> even we umount the ceph directory on these two old os node.
> 
> after dump ops flight , we can see some request, and the earliest
 complain “failed to authpin, subtree is being exported"
> 
> And how to avoid this, would you please help to shed some light here?
 
 Okay, as Frank mentioned you can try to disable the balancer by pining
 the directories. As I remembered the balancer is buggy.
 
 And also you can raise one ceph tracker and provide the debug logs if
 you have.
 
 Thanks
 
 - Xiubo
 
 
> Thanks,
> xz
> 
> 
>> 2023年11月22日 19:44,Xiubo Li  写道:
>> 
>> 
>> On 11/22/23 16:02, zxcs wrote:
>>> HI, Experts,
>>> 
>>> we are using cephfs with  16.2.* with multi active mds, and recently,
 we have two nodes mount with ceph-fuse due to the old os system.
>>> 
>>> and  one nodes run a python script with `glob.glob(path)`, and another
 client doing `cp` operation on the same path.
>>> 
>>> then we see some log about `mds slow request`, and logs complain
 “failed to authpin, subtree is being exported"
>>> 
>>> then need to restart mds,
>>> 
>>> 
>>> our question is, does there any dead lock?  how can we avoid this and
 how to fix it without restart mds(it will influence other users) ?
>> BTW, won't the slow requests disappear themself later ?
>> 
>> It looks like the exporting is slow or there too many exports are going
 on.
>> 
>> Thanks
>> 
>> - Xiubo
>> 
>>> Thanks a ton!
>>> 
>>> 
>>> xz
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
 
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-26 Thread Xiubo Li


On 11/27/23 13:12, zxcs wrote:
current, we using `ceph config set mds mds_bal_interval 3600` to set a 
fixed time(1 hour).


we also have a question about how to set no balance for multi active mds.

means, we will enable multi active mds(to improve throughput) and no 
balance for these mds.


and if we set mds_bal_interval as big number seems can void this issue?


You can just set 'mds_bal_interval' to 0.





Thanks,
xz


2023年11月27日 10:56,Ben  写道:

with the same mds configuration, we see exactly the same(problem, log and
solution) with 17.2.5, constantly happening again and again in 
couples days

intervals. MDS servers are stuck somewhere, ceph status reports no issue
however. We need to restart some of the mds (if not all of them) to 
restore

them back. Hopefully this could be fixed soon or get docs updated with
warning for the balancer's usage in production environment.

thanks and regards

Xiubo Li  于2023年11月23日周四 15:47写道:



On 11/23/23 11:25, zxcs wrote:

Thanks a ton, Xiubo!

it not disappear.

even we umount the ceph directory on these two old os node.

after dump ops flight , we can see some request, and the earliest

complain “failed to authpin, subtree is being exported"


And how to avoid this, would you please help to shed some light here?


Okay, as Frank mentioned you can try to disable the balancer by pining
the directories. As I remembered the balancer is buggy.

And also you can raise one ceph tracker and provide the debug logs if
you have.

Thanks

- Xiubo



Thanks,
xz



2023年11月22日 19:44,Xiubo Li  写道:


On 11/22/23 16:02, zxcs wrote:

HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and recently,

we have two nodes mount with ceph-fuse due to the old os system.


and  one nodes run a python script with `glob.glob(path)`, and 
another

client doing `cp` operation on the same path.


then we see some log about `mds slow request`, and logs complain

“failed to authpin, subtree is being exported"


then need to restart mds,


our question is, does there any dead lock?  how can we avoid this and

how to fix it without restart mds(it will influence other users) ?

BTW, won't the slow requests disappear themself later ?

It looks like the exporting is slow or there too many exports are 
going

on.


Thanks

- Xiubo


Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-26 Thread zxcs
current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed 
time(1 hour).

we also have a question about how to set no balance for multi active mds.

means, we will enable multi active mds(to improve throughput) and no balance 
for these mds.

and if we set mds_bal_interval as big number seems can void this issue?



Thanks,
xz

> 2023年11月27日 10:56,Ben  写道:
> 
> with the same mds configuration, we see exactly the same(problem, log and
> solution) with 17.2.5, constantly happening again and again in couples days
> intervals. MDS servers are stuck somewhere, ceph status reports no issue
> however. We need to restart some of the mds (if not all of them) to restore
> them back. Hopefully this could be fixed soon or get docs updated with
> warning for the balancer's usage in production environment.
> 
> thanks and regards
> 
> Xiubo Li  于2023年11月23日周四 15:47写道:
> 
>> 
>> On 11/23/23 11:25, zxcs wrote:
>>> Thanks a ton, Xiubo!
>>> 
>>> it not disappear.
>>> 
>>> even we umount the ceph directory on these two old os node.
>>> 
>>> after dump ops flight , we can see some request, and the earliest
>> complain “failed to authpin, subtree is being exported"
>>> 
>>> And how to avoid this, would you please help to shed some light here?
>> 
>> Okay, as Frank mentioned you can try to disable the balancer by pining
>> the directories. As I remembered the balancer is buggy.
>> 
>> And also you can raise one ceph tracker and provide the debug logs if
>> you have.
>> 
>> Thanks
>> 
>> - Xiubo
>> 
>> 
>>> Thanks,
>>> xz
>>> 
>>> 
 2023年11月22日 19:44,Xiubo Li  写道:
 
 
 On 11/22/23 16:02, zxcs wrote:
> HI, Experts,
> 
> we are using cephfs with  16.2.* with multi active mds, and recently,
>> we have two nodes mount with ceph-fuse due to the old os system.
> 
> and  one nodes run a python script with `glob.glob(path)`, and another
>> client doing `cp` operation on the same path.
> 
> then we see some log about `mds slow request`, and logs complain
>> “failed to authpin, subtree is being exported"
> 
> then need to restart mds,
> 
> 
> our question is, does there any dead lock?  how can we avoid this and
>> how to fix it without restart mds(it will influence other users) ?
 BTW, won't the slow requests disappear themself later ?
 
 It looks like the exporting is slow or there too many exports are going
>> on.
 
 Thanks
 
 - Xiubo
 
> Thanks a ton!
> 
> 
> xz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-26 Thread Ben
with the same mds configuration, we see exactly the same(problem, log and
solution) with 17.2.5, constantly happening again and again in couples days
intervals. MDS servers are stuck somewhere, ceph status reports no issue
however. We need to restart some of the mds (if not all of them) to restore
them back. Hopefully this could be fixed soon or get docs updated with
warning for the balancer's usage in production environment.

thanks and regards

Xiubo Li  于2023年11月23日周四 15:47写道:

>
> On 11/23/23 11:25, zxcs wrote:
> > Thanks a ton, Xiubo!
> >
> > it not disappear.
> >
> > even we umount the ceph directory on these two old os node.
> >
> > after dump ops flight , we can see some request, and the earliest
> complain “failed to authpin, subtree is being exported"
> >
> > And how to avoid this, would you please help to shed some light here?
>
> Okay, as Frank mentioned you can try to disable the balancer by pining
> the directories. As I remembered the balancer is buggy.
>
> And also you can raise one ceph tracker and provide the debug logs if
> you have.
>
> Thanks
>
> - Xiubo
>
>
> > Thanks,
> > xz
> >
> >
> >> 2023年11月22日 19:44,Xiubo Li  写道:
> >>
> >>
> >> On 11/22/23 16:02, zxcs wrote:
> >>> HI, Experts,
> >>>
> >>> we are using cephfs with  16.2.* with multi active mds, and recently,
> we have two nodes mount with ceph-fuse due to the old os system.
> >>>
> >>> and  one nodes run a python script with `glob.glob(path)`, and another
> client doing `cp` operation on the same path.
> >>>
> >>> then we see some log about `mds slow request`, and logs complain
> “failed to authpin, subtree is being exported"
> >>>
> >>> then need to restart mds,
> >>>
> >>>
> >>> our question is, does there any dead lock?  how can we avoid this and
> how to fix it without restart mds(it will influence other users) ?
> >> BTW, won't the slow requests disappear themself later ?
> >>
> >> It looks like the exporting is slow or there too many exports are going
> on.
> >>
> >> Thanks
> >>
> >> - Xiubo
> >>
> >>> Thanks a ton!
> >>>
> >>>
> >>> xz
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li


On 11/23/23 11:25, zxcs wrote:

Thanks a ton, Xiubo!

it not disappear.

even we umount the ceph directory on these two old os node.

after dump ops flight , we can see some request, and the earliest complain “failed 
to authpin, subtree is being exported"

And how to avoid this, would you please help to shed some light here?


Okay, as Frank mentioned you can try to disable the balancer by pining 
the directories. As I remembered the balancer is buggy.


And also you can raise one ceph tracker and provide the debug logs if 
you have.


Thanks

- Xiubo



Thanks,
xz



2023年11月22日 19:44,Xiubo Li  写道:


On 11/22/23 16:02, zxcs wrote:

HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and recently, we have 
two nodes mount with ceph-fuse due to the old os system.

and  one nodes run a python script with `glob.glob(path)`, and another client 
doing `cp` operation on the same path.

then we see some log about `mds slow request`, and logs complain “failed to authpin, 
subtree is being exported"

then need to restart mds,


our question is, does there any dead lock?  how can we avoid this and how to 
fix it without restart mds(it will influence other users) ?

BTW, won't the slow requests disappear themself later ?

It looks like the exporting is slow or there too many exports are going on.

Thanks

- Xiubo


Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread zxcs
Thanks a ton, Xiubo!

it not disappear.

even we umount the ceph directory on these two old os node.

after dump ops flight , we can see some request, and the earliest complain 
“failed to authpin, subtree is being exported"

And how to avoid this, would you please help to shed some light here?

Thanks,
xz


> 2023年11月22日 19:44,Xiubo Li  写道:
> 
> 
> On 11/22/23 16:02, zxcs wrote:
>> HI, Experts,
>> 
>> we are using cephfs with  16.2.* with multi active mds, and recently, we 
>> have two nodes mount with ceph-fuse due to the old os system.
>> 
>> and  one nodes run a python script with `glob.glob(path)`, and another 
>> client doing `cp` operation on the same path.
>> 
>> then we see some log about `mds slow request`, and logs complain “failed to 
>> authpin, subtree is being exported"
>> 
>> then need to restart mds,
>> 
>> 
>> our question is, does there any dead lock?  how can we avoid this and how to 
>> fix it without restart mds(it will influence other users) ?
> 
> BTW, won't the slow requests disappear themself later ?
> 
> It looks like the exporting is slow or there too many exports are going on.
> 
> Thanks
> 
> - Xiubo
> 
>> 
>> Thanks a ton!
>> 
>> 
>> xz
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Frank Schilder
There are some unhandled race conditions in the MDS cluster in rare 
circumstances.

We had this issue with mimic and octopus and it went away after manually 
pinning sub-dirs to MDS ranks; see 
https://docs.ceph.com/en/nautilus/cephfs/multimds/?highlight=dir%20pin#manually-pinning-directory-trees-to-a-particular-rank.

This has the added advantage that one can bypass the internal load-balancer, 
which was horrible for our work loads. I have a related post about ephemeral 
pinning on this list one-two years ago. You should be able to find it. Short 
story: after manually pinning all user directories to ranks, all our problems 
disappeared and performance improved a lot. MDS load dropped from 130% average 
to 10-20%. So did memory consumption and cache recycling.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: Wednesday, November 22, 2023 12:30 PM
To: ceph-users@ceph.io
Subject: [ceph-users]  Re: mds slow request with “failed to authpin, subtree is 
being exported"

Hi,

we've seen this a year ago in a Nautilus cluster with multi-active MDS
as well. It turned up only once within several years and we decided
not to look too closely at that time. How often do you see it? Is it
reproducable? In that case I'd recommend to create a tracker issue.

Regards,
Eugen

Zitat von zxcs :

> HI, Experts,
>
> we are using cephfs with  16.2.* with multi active mds, and
> recently, we have two nodes mount with ceph-fuse due to the old os
> system.
>
> and  one nodes run a python script with `glob.glob(path)`, and
> another client doing `cp` operation on the same path.
>
> then we see some log about `mds slow request`, and logs complain
> “failed to authpin, subtree is being exported"
>
> then need to restart mds,
>
>
> our question is, does there any dead lock?  how can we avoid this
> and how to fix it without restart mds(it will influence other users) ?
>
>
> Thanks a ton!
>
>
> xz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li


On 11/22/23 16:02, zxcs wrote:

HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and recently, we have 
two nodes mount with ceph-fuse due to the old os system.

and  one nodes run a python script with `glob.glob(path)`, and another client 
doing `cp` operation on the same path.

then we see some log about `mds slow request`, and logs complain “failed to authpin, 
subtree is being exported"

then need to restart mds,


our question is, does there any dead lock?  how can we avoid this and how to 
fix it without restart mds(it will influence other users) ?


BTW, won't the slow requests disappear themself later ?

It looks like the exporting is slow or there too many exports are going on.

Thanks

- Xiubo



Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Eugen Block

Hi,

we've seen this a year ago in a Nautilus cluster with multi-active MDS  
as well. It turned up only once within several years and we decided  
not to look too closely at that time. How often do you see it? Is it  
reproducable? In that case I'd recommend to create a tracker issue.


Regards,
Eugen

Zitat von zxcs :


HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and  
recently, we have two nodes mount with ceph-fuse due to the old os  
system.


and  one nodes run a python script with `glob.glob(path)`, and  
another client doing `cp` operation on the same path.


then we see some log about `mds slow request`, and logs complain  
“failed to authpin, subtree is being exported"


then need to restart mds,


our question is, does there any dead lock?  how can we avoid this  
and how to fix it without restart mds(it will influence other users) ?



Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io