[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-04 Thread Wesley Dillingham
Initial indication shows "osd_async_recovery_min_cost = 0"  to be a huge
win here. Some initial thoughts. Were this not for the fact that the index
(and other OMAP pools) were isolated to their own OSDs in this cluster this
tunable would seemingly cause data/blob objects from data pools to async
recover when synchronous recovery might be better for those pools / that
data. I can play around with how this affects the RGW data pools. There was
a Ceph code walk thru video of this topic:
https://www.youtube.com/watch?v=waOtatCpnYs it seems that perhaps
osd_async_recovery_min_cost may have previously been referred to as
osd_async_recover_min_pg_log_entries (both default to 100). For a pool with
OMAP data where each or some OMAP objects are very very large this may not
be a dynamic enough factor to base the decision on. Thanks for the feedback
everybody!


Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Wed, Apr 3, 2024 at 1:38 PM Joshua Baergen 
wrote:

> We've had success using osd_async_recovery_min_cost=0 to drastically
> reduce slow ops during index recovery.
>
> Josh
>
> On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham 
> wrote:
> >
> > I am fighting an issue on an 18.2.0 cluster where a restart of an OSD
> which
> > supports the RGW index pool causes crippling slow ops. If the OSD is
> marked
> > with primary-affinity of 0 prior to the OSD restart no slow ops are
> > observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
> > ops only occur during the recovery period of the OMAP data and further
> only
> > occur when client activity is allowed to pass to the cluster. Luckily I
> am
> > able to test this during periods when I can disable all client activity
> at
> > the upstream proxy.
> >
> > Given the behavior of the primary affinity changes preventing the slow
> ops
> > I think this may be a case of recovery being more detrimental than
> > backfill. I am thinking that causing an pg_temp acting set by forcing
> > backfill may be the right method to mitigate the issue. [1]
> >
> > I believe that reducing the PG log entries for these OSDs would
> accomplish
> > that but I am also thinking a tuning of osd_async_recovery_min_cost [2]
> may
> > also accomplish something similar. Not sure the appropriate tuning for
> that
> > config at this point or if there may be a better approach. Seeking any
> > input here.
> >
> > Further if this issue sounds familiar or sounds like another condition
> > within the OSD may be at hand I would be interested in hearing your input
> > or thoughts. Thanks!
> >
> > [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
> > [2]
> >
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
> >
> > Respectfully,
> >
> > *Wes Dillingham*
> > LinkedIn 
> > w...@wesdillingham.com
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Anthony D'Atri
Thanks.  I'll PR up some doc updates reflecting this and run them by the RGW / 
RADOS folks.

> On Apr 3, 2024, at 16:34, Joshua Baergen  wrote:
> 
> Hey Anthony,
> 
> Like with many other options in Ceph, I think what's missing is the
> user-visible effect of what's being altered. I believe the reason why
> synchronous recovery is still used is that, assuming that per-object
> recovery is quick, it's faster to complete than asynchronous recovery,
> which has extra steps on either end of the recovery process. Of
> course, as you know, synchronous recovery blocks I/O, so when
> per-object recovery isn't quick, as in RGW index omap shards,
> particularly large shards, IMO we're better off always doing async
> recovery.
> 
> I don't know enough about the overheads involved here to evaluate
> whether it's worth keeping synchronous recovery at all, but IMO RGW
> index/usage(/log/gc?) pools are always better off using asynchronous
> recovery.
> 
> Josh
> 
> On Wed, Apr 3, 2024 at 1:48 PM Anthony D'Atri  wrote:
>> 
>> We currently have in  src/common/options/global.yaml.in
>> 
>> - name: osd_async_recovery_min_cost
>>  type: uint
>>  level: advanced
>>  desc: A mixture measure of number of current log entries difference and 
>> historical
>>missing objects,  above which we switch to use asynchronous recovery when 
>> appropriate
>>  default: 100
>>  flags:
>>  - runtime
>> 
>> I'd like to rephrase the description there in a PR, might you be able to 
>> share your insight into the dynamics so I can craft a better description?  
>> And do you have any thoughts on the default value?  Might appropriate values 
>> vary by pool type and/or media?
>> 
>> 
>> 
>>> On Apr 3, 2024, at 13:38, Joshua Baergen  wrote:
>>> 
>>> We've had success using osd_async_recovery_min_cost=0 to drastically
>>> reduce slow ops during index recovery.
>>> 
>>> Josh
>>> 
>>> On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham  
>>> wrote:
 
 I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
 supports the RGW index pool causes crippling slow ops. If the OSD is marked
 with primary-affinity of 0 prior to the OSD restart no slow ops are
 observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
 ops only occur during the recovery period of the OMAP data and further only
 occur when client activity is allowed to pass to the cluster. Luckily I am
 able to test this during periods when I can disable all client activity at
 the upstream proxy.
 
 Given the behavior of the primary affinity changes preventing the slow ops
 I think this may be a case of recovery being more detrimental than
 backfill. I am thinking that causing an pg_temp acting set by forcing
 backfill may be the right method to mitigate the issue. [1]
 
 I believe that reducing the PG log entries for these OSDs would accomplish
 that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
 also accomplish something similar. Not sure the appropriate tuning for that
 config at this point or if there may be a better approach. Seeking any
 input here.
 
 Further if this issue sounds familiar or sounds like another condition
 within the OSD may be at hand I would be interested in hearing your input
 or thoughts. Thanks!
 
 [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
 [2]
 https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
 
 Respectfully,
 
 *Wes Dillingham*
 LinkedIn 
 w...@wesdillingham.com
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Joshua Baergen
Hey Anthony,

Like with many other options in Ceph, I think what's missing is the
user-visible effect of what's being altered. I believe the reason why
synchronous recovery is still used is that, assuming that per-object
recovery is quick, it's faster to complete than asynchronous recovery,
which has extra steps on either end of the recovery process. Of
course, as you know, synchronous recovery blocks I/O, so when
per-object recovery isn't quick, as in RGW index omap shards,
particularly large shards, IMO we're better off always doing async
recovery.

I don't know enough about the overheads involved here to evaluate
whether it's worth keeping synchronous recovery at all, but IMO RGW
index/usage(/log/gc?) pools are always better off using asynchronous
recovery.

Josh

On Wed, Apr 3, 2024 at 1:48 PM Anthony D'Atri  wrote:
>
> We currently have in  src/common/options/global.yaml.in
>
> - name: osd_async_recovery_min_cost
>   type: uint
>   level: advanced
>   desc: A mixture measure of number of current log entries difference and 
> historical
> missing objects,  above which we switch to use asynchronous recovery when 
> appropriate
>   default: 100
>   flags:
>   - runtime
>
> I'd like to rephrase the description there in a PR, might you be able to 
> share your insight into the dynamics so I can craft a better description?  
> And do you have any thoughts on the default value?  Might appropriate values 
> vary by pool type and/or media?
>
>
>
> > On Apr 3, 2024, at 13:38, Joshua Baergen  wrote:
> >
> > We've had success using osd_async_recovery_min_cost=0 to drastically
> > reduce slow ops during index recovery.
> >
> > Josh
> >
> > On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham  
> > wrote:
> >>
> >> I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
> >> supports the RGW index pool causes crippling slow ops. If the OSD is marked
> >> with primary-affinity of 0 prior to the OSD restart no slow ops are
> >> observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
> >> ops only occur during the recovery period of the OMAP data and further only
> >> occur when client activity is allowed to pass to the cluster. Luckily I am
> >> able to test this during periods when I can disable all client activity at
> >> the upstream proxy.
> >>
> >> Given the behavior of the primary affinity changes preventing the slow ops
> >> I think this may be a case of recovery being more detrimental than
> >> backfill. I am thinking that causing an pg_temp acting set by forcing
> >> backfill may be the right method to mitigate the issue. [1]
> >>
> >> I believe that reducing the PG log entries for these OSDs would accomplish
> >> that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
> >> also accomplish something similar. Not sure the appropriate tuning for that
> >> config at this point or if there may be a better approach. Seeking any
> >> input here.
> >>
> >> Further if this issue sounds familiar or sounds like another condition
> >> within the OSD may be at hand I would be interested in hearing your input
> >> or thoughts. Thanks!
> >>
> >> [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
> >> [2]
> >> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
> >>
> >> Respectfully,
> >>
> >> *Wes Dillingham*
> >> LinkedIn 
> >> w...@wesdillingham.com
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Anthony D'Atri
We currently have in  src/common/options/global.yaml.in

- name: osd_async_recovery_min_cost
  type: uint
  level: advanced
  desc: A mixture measure of number of current log entries difference and 
historical
missing objects,  above which we switch to use asynchronous recovery when 
appropriate
  default: 100
  flags:
  - runtime

I'd like to rephrase the description there in a PR, might you be able to share 
your insight into the dynamics so I can craft a better description?  And do you 
have any thoughts on the default value?  Might appropriate values vary by pool 
type and/or media?



> On Apr 3, 2024, at 13:38, Joshua Baergen  wrote:
> 
> We've had success using osd_async_recovery_min_cost=0 to drastically
> reduce slow ops during index recovery.
> 
> Josh
> 
> On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham  
> wrote:
>> 
>> I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
>> supports the RGW index pool causes crippling slow ops. If the OSD is marked
>> with primary-affinity of 0 prior to the OSD restart no slow ops are
>> observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
>> ops only occur during the recovery period of the OMAP data and further only
>> occur when client activity is allowed to pass to the cluster. Luckily I am
>> able to test this during periods when I can disable all client activity at
>> the upstream proxy.
>> 
>> Given the behavior of the primary affinity changes preventing the slow ops
>> I think this may be a case of recovery being more detrimental than
>> backfill. I am thinking that causing an pg_temp acting set by forcing
>> backfill may be the right method to mitigate the issue. [1]
>> 
>> I believe that reducing the PG log entries for these OSDs would accomplish
>> that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
>> also accomplish something similar. Not sure the appropriate tuning for that
>> config at this point or if there may be a better approach. Seeking any
>> input here.
>> 
>> Further if this issue sounds familiar or sounds like another condition
>> within the OSD may be at hand I would be interested in hearing your input
>> or thoughts. Thanks!
>> 
>> [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
>> [2]
>> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
>> 
>> Respectfully,
>> 
>> *Wes Dillingham*
>> LinkedIn 
>> w...@wesdillingham.com
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Joshua Baergen
We've had success using osd_async_recovery_min_cost=0 to drastically
reduce slow ops during index recovery.

Josh

On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham  
wrote:
>
> I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
> supports the RGW index pool causes crippling slow ops. If the OSD is marked
> with primary-affinity of 0 prior to the OSD restart no slow ops are
> observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
> ops only occur during the recovery period of the OMAP data and further only
> occur when client activity is allowed to pass to the cluster. Luckily I am
> able to test this during periods when I can disable all client activity at
> the upstream proxy.
>
> Given the behavior of the primary affinity changes preventing the slow ops
> I think this may be a case of recovery being more detrimental than
> backfill. I am thinking that causing an pg_temp acting set by forcing
> backfill may be the right method to mitigate the issue. [1]
>
> I believe that reducing the PG log entries for these OSDs would accomplish
> that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
> also accomplish something similar. Not sure the appropriate tuning for that
> config at this point or if there may be a better approach. Seeking any
> input here.
>
> Further if this issue sounds familiar or sounds like another condition
> within the OSD may be at hand I would be interested in hearing your input
> or thoughts. Thanks!
>
> [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
> [2]
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
>
> Respectfully,
>
> *Wes Dillingham*
> LinkedIn 
> w...@wesdillingham.com
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io