[ceph-users] Re: Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Stephen Smith6
The 10 minute delay is the default wait period Ceph allows before it attempts 
to heal the data. See "mon_osd_report_timeout" – I believe the default is 900 
seconds.

From: Sean Matheny 
Date: Monday, December 5, 2022 at 5:20 PM
To: ceph-users@ceph.io 
Cc: Blair Bethwaite , pi...@stackhpc.com 
, Michal Nasiadka 
Subject: [EXTERNAL] [ceph-users] Odd 10-minute delay before recovery IO begins
Hi all,

New Quincy cluster here that I'm just running through some benchmarks against:

ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)
11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs

I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node from 
the cluster until actual recovery IO begins. This is much different behaviour 
that what I'm used to in Nautilus previously, where recovery IO would commence 
within seconds. Downed OSDs are reflected in ceph health within a few seconds 
(as expected), and affected PGs show as undersized a few seconds later (as 
expected). I guess this 10-minute delay may even be a feature-- accidentally 
rebooting a node before setting recovery flags would prevent rebalancing, for 
example. Just thought it was worth asking in case it's a bug or something to 
look deeper into.

I've read through the OSD config and all of my recovery tuneables look ok, for 
example:
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ 

[ceph: root@ /]# ceph config get osd osd_recovery_delay_start
20.00
3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
40.00
5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
60.10
7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
80.00
9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
100.025000

Thanks in advance.

Ngā mihi,

Sean Matheny
HPC Cloud Platform DevOps Lead
New Zealand eScience Infrastructure (NeSI)

e: sean.math...@nesi.org.nz



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Tyler Brekke
Sounds like your OSDs were down, but not marked out. Recovery will only
occur once they are actually marked out. The default
mon_osd_down_out_interval is 10 minutes.

You can mark them out explicitly with ceph osd out 

On Mon, Dec 5, 2022 at 2:20 PM Sean Matheny 
wrote:

> Hi all,
>
> New Quincy cluster here that I'm just running through some benchmarks
> against:
>
> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy
> (stable)
> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>
> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node
> from the cluster until actual recovery IO begins. This is much different
> behaviour that what I'm used to in Nautilus previously, where recovery IO
> would commence within seconds. Downed OSDs are reflected in ceph health
> within a few seconds (as expected), and affected PGs show as undersized a
> few seconds later (as expected). I guess this 10-minute delay may even be a
> feature-- accidentally rebooting a node before setting recovery flags would
> prevent rebalancing, for example. Just thought it was worth asking in case
> it's a bug or something to look deeper into.
>
> I've read through the OSD config and all of my recovery tuneables look ok,
> for example:
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/
>
> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
> 20.00
> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
> 40.00
> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
> 60.10
> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
> 80.00
> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
> 100.025000
>
> Thanks in advance.
>
> Ngā mihi,
>
> Sean Matheny
> HPC Cloud Platform DevOps Lead
> New Zealand eScience Infrastructure (NeSI)
>
> e: sean.math...@nesi.org.nz
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Tyler Brekke
Senior Engineer I
tbre...@digitalocean.com
--
We're Hiring!  | @digitalocean
 | YouTube

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Wesley Dillingham
I think you are experiencing the mon_osd_down_out_interval

https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_interval

Ceph waits 10 minutes before marking a down osd as out for the reasons you
mention, but this would have been the case in nautilus as well.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Dec 5, 2022 at 5:20 PM Sean Matheny 
wrote:

> Hi all,
>
> New Quincy cluster here that I'm just running through some benchmarks
> against:
>
> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy
> (stable)
> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>
> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node
> from the cluster until actual recovery IO begins. This is much different
> behaviour that what I'm used to in Nautilus previously, where recovery IO
> would commence within seconds. Downed OSDs are reflected in ceph health
> within a few seconds (as expected), and affected PGs show as undersized a
> few seconds later (as expected). I guess this 10-minute delay may even be a
> feature-- accidentally rebooting a node before setting recovery flags would
> prevent rebalancing, for example. Just thought it was worth asking in case
> it's a bug or something to look deeper into.
>
> I've read through the OSD config and all of my recovery tuneables look ok,
> for example:
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/
>
> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
> 20.00
> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
> 40.00
> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
> 60.10
> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
> 80.00
> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
> 100.025000
>
> Thanks in advance.
>
> Ngā mihi,
>
> Sean Matheny
> HPC Cloud Platform DevOps Lead
> New Zealand eScience Infrastructure (NeSI)
>
> e: sean.math...@nesi.org.nz
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Sean Matheny
Hi all, 

Thanks for the great responses. Confirming that this was the issue (feature). 
No idea why this was set differently for us in Nautilus. 

This should make the recovery benchmarking a bit faster now. :) 

Cheers,
Sean

> On 6/12/2022, at 3:09 PM, Wesley Dillingham  wrote:
> 
> I think you are experiencing the mon_osd_down_out_interval 
> 
> https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_interval
> 
> Ceph waits 10 minutes before marking a down osd as out for the reasons you 
> mention, but this would have been the case in nautilus as well. 
> 
> Respectfully,
> 
> Wes Dillingham
> w...@wesdillingham.com 
> LinkedIn 
> 
> 
> On Mon, Dec 5, 2022 at 5:20 PM Sean Matheny  > wrote:
>> Hi all,
>> 
>> New Quincy cluster here that I'm just running through some benchmarks 
>> against:
>> 
>> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy 
>> (stable)
>> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>> 
>> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node 
>> from the cluster until actual recovery IO begins. This is much different 
>> behaviour that what I'm used to in Nautilus previously, where recovery IO 
>> would commence within seconds. Downed OSDs are reflected in ceph health 
>> within a few seconds (as expected), and affected PGs show as undersized a 
>> few seconds later (as expected). I guess this 10-minute delay may even be a 
>> feature-- accidentally rebooting a node before setting recovery flags would 
>> prevent rebalancing, for example. Just thought it was worth asking in case 
>> it's a bug or something to look deeper into.  
>> 
>> I've read through the OSD config and all of my recovery tuneables look ok, 
>> for example: 
>> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ 
>> 
>> 
>> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
>> 20.00
>> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
>> 40.00
>> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
>> 60.10
>> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
>> 80.00
>> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
>> 100.025000
>> 
>> Thanks in advance.
>> 
>> Ngā mihi,
>> 
>> Sean Matheny
>> HPC Cloud Platform DevOps Lead
>> New Zealand eScience Infrastructure (NeSI)
>> 
>> e: sean.math...@nesi.org.nz 
>> 
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io 
>> To unsubscribe send an email to ceph-users-le...@ceph.io 
>> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io