Hi all, 

Thanks for the great responses. Confirming that this was the issue (feature). 
No idea why this was set differently for us in Nautilus. 

This should make the recovery benchmarking a bit faster now. :) 

Cheers,
Sean

> On 6/12/2022, at 3:09 PM, Wesley Dillingham <w...@wesdillingham.com> wrote:
> 
> I think you are experiencing the mon_osd_down_out_interval 
> 
> https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_interval
> 
> Ceph waits 10 minutes before marking a down osd as out for the reasons you 
> mention, but this would have been the case in nautilus as well. 
> 
> Respectfully,
> 
> Wes Dillingham
> w...@wesdillingham.com <mailto:w...@wesdillingham.com>
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> 
> 
> On Mon, Dec 5, 2022 at 5:20 PM Sean Matheny <sean.math...@nesi.org.nz 
> <mailto:sean.math...@nesi.org.nz>> wrote:
>> Hi all,
>> 
>> New Quincy cluster here that I'm just running through some benchmarks 
>> against:
>> 
>> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy 
>> (stable)
>> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>> 
>> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node 
>> from the cluster until actual recovery IO begins. This is much different 
>> behaviour that what I'm used to in Nautilus previously, where recovery IO 
>> would commence within seconds. Downed OSDs are reflected in ceph health 
>> within a few seconds (as expected), and affected PGs show as undersized a 
>> few seconds later (as expected). I guess this 10-minute delay may even be a 
>> feature-- accidentally rebooting a node before setting recovery flags would 
>> prevent rebalancing, for example. Just thought it was worth asking in case 
>> it's a bug or something to look deeper into.  
>> 
>> I've read through the OSD config and all of my recovery tuneables look ok, 
>> for example: 
>> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ 
>> <https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/%EF%BF%BC>
>> 
>> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
>> 20.000000
>> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
>> 40.000000
>> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
>> 60.100000
>> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
>> 80.000000
>> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
>> 100.025000
>> 
>> Thanks in advance.
>> 
>> Ngā mihi,
>> 
>> Sean Matheny
>> HPC Cloud Platform DevOps Lead
>> New Zealand eScience Infrastructure (NeSI)
>> 
>> e: sean.math...@nesi.org.nz <mailto:sean.math...@nesi.org.nz>
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io>
>> To unsubscribe send an email to ceph-users-le...@ceph.io 
>> <mailto:ceph-users-le...@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to