> On 17 Feb 2023, at 23:20, Anthony D'Atri <anthony.da...@gmail.com> wrote:
> 
> 
> 
>> * if rebalance will starts due EDAC or SFP degradation, is faster to fix the 
>> issue via DC engineers and put node back to work
> 
> A judicious mon_osd_down_out_subtree_limit setting can also do this by not 
> rebalancing when an entire node is detected down. 

Yes. But in this case when single disk dead, it's may be not actually dead, the 
examples:

* disk just stuck - reboot or/and physical inject_insert return in to live
* disk read errors - such errors lead to OSD down, but after OSD restart is 
just works normal (Pending Sectors -> Reallocates)

The fill of single 16TB OSD may be a 7-10 days. And it's may be fixed with 
10-20 minutes with duty engineer

> 
>> * noout prevents unwanted OSD's fills and the run out of space => outage of 
>> services
> 
> Do you run your clusters very full?

We provide public services. This means client can rent 1000 disks x 1000GB via 
one terraform command, at 02:00 Saturday night. Just physically impossible to 
add nodes at this case. Any movement without upmap is highly undesirable



k
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to