HI Guys,

I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over
37% od the data to rebalance - let's say this is fine (this is when I
removed it frm Crush Map).

I'm wondering - I have previously set some throtling mechanism, but during
first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s -
and VMs were unusable completely, and then last 4h of the duration of
recover this recovery rate went down to, say, 100-200 MB.s and during this
VM performance was still pretty impacted, but at least I could work more or
a less

So my question, is this behaviour expected, is throtling here working as
expected, since first 1h was almoust no throtling applied if I check the
recovery rate 1500MB/s and the impact on Vms.
And last 4h seemed pretty fine (although still lot of impact in general)

I changed these throtling on the fly with:

ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_max_backfills 1'

My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD,
6 journals on another SSD)  - I have 3 of these hosts.

Any thought are welcome.
-- 

Andrija Panić
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to