If I’m not mistaken, marking an osd out will remap its placement groups 
temporarily, while removing it from the crush map will remap the placement 
groups permanently. Additionally, other placement groups from other osds could 
get remapped permanently when an osd is removed from the crush map. I would 
think the only benefit to marking an osd out before stopping it would be a 
cleaner redirection of client I/O before the osd disappears, which may be 
worthwhile if you’re removing a healthy osd.

As for reweighting to 0 prior to removing an osd, it seems like that would give 
the osd the ability to participate in the recovery essentially in read-only 
fashion (plus deletes) until it’s empty, so objects wouldn’t become degraded as 
placement groups are backfilling onto other osds. Again, this would really only 
be useful if you’re removing a healthy osd. If you’re removing an osd where 
other osds in different failure domains are known to be unhealthy, it seems 
like this would be a really good idea.

I usually follow the documented steps you’ve outlined myself, but I’m typically 
removing osds due to failed/failing drives while the rest of the cluster is 
healthy.
________________________________
Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<http://www.storagecraft.com/>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705
________________________________
If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Rafael 
Lopez
Sent: Wednesday, January 06, 2016 4:53 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] double rebalance when removing osd

Hi all,

I am curious what practices other people follow when removing OSDs from a 
cluster. According to the docs, you are supposed to:

1. ceph osd out
2. stop daemon
3. ceph osd crush remove
4. ceph auth del
5. ceph osd rm

What value does ceph osd out (1) add to the removal process and why is it in 
the docs ? We have found (as have others) that by outing(1) and then crush 
removing (3), the cluster has to do two recoveries. Is it necessary? Can you 
just do a crush remove without step 1?

I found this earlier message from GregF which he seems to affirm that just 
doing the crush remove is fine:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/007227.html

This recent blog post from Sebastien that suggests reweighting to 0 first, but 
havent tested it:
http://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/

I thought that by marking it out, it sets the reweight to 0 anyway, so not sure 
how this would make a difference in terms of two rebalances but maybe there is 
a subtle difference.. ?

Thanks,
Raf

--
Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions
[http://assets.monash.edu/logos/logo.gif]
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to