On 01/08/17 12:41, Osama Hasebou wrote:
> Hi,
> 
> What would be the best possible and efficient way for big Ceph clusters when 
> maintenance needs to be performed ?
> 
> Lets say that we have 3 copies of data, and one of the servers needs to be 
> maintained, and maintenance might take 1-2 days due to some unprepared issues 
> that come up.
> 
> Setting node to no-out is a bit of risk since only 2 copies will be active. 
> So in that case what would be proper way taking down node to rebalance and 
> then perform maintanence , and in case one needs to being it back online 
> without rebalancing right away to check if its functioning properly or not as 
> a server 1st  and once all looks good, one can introduce  rebalancing again ?
> 
> 
> Thank you.
> 
> Regards,
> Ossi

The recommended practice would be to use "ceph osd crush reweight" to set the 
crush weight on the OSDs that will be down to 0. The cluster will then 
rebalance, and once it's HEALTH_OK again, you can take those OSDs offline 
without losing any redundancy (though you will need to ensure you have enough 
spare space in what's left of the cluster that you don't push disk usage too 
high on your other nodes).

When you're ready to bring them online again, make sure that you have 
"osd_crush_update_on_start = false" set in your ceph.conf so they don't 
potentially mess with their weights when they come back. Then they will be up 
but still at crush weight 0 so no data will be assigned to them. When you're 
happy everything's okay, use "ceph osd crush reweight" again to bring them back 
to their original weights. Lots of people like to do that in increments of 0.1 
weight at a time, so the recovery is staggered and doesn't impact your active 
I/O too much.

This assumes your crush layout is such that you can still have three replicas 
with one server missing.

Rich

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to