Good morning,

We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for the past 
20 months. We're very happy with our experience with the platform so far.

Shortly, we will be embarking on an initiative to replace all 88 OSDs with new 
drives (Planned maintenance and lifecycle replacement). Before we do so, 
however, I wanted to confirm with the community as to the proper order of 
operation to perform such a task.

The OSDs are divided evenly across an even number of hosts which are then 
divided evenly between 2 cabinets in 2 physically separate locations. The plan 
is to replace the OSDs, one host at a time, cycling back and forth between 
cabinets, replacing one host per week, or every 2 weeks (Depending on the 
amount of time the crush rebalancing takes).

For each host, the plan was to mark the OSDs as out, one at a time, closely 
monitoring each of them, moving to the next OSD one the current one is balanced 
out. Once all OSDs are successfully marked as out, we will then delete those 
OSDs from the cluster, shutdown the server, replace the physical drives, and 
once rebooted, add the new drives to the cluster as new OSDs using the same 
method we've used previously, doing so one at a time to allow for rebalancing 
as they rejoin the cluster.

My questions are…Does this process sound correct? Should I also mark the OSDs 
as down when I mark them as out? Are there any steps I'm overlooking in this 
process?

Any advice is greatly appreciated.

Cheers,
-
Stephen Mercier | Sr. Systems Architect
Attainia Capital Planning Solutions (ACPS)
O: (650)241-0567, 727 | TF: (866)288-2464, 727
stephen.merc...@attainia.com | www.attainia.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to