On 23/02/2018 14:27, Caspar Smit wrote:
Hi All,

What would be the proper way to preventively replace a DB/WAL SSD (when it is nearing it's DWPD/TBW limit and not failed yet).

It hosts DB partitions for 5 OSD's

Maybe something like:

1) ceph osd reweight 0 the 5 OSD's
2) let backfilling complete
3) destroy/remove the 5 OSD's
4) replace SSD
5) create 5 new OSD's with seperate DB partition on new SSD

When these 5 OSD's are big HDD's (8TB) a LOT of data has to be moved so i thought maybe the following would work:

1) ceph osd set noout
2) stop the 5 OSD's (systemctl stop)
3) 'dd' the old SSD to a new SSD of same or bigger size
4) remove the old SSD
5) start the 5 OSD's (systemctl start)
6) let backfilling/recovery complete (only delta data between OSD stop and now)
6) ceph osd unset noout

Would this be a viable method to replace a DB SSD? Any udev/serial nr/uuid stuff preventing this to work?

What I would do under FreeBSD/ZFS (and perhaps there is something under Linux that works the same):

Promote the the disk/zvol for the DB/WAL to mirror.
  This is instantaneous, and does not modify anything.
Add the new SSD to the mirror, and wait until the new SSD is updated.
Then I'dd delete the old SSD from the mirror.

You'd be stuck with a mirror with one disk for the DB/WALL, but that does not consume much. ZFS does not even think it is wrong, if you deleted the disk in the correct way.

And no reboot required.

No idea if you can do something similar under LVM or other types of mirroring stuff.

--WjW



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to