On 23/02/2018 14:27, Caspar Smit wrote:
Hi All,
What would be the proper way to preventively replace a DB/WAL SSD (when
it is nearing it's DWPD/TBW limit and not failed yet).
It hosts DB partitions for 5 OSD's
Maybe something like:
1) ceph osd reweight 0 the 5 OSD's
2) let backfilling complete
3) destroy/remove the 5 OSD's
4) replace SSD
5) create 5 new OSD's with seperate DB partition on new SSD
When these 5 OSD's are big HDD's (8TB) a LOT of data has to be moved so
i thought maybe the following would work:
1) ceph osd set noout
2) stop the 5 OSD's (systemctl stop)
3) 'dd' the old SSD to a new SSD of same or bigger size
4) remove the old SSD
5) start the 5 OSD's (systemctl start)
6) let backfilling/recovery complete (only delta data between OSD stop
and now)
6) ceph osd unset noout
Would this be a viable method to replace a DB SSD? Any udev/serial
nr/uuid stuff preventing this to work?
What I would do under FreeBSD/ZFS (and perhaps there is something under
Linux that works the same):
Promote the the disk/zvol for the DB/WAL to mirror.
This is instantaneous, and does not modify anything.
Add the new SSD to the mirror, and wait until the new SSD is updated.
Then I'dd delete the old SSD from the mirror.
You'd be stuck with a mirror with one disk for the DB/WALL, but that
does not consume much. ZFS does not even think it is wrong, if you
deleted the disk in the correct way.
And no reboot required.
No idea if you can do something similar under LVM or other types of
mirroring stuff.
--WjW
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com