On Wed, May 2, 2018 at 12:18 PM, Nicolas Huillard <nhuill...@dolomede.fr> wrote: > Le dimanche 08 avril 2018 à 20:40 +0000, Jens-U. Mozdzen a écrit : >> sorry for bringing up that old topic again, but we just faced a >> corresponding situation and have successfully tested two migration >> scenarios. > > Thank you very much for this update, as I needed to do exactly that, > due to an SSD crash triggering hardware replacement. > The block.db on the crashed SSD were lost, so the whole two OSDs > depending on it were re-created. I also replaced two other bad SSDs > before they failed, thus needed to effectively replace DB/WAL devices > on the live cluster (2 SSDs on 2 hosts and 4 OSDs). > >> it is possible to move a separate WAL/DB to a new device, whilst >> without changing the size. We have done this for multiple OSDs, >> using >> only existing (mainstream :) ) tools and have documented the >> procedure >> in >> http://heiterbiswolkig.blogs.nde.ag/2018/04/08/migrating-bluestores-b >> lock-db/ >> . It will *not* allow to separate WAL / DB after OSD creation, nor >> does it allow changing the DB size. > > The lost OSD were still backfilling when I did the above procedure > (data redundancy was high enough to risk losing one more node). I even > mis-typed the "ceph osd set noout" command ("ceph osd unset noout" > instead, effectively a no-op), and replaced 2 OSDs of a single host at > the same time (thus taking more time than the 10 minutes before kicking > the OSDs out, triggering even more data movement). > Everything went cleanly though, thanks to your detailed commands, which > I ran one at a time, thinking twice before each [Enter]. > > I digged a bit into the LVM tags : > * make a backup of all pv/vg/lv config : vgcfgbackup > * check the backed-up tags : grep tags /etc/lvm/backup/* > > I then noticed that : > * there are lots of "ceph.*=" tags > * tags are still present on the old DB/WAL LVs (since I didn't remove > them) > * tags are absent from the new DB/WAL LVs (ditto, I didn't create > them), which may be a problem later on...
This is absolutely going to be a problem for you if I understand that these are handled by ceph-volume, in which case it reads from these tags to be able to bring up the OSD. > * I changed the ceph.db_device= tag, but there is also a ceph.db_uuid= > tag which was not changed, and may or may not trigger a problem upon > reboot (I don't know if this UUID is part of the dd'ed data) For sure you can get into a situation where ceph-volume needs one of these and can't find it and then it breaks. > > You effectively helped a lot! Thanks. > > -- > Nicolas Huillard > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com