I was able to export the PGs using the ceph-object-store tool and import them to the new OSDs.
I moved some other OSDs from the bare metal on a node into a virtual machine on the same node and was surprised at how easy it was. Install ceph in the VM(using ceph-deploy) -- stop the OSD and dismount OSD drive from physical machine, mount it to the VM, the OSD was auto-detected and ceph-osd process started automatically and was up within a few seconds. I'm having a different problem now that I will make a separate message about. Thanks! On Mon, Jul 24, 2017 at 12:52 PM, Gregory Farnum <gfar...@redhat.com> wrote: > > On Fri, Jul 21, 2017 at 10:23 PM Daniel K <satha...@gmail.com> wrote: > >> Luminous 12.1.0(RC) >> >> I replaced two OSD drives(old ones were still good, just too small), >> using: >> >> ceph osd out osd.12 >> ceph osd crush remove osd.12 >> ceph auth del osd.12 >> systemctl stop ceph-osd@osd.12 >> ceph osd rm osd.12 >> >> I later found that I also should have unmounted it from >> /var/lib/ceph/osd-12 >> >> (remove old disk, insert new disk) >> >> I added the new disk/osd with ceph-deploy osd prepare stor-vm3:sdg >> --bluestore >> >> This automatically activated the osd (not sure why, I thought it needed a >> ceph-deploy osd activate as well) >> >> >> Then, working on an unrelated issue, I upgraded one (out of 4 total) >> nodes to 12.1.1 using apt and rebooted. >> >> The mon daemon would not form a quorum with the others on 12.1.0, so, >> instead of troubleshooting that, I just went ahead and upgraded the other 3 >> nodes and rebooted. >> >> Lots of recovery IO went on afterwards, but now things have stopped at: >> >> pools: 10 pools, 6804 pgs >> objects: 1784k objects, 7132 GB >> usage: 11915 GB used, 19754 GB / 31669 GB avail >> pgs: 0.353% pgs not active >> 70894/2988573 objects degraded (2.372%) >> 422090/2988573 objects misplaced (14.123%) >> 6626 active+clean >> 129 active+remapped+backfill_wait >> 23 incomplete >> 14 active+undersized+degraded+remapped+backfill_wait >> 4 active+undersized+degraded+remapped+backfilling >> 4 active+remapped+backfilling >> 2 active+clean+scrubbing+deep >> 1 peering >> 1 active+recovery_wait+degraded+remapped >> >> >> when I run ceph pg query on the incompletes, they all list at least one >> of the two removed OSDs(12,17) in "down_osds_we_would_probe" >> >> most pools are size:2 min_size 1(trusting bluestore to tell me which one >> is valid). One pool is size:1 min size:1 and I'm okay with losing it, >> except I had it mounted in a directory on cephfs, I rm'd the directory but >> I can't delete the pool because it's "in use by CephFS" >> >> >> I still have the old drives, can I stick them into another host and >> re-add them somehow? >> > > Yes, that'll probably be your easiest solution. You may have some trouble > because you already deleted them, but I'm not sure. > > Alternatively, you ought to be able to remove the pool from CephFS using > some of the monitor commands and then delete it. > > >> This data isn't super important, but I'd like to learn a bit on how to >> recover when bad things happen as we are planning a production deployment >> in a couple of weeks. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com