I am in the process of doing exactly what you are -- this worked for me: 1. mount the first partition of the bluestore drive that holds the missing PGs (if it's not already mounted) > mkdir /mnt/tmp > mount /dev/sdb1 /mnt/tmp
2. export the pg to a suitable temporary storage location: > ceph-objectstore-tool --data-path /mnt/tmp --pgid 1.24 --op export --file /mnt/sdd1/recover.1.24 3. find the acting osd > ceph health detail |grep incomplete PG_DEGRADED Degraded data redundancy: 23 pgs unclean, 23 pgs incomplete pg 1.24 is incomplete, acting [18,13] pg 4.1f is incomplete, acting [11] ... 4. set noout > ceph osd set noout 5. Find the OSD and log into it -- I used 18 here. > ceph osd find 18 { "osd": 18, "ip": "10.0.15.54:6801/9263", "crush_location": { "building": "building-dc", "chassis": "chassis-dc400f5-10", "city": "city", "floor": "floor-dc4", "host": "stor-vm4", "rack": "rack-dc400f5", "region": "cfl", "room": "room-dc400", "root": "default", "row": "row-dc400f" } } > ssh user@10.0.15.54 6. copy the file to somewhere accessible by the new(acting) osd > scp user@10.0.14.51:/mnt/sdd1/recover.1.24 /tmp/recover.1.24 7. stop the osd > service ceph-osd@18 stop 8. import the file using ceph-objectstore-tool > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-18 --op import --file /tmp/recover.1.24 9. start the osd > service-osd@18 start this worked for me -- not sure if this is the best way or if I took any extra steps and I have yet to validate that the data is good. I based this partially off your original email, and the guide here http://ceph.com/geen-categorie/incomplete-pgs-oh-my/ On Sat, Jul 22, 2017 at 4:46 PM, mofta7y <moft...@gmail.com> wrote: > Hi All, > > I have a situation here. > > I have an EC pool that is having cache tier pool (the cache tier is > replicated with size 2). > > Had an issue on the pool and the crush map got changed after rebooting > some OSD in any case I lost 4 cache ties OSDs > > those lost OSDs are not really lost they look fine to me but bluestore is > giving me exception when starting them i cant deal with it. (will open > question about that exception as well) > > So now i have 14 incomplete Pgs on the caching tier. > > > I am trying to recover them using ceph-objectstore-tool > > the extraction and import works nice with no issues but the OSD fail to > start after wards with same issue as the original OSD . > > after importing the PG on the acting OSD i get the exact same exception I > was getting while trying to start the failed OSD > > removing that import resolve the issue. > > > So the question is how can use ceph-objectstore-tool to import in > bluestore as i think i am missing somthing here > > > here is the procedure and the steps i used > > 1- stop old osd (it cannot start anyway) > > 2- use this command to extract the pg i need > > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-116 --pgid 15.371 > --op export --file /tmp/recover.15.371 > > that command work > > 3- check what is the acting OSD for the pg > > 4- stop the acting OSD > > 5- delete the current folder with same og name > > 6- use this command > > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-78 --op import > /tmp/recover.15.371 > the error i got in both cases is this bluestore error > > Jul 22 16:35:20 alm9 ceph-osd[3799171]: -257> 2017-07-22 16:20:19.544195 > 7f7157036a40 -1 osd.116 119691 log_to_monitors {default=true} > Jul 22 16:35:20 alm9 ceph-osd[3799171]: 0> 2017-07-22 16:35:20.142143 > 7f713c597700 -1 /tmp/buildd/ceph-11.2.0/src/os/bluestore/BitMapAllocator.cc: > In function 'virtual int BitMapAllocator::reserve(uint64_t)' thread > 7f713c597700 time 2017-07-22 16:35:20.139309 > Jul 22 16:35:20 alm9 ceph-osd[3799171]: > /tmp/buildd/ceph-11.2.0/src/os/bluestore/BitMapAllocator.cc: > 82: FAILED assert(!(need % m_block_size)) > Jul 22 16:35:20 alm9 ceph-osd[3799171]: ceph version 11.2.0 > (f223e27eeb35991352ebc1f67423d4ebc252adb7) > Jul 22 16:35:20 alm9 ceph-osd[3799171]: 1: (ceph::__ceph_assert_fail(char > const*, char const*, int, char const*)+0x80) [0x562b84558380] > Jul 22 16:35:20 alm9 ceph-osd[3799171]: 2: (BitMapAllocator::reserve(unsigned > long)+0x2ab) [0x562b8437c5cb] > Jul 22 16:35:20 alm9 ceph-osd[3799171]: 3: (BlueFS::reclaim_blocks(unsigned > int, unsigned long, std::vector<AllocExtent, > mempool::pool_allocator<(mempool::pool_index_t)7, > AllocExtent> >*)+0x22a) [0x562b8435109a] > Jul 22 16:35:20 alm9 ceph-osd[3799171]: 4: (BlueStore::_balance_bluefs_fr > eespace(std::vector<bluestore_pextent_t, std::allocator<bluestore_pextent_t> > >*)+0x28e) [0x562b84270dae] > Jul 22 16:35:20 alm9 ceph-osd[3799171]: 5: > (BlueStore::_kv_sync_thread()+0x164a) > [0x562b84273eea] > Jul 22 16:35:20 alm9 ceph-osd[3799171]: 6: > (BlueStore::KVSyncThread::entry()+0xd) > [0x562b842ad9dd] > Jul 22 16:35:20 alm9 ceph-osd[3799171]: 7: (()+0x76ba) [0x7f71560c76ba] > Jul 22 16:35:20 alm9 ceph-osd[3799171]: 8: (clone()+0x6d) [0x7f71547953dd] > Jul 22 16:35:20 alm9 ceph-osd[3799171]: NOTE: a copy of the executable, > or `objdump -rdS <executable>` is needed to interpret this. > > > if any one have any idea how to restore those PGs please point me to the > right direction > > > by the way resarting the folder that i deleted in step5 manually make the > osd go up again > > > > Thanks > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com