Hi All,

Long story-short, we’re doing disaster recovery on a cephfs cluster, and are at 
a point where we have 8 pgs stuck incomplete.  Just before the disaster, I 
increased the pg_count on two of the pools, and they had not completed 
increasing the pgp_num yet.  I’ve since forced pgp_num to the current values.

So far, I’ve tried mark_unfound_lost but they don’t report any unfound objects, 
and I’ve tried force-create-pg but that has no effect, except on one of the 
pgs, which went to creating+incomplete.  During the disaster recovery, I had to 
re-create several OSDs (due to unreadable superblocks,) and now one of the new 
osds, as well as one of the existing osds won’t start.  The log from the 
startup of osd.29 is here: https://pastebin.com/PX9AAj8m, which seems to 
indicate that it won’t start because it’s supposed to have copies of the 
incomplete placement groups.

ceph pg 5.38 query (one of the incomplete) gives: https://pastebin.com/Jf4GnZTc

I have hunted around in the osds listed for all the placement groups for any 
sign of a pg that I could mark as complete with ceph-objectstore-tool, but 
can’t find any.  I don’t care about the data in the pgs, but I can’t abandon 
the filesystem.

Any help would be greatly appreciated.

-TJ Ragan
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to