I resolved this situation and the results are a testament on how amazing Ceph is.
I had replication factor set to 2 during my data migration to ensure there was enough capacity on the target as I cycled disks from Drobo storage to Ceph storage. I forgot about that setting. I started having mech and inevitable failure on a disk so started evacuating objects from that disk via crush reweight osd. Objects started to fly across osd's. A bunch of data was moved and some was left to be moved. Impatient me started tinkering on some other part of the cluster and I accidentally "zapped" a disk. Things were still "ok". Then the failing disk finally died and dropped out of the filesystem. I rebooted the node. The xfs filesystem corrupted. xfs_repair wiped out inodes and put everything into lost+found. This was a BIG uh-oh. Ceph reacted badly and halted with 46 pg's in active+stuck+stale status and filesystem halted. About 800,000 objects were handled by the pg's. Mother eff! I ended up dropping the two disks completely from cluster, dropping the pg's that were managing the objects. For filesystem I just simply deleted it. Dum dum duuuuum! On a traditional storage system these acts would've been catastrophic to the "volume"! But not for Ceph. I readded the good disk to the cluster and added a replacement disk to the cluster. I recreated the pg's as blank pg's. This got the Ceph cluster healthy. For filesystem I rescanned for indexes which an object to inode map, recreated the inodes. The made for a healthy posix filesystem and mounted it up. Total loss was a few thousand minor fringe files. My video surveillance archive is 100% intact. Thanks! /Chris Callegari ps... post mortem actions: my pool size got set to 3 since I now have raw capacity to do so. ;-) On Fri, Jun 9, 2017 at 5:11 PM, Mazzystr <mazzy...@gmail.com> wrote: > Well I did bad I just don't know how bad yet. Before we get into it my > critical data is backed up to CrashPlan. I'd rather not lose all my > archive data. Losing some of the data is ok. > > I added a bunch of disks to my ceph cluster so I turned off the cluster > and dd'd the raw disks around so that the disks and osd's were ordered by > id's on the HBA. I fat fingered one disk and overwrote it. Another disk > didn't dd correctly... it seems to have not unmounted correctly plus it has > some failures according to smartctl. A repair_xfs command put a whole > bunch of data into lost+found. > > I brought the cluster up and let it settle down. The result is 49 stuck > pg's and CephFS is halted. > > ceph -s is here <https://pastebin.com/RXSinLjZ> > ceph osd tree is here <https://pastebin.com/qmE0dhyH> > ceph pg dump minus the active pg's is here <https://pastebin.com/36kpmA8s> > > OSD-2 is gone with no chance to restore it. > > OSD-3 had the xfs corruption. I have a bunch of > /var/lib/ceph/osd/ceph-3/lost+found/blah/DIR_[0-9]+/blah.blah__head_blah.blah > files after xfs_repair. I for looped these files through ceph osd map > <pool> $file and it seems they have all been replicated to other OSD's. > It seems to be safe to delete this data. > > There are files named [0-9]+ in the top level of > /var/lib/ceph/osd/ceph-3/lost+found. I don't know what to do with these > files. > > > I have a couple questions: > 1) can the top level lost+found files be used to recreate the stuck pg's? > > 2a) can the pg's be dropped and recreated to bring the cluster to a > healthy state? > 2b) if i do this can CephFS be restored with just partial data loss? The > cephfs documentation isn't quite clear on how to do this. > > Thanks for your time and help! > /Chris > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com