I resolved this situation and the results are a testament on how amazing
Ceph is.

I had replication factor set to 2 during my data migration to ensure there
was enough capacity on the target as I cycled disks from Drobo storage to
Ceph storage.  I forgot about that setting.  I started having mech and
inevitable failure on a disk so started evacuating objects from that disk
via crush reweight osd.  Objects started to fly across osd's.  A bunch of
data was moved and some was left to be moved. Impatient me started
tinkering on some other part of the cluster and I accidentally "zapped" a
disk.  Things were still "ok".  Then the failing disk finally died and
dropped out of the filesystem.  I rebooted the node.  The xfs filesystem
corrupted.  xfs_repair wiped out inodes and put everything into
lost+found.  This was a BIG uh-oh.  Ceph reacted badly and halted with 46
pg's in active+stuck+stale status and filesystem halted.  About 800,000
objects were handled by the pg's. Mother eff!

I ended up dropping the two disks completely from cluster, dropping the
pg's that were managing the objects. For filesystem I just simply deleted
it.

Dum dum duuuuum!  On a traditional storage system these acts would've been
catastrophic to the "volume"!

But not for Ceph.

I readded the good disk to the cluster and added a replacement disk to the
cluster.  I recreated the pg's as blank pg's.  This got the Ceph cluster
healthy.  For filesystem I rescanned for indexes which an object to inode
map, recreated the inodes. The made for a healthy posix filesystem and
mounted it up.

Total loss was a few thousand minor fringe files.  My video surveillance
archive is 100% intact.

Thanks!
/Chris Callegari

ps... post mortem actions: my pool size got set to 3 since I now have raw
capacity to do so.  ;-)


On Fri, Jun 9, 2017 at 5:11 PM, Mazzystr <mazzy...@gmail.com> wrote:

> Well I did bad I just don't know how bad yet.  Before we get into it my
> critical data is backed up to CrashPlan.  I'd rather not lose all my
> archive data.  Losing some of the data is ok.
>
> I added a bunch of disks to my ceph cluster so I turned off the cluster
> and dd'd the raw disks around so that the disks and osd's were ordered by
> id's on the HBA.  I fat fingered one disk and overwrote it.  Another disk
> didn't dd correctly... it seems to have not unmounted correctly plus it has
> some failures according to smartctl.  A repair_xfs command put a whole
> bunch of data into lost+found.
>
> I brought the cluster up and let it settle down.  The result is 49 stuck
> pg's and CephFS is halted.
>
> ceph -s is here <https://pastebin.com/RXSinLjZ>
> ceph osd tree is here <https://pastebin.com/qmE0dhyH>
> ceph pg dump minus the active pg's is here <https://pastebin.com/36kpmA8s>
>
> OSD-2 is gone with no chance to restore it.
>
> OSD-3 had the xfs corruption.  I have a bunch of
> /var/lib/ceph/osd/ceph-3/lost+found/blah/DIR_[0-9]+/blah.blah__head_blah.blah
> files after xfs_repair.  I for looped these files through ceph osd map
> <pool> $file and it seems they have all been replicated to other OSD's.
> It seems to be safe to delete this data.
>
> There are files named [0-9]+ in the top level of
> /var/lib/ceph/osd/ceph-3/lost+found.  I don't know what to do with these
> files.
>
>
> I have a couple questions:
> 1) can the top level lost+found files be used to recreate the stuck pg's?
>
> 2a) can the pg's be dropped and recreated to bring the cluster to a
> healthy state?
> 2b) if i do this can CephFS be restored with just partial data loss?  The
> cephfs documentation isn't quite clear on how to do this.
>
> Thanks for your time and help!
> /Chris
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to