Hello Wido,

Thanks for the advice.  While the data center has a/b circuits and
redundant power, etc if a ground fault happens it  travels outside and
fails causing the whole building to fail (apparently).

The monitors are each the same with
2x e5 cpus
64gb of ram
4x 300gb 10k SAS drives in raid 10 (write through mode).
Ubuntu 14.04 with the latest updates prior to power failure (2016/Aug/10 -
3am CST)
Ceph hammer LTS 0.94.7

(we are still working on our jewel test cluster so it is planned but not in
place yet)

The only thing that seems to be corrupt is the monitors leveldb store.  I
see multiple issues on Google leveldb github from March 2016 about fsync
and power failure so I assume this is an issue with leveldb.

I have backed up /var/lib/ceph/Mon on all of my monitors before trying to
proceed with any form of recovery.

Is there any way to reconstruct the leveldb or replace the monitors and
recover the data?

I found the following post in which sage says it is tedious but possible. (
http://www.spinics.net/lists/ceph-devel/msg06662.html). Tedious is fine if
I have any chance of doing it.  I have the fsid, the Mon key map and all of
the osds look to be fine so all of the previous osd maps  are there.

I just don't understand what key/values I need inside.

On Aug 11, 2016 1:33 AM, "Wido den Hollander" <w...@42on.com> wrote:

>
> > Op 11 augustus 2016 om 0:10 schreef Sean Sullivan <
> seapasu...@uchicago.edu>:
> >
> >
> > I think it just got worse::
> >
> > all three monitors on my other cluster say that ceph-mon can't open
> > /var/lib/ceph/mon/$(hostname). Is there any way to recover if you lose
> all
> > 3 monitors? I saw a post by Sage saying that the data can be recovered as
> > all of the data is held on other servers. Is this possible? If so has
> > anyone had any experience doing so?
>
> I have never done so, so I couldn't tell you.
>
> However, it is weird that on all three it got corrupted. What hardware are
> you using? Was it properly protected against power failure?
>
> If you mon store is corrupted I'm not sure what might happen.
>
> However, make a backup of ALL monitors right now before doing anything.
>
> Wido
>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to