Re: [ceph-users] mon leveldb loss

2015-01-30 Thread Sebastien Han
Hi Mike,

Sorry to hear that, I hope this can help you to recover your RBD images:
http://www.sebastien-han.fr/blog/2015/01/29/ceph-recover-a-rbd-image-from-a-dead-cluster/

Since you don’t have your monitors, you can still walk through the OSD data dir 
and look for the rbd identifiers.
Something like this might help:

 sudo find /var/lib/ceph/osd/ -type f -name rbd*data.* | cut -d'.' -f 3 | sort 
| uniq

Hope it helps.


> On 29 Jan 2015, at 21:36, Mike Winfield  
> wrote:
> 
> Hi, I'm hoping desperately that someone can help. I have a critical issue 
> with a tiny 'cluster'...
> 
> There was a power glitch earlier today (not an outage, might have been a 
> brownout, some things went down, others didn't) and i came home to a CPU 
> machine check exception on the singular host on which i keep a trio of ceph 
> monitors. No option but to hard reset. When the system came back up, the 
> monitors didn't.
> 
> Each mon is reporting possible corruption of their leveldb stores, files are 
> missing, one might surmise an fsck decided to discard them. See attached txt 
> files for ceph-mon output and corresponding store.db directory listings.
> 
> Is there any way to recover the leveldb for the monitors? I am more than 
> capable and willing to dig into the structure of these files - or any similar 
> measures - if necessary. Perhaps correlate a compete picture between the data 
> files that are available?
> 
> I do have a relevant backup of the monitor data but it is now three months 
> old. I would prefer not to have to resort to this if there is any chance of 
> recovering monitor operability by other means.
> 
> Also, what would the consequences be of restoring such a backup when the 
> (12TB worth of) osd's are perfectly fine and contain the latest up-to-date pg 
> associations? Would there be a risk of data loss?
> 
> Unfortunately i don't have any backups of the actual user data (being poor, 
> scraping along on a shoestring budget, not exactly conducive to anything 
> approaching an ideal hardware setup), unless one counts a set of old disks 
> from a previously failed cluster from six months ago.
> 
> My last recourse will likely be to try to scavenge and piece together my most 
> important files from whatever i find on the osd's. Far from an exciting 
> prospect but i am seriously desperate.
> 
> I would be terribly grateful for any input.
> 
> Mike
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Cheers.

Sébastien Han
Cloud Architect

"Always give 100%. Unless you're giving blood."

Phone: +33 (0)1 49 70 99 72
Mail: sebastien@enovance.com
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.enovance.com - Twitter : @enovance



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mon leveldb loss

2015-01-29 Thread Mike Winfield
Hi, I'm hoping desperately that someone can help. I have a critical issue
with a tiny 'cluster'...

There was a power glitch earlier today (not an outage, might have been a
brownout, some things went down, others didn't) and i came home to a CPU
machine check exception on the singular host on which i keep a trio of ceph
monitors. No option but to hard reset. When the system came back up, the
monitors didn't.

Each mon is reporting possible corruption of their leveldb stores, files
are missing, one might surmise an fsck decided to discard them. See
attached txt files for ceph-mon output and corresponding store.db directory
listings.

Is there any way to recover the leveldb for the monitors? I am more than
capable and willing to dig into the structure of these files - or any
similar measures - if necessary. Perhaps correlate a compete picture
between the data files that are available?

I do have a relevant backup of the monitor data but it is now three months
old. I would prefer not to have to resort to this if there is any chance of
recovering monitor operability by other means.

Also, what would the consequences be of restoring such a backup when the
(12TB worth of) osd's are perfectly fine and contain the latest up-to-date
pg associations? Would there be a risk of data loss?

Unfortunately i don't have any backups of the actual user data (being poor,
scraping along on a shoestring budget, not exactly conducive to anything
approaching an ideal hardware setup), unless one counts a set of old disks
from a previously failed cluster from six months ago.

My last recourse will likely be to try to scavenge and piece together my
most important files from whatever i find on the osd's. Far from an
exciting prospect but i am seriously desperate.

I would be terribly grateful for any input.

Mike
2015-01-29 19:49:30.590913 7fa66458d7c0  0 ceph version 0.74 
(c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18788
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb
2015-01-29 19:49:37.542790 7fa66458d7c0 -1 failed to create new leveldb store
2015-01-29 19:49:43.279940 7f03e8ec87c0  0 ceph version 0.74 
(c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18846
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb
2015-01-29 19:49:50.708742 7f03e8ec87c0 -1 failed to create new leveldb store
2015-01-29 19:49:47.866736 7fb6aeebe7c0  0 ceph version 0.74 
(c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18869
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb
Corruption: 10 missing files; e.g.: 
/var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb
2015-01-29 19:49:54.935436 7fb6aeebe7c0 -1 failed to create new leveldb store
mon/unimatrix-0/store.db/:
total 42160
-rw-r--r-- 1 root root   57 Aug 24 14:59 LOG
-rw-r--r-- 1 root root0 Aug 24 14:59 LOCK
drwxr-xr-x 3 root root   80 Aug 24 14:59 ..
-rw-r--r-- 1 root root   16 Nov  2 18:24 CURRENT
-rw-r--r-- 1 root root   182248 Jan 29 05:13 1051297.ldb
-rw-r--r-- 1 root root82124 Jan 29 13:53 1054697.ldb
-rw-r--r-- 1 root root46609 Jan 29 14:00 1054744.ldb
-rw-r--r-- 1 root root   165708 Jan 29 14:07 1054790.ldb
-rw-r--r-- 1 root root83304 Jan 29 14:16 1054851.ldb
-rw-r--r-- 1 root root18620 Jan 29 14:16 1054858.ldb
-rw-r--r-- 1 root root 42568979 Jan 29 14:23 MANIFEST-399002
drwxr-xr-x 2 root root  240 Jan 29 14:23 .

mon/unimatrix-2/store.db/:
total 42180
-rw-r--r-- 1 root root   57 Aug 24 15:09 LOG
-rw-r--r-- 1 root root0 Aug 24 15:09 LOCK
drwxr-xr-x 3 root root   80 Aug 24 15:09 ..
-rw-r--r-- 1 root root   16 Nov  2 18:24 CURRENT
-rw-r--r-- 1 root root   182248 Jan 29 05:13 1051311.ldb
-rw-r--r-- 1 root root82124 Jan 29 13:53 1054711.ldb
-rw-r--r-- 1 root root46609 Jan 29 14:00 1054758.ldb
-rw-r--r-- 1 root root   165708 Jan 29 14:07 1054804.ldb
-rw-r--r-- 1 root root83304 Jan 29 14:16 1054865.ldb
-rw-r--r-- 1 root root18620 Jan 29 14:16 1054872.ldb
-rw-r--r-- 1 root root 42589118 Jan 29 14:23 MANIFEST-399004
drwxr-xr-x 2 root root  240 Jan 29 14:23 .

mon/unimatrix-1/store.db/:
total 42180
-rw-r--r-- 1 root root0 Aug 24 15:03 LOCK
drwxr-xr-x 3 root root   80 Aug 24 15:03 ..
-rw-r--r-- 1 root root   57 Aug 24 15:03 LOG
-rw-r--r-- 1 root root   16 Nov  2 18:24 CURRENT
-rw-r--r-- 1 root root   182248 Jan 29 05:13 1051308.ldb
-rw-r--r-- 1 root root82124 Jan 29 13:53 1054708.ldb
-rw-r--r-- 1 root root46609 Jan 29 14:00 1054755.ldb
-rw-r--r-- 1 root root   165708 Jan 29 14:07 1054801.ldb
-rw-r--r-- 1 root root83304 Jan 29 14:16 1054862.ldb
-rw-r--r-- 1 root root18620 Jan 29 14:16 1054869.ldb
-rw-r--r-- 1 root root 4254 Jan 29 14:23 MANIFEST-399