Re: [ceph-users] mon leveldb loss
Hi Mike, Sorry to hear that, I hope this can help you to recover your RBD images: http://www.sebastien-han.fr/blog/2015/01/29/ceph-recover-a-rbd-image-from-a-dead-cluster/ Since you don’t have your monitors, you can still walk through the OSD data dir and look for the rbd identifiers. Something like this might help: sudo find /var/lib/ceph/osd/ -type f -name rbd*data.* | cut -d'.' -f 3 | sort | uniq Hope it helps. > On 29 Jan 2015, at 21:36, Mike Winfield > wrote: > > Hi, I'm hoping desperately that someone can help. I have a critical issue > with a tiny 'cluster'... > > There was a power glitch earlier today (not an outage, might have been a > brownout, some things went down, others didn't) and i came home to a CPU > machine check exception on the singular host on which i keep a trio of ceph > monitors. No option but to hard reset. When the system came back up, the > monitors didn't. > > Each mon is reporting possible corruption of their leveldb stores, files are > missing, one might surmise an fsck decided to discard them. See attached txt > files for ceph-mon output and corresponding store.db directory listings. > > Is there any way to recover the leveldb for the monitors? I am more than > capable and willing to dig into the structure of these files - or any similar > measures - if necessary. Perhaps correlate a compete picture between the data > files that are available? > > I do have a relevant backup of the monitor data but it is now three months > old. I would prefer not to have to resort to this if there is any chance of > recovering monitor operability by other means. > > Also, what would the consequences be of restoring such a backup when the > (12TB worth of) osd's are perfectly fine and contain the latest up-to-date pg > associations? Would there be a risk of data loss? > > Unfortunately i don't have any backups of the actual user data (being poor, > scraping along on a shoestring budget, not exactly conducive to anything > approaching an ideal hardware setup), unless one counts a set of old disks > from a previously failed cluster from six months ago. > > My last recourse will likely be to try to scavenge and piece together my most > important files from whatever i find on the osd's. Far from an exciting > prospect but i am seriously desperate. > > I would be terribly grateful for any input. > > Mike > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Cheers. Sébastien Han Cloud Architect "Always give 100%. Unless you're giving blood." Phone: +33 (0)1 49 70 99 72 Mail: sebastien@enovance.com Address : 11 bis, rue Roquépine - 75008 Paris Web : www.enovance.com - Twitter : @enovance signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mon leveldb loss
Hi, I'm hoping desperately that someone can help. I have a critical issue with a tiny 'cluster'... There was a power glitch earlier today (not an outage, might have been a brownout, some things went down, others didn't) and i came home to a CPU machine check exception on the singular host on which i keep a trio of ceph monitors. No option but to hard reset. When the system came back up, the monitors didn't. Each mon is reporting possible corruption of their leveldb stores, files are missing, one might surmise an fsck decided to discard them. See attached txt files for ceph-mon output and corresponding store.db directory listings. Is there any way to recover the leveldb for the monitors? I am more than capable and willing to dig into the structure of these files - or any similar measures - if necessary. Perhaps correlate a compete picture between the data files that are available? I do have a relevant backup of the monitor data but it is now three months old. I would prefer not to have to resort to this if there is any chance of recovering monitor operability by other means. Also, what would the consequences be of restoring such a backup when the (12TB worth of) osd's are perfectly fine and contain the latest up-to-date pg associations? Would there be a risk of data loss? Unfortunately i don't have any backups of the actual user data (being poor, scraping along on a shoestring budget, not exactly conducive to anything approaching an ideal hardware setup), unless one counts a set of old disks from a previously failed cluster from six months ago. My last recourse will likely be to try to scavenge and piece together my most important files from whatever i find on the osd's. Far from an exciting prospect but i am seriously desperate. I would be terribly grateful for any input. Mike 2015-01-29 19:49:30.590913 7fa66458d7c0 0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18788 Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb 2015-01-29 19:49:37.542790 7fa66458d7c0 -1 failed to create new leveldb store 2015-01-29 19:49:43.279940 7f03e8ec87c0 0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18846 Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb 2015-01-29 19:49:50.708742 7f03e8ec87c0 -1 failed to create new leveldb store 2015-01-29 19:49:47.866736 7fb6aeebe7c0 0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18869 Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb 2015-01-29 19:49:54.935436 7fb6aeebe7c0 -1 failed to create new leveldb store mon/unimatrix-0/store.db/: total 42160 -rw-r--r-- 1 root root 57 Aug 24 14:59 LOG -rw-r--r-- 1 root root0 Aug 24 14:59 LOCK drwxr-xr-x 3 root root 80 Aug 24 14:59 .. -rw-r--r-- 1 root root 16 Nov 2 18:24 CURRENT -rw-r--r-- 1 root root 182248 Jan 29 05:13 1051297.ldb -rw-r--r-- 1 root root82124 Jan 29 13:53 1054697.ldb -rw-r--r-- 1 root root46609 Jan 29 14:00 1054744.ldb -rw-r--r-- 1 root root 165708 Jan 29 14:07 1054790.ldb -rw-r--r-- 1 root root83304 Jan 29 14:16 1054851.ldb -rw-r--r-- 1 root root18620 Jan 29 14:16 1054858.ldb -rw-r--r-- 1 root root 42568979 Jan 29 14:23 MANIFEST-399002 drwxr-xr-x 2 root root 240 Jan 29 14:23 . mon/unimatrix-2/store.db/: total 42180 -rw-r--r-- 1 root root 57 Aug 24 15:09 LOG -rw-r--r-- 1 root root0 Aug 24 15:09 LOCK drwxr-xr-x 3 root root 80 Aug 24 15:09 .. -rw-r--r-- 1 root root 16 Nov 2 18:24 CURRENT -rw-r--r-- 1 root root 182248 Jan 29 05:13 1051311.ldb -rw-r--r-- 1 root root82124 Jan 29 13:53 1054711.ldb -rw-r--r-- 1 root root46609 Jan 29 14:00 1054758.ldb -rw-r--r-- 1 root root 165708 Jan 29 14:07 1054804.ldb -rw-r--r-- 1 root root83304 Jan 29 14:16 1054865.ldb -rw-r--r-- 1 root root18620 Jan 29 14:16 1054872.ldb -rw-r--r-- 1 root root 42589118 Jan 29 14:23 MANIFEST-399004 drwxr-xr-x 2 root root 240 Jan 29 14:23 . mon/unimatrix-1/store.db/: total 42180 -rw-r--r-- 1 root root0 Aug 24 15:03 LOCK drwxr-xr-x 3 root root 80 Aug 24 15:03 .. -rw-r--r-- 1 root root 57 Aug 24 15:03 LOG -rw-r--r-- 1 root root 16 Nov 2 18:24 CURRENT -rw-r--r-- 1 root root 182248 Jan 29 05:13 1051308.ldb -rw-r--r-- 1 root root82124 Jan 29 13:53 1054708.ldb -rw-r--r-- 1 root root46609 Jan 29 14:00 1054755.ldb -rw-r--r-- 1 root root 165708 Jan 29 14:07 1054801.ldb -rw-r--r-- 1 root root83304 Jan 29 14:16 1054862.ldb -rw-r--r-- 1 root root18620 Jan 29 14:16 1054869.ldb -rw-r--r-- 1 root root 4254 Jan 29 14:23 MANIFEST-399