Sorry to take so long in replying.... I ended up evacuating data and rebuilding using Luminous with BlueStore OSDs. I need my usual drive/host failure testing before going live. Of course other things are burning right now and have my attention. Hopefully I can finish that work in the next few days.
On Wed, Jun 28, 2017 at 10:11 PM, Mazzystr <mazzy...@gmail.com> wrote: > I should be able to try that tomorrow. > > I'll report back in afterward > > On Wed, Jun 28, 2017 at 10:09 PM, Brad Hubbard <bhubb...@redhat.com> > wrote: > >> On Thu, Jun 29, 2017 at 11:58 AM, Mazzystr <mazzy...@gmail.com> wrote: >> > just one MON >> >> Try just replacing that MON then? >> >> > >> > On Wed, Jun 28, 2017 at 8:05 PM, Brad Hubbard <bhubb...@redhat.com> >> wrote: >> >> >> >> On Wed, Jun 28, 2017 at 10:18 PM, Mazzystr <mazzy...@gmail.com> wrote: >> >> > The corruption is back in mons logs... >> >> > >> >> > 2017-06-28 08:16:53.078495 7f1a0b9da700 1 leveldb: Compaction error: >> >> > Corruption: bad entry in block >> >> > 2017-06-28 08:16:53.078499 7f1a0b9da700 1 leveldb: Waiting after >> >> > background >> >> > compaction error: Corruption: bad entry in block >> >> >> >> Is this just one MON, or is it in the logs of all of your MONs? >> >> >> >> > >> >> > >> >> > On Tue, Jun 27, 2017 at 10:42 PM, Mazzystr <mazzy...@gmail.com> >> wrote: >> >> >> >> >> >> 22:16 ccallegar: good grief...talk about a handful of sand in your >> eye! >> >> >> I've been chasing down a "leveldb: Compaction error: Corruption: bad >> >> >> entry >> >> >> in block " in mons logs... >> >> >> 22:17 ccallegar: I ran a python leveldb.repair() and restarted osd's >> >> >> and >> >> >> mons and my cluster crashed and burned >> >> >> 22:18 ccallegar: a couple files ended up in leveldb lost dirs. The >> >> >> path >> >> >> is different if it's a mons or osd >> >> >> 22:19 ccallegar: for mons logs showed a MANIFEST file missing. I >> moved >> >> >> the file that landed in lost back to normal position, chown'd >> >> >> ceph:ceph, >> >> >> restarted mons and mons came back online! >> >> >> 22:21 ccallegar: osd logs showed a sst file missing. looks like >> >> >> leveldb.repair() does the needful but names the new file a .ldb. I >> >> >> renamed >> >> >> the file, chown'd ceph:ceph, restarted osd and they came back >> online! >> >> >> >> >> >> leveldb corruption log entries have gone away and my cluster is >> >> >> recovering >> >> >> it's way to happiness. >> >> >> >> >> >> Hopefully this helps someone else out >> >> >> >> >> >> Thanks, >> >> >> /Chris >> >> >> >> >> >> >> >> >> On Tue, Jun 27, 2017 at 6:39 PM, Mazzystr <mazzy...@gmail.com> >> wrote: >> >> >>> >> >> >>> Hi Ceph Users, >> >> >>> I've been chasing down some levelDB corruption messages in my mons >> >> >>> logs. >> >> >>> I ran a python leveldb repair on mon and odd leveldbs. The job >> caused >> >> >>> a >> >> >>> files to disappear and a log file to appear in lost directory. Mon >> >> >>> and >> >> >>> osd's refuse to boot. >> >> >>> >> >> >>> Ceph version is kraken 11.02. >> >> >>> >> >> >>> There's not a whole lot of info on the internet regarding this. >> >> >>> Anyone >> >> >>> have any ideas on how to recover the mess? >> >> >>> >> >> >>> Thanks, >> >> >>> /Chris C >> >> >> >> >> >> >> >> > >> >> > >> >> > _______________________________________________ >> >> > ceph-users mailing list >> >> > ceph-users@lists.ceph.com >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > >> >> >> >> >> >> >> >> -- >> >> Cheers, >> >> Brad >> > >> > >> >> >> >> -- >> Cheers, >> Brad >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com