Re: [ceph-users] mon: leveldb checksum mismatch

2014-07-03 Thread Jason Harley
Hi Joao,

On Jul 3, 2014, at 7:57 PM, Joao Eduardo Luis  wrote:

> We don't have a way to repair leveldb.  Having multiple monitors usually help 
> with such tricky situations.

I know this, but for this small dev cluster I wasn’t thinking about corruption 
of my mon’s backing store.  Silly me :)

> 
> According to this [1] the python bindings you're using may not be linked into 
> snappy, which we were using (mistakenly until recently) to compress data as 
> it goes into leveldb.  Not having those snappy bindings may be what's causing 
> all those files to be moved to lost instead.

I found the same posting, and confirmed that the ‘levedb.so’ that ships with 
the ‘python-leveldb’ package on Ubuntu 13.10 links against ‘snappy’.

> The suggestion that the thread in [1] offers is to have the repair 
> functionality directly in the 'application' itself.  We could do this by 
> adding a repair option to ceph-kvstore-tool -- which could help.
> 
> I'll be happy to get that into ceph-kvstore-tool tomorrow and push a branch 
> for you to compile and test.

I would be more than happy to try this out.  Without fixing these checksums, I 
think I’m reinitializing my cluster. :\

Thank you,
./JRH___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon: leveldb checksum mismatch

2014-07-03 Thread Joao Eduardo Luis

On 07/04/2014 12:29 AM, Jason Harley wrote:

Hi list —

I’ve got a small dev. cluster: 3 OSD nodes with 6 disks/OSDs each and a single 
monitor (this, it seems, was my mistake).  The monitor node went down hard and 
it looks like the monitor’s db is in a funny state.  Running ‘ceph-mon’ 
manually with ‘debug_mon 20’ and ‘debug_ms 20’ gave the following:


/usr/bin/ceph-mon -i monhost --mon-data /var/lib/ceph/mon/ceph-monhost 
--debug_mon 20 --debug_ms 20 -d
2014-07-03 23:20:55.800512 7f973918e7c0  0 ceph version 0.67.7 
(d7ab4244396b57aac8b7e80812115bbd079e6b73), process ceph-mon, pid 24930
Corruption: checksum mismatch
Corruption: checksum mismatch
2014-07-03 23:20:56.455797 7f973918e7c0 -1 failed to create new leveldb store


I attempted to make use of the leveldb Python library’s ‘RepairDB’ function, 
which just moves enough files into ‘lost’ that when running the monitor again 
I’m asked if I ran mkcephfs.

Any insight into resolving these two checksum mismatches so I can access my OSD 
data would be greatly appreciated.

Thanks,
./JRH

p.s. I’m assuming that without the maps from the monitor, my OSD data is 
unrecoverable also.


Hello Jason,

We don't have a way to repair leveldb.  Having multiple monitors usually 
help with such tricky situations.


According to this [1] the python bindings you're using may not be linked 
into snappy, which we were using (mistakenly until recently) to compress 
data as it goes into leveldb.  Not having those snappy bindings may be 
what's causing all those files to be moved to lost instead.


The suggestion that the thread in [1] offers is to have the repair 
functionality directly in the 'application' itself.  We could do this by 
adding a repair option to ceph-kvstore-tool -- which could help.


I'll be happy to get that into ceph-kvstore-tool tomorrow and push a 
branch for you to compile and test.


  -Joao


[1] - https://groups.google.com/forum/#!topic/leveldb/YvszWNio2-Q

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mon: leveldb checksum mismatch

2014-07-03 Thread Jason Harley
Hi list —

I’ve got a small dev. cluster: 3 OSD nodes with 6 disks/OSDs each and a single 
monitor (this, it seems, was my mistake).  The monitor node went down hard and 
it looks like the monitor’s db is in a funny state.  Running ‘ceph-mon’ 
manually with ‘debug_mon 20’ and ‘debug_ms 20’ gave the following:

> /usr/bin/ceph-mon -i monhost --mon-data /var/lib/ceph/mon/ceph-monhost 
> --debug_mon 20 --debug_ms 20 -d
> 2014-07-03 23:20:55.800512 7f973918e7c0  0 ceph version 0.67.7 
> (d7ab4244396b57aac8b7e80812115bbd079e6b73), process ceph-mon, pid 24930
> Corruption: checksum mismatch
> Corruption: checksum mismatch
> 2014-07-03 23:20:56.455797 7f973918e7c0 -1 failed to create new leveldb store

I attempted to make use of the leveldb Python library’s ‘RepairDB’ function, 
which just moves enough files into ‘lost’ that when running the monitor again 
I’m asked if I ran mkcephfs.

Any insight into resolving these two checksum mismatches so I can access my OSD 
data would be greatly appreciated.

Thanks,
./JRH

p.s. I’m assuming that without the maps from the monitor, my OSD data is 
unrecoverable also.

  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com