I'm upgrading from Giant to Hammer (0.94.1), and I'm seeing a ton of OSDs
die (and stay dead) with this error in the logs:

2015-04-19 11:53:36.796847 7f61fa900900 -1 osd/OSD.h: In function
'OSDMapRef OSDService::get_map(epoch_t)' thread 7f61fa900900 time
2015-04-19 11:53:36.794951
osd/OSD.h: 716: FAILED assert(ret)

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xbc271b]
 2: (OSDService::get_map(unsigned int)+0x3f) [0x70923f]
 3: (OSD::load_pgs()+0x1769) [0x6c35d9]
 4: (OSD::init()+0x71f) [0x6c4c7f]
 5: (main()+0x2860) [0x651fc0]
 6: (__libc_start_main()+0xf5) [0x7f61f7a3fec5]
 7: /usr/bin/ceph-osd() [0x66aff7]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.

This is on a small cluster, with ~40 OSDs on 5 servers running Ubuntu
14.04.  So far, every single server that I've upgraded has had at least one
disk that has failed to restart with this error, and one has had several
disks in this state.

Restarting the OSD after it dies with this doesn't help.

I haven't lost any data through this due to my slow rollout, but it's
really annoying.

Here are two full logs from OSDs on two different machines:

https://dl.dropboxusercontent.com/u/104949139/ceph-osd.25.log
https://dl.dropboxusercontent.com/u/104949139/ceph-osd.34.log

Any suggestions?


Scott
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to