Did you upgrade from 0.92? If you did, did you flush the logs before upgrading?
On Sun, Apr 19, 2015 at 1:02 PM, Scott Laird <sc...@sigkill.org> wrote: > I'm upgrading from Giant to Hammer (0.94.1), and I'm seeing a ton of OSDs > die (and stay dead) with this error in the logs: > > 2015-04-19 11:53:36.796847 7f61fa900900 -1 osd/OSD.h: In function > 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f61fa900900 time > 2015-04-19 11:53:36.794951 > osd/OSD.h: 716: FAILED assert(ret) > > ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x8b) [0xbc271b] > 2: (OSDService::get_map(unsigned int)+0x3f) [0x70923f] > 3: (OSD::load_pgs()+0x1769) [0x6c35d9] > 4: (OSD::init()+0x71f) [0x6c4c7f] > 5: (main()+0x2860) [0x651fc0] > 6: (__libc_start_main()+0xf5) [0x7f61f7a3fec5] > 7: /usr/bin/ceph-osd() [0x66aff7] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > This is on a small cluster, with ~40 OSDs on 5 servers running Ubuntu > 14.04. So far, every single server that I've upgraded has had at least one > disk that has failed to restart with this error, and one has had several > disks in this state. > > Restarting the OSD after it dies with this doesn't help. > > I haven't lost any data through this due to my slow rollout, but it's > really annoying. > > Here are two full logs from OSDs on two different machines: > > https://dl.dropboxusercontent.com/u/104949139/ceph-osd.25.log > https://dl.dropboxusercontent.com/u/104949139/ceph-osd.34.log > > Any suggestions? > > > Scott > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com