Did you upgrade from 0.92? If you did, did you flush the logs before
upgrading?

On Sun, Apr 19, 2015 at 1:02 PM, Scott Laird <sc...@sigkill.org> wrote:

> I'm upgrading from Giant to Hammer (0.94.1), and I'm seeing a ton of OSDs
> die (and stay dead) with this error in the logs:
>
> 2015-04-19 11:53:36.796847 7f61fa900900 -1 osd/OSD.h: In function
> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f61fa900900 time
> 2015-04-19 11:53:36.794951
> osd/OSD.h: 716: FAILED assert(ret)
>
>  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0xbc271b]
>  2: (OSDService::get_map(unsigned int)+0x3f) [0x70923f]
>  3: (OSD::load_pgs()+0x1769) [0x6c35d9]
>  4: (OSD::init()+0x71f) [0x6c4c7f]
>  5: (main()+0x2860) [0x651fc0]
>  6: (__libc_start_main()+0xf5) [0x7f61f7a3fec5]
>  7: /usr/bin/ceph-osd() [0x66aff7]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> This is on a small cluster, with ~40 OSDs on 5 servers running Ubuntu
> 14.04.  So far, every single server that I've upgraded has had at least one
> disk that has failed to restart with this error, and one has had several
> disks in this state.
>
> Restarting the OSD after it dies with this doesn't help.
>
> I haven't lost any data through this due to my slow rollout, but it's
> really annoying.
>
> Here are two full logs from OSDs on two different machines:
>
> https://dl.dropboxusercontent.com/u/104949139/ceph-osd.25.log
> https://dl.dropboxusercontent.com/u/104949139/ceph-osd.34.log
>
> Any suggestions?
>
>
> Scott
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to