Can you post more of the log? There should be a line towards the bottom indicating the line with the failed assert. Can you also attach ceph pg dump, ceph osd dump, ceph osd tree? -Sam
On Mon, Aug 12, 2013 at 11:54 AM, John Wilkins <john.wilk...@inktank.com>wrote: > Stephane, > > You should post any crash bugs with stack trace to ceph-devel > ceph-de...@vger.kernel.org. > > > On Mon, Aug 12, 2013 at 9:02 AM, Stephane Boisvert < > stephane.boisv...@gameloft.com> wrote: > >> Hi, >> It seems my OSD processes keep crashing randomly and I don't know >> why. It seems to happens when the cluster is trying to re-balance... In >> normal usange I didn't notice any crash like that. >> >> We running ceph 0.61.7 on an up to date ubuntu 12.04 (all packages >> including kernel are current). >> >> >> Anyone have an idea ? >> >> >> TRACE: >> >> >> ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff) >> 1: /usr/bin/ceph-osd() [0x79219a] >> 2: (()+0xfcb0) [0x7fd692da1cb0] >> 3: (gsignal()+0x35) [0x7fd69155a425] >> 4: (abort()+0x17b) [0x7fd69155db8b] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd691eac69d] >> 6: (()+0xb5846) [0x7fd691eaa846] >> 7: (()+0xb5873) [0x7fd691eaa873] >> 8: (()+0xb596e) [0x7fd691eaa96e] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x1df) [0x84303f] >> 10: >> (PG::RecoveryState::Recovered::Recovered(boost::statechart::state<PG::RecoveryState::Recovered, >> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na>, >> (boost::statechart::history_mode)0>::my_context)+0x38f) [0x6d932f] >> 11: (boost::statechart::state<PG::RecoveryState::Recovered, >> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na>, >> (boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr<PG::RecoveryState::Active> >> const&, >> boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, >> PG::RecoveryState::Initial, std::allocator<void>, >> boost::statechart::null_exception_translator>&)+0x5c) [0x6f270c] >> 12: (PG::RecoveryState::Recovering::react(PG::AllReplicasRecovered >> const&)+0xb4) [0x6d9454] >> 13: (boost::statechart::simple_state<PG::RecoveryState::Recovering, >> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na>, >> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base >> const&, void const*)+0xda) [0x6f296a] >> 14: >> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, >> PG::RecoveryState::Initial, std::allocator<void>, >> boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base >> const&)+0x5b) [0x6e320b] >> 15: >> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, >> PG::RecoveryState::Initial, std::allocator<void>, >> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base >> const&)+0x11) [0x6e34e1] >> 16: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, >> PG::RecoveryCtx*)+0x347) [0x69aaf7] >> 17: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > >> const&, ThreadPool::TPHandle&)+0x2f5) [0x632fc5] >> 18: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > >> const&, ThreadPool::TPHandle&)+0x12) [0x66e2d2] >> 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x838476] >> 20: (ThreadPool::WorkThread::entry()+0x10) [0x83a2a0] >> 21: (()+0x7e9a) [0x7fd692d99e9a] >> 22: (clone()+0x6d) [0x7fd691617ccd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >> to interpret this. >> >> --- begin dump of recent events --- >> -3> 2013-08-12 15:58:15.561005 7fd683d78700 1 -- >> 10.136.48.18:6814/21240 <== osd.56 10.136.48.14:0/17437 44 ==== >> osd_ping(ping e8959 stamp 2013-08-12 15:58:15.556022) v2 ==== 47+0+0 >> (355096560 0 0) 0xc4e81c0 con 0x12fbeb00 >> -2> 2013-08-12 15:58:15.561038 7fd683d78700 1 -- >> 10.136.48.18:6814/21240 --> 10.136.48.14:0/17437 -- osd_ping(ping_reply >> e8959 stamp 2013-08-12 15:58:15.556022) v2 -- ?+0 0x1683ec40 con 0x12fbeb00 >> -1> 2013-08-12 15:58:15.568600 7fd67e56d700 1 -- >> 10.136.48.18:6813/21240 --> osd.44 10.136.48.15:6820/25671 -- >> osd_sub_op(osd.20.0:1293 25.328 >> 699ac328/rbd_data.ae2732ae8944a.0000000000240828/head//25 [push] v 8424'11 >> snapset=0=[]:[] snapc=0=[]) v7 -- ?+0 0x2df0f400 >> 0> 2013-08-12 15:58:15.581608 7fd681d74700 -1 *** Caught signal >> (Aborted) ** >> in thread 7fd681d74700 >> >> ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff) >> 1: /usr/bin/ceph-osd() [0x79219a] >> 2: (()+0xfcb0) [0x7fd692da1cb0] >> 3: (gsignal()+0x35) [0x7fd69155a425] >> 4: (abort()+0x17b) [0x7fd69155db8b] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd691eac69d] >> 6: (()+0xb5846) [0x7fd691eaa846] >> 7: (()+0xb5873) [0x7fd691eaa873] >> 8: (()+0xb596e) [0x7fd691eaa96e] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x1df) [0x84303f] >> 10: >> (PG::RecoveryState::Recovered::Recovered(boost::statechart::state<PG::RecoveryState::Recovered, >> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na>, >> (boost::statechart::history_mode)0>::my_context)+0x38f) [0x6d932f] >> 11: (boost::statechart::state<PG::RecoveryState::Recovered, >> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na>, >> (boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr<PG::RecoveryState::Active> >> const&, >> boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, >> PG::RecoveryState::Initial, std::allocator<void>, >> boost::statechart::null_exception_translator>&)+0x5c) [0x6f270c] >> 12: (PG::RecoveryState::Recovering::react(PG::AllReplicasRecovered >> const&)+0xb4) [0x6d9454] >> 13: (boost::statechart::simple_state<PG::RecoveryState::Recovering, >> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, >> mpl_::na, mpl_::na, mpl_::na>, >> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base >> const&, void const*)+0xda) [0x6f296a] >> 14: >> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, >> PG::RecoveryState::Initial, std::allocator<void>, >> boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base >> const&)+0x5b) [0x6e320b] >> 15: >> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, >> PG::RecoveryState::Initial, std::allocator<void>, >> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base >> const&)+0x11) [0x6e34e1] >> 16: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, >> PG::RecoveryCtx*)+0x347) [0x69aaf7] >> 17: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > >> const&, ThreadPool::TPHandle&)+0x2f5) [0x632fc5] >> 18: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > >> const&, ThreadPool::TPHandle&)+0x12) [0x66e2d2] >> 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x838476] >> 20: (ThreadPool::WorkThread::entry()+0x10) [0x83a2a0] >> 21: (()+0x7e9a) [0x7fd692d99e9a] >> 22: (clone()+0x6d) [0x7fd691617ccd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >> to interpret this. >> >> --- logging levels --- >> 0/ 5 none >> 0/ 1 lockdep >> 0/ 1 context >> 1/ 1 crush >> 1/ 5 mds >> 1/ 5 mds_balancer >> 1/ 5 mds_locker >> 1/ 5 mds_log >> 1/ 5 mds_log_expire >> 1/ 5 mds_migrator >> 0/ 1 buffer >> 0/ 1 timer >> 0/ 1 filer >> 0/ 1 striper >> 0/ 1 objecter >> 0/ 5 rados >> 0/ 5 rbd >> 0/ 5 journaler >> 0/ 5 objectcacher >> 0/ 5 client >> 0/ 5 osd >> 0/ 5 optracker >> 0/ 5 objclass >> 1/ 3 filestore >> 1/ 3 journal >> 0/ 5 ms >> 1/ 5 mon >> 0/10 monc >> 0/ 5 paxos >> 0/ 5 tp >> 1/ 5 auth >> 1/ 5 crypto >> 1/ 1 finisher >> 1/ 5 heartbeatmap >> 1/ 5 perfcounter >> 1/ 5 rgw >> 1/ 5 hadoop >> 1/ 5 javaclient >> 1/ 5 asok >> 1/ 1 throttle >> -2/-2 (syslog threshold) >> -1/-1 (stderr threshold) >> max_recent 10000 >> max_new 1000 >> log_file /var/log/ceph/ceph-osd.20.log >> --- end dump of recent events --- >> >> >> >> -- >> *Stéphane Boisvert* GNS-Shop Technical Coordinator 5800 St-Denis >> suite 1001 Montreal (QC), H2S 3L5 *MSN:* stephane.boisv...@gameloft.com >> *E-mail:* stephane.boisv...@gameloft.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > John Wilkins > Senior Technical Writer > Intank > john.wilk...@inktank.com > (415) 425-9599 > http://inktank.com > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
<<Inbox.jpg>>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com