Can you post more of the log?  There should be a line towards the bottom
indicating the line with the failed assert.  Can you also attach ceph pg
dump, ceph osd dump, ceph osd tree?
-Sam


On Mon, Aug 12, 2013 at 11:54 AM, John Wilkins <john.wilk...@inktank.com>wrote:

> Stephane,
>
> You should post any crash bugs with stack trace to ceph-devel
> ceph-de...@vger.kernel.org.
>
>
> On Mon, Aug 12, 2013 at 9:02 AM, Stephane Boisvert <
> stephane.boisv...@gameloft.com> wrote:
>
>>  Hi,
>>     It seems my OSD processes keep crashing randomly and I don't know
>> why.  It seems to happens when the cluster is trying to re-balance... In
>> normal usange I didn't  notice any crash like that.
>>
>> We running ceph 0.61.7 on an up to date ubuntu 12.04 (all packages
>> including kernel are current).
>>
>>
>> Anyone have an idea ?
>>
>>
>> TRACE:
>>
>>
>>  ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
>>  1: /usr/bin/ceph-osd() [0x79219a]
>>  2: (()+0xfcb0) [0x7fd692da1cb0]
>>  3: (gsignal()+0x35) [0x7fd69155a425]
>>  4: (abort()+0x17b) [0x7fd69155db8b]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd691eac69d]
>>  6: (()+0xb5846) [0x7fd691eaa846]
>>  7: (()+0xb5873) [0x7fd691eaa873]
>>  8: (()+0xb596e) [0x7fd691eaa96e]
>>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x1df) [0x84303f]
>>  10:
>> (PG::RecoveryState::Recovered::Recovered(boost::statechart::state<PG::RecoveryState::Recovered,
>> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na>,
>> (boost::statechart::history_mode)0>::my_context)+0x38f) [0x6d932f]
>>  11: (boost::statechart::state<PG::RecoveryState::Recovered,
>> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na>,
>> (boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr<PG::RecoveryState::Active>
>> const&,
>> boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
>> PG::RecoveryState::Initial, std::allocator<void>,
>> boost::statechart::null_exception_translator>&)+0x5c) [0x6f270c]
>>  12: (PG::RecoveryState::Recovering::react(PG::AllReplicasRecovered
>> const&)+0xb4) [0x6d9454]
>>  13: (boost::statechart::simple_state<PG::RecoveryState::Recovering,
>> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na>,
>> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
>> const&, void const*)+0xda) [0x6f296a]
>>  14:
>> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
>> PG::RecoveryState::Initial, std::allocator<void>,
>> boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base
>> const&)+0x5b) [0x6e320b]
>>  15:
>> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
>> PG::RecoveryState::Initial, std::allocator<void>,
>> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
>> const&)+0x11) [0x6e34e1]
>>  16: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>,
>> PG::RecoveryCtx*)+0x347) [0x69aaf7]
>>  17: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> >
>> const&, ThreadPool::TPHandle&)+0x2f5) [0x632fc5]
>>  18: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
>> const&, ThreadPool::TPHandle&)+0x12) [0x66e2d2]
>>  19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x838476]
>>  20: (ThreadPool::WorkThread::entry()+0x10) [0x83a2a0]
>>  21: (()+0x7e9a) [0x7fd692d99e9a]
>>  22: (clone()+0x6d) [0x7fd691617ccd]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>> --- begin dump of recent events ---
>>     -3> 2013-08-12 15:58:15.561005 7fd683d78700  1 --
>> 10.136.48.18:6814/21240 <== osd.56 10.136.48.14:0/17437 44 ====
>> osd_ping(ping e8959 stamp 2013-08-12 15:58:15.556022) v2 ==== 47+0+0
>> (355096560 0 0) 0xc4e81c0 con 0x12fbeb00
>>     -2> 2013-08-12 15:58:15.561038 7fd683d78700  1 --
>> 10.136.48.18:6814/21240 --> 10.136.48.14:0/17437 -- osd_ping(ping_reply
>> e8959 stamp 2013-08-12 15:58:15.556022) v2 -- ?+0 0x1683ec40 con 0x12fbeb00
>>     -1> 2013-08-12 15:58:15.568600 7fd67e56d700  1 --
>> 10.136.48.18:6813/21240 --> osd.44 10.136.48.15:6820/25671 --
>> osd_sub_op(osd.20.0:1293 25.328
>> 699ac328/rbd_data.ae2732ae8944a.0000000000240828/head//25 [push] v 8424'11
>> snapset=0=[]:[] snapc=0=[]) v7 -- ?+0 0x2df0f400
>>      0> 2013-08-12 15:58:15.581608 7fd681d74700 -1 *** Caught signal
>> (Aborted) **
>>  in thread 7fd681d74700
>>
>>  ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
>>  1: /usr/bin/ceph-osd() [0x79219a]
>>  2: (()+0xfcb0) [0x7fd692da1cb0]
>>  3: (gsignal()+0x35) [0x7fd69155a425]
>>  4: (abort()+0x17b) [0x7fd69155db8b]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd691eac69d]
>>  6: (()+0xb5846) [0x7fd691eaa846]
>>  7: (()+0xb5873) [0x7fd691eaa873]
>>  8: (()+0xb596e) [0x7fd691eaa96e]
>>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x1df) [0x84303f]
>>  10:
>> (PG::RecoveryState::Recovered::Recovered(boost::statechart::state<PG::RecoveryState::Recovered,
>> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na>,
>> (boost::statechart::history_mode)0>::my_context)+0x38f) [0x6d932f]
>>  11: (boost::statechart::state<PG::RecoveryState::Recovered,
>> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na>,
>> (boost::statechart::history_mode)0>::shallow_construct(boost::intrusive_ptr<PG::RecoveryState::Active>
>> const&,
>> boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
>> PG::RecoveryState::Initial, std::allocator<void>,
>> boost::statechart::null_exception_translator>&)+0x5c) [0x6f270c]
>>  12: (PG::RecoveryState::Recovering::react(PG::AllReplicasRecovered
>> const&)+0xb4) [0x6d9454]
>>  13: (boost::statechart::simple_state<PG::RecoveryState::Recovering,
>> PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
>> mpl_::na, mpl_::na, mpl_::na>,
>> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
>> const&, void const*)+0xda) [0x6f296a]
>>  14:
>> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
>> PG::RecoveryState::Initial, std::allocator<void>,
>> boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base
>> const&)+0x5b) [0x6e320b]
>>  15:
>> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
>> PG::RecoveryState::Initial, std::allocator<void>,
>> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
>> const&)+0x11) [0x6e34e1]
>>  16: (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>,
>> PG::RecoveryCtx*)+0x347) [0x69aaf7]
>>  17: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> >
>> const&, ThreadPool::TPHandle&)+0x2f5) [0x632fc5]
>>  18: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
>> const&, ThreadPool::TPHandle&)+0x12) [0x66e2d2]
>>  19: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x838476]
>>  20: (ThreadPool::WorkThread::entry()+0x10) [0x83a2a0]
>>  21: (()+0x7e9a) [0x7fd692d99e9a]
>>  22: (clone()+0x6d) [0x7fd691617ccd]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>> --- logging levels ---
>>    0/ 5 none
>>    0/ 1 lockdep
>>    0/ 1 context
>>    1/ 1 crush
>>    1/ 5 mds
>>    1/ 5 mds_balancer
>>    1/ 5 mds_locker
>>    1/ 5 mds_log
>>    1/ 5 mds_log_expire
>>    1/ 5 mds_migrator
>>    0/ 1 buffer
>>    0/ 1 timer
>>    0/ 1 filer
>>    0/ 1 striper
>>    0/ 1 objecter
>>    0/ 5 rados
>>    0/ 5 rbd
>>    0/ 5 journaler
>>    0/ 5 objectcacher
>>    0/ 5 client
>>    0/ 5 osd
>>    0/ 5 optracker
>>    0/ 5 objclass
>>    1/ 3 filestore
>>    1/ 3 journal
>>    0/ 5 ms
>>    1/ 5 mon
>>    0/10 monc
>>    0/ 5 paxos
>>    0/ 5 tp
>>    1/ 5 auth
>>    1/ 5 crypto
>>    1/ 1 finisher
>>    1/ 5 heartbeatmap
>>    1/ 5 perfcounter
>>    1/ 5 rgw
>>    1/ 5 hadoop
>>    1/ 5 javaclient
>>    1/ 5 asok
>>    1/ 1 throttle
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent     10000
>>   max_new         1000
>>   log_file /var/log/ceph/ceph-osd.20.log
>> --- end dump of recent events ---
>>
>>
>>
>> --
>>     *Stéphane Boisvert*  GNS-Shop Technical Coordinator  5800 St-Denis
>> suite 1001  Montreal (QC), H2S 3L5  *MSN:* stephane.boisv...@gameloft.com
>> *E-mail:* stephane.boisv...@gameloft.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> John Wilkins
> Senior Technical Writer
> Intank
> john.wilk...@inktank.com
> (415) 425-9599
> http://inktank.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

<<Inbox.jpg>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to