I've been seeing these as well on our SSD cachetier that's been ravaged by disk failures as of late.... Same tp_peering assert as above even running luminous branch from git.
Let me know if you have a bug filed I can +1 or have found a workaround. E On Wed, Nov 15, 2017 at 10:25 AM, Ashley Merrick <[email protected]> wrote: > Hello, > > > > After replacing a single OSD disk due to a failed disk I am now seeing 2-3 > OSD’s randomly stop and fail to start, do a boot loop get to load_pgs and > then fail with the following (I tried setting OSD log’s to 5/5 but didn’t > get any extra lines around the error just more information pre boot. > > > > Could this be a certain PG causing these OSD’s to crash (6.2f2s10 for > example)? > > > > -9> 2017-11-15 17:37:14.696229 7fa4ec50f700 1 osd.37 pg_epoch: 161571 > pg[6.2f9s1( v 161563'158209 lc 161175'158153 (150659'148187,161563'158209] > local-lis/les=161519/161521 n=47572 ec=31534/31534 lis/c 161519/152474 > les/c/f 161521/152523/159786 161517/161519/161519) > [34,37,13,12,66,69,118,120,28,20,88,0,2]/[34,37,13,12,66,69, > 118,120,28,20,53,54,2147483647] r=1 lpr=161563 pi=[152474,161519)/1 > crt=161562'158208 lcod 0'0 unknown NOTIFY m=21] state<Start>: transitioning > to Stray > > -8> 2017-11-15 17:37:14.696239 7fa4ec50f700 5 osd.37 pg_epoch: 161571 > pg[6.2f9s1( v 161563'158209 lc 161175'158153 (150659'148187,161563'158209] > local-lis/les=161519/161521 n=47572 ec=31534/31534 lis/c 161519/152474 > les/c/f 161521/152523/159786 161517/161519/161519) > [34,37,13,12,66,69,118,120,28,20,88,0,2]/[34,37,13,12,66,69, > 118,120,28,20,53,54,2147483647] r=1 lpr=161563 pi=[152474,161519)/1 > crt=161562'158208 lcod 0'0 unknown NOTIFY m=21] exit Start 0.000019 0 > 0.000000 > > -7> 2017-11-15 17:37:14.696250 7fa4ec50f700 5 osd.37 pg_epoch: 161571 > pg[6.2f9s1( v 161563'158209 lc 161175'158153 (150659'148187,161563'158209] > local-lis/les=161519/161521 n=47572 ec=31534/31534 lis/c 161519/152474 > les/c/f 161521/152523/159786 161517/161519/161519) > [34,37,13,12,66,69,118,120,28,20,88,0,2]/[34,37,13,12,66,69, > 118,120,28,20,53,54,2147483647] r=1 lpr=161563 pi=[152474,161519)/1 > crt=161562'158208 lcod 0'0 unknown NOTIFY m=21] enter Started/Stray > > -6> 2017-11-15 17:37:14.696324 7fa4ec50f700 5 osd.37 pg_epoch: 161571 > pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] > local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 > les/c/f 161519/160963/159786 161517/161517/108939) > [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 > pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] exit > Reset 3.363755 2 0.000076 > > -5> 2017-11-15 17:37:14.696337 7fa4ec50f700 5 osd.37 pg_epoch: 161571 > pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] > local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 > les/c/f 161519/160963/159786 161517/161517/108939) > [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 > pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] enter > Started > > -4> 2017-11-15 17:37:14.696346 7fa4ec50f700 5 osd.37 pg_epoch: 161571 > pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] > local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 > les/c/f 161519/160963/159786 161517/161517/108939) > [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 > pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] enter > Start > > -3> 2017-11-15 17:37:14.696353 7fa4ec50f700 1 osd.37 pg_epoch: 161571 > pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] > local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 > les/c/f 161519/160963/159786 161517/161517/108939) > [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 > pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] > state<Start>: transitioning to Stray > > -2> 2017-11-15 17:37:14.696364 7fa4ec50f700 5 osd.37 pg_epoch: 161571 > pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] > local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 > les/c/f 161519/160963/159786 161517/161517/108939) > [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 > pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] exit > Start 0.000018 0 0.000000 > > -1> 2017-11-15 17:37:14.696372 7fa4ec50f700 5 osd.37 pg_epoch: 161571 > pg[6.2f2s10( v 161570'157712 lc 161175'157648 (160455'154564,161570'157712] > local-lis/les=161517/161519 n=47328 ec=31534/31534 lis/c 161517/160962 > les/c/f 161519/160963/159786 161517/161517/108939) > [96,100,79,4,69,65,57,59,135,134,37,35,18] r=10 lpr=161570 > pi=[160962,161517)/2 crt=161560'157711 lcod 0'0 unknown NOTIFY m=5] enter > Started/Stray > > 0> 2017-11-15 17:37:14.697245 7fa4ebd0e700 -1 *** Caught signal > (Aborted) ** > > in thread 7fa4ebd0e700 thread_name:tp_peering > > > > ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous > (stable) > > 1: (()+0xa3acdc) [0x55dfb6ba3cdc] > > 2: (()+0xf890) [0x7fa510e2c890] > > 3: (gsignal()+0x37) [0x7fa50fe66067] > > 4: (abort()+0x148) [0x7fa50fe67448] > > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x27f) [0x55dfb6be6f5f] > > 6: (PG::start_peering_interval(std::shared_ptr<OSDMap const>, > std::vector<int, std::allocator<int> > const&, int, std::vector<int, > std::allocator<int> > const&, int, ObjectStore::Transaction*)+0x14e3) > [0x55dfb670f8a3] > > 7: (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x539) > [0x55dfb670ff39] > > 8: (boost::statechart::simple_state<PG::RecoveryState::Reset, > PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, > mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_ > mode)0>::react_impl(boost::statechart::event_base const&, void > const*)+0x244) [0x55dfb67552a4] > > 9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, > PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_ > exception_translator>::send_event(boost::statechart::event_base > const&)+0x6b) [0x55dfb6732c1b] > > 10: (PG::handle_advance_map(std::shared_ptr<OSDMap const>, > std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, > int, std::vector<int, std::allocator<int> >&, int, PG::RecoveryCtx*)+0x3e3) > [0x55dfb6702ef3] > > 11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, > PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, > std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > > >*)+0x20a) [0x55dfb664db2a] > > 12: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > > const&, ThreadPool::TPHandle&)+0x175) [0x55dfb664e6b5] > > 13: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, > ThreadPool::TPHandle&)+0x27) [0x55dfb66ae5a7] > > 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa8f) [0x55dfb6bedb1f] > > 15: (ThreadPool::WorkThread::entry()+0x10) [0x55dfb6beea50] > > 16: (()+0x8064) [0x7fa510e25064] > > 17: (clone()+0x6d) [0x7fa50ff1962d] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > > > --- logging levels --- > > 0/ 5 none > > 0/ 1 lockdep > > 0/ 1 context > > 1/ 1 crush > > 1/ 5 mds > > 1/ 5 mds_balancer > > 1/ 5 mds_locker > > 1/ 5 mds_log > > 1/ 5 mds_log_expire > > 1/ 5 mds_migrator > > 0/ 1 buffer > > 0/ 1 timer > > 0/ 1 filer > > 0/ 1 striper > > 0/ 1 objecter > > 0/ 5 rados > > 0/ 5 rbd > > 0/ 5 rbd_mirror > > 0/ 5 rbd_replay > > 0/ 5 journaler > > 0/ 5 objectcacher > > 0/ 5 client > > 1/ 5 osd > > 0/ 5 optracker > > 0/ 5 objclass > > 1/ 3 filestore > > 1/ 3 journal > > 0/ 5 ms > > 1/ 5 mon > > 0/10 monc > > 1/ 5 paxos > > 0/ 5 tp > > 1/ 5 auth > > 1/ 5 crypto > > 1/ 1 finisher > > 1/ 5 heartbeatmap > > 1/ 5 perfcounter > > 1/ 5 rgw > > 1/10 civetweb > > 1/ 5 javaclient > > 1/ 5 asok > > 1/ 1 throttle > > 0/ 0 refs > > 1/ 5 xio > > 1/ 5 compressor > > 1/ 5 bluestore > > 1/ 5 bluefs > > 1/ 3 bdev > > 1/ 5 kstore > > 4/ 5 rocksdb > > 4/ 5 leveldb > > 4/ 5 memdb > > 1/ 5 kinetic > > 1/ 5 fuse > > 1/ 5 mgr > > 1/ 5 mgrc > > 1/ 5 dpdk > > 1/ 5 eventtrace > > -2/-2 (syslog threshold) > > -1/-1 (stderr threshold) > > max_recent 10000 > > max_new 1000 > > log_file /var/log/ceph/ceph-osd.37.log > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
