Reproduce as Follows: HOST-A HOST-B HOST-C osd 7 osd 21 osd11 1. shutdown HOST-C 2. for a long time, cluster has only HOST A and HOST B, write data 3. shutdown HOST-A => then start HOST-C=> restart ceph-osd-all on HOST-B about 5 times,at the same time start HOST-A,restart ceph-osd-all on HOST-A serveral times
Chage some config : [mon] mon_min_osdmap_epochs = 10 paxos_service_trim_min = 5 paxos_service_trim_max = 10 [osd] osd_min_pg_log_entries = 1 osd_max_pg_log_entries = 1 osd_pg_log_trim_min = 1 this time , the problem can be reproduced much more easily (hammer ), and I pushed PR# 16371, please have a look, thanks. Best regards -----邮件原件----- 发件人: Sage Weil [mailto:sw...@redhat.com] 发送时间: 2017年7月14日 21:49 收件人: chenyehua 11692 (RD) 抄送: 'ceph-us...@ceph.com' 主题: Re: 答复: calculate past_intervals wrong, lead to choose wrong authority osd, then osd assert(newhead >= log.tail) On Fri, 14 Jul 2017, Chenyehua wrote: > Thanks, Sage. > > It doesn't happen every time, but the probability is high > > Reproduce as Follows: > HOST-A HOST-B HOST-C > osd 7 osd 21 osd11 > 1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up > set[11,7],then shutdown HOST-C 2. for a long time, cluster has only > HOST A and HOST B, write data 3. shutdown HOST-A , then start HOST-C, > restart HOST-B about 4 times 4. start HOST-A, osd 7 assert And set > pg log configuration: > "osd_min_pg_log_entries": "100", > "osd_max_pg_log_entries": "300", > "osd_pg_log_trim_min": "100" > > After analyzing ceph-osd.log, I think the root cause is "some osd compute the > wrong past interval" > > In my test ,osd 11 has down for a long time ,it has very old data, when it > came up, first received full_1001 osdmap,and generate past_interval as > follows: > 92~1000 [11,7]/[11,7]----------------this is wrong interval; Actually > during osdmap 92 to 1000, pg already become active+clean on [7,21],and data > has updated a lot; > this past interval is inappropriate and could lead to osd11 think "I've been > alive and is the primary during osdmap 92~1000, already have the same data > epoch with osd 7" > next osdmap epoch, pg mapped only on [11], this pg can become active and > modify the last_epoch_start; In the future, other osd who has newest data > come back, find_best_info would choose out the one who has bigger > last_epoch_start, yeah it's osd 11; finally osd who has the older data > unexpectly become the best. Ooh... I think this is because the past intervals generation can't really deal with the osdmap gap. You can make it easier to trigger this case by setting mon_min_osdmap_epochs = 50 or some other smaller number (the default is 500). (You can, conversely, make this very hard to trigger by setting it to a value larger than 500, at the expense of some mon disk space.) Can you try reproducing with a lower value? If we can reliably reproduce this I'm confident we can come up with a real fix (for hammer and jewel, if it is also affected). I'm pretty sure luminous is not since the past intervals code was almost completely restructured, but with a method to reproduce we can confirm. Thank you! sage > > Besides, after using the ceph-objectstore-tool import/export, the cluster > became healthy and all pgs were active+clean, however, the client io still > failed. > > I have not tested on jewel, but looked at Jewel code, I think this > problem seems still exists; later I will test it; > > I want to solve this problem in hammer branch, what should I do? Could you > give me some advice? Thanks. > > > > -----邮件原件----- > 发件人: Sage Weil [mailto:sw...@redhat.com] > 发送时间: 2017年7月13日 22:51 > 收件人: chenyehua 11692 (RD) > 抄送: 'ceph-us...@ceph.com' > 主题: Re: calculate past_intervals wrong, lead to choose wrong authority > osd, then osd assert(newhead >= log.tail) > > Hi Chenyehua, > > This looks a lot like a problem we've seen several times on hammer and never > managed to find a root cause for. Is this something that happened once or > can you reproduce it? To my knowledge it has not happened on jewel, so my > first suggestion is to upgrade (hammer is pretty old now :). > > Are you looking for help on resolving this specific issue for this cluster? > In the past we've used a combination of ceph-objectstore-tool import/export > and/or osd_find_best_info_ignore_history_les on the primary to resolve it. > > sage > > > > On Thu, 13 Jul 2017, Chenyehua wrote: > > > > > Hi Sage > > > > I find the osd assert due to the wrongly generated > > past_intervals, could you give me some advice and solutions to this problem? > > > > > > > > Here is the detail: > > > > > > > > Ceph version: 0.94.5 > > > > > > > > HOST-A HOST-B HOST-C > > > > osd 7 osd 21 osd11 > > > > 1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up > > set[11,7],then shutdown HOST-C > > > > 2. for a long time, cluster has only HOST A and HOST B, write data > > > > 3. shutdown HOST-A , then start HOST-C, restart HOST-B about 4 times > > > > 4. start HOST-A, osd 21 assert > > > > > > > > Analysis: > > > > when osd 11 start, it generate past_intervals wrongly, make > > [92~1000] in the same interval > > > > pg map 1673,osd11 become the primary,and pg 1.20f change from > > peering to > > activating+undersized+degraded , modified last_epoch_start; > > > > osd7 start, find_best_info will choose out bigger > > last_epoch_start,althought osd7 has the latest data; > > > > past_intervals on osd 7: > > > > ~95 [11,7]/[11,7] > > > > 96~100 [7]/[7] > > > > 101 [7,21]/[7,21] > > > > 102~178 [7,21]/[7] > > > > 179~1663 [7,21]/[7,21] > > > > 1664~1672 [21]/[21] > > > > 1673~1692 [11]/[11] > > > > > > > > past_intervals on osd11: > > > > 92~1000 [11,7]/[11,7]----------------the wrong pi > > > > 1001~1663 [7,21]/[7,21] no rw > > > > 1664~1672 [21]/[21] no rw > > > > 1673~1692 [11]/[11] > > > > > > > > > > > > > > > > Logs: > > > > Assert on osd7: > > > > 2017-07-10 16:08:29.836722 7f4fac24a700 -1 osd/PGLog.cc: In function > > 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, > > eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' > > thread > > 7f4fac24a700 time > > 2017-07-10 16:08:29.833699 > > > > osd/PGLog.cc: 503: FAILED assert(newhead >= log.tail) > > > > ceph version 0.94.5 (664cc0b54fdb496233a81ab19d42df3f46dcda50) > > > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x8b) [0xbd1ebb] > > > > 2: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, > > eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, > > bool&)+0x60b) [0x7840fb] > > > > 3: (PG::rewind_divergent_log(ObjectStore::Transaction&, > > eversion_t)+0x97) [0x7df4b7] > > > > 4: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) > > [0x80109f] > > > > 5: (boost::statechart::simple_state<PG::RecoveryState::Stray, > > PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, > > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, > > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, > > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, > > mpl_::na>,(boost::statechart::history_mode)0>::react_impl(boost::sta > > te chart::event_ba se const&, void const*)+0x216) [0x83c8d6] > > > > 6: > > (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine > > , > > PG::RecoveryState::Initial, > > std::allocator<void>,boost::statechart::null_exception_translator>:: > > se > > nd_event(boost::statechart > > ::event_base const&)+0x5b) [0x827edb] > > > > 7: > > (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>, > > PG::RecoveryCtx*)+0x1ce) [0x7d5dce] > > > > 8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > > > const&, > > ThreadPool::TPHandle&)+0x2c0) [0x6b5930] > > > > 9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > > > const&, > > ThreadPool::TPHandle&)+0x18) [0x70ef18] > > > > 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xbc2aa6] > > > > 11: (ThreadPool::WorkThread::entry()+0x10) [0xbc3b50] > > > > 12: (()+0x8182) [0x7f4fc6458182] > > > > 13: (clone()+0x6d) [0x7f4fc49c347d] > > > > > > > > > > > > osd 11 log: > > > > 2017-07-10 15:40:43.742547 7f8b1567a740 10 osd.11 95 pgid 1.20f coll > > 1.20f_head > > > > 2017-07-10 15:40:43.742580 7f8b1567a740 10 osd.11 95 _open_lock_pg > > 1.20f > > > > 2017-07-10 15:40:43.742592 7f8b1567a740 5 osd.11 pg_epoch: 95 > > pg[1.20f(unlocked)] enter Initial > > > > 2017-07-10 15:40:43.743207 7f8b1567a740 10 osd.11 pg_epoch: 95 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_loaded > > > > 2017-07-10 15:40:43.743214 7f8b1567a740 5 osd.11 pg_epoch: 95 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] exit Initial > > 0.000622 0 > > 0.000000 > > > > 2017-07-10 15:40:43.743220 7f8b1567a740 5 osd.11 pg_epoch: 95 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] enter Reset > > > > 2017-07-10 15:40:43.743224 7f8b1567a740 10 osd.11 pg_epoch: 95 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Clearing blocked > > outgoing recovery messages > > > > 2017-07-10 15:40:43.743228 7f8b1567a740 10 osd.11 pg_epoch: 95 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Not blocking > > outgoing recovery messages > > > > 2017-07-10 15:40:43.743232 7f8b1567a740 10 osd.11 95 load_pgs loaded > > pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c > > 93/93 > > 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] > > log((95'300,95'397], crt=95'394) > > > > 2017-07-10 15:40:43.829867 7f8b1567a740 10 osd.11 pg_epoch: 95 > > pg[1.20f(unlocked)] _calc_past_interval_range start epoch 93 >= end > > epoch 92, nothing to do > > > > 2017-07-10 15:42:40.899157 7f8b1567a740 10 osd.11 pg_epoch: 95 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] null > > > > 2017-07-10 15:42:40.902520 7f8afa1bd700 10 osd.11 pg_epoch: 95 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_peering_event: > > epoch_sent: 95 epoch_requested: 95 NullEvt > > > > 2017-07-10 15:42:40.917668 7f8b031cf700 10 osd.11 pg_epoch: 95 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] null > > > > 2017-07-10 15:42:41.846755 7f8af89ba700 10 osd.11 pg_epoch: 95 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] > > handle_advance_map [7,21]/[7,21] -- 7/7 > > > > 2017-07-10 15:42:41.846771 7f8af89ba700 10 osd.11 pg_epoch: 1001 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Reset>: > > Reset advmap > > > > 2017-07-10 15:42:41.846781 7f8af89ba700 10 osd.11 pg_epoch: 1001 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] > > _calc_past_interval_range start epoch 1001 >= end epoch 92, nothing > > to do > > > > 2017-07-10 15:42:41.846790 7f8af89ba700 10 osd.11 pg_epoch: 1001 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Reset>: > > should restart peering, calling start_peering_interval again > > > > 2017-07-10 15:42:41.846798 7f8af89ba700 10 osd.11 pg_epoch: 1001 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=1001 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Clearing > > blocked outgoing recovery messages > > > > 2017-07-10 15:42:41.846807 7f8af89ba700 10 osd.11 pg_epoch: 1001 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [11,7] > > r=0 lpr=1001 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Not blocking > > outgoing recovery messages > > > > 2017-07-10 15:42:41.846829 7f8af89ba700 10 osd.11 pg_epoch: 1001 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [7,21] > > r=-1 lpr=1001 pi=92-1000/1 crt=95'394 lcod 0'0 inactive] > > start_peering_interval: check_new_interval output: > > generate_past_intervals > > interval(92-1000 up [11,7](11) acting [11,7](11)): not rw, up_thru > > 94 up_from 92 last_epoch_clean 93 > > > > 2017-07-10 15:42:41.846839 7f8af89ba700 10 osd.11 pg_epoch: 1001 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) > > [7,21] > > r=-1 lpr=1001 pi=92-1000/1 crt=95'394 lcod 0'0 inactive] noting > > past > > interval(92-1000 up [11,7](11) acting [11,7](11) maybe_went_rw) > > > > ... > > > > 2017-07-10 15:42:52.600376 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] enter Started > > > > 2017-07-10 15:42:52.600387 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] enter Start > > > > 2017-07-10 15:42:52.600397 7f8af91bb700 1 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] > > state<Start>: transitioning to Primary > > > > 2017-07-10 15:42:52.600410 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] exit Start 0.000023 0 0.000000 > > > > 2017-07-10 15:42:52.600424 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] enter Started/Primary > > > > 2017-07-10 15:42:52.600435 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] enter Started/Primary/Peering > > > > 2017-07-10 15:42:52.600446 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] enter Started/Primary/Peering/GetInfo > > > > 2017-07-10 15:42:52.600460 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > _calc_past_interval_range: already have past intervals back to 93 > > > > 2017-07-10 15:42:52.600474 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > PriorSet: build_prior interval(1664-1672 up [21](21) acting > > [21](21)) > > > > 2017-07-10 15:42:52.608241 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > PriorSet: build_prior interval(1001-1663 up [7,21](7) acting > > [7,21](7)) > > > > 2017-07-10 15:42:52.608256 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > PriorSet: build_prior interval(92-1000 up [11,7](11) acting > > [11,7](11) > > maybe_went_rw) > > > > 2017-07-10 15:42:52.608265 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > PriorSet: build_prior prior osd.7 is down > > > > 2017-07-10 15:42:52.608273 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > PriorSet: build_prior final: probe 11 down 7 blocked_by {} > > > > 2017-07-10 15:42:52.608281 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] up_thru 94 < same_since 1673, must notify monitor > > > > 2017-07-10 15:42:52.608294 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] exit Started/Primary/Peering/GetInfo 0.007847 0 > > 0.000000 > > > > 2017-07-10 15:42:52.608306 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] enter Started/Primary/Peering/GetLog > > > > 2017-07-10 15:42:52.608318 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] calc_acting osd.11 1.20f( v 95'397 > > (95'300,95'397] > > local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) > > > > 2017-07-10 15:42:52.608338 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] calc_acting newest update on osd.11 with 1.20f( v > > 95'397 (95'300,95'397] > > local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) > > > > calc_acting primary is osd.11 with 1.20f( v 95'397 (95'300,95'397] > > local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673) > > > > 2017-07-10 15:42:52.608348 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] actingbackfill is 11 > > > > 2017-07-10 15:42:52.608357 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] choose_acting want [11] (== acting) > > backfill_targets > > > > 2017-07-10 15:42:52.608367 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > state<Started/Primary/Peering/GetLog>: leaving GetLog > > > > 2017-07-10 15:42:52.608380 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] exit Started/Primary/Peering/GetLog 0.000073 0 > > 0.000000 > > > > 2017-07-10 15:42:52.608391 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] enter Started/Primary/Peering/GetMissing > > > > 2017-07-10 15:42:52.608400 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > state<Started/Primary/Peering/GetMissing>: still need up_thru > > update before going active > > > > 2017-07-10 15:42:52.608409 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] exit Started/Primary/Peering/GetMissing 0.000018 > > 0 > > 0.000000 > > > > 2017-07-10 15:42:52.608417 7f8af91bb700 5 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] enter Started/Primary/Peering/WaitUpThru > > > > 2017-07-10 15:42:52.608424 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 NullEvt > > > > 2017-07-10 15:42:52.610190 7f8b06334700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] flushed > > > > 2017-07-10 15:42:52.640813 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 > > MNotifyRec from > > 21 notify: (query_epoch:1673, epoch_sent:1673, info:1.20f( v > > 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 180/180 > > 1673/1673/1673)) > > features: 0x483ffffffffffff > > > > 2017-07-10 15:42:52.640821 7f8af91bb700 7 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > state<Started/Primary>: handle_pg_notify from osd.21 > > > > 2017-07-10 15:42:52.640829 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] got > > osd.21 1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 > > ec=75 les/c > > 180/180 1673/1673/1673) > > > > 2017-07-10 15:42:52.640840 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > osd.21 has stray content: 1.20f( v 1663'655 (1663'600,1663'655] > > local-les=180 n=69 ec=75 les/c 180/180 1673/1673/1673) > > > > 2017-07-10 15:42:52.640852 7f8af91bb700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] update_heartbeat_peers 11 -> 11,21 > > > > 2017-07-10 15:42:52.642542 7f8af89ba700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 > > FlushedEvt > > > > 2017-07-10 15:42:53.549732 7f8b031cf700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] null > > > > 2017-07-10 15:42:53.556132 7f8af99bc700 10 osd.11 pg_epoch: 1673 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] handle_advance_map [11]/[11] -- 11/11 > > > > 2017-07-10 15:42:53.556142 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > state<Started/Primary/Peering>: Peering advmap > > > > 2017-07-10 15:42:53.556149 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] adjust_need_up_thru now 1673, need_up_thru now > > false > > > > 2017-07-10 15:42:53.556157 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > state<Started>: Started advmap > > > > 2017-07-10 15:42:53.556164 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] check_recovery_sources no source osds () went > > down > > > > 2017-07-10 15:42:53.556173 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] handle_activate_map > > > > 2017-07-10 15:42:53.556179 7f8af99bc700 7 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > state<Started/Primary>: handle ActMap primary > > > > 2017-07-10 15:42:53.556187 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] take_waiters > > > > 2017-07-10 15:42:53.556194 7f8af99bc700 5 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] exit Started/Primary/Peering/WaitUpThru 0.947776 > > 5 > > 0.008159 > > > > 2017-07-10 15:42:53.556203 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] > > state<Started/Primary/Peering>: Leaving Peering > > > > 2017-07-10 15:42:53.556211 7f8af99bc700 5 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 peering] exit Started/Primary/Peering 0.955776 0 0.000000 > > > > 2017-07-10 15:42:53.556221 7f8af99bc700 5 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] enter Started/Primary/Active > > > > 2017-07-10 15:42:53.556232 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] > > state<Started/Primary/Active>: In Active, about to call activate > > > > 2017-07-10 15:42:53.556241 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] activate - snap_trimq [] > > > > 2017-07-10 15:42:53.556249 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] activate - no missing, moving last_complete > > 95'397 > > -> 95'397 > > > > 2017-07-10 15:42:53.556261 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] 1.20f primary missing_loc.add_active_missing > > peer > > 11 missing: {} > > > > 2017-07-10 15:42:53.556268 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 inactive] needs_recovery is recovered > > > > 2017-07-10 15:42:53.556275 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 > > activating+undersized+degraded] state<Started/Primary/Active>: > > activating+undersized+Activate > > Finished > > > > 2017-07-10 15:42:53.556283 7f8af99bc700 5 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 > > activating+undersized+degraded] enter > > activating+undersized+Started/Primary/Active/Activating > > > > 2017-07-10 15:42:53.556290 7f8af99bc700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 > > activating+undersized+degraded] handle_peering_event: epoch_sent: > > activating+undersized+1674 > > epoch_requested: 1674 NullEvt > > > > 2017-07-10 15:42:53.564228 7f8b06334700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 > > activating+undersized+degraded] _activate_committed 1674 > > activating+undersized+peer_activated now > > 11 last_epoch_started 180 same_interval_since 1673 > > > > 2017-07-10 15:42:53.564250 7f8b06334700 10 osd.11 pg_epoch: 1674 > > pg[1.20f( v > > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180 > > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 > > mlcod 0'0 > > activating+undersized+degraded] all_activated_and_committed > > > > > > > > osd 7 log: > > > > 2017-07-10 16:08:29.812470 7f4fb425a700 7 osd.7 2123 handle_pg_info > > pg_info(1 pgs e2123:1.20f(6)) v4 from osd.11 > > > > 2017-07-10 16:08:29.833669 7f4fac24a700 10 osd.7 pg_epoch: 2123 > > pg[1.20f( v > > 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c > > 1898/1898 > > 2122/2122/1673) [11,7] r=1 lpr=2122 pi=179-2121/15 crt=1663'652 lcod > > 0'0 inactive NOTIFY] handle_peering_event: epoch_sent: 2123 epoch_requested: > > 2123 MInfoRec from 11 info: 1.20f( v 95'397 (95'300,95'397] > > local-les=2123 > > n=69 ec=75 les/c 1898/1898 2122/2122/1673) > > > > 2017-07-10 16:08:29.833684 7f4fac24a700 10 osd.7 pg_epoch: 2123 > > pg[1.20f( v > > 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c > > 1898/1898 > > 2122/2122/1673) [11,7] r=1 lpr=2122 pi=179-2121/15 crt=1663'652 lcod > > 0'0 inactive NOTIFY] state<Started/Stray>: got info from osd.11 > > 1.20f( v 95'397 (95'300,95'397] local-les=2123 n=69 ec=75 les/c > > 1898/1898 > > 2122/2122/1673) > > > > -------------------------------------------------------------------- > > -- > > ----- > > ---------------------------------------------------------- > > 本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出 > > 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、 > ?? ??制、 > > 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并 > ?? ??除本 > > 邮件! > > This e-mail and its attachments contain confidential information > > from New H3C, which is intended only for the person or entity whose > > address is listed above. Any use of the information contained herein > > in any way (including, but not limited to, total or partial > > disclosure, reproduction, or dissemination) by persons other than > > the intended > > recipient(s) is prohibited. If you receive this e-mail in error, > > please notify the sender by phone or email immediately and delete it! > > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com