Reproduce as Follows:
 HOST-A           HOST-B          HOST-C
 osd 7              osd 21           osd11
1. shutdown HOST-C
2. for a long time, cluster has only HOST A and HOST B, write data 3. shutdown 
HOST-A => then start HOST-C=> restart ceph-osd-all on HOST-B about 5 times,at 
the same time start HOST-A,restart ceph-osd-all on HOST-A serveral times

Chage some config :
[mon]
mon_min_osdmap_epochs = 10
paxos_service_trim_min = 5
paxos_service_trim_max = 10
[osd]
osd_min_pg_log_entries = 1
osd_max_pg_log_entries = 1
osd_pg_log_trim_min = 1


this time , the problem can be reproduced much more easily (hammer ), and I 
pushed PR# 16371, please have a look, thanks.

Best regards

-----邮件原件-----
发件人: Sage Weil [mailto:sw...@redhat.com] 
发送时间: 2017年7月14日 21:49
收件人: chenyehua 11692 (RD)
抄送: 'ceph-us...@ceph.com'
主题: Re: 答复: calculate past_intervals wrong, lead to choose wrong authority osd, 
then osd assert(newhead >= log.tail) 

On Fri, 14 Jul 2017, Chenyehua wrote:
> Thanks, Sage.
> 
> It doesn't happen every time, but the probability is high
> 
> Reproduce as Follows:
> HOST-A           HOST-B          HOST-C
> osd 7              osd 21           osd11
>  1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up 
> set[11,7],then shutdown HOST-C  2. for a long time, cluster has only 
> HOST A and HOST B, write data  3. shutdown HOST-A , then start HOST-C, 
> restart HOST-B about 4 times  4. start HOST-A, osd 7 assert  And set 
> pg log configuration:
>     "osd_min_pg_log_entries": "100",
>     "osd_max_pg_log_entries": "300",
>     "osd_pg_log_trim_min": "100"
> 
> After analyzing ceph-osd.log, I think the root cause is "some osd compute the 
> wrong past interval"
> 
> In my test ,osd 11 has down for a long time ,it has very old data, when it 
> came up, first received full_1001 osdmap,and generate past_interval as 
> follows:
> 92~1000         [11,7]/[11,7]----------------this is wrong interval; Actually 
> during osdmap 92 to 1000, pg already become active+clean on [7,21],and data 
> has updated a lot; 
> this past interval is inappropriate and could lead to osd11 think "I've been 
> alive and is the primary during osdmap 92~1000,  already have the same data 
> epoch with osd 7"
> next osdmap epoch, pg mapped only on [11], this pg can become active and 
> modify the last_epoch_start; In the future, other osd who has newest data 
> come back, find_best_info would choose out the one who has bigger 
> last_epoch_start, yeah it's osd 11; finally osd who has the older data 
> unexpectly become the best.

Ooh... I think this is because the past intervals generation can't really deal 
with the osdmap gap.  You can make it easier to trigger this case by setting 
mon_min_osdmap_epochs = 50 or some other smaller number (the default is 500).  
(You can, conversely, make this very hard to trigger by setting it to a value 
larger than 500, at the expense of some mon disk
space.)

Can you try reproducing with a lower value?  If we can reliably reproduce this 
I'm confident we can come up with a real fix (for hammer and jewel, if it is 
also affected).  I'm pretty sure luminous is not since the past intervals code 
was almost completely restructured, but with a method to reproduce we can 
confirm.

Thank you!
sage


> 
> Besides, after using the ceph-objectstore-tool import/export, the cluster 
> became healthy and all pgs were active+clean, however, the client io still 
> failed.
> 
> I have not tested on jewel, but looked at Jewel code, I think this 
> problem seems still exists; later I will test it;
> 
> I want to solve this problem in hammer branch, what should I do? Could you 
> give me some advice? Thanks.
> 
> 
> 
> -----邮件原件-----
> 发件人: Sage Weil [mailto:sw...@redhat.com]
> 发送时间: 2017年7月13日 22:51
> 收件人: chenyehua 11692 (RD)
> 抄送: 'ceph-us...@ceph.com'
> 主题: Re: calculate past_intervals wrong, lead to choose wrong authority 
> osd, then osd assert(newhead >= log.tail)
> 
> Hi Chenyehua,
> 
> This looks a lot like a problem we've seen several times on hammer and never 
> managed to find a root cause for.  Is this something that happened once or 
> can you reproduce it?  To my knowledge it has not happened on jewel, so my 
> first suggestion is to upgrade (hammer is pretty old now :).
> 
> Are you looking for help on resolving this specific issue for this cluster?  
> In the past we've used a combination of ceph-objectstore-tool import/export 
> and/or osd_find_best_info_ignore_history_les on the primary to resolve it.
> 
> sage
> 
> 
> 
> On Thu, 13 Jul 2017, Chenyehua wrote:
> 
> > 
> > Hi Sage
> > 
> > I find  the osd assert due to the  wrongly  generated 
> > past_intervals, could you give me some advice and solutions to this problem?
> > 
> >  
> > 
> > Here is the detail:
> > 
> >  
> > 
> > Ceph version: 0.94.5
> > 
> >  
> > 
> > HOST-A           HOST-B        HOST-C
> > 
> > osd 7                osd 21           osd11
> > 
> > 1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up 
> > set[11,7],then shutdown HOST-C
> > 
> > 2. for a long time, cluster has only HOST A and HOST B, write data
> > 
> > 3. shutdown HOST-A , then start HOST-C, restart HOST-B about 4 times
> > 
> > 4. start HOST-A, osd 21 assert
> > 
> >  
> > 
> > Analysis:
> > 
> > when osd 11 start, it generate past_intervals wrongly, make 
> > [92~1000] in the same interval
> > 
> > pg map 1673,osd11 become the primary,and pg 1.20f change from 
> > peering to
> > activating+undersized+degraded , modified last_epoch_start;
> > 
> > osd7 start, find_best_info will choose out bigger
> > last_epoch_start,althought osd7 has the latest data;
> > 
> > past_intervals on osd 7:
> > 
> > ~95         [11,7]/[11,7]
> > 
> > 96~100        [7]/[7]
> > 
> > 101         [7,21]/[7,21]
> > 
> > 102~178     [7,21]/[7]
> > 
> > 179~1663          [7,21]/[7,21]
> > 
> > 1664~1672  [21]/[21]
> > 
> > 1673~1692  [11]/[11]
> > 
> >  
> > 
> > past_intervals on osd11:
> > 
> > 92~1000         [11,7]/[11,7]----------------the wrong pi
> > 
> > 1001~1663       [7,21]/[7,21] no rw
> > 
> > 1664~1672       [21]/[21]     no rw
> > 
> > 1673~1692    [11]/[11]
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > Logs:
> > 
> > Assert on osd7:
> > 
> > 2017-07-10 16:08:29.836722 7f4fac24a700 -1 osd/PGLog.cc: In function 
> > 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&,
> > eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' 
> > thread
> > 7f4fac24a700 time
> > 2017-07-10 16:08:29.833699
> > 
> > osd/PGLog.cc: 503: FAILED assert(newhead >= log.tail)
> > 
> > ceph version 0.94.5 (664cc0b54fdb496233a81ab19d42df3f46dcda50)
> > 
> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x8b) [0xbd1ebb]
> > 
> > 2: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, 
> > eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, 
> > bool&)+0x60b) [0x7840fb]
> > 
> > 3: (PG::rewind_divergent_log(ObjectStore::Transaction&,
> > eversion_t)+0x97) [0x7df4b7]
> > 
> > 4: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) 
> > [0x80109f]
> > 
> > 5: (boost::statechart::simple_state<PG::RecoveryState::Stray,
> > PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, 
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
> > mpl_::na>,(boost::statechart::history_mode)0>::react_impl(boost::sta
> > te chart::event_ba se const&, void const*)+0x216) [0x83c8d6]
> > 
> > 6: 
> > (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine
> > ,
> > PG::RecoveryState::Initial,
> > std::allocator<void>,boost::statechart::null_exception_translator>::
> > se
> > nd_event(boost::statechart
> > ::event_base const&)+0x5b) [0x827edb]
> > 
> > 7: 
> > (PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>,
> > PG::RecoveryCtx*)+0x1ce) [0x7d5dce]
> > 
> > 8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > 
> > const&,
> > ThreadPool::TPHandle&)+0x2c0) [0x6b5930]
> > 
> > 9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > 
> > const&,
> > ThreadPool::TPHandle&)+0x18) [0x70ef18]
> > 
> > 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0xbc2aa6]
> > 
> > 11: (ThreadPool::WorkThread::entry()+0x10) [0xbc3b50]
> > 
> > 12: (()+0x8182) [0x7f4fc6458182]
> > 
> > 13: (clone()+0x6d) [0x7f4fc49c347d]
> > 
> >  
> > 
> >  
> > 
> > osd 11 log:
> > 
> > 2017-07-10 15:40:43.742547 7f8b1567a740 10 osd.11 95 pgid 1.20f coll 
> > 1.20f_head
> > 
> > 2017-07-10 15:40:43.742580 7f8b1567a740 10 osd.11 95 _open_lock_pg 
> > 1.20f
> > 
> > 2017-07-10 15:40:43.742592 7f8b1567a740  5 osd.11 pg_epoch: 95 
> > pg[1.20f(unlocked)] enter Initial
> > 
> > 2017-07-10 15:40:43.743207 7f8b1567a740 10 osd.11 pg_epoch: 95 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_loaded
> > 
> > 2017-07-10 15:40:43.743214 7f8b1567a740  5 osd.11 pg_epoch: 95 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] exit Initial
> > 0.000622 0
> > 0.000000
> > 
> > 2017-07-10 15:40:43.743220 7f8b1567a740  5 osd.11 pg_epoch: 95 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=0 crt=95'394 lcod 0'0 mlcod 0'0 inactive] enter Reset
> > 
> > 2017-07-10 15:40:43.743224 7f8b1567a740 10 osd.11 pg_epoch: 95 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Clearing blocked 
> > outgoing recovery messages
> > 
> > 2017-07-10 15:40:43.743228 7f8b1567a740 10 osd.11 pg_epoch: 95 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Not blocking 
> > outgoing recovery messages
> > 
> > 2017-07-10 15:40:43.743232 7f8b1567a740 10 osd.11 95 load_pgs loaded 
> > pg[1.20f( v 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 
> > 93/93
> > 92/92/92) [11,7] r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] 
> > log((95'300,95'397], crt=95'394)
> > 
> > 2017-07-10 15:40:43.829867 7f8b1567a740 10 osd.11 pg_epoch: 95 
> > pg[1.20f(unlocked)] _calc_past_interval_range start epoch 93 >= end 
> > epoch 92, nothing to do
> > 
> > 2017-07-10 15:42:40.899157 7f8b1567a740 10 osd.11 pg_epoch: 95 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] null
> > 
> > 2017-07-10 15:42:40.902520 7f8afa1bd700 10 osd.11 pg_epoch: 95 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] handle_peering_event:
> > epoch_sent: 95 epoch_requested: 95 NullEvt
> > 
> > 2017-07-10 15:42:40.917668 7f8b031cf700 10 osd.11 pg_epoch: 95 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] null
> > 
> > 2017-07-10 15:42:41.846755 7f8af89ba700 10 osd.11 pg_epoch: 95 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] 
> > handle_advance_map [7,21]/[7,21] -- 7/7
> > 
> > 2017-07-10 15:42:41.846771 7f8af89ba700 10 osd.11 pg_epoch: 1001 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Reset>: 
> > Reset advmap
> > 
> > 2017-07-10 15:42:41.846781 7f8af89ba700 10 osd.11 pg_epoch: 1001 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] 
> > _calc_past_interval_range start epoch 1001 >= end epoch 92, nothing 
> > to do
> > 
> > 2017-07-10 15:42:41.846790 7f8af89ba700 10 osd.11 pg_epoch: 1001 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=95 crt=95'394 lcod 0'0 mlcod 0'0 inactive] state<Reset>: 
> > should restart peering, calling start_peering_interval again
> > 
> > 2017-07-10 15:42:41.846798 7f8af89ba700 10 osd.11 pg_epoch: 1001 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=1001 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Clearing 
> > blocked outgoing recovery messages
> > 
> > 2017-07-10 15:42:41.846807 7f8af89ba700 10 osd.11 pg_epoch: 1001 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [11,7]
> > r=0 lpr=1001 crt=95'394 lcod 0'0 mlcod 0'0 inactive] Not blocking 
> > outgoing recovery messages
> > 
> > 2017-07-10 15:42:41.846829 7f8af89ba700 10 osd.11 pg_epoch: 1001 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [7,21]
> > r=-1 lpr=1001 pi=92-1000/1 crt=95'394 lcod 0'0 inactive]
> > start_peering_interval: check_new_interval output: 
> > generate_past_intervals
> > interval(92-1000 up [11,7](11) acting [11,7](11)): not rw, up_thru 
> > 94 up_from 92 last_epoch_clean 93
> > 
> > 2017-07-10 15:42:41.846839 7f8af89ba700 10 osd.11 pg_epoch: 1001 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93 92/92/92) 
> > [7,21]
> > r=-1 lpr=1001 pi=92-1000/1 crt=95'394 lcod 0'0 inactive]  noting 
> > past
> > interval(92-1000 up [11,7](11) acting [11,7](11) maybe_went_rw)
> > 
> > ...
> > 
> > 2017-07-10 15:42:52.600376 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] enter Started
> > 
> > 2017-07-10 15:42:52.600387 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] enter Start
> > 
> > 2017-07-10 15:42:52.600397 7f8af91bb700  1 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive]
> > state<Start>: transitioning to Primary
> > 
> > 2017-07-10 15:42:52.600410 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] exit Start 0.000023 0 0.000000
> > 
> > 2017-07-10 15:42:52.600424 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] enter Started/Primary
> > 
> > 2017-07-10 15:42:52.600435 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] enter Started/Primary/Peering
> > 
> > 2017-07-10 15:42:52.600446 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] enter Started/Primary/Peering/GetInfo
> > 
> > 2017-07-10 15:42:52.600460 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > _calc_past_interval_range: already have past intervals back to 93
> > 
> > 2017-07-10 15:42:52.600474 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > PriorSet: build_prior interval(1664-1672 up [21](21) acting 
> > [21](21))
> > 
> > 2017-07-10 15:42:52.608241 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > PriorSet: build_prior interval(1001-1663 up [7,21](7) acting
> > [7,21](7))
> > 
> > 2017-07-10 15:42:52.608256 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > PriorSet: build_prior interval(92-1000 up [11,7](11) acting 
> > [11,7](11)
> > maybe_went_rw)
> > 
> > 2017-07-10 15:42:52.608265 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > PriorSet: build_prior  prior osd.7 is down
> > 
> > 2017-07-10 15:42:52.608273 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > PriorSet: build_prior final: probe 11 down 7 blocked_by {}
> > 
> > 2017-07-10 15:42:52.608281 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] up_thru 94 < same_since 1673, must notify monitor
> > 
> > 2017-07-10 15:42:52.608294 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] exit Started/Primary/Peering/GetInfo 0.007847 0
> > 0.000000
> > 
> > 2017-07-10 15:42:52.608306 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] enter Started/Primary/Peering/GetLog
> > 
> > 2017-07-10 15:42:52.608318 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] calc_acting osd.11 1.20f( v 95'397 
> > (95'300,95'397]
> > local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673)
> > 
> > 2017-07-10 15:42:52.608338 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] calc_acting newest update on osd.11 with 1.20f( v
> > 95'397 (95'300,95'397]
> > local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673)
> > 
> > calc_acting primary is osd.11 with 1.20f( v 95'397 (95'300,95'397]
> > local-les=93 n=69 ec=75 les/c 93/93 1673/1673/1673)
> > 
> > 2017-07-10 15:42:52.608348 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] actingbackfill is 11
> > 
> > 2017-07-10 15:42:52.608357 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] choose_acting want [11] (== acting) 
> > backfill_targets
> > 
> > 2017-07-10 15:42:52.608367 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > state<Started/Primary/Peering/GetLog>: leaving GetLog
> > 
> > 2017-07-10 15:42:52.608380 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] exit Started/Primary/Peering/GetLog 0.000073 0
> > 0.000000
> > 
> > 2017-07-10 15:42:52.608391 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] enter Started/Primary/Peering/GetMissing
> > 
> > 2017-07-10 15:42:52.608400 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > state<Started/Primary/Peering/GetMissing>:  still need up_thru 
> > update before going active
> > 
> > 2017-07-10 15:42:52.608409 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] exit Started/Primary/Peering/GetMissing 0.000018 
> > 0
> > 0.000000
> > 
> > 2017-07-10 15:42:52.608417 7f8af91bb700  5 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] enter Started/Primary/Peering/WaitUpThru
> > 
> > 2017-07-10 15:42:52.608424 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 NullEvt
> > 
> > 2017-07-10 15:42:52.610190 7f8b06334700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] flushed
> > 
> > 2017-07-10 15:42:52.640813 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 
> > MNotifyRec from
> > 21 notify: (query_epoch:1673, epoch_sent:1673, info:1.20f( v 
> > 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 180/180
> > 1673/1673/1673))
> > features: 0x483ffffffffffff
> > 
> > 2017-07-10 15:42:52.640821 7f8af91bb700  7 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > state<Started/Primary>: handle_pg_notify from osd.21
> > 
> > 2017-07-10 15:42:52.640829 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 93/93
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]  got
> > osd.21 1.20f( v 1663'655 (1663'600,1663'655] local-les=180 n=69 
> > ec=75 les/c
> > 180/180 1673/1673/1673)
> > 
> > 2017-07-10 15:42:52.640840 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > osd.21 has stray content: 1.20f( v 1663'655 (1663'600,1663'655]
> > local-les=180 n=69 ec=75 les/c 180/180 1673/1673/1673)
> > 
> > 2017-07-10 15:42:52.640852 7f8af91bb700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] update_heartbeat_peers 11 -> 11,21
> > 
> > 2017-07-10 15:42:52.642542 7f8af89ba700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > handle_peering_event: epoch_sent: 1673 epoch_requested: 1673 
> > FlushedEvt
> > 
> > 2017-07-10 15:42:53.549732 7f8b031cf700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] null
> > 
> > 2017-07-10 15:42:53.556132 7f8af99bc700 10 osd.11 pg_epoch: 1673 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] handle_advance_map [11]/[11] -- 11/11
> > 
> > 2017-07-10 15:42:53.556142 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > state<Started/Primary/Peering>: Peering advmap
> > 
> > 2017-07-10 15:42:53.556149 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] adjust_need_up_thru now 1673, need_up_thru now 
> > false
> > 
> > 2017-07-10 15:42:53.556157 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > state<Started>: Started advmap
> > 
> > 2017-07-10 15:42:53.556164 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] check_recovery_sources no source osds () went 
> > down
> > 
> > 2017-07-10 15:42:53.556173 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] handle_activate_map
> > 
> > 2017-07-10 15:42:53.556179 7f8af99bc700  7 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > state<Started/Primary>: handle ActMap primary
> > 
> > 2017-07-10 15:42:53.556187 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] take_waiters
> > 
> > 2017-07-10 15:42:53.556194 7f8af99bc700  5 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] exit Started/Primary/Peering/WaitUpThru 0.947776 
> > 5
> > 0.008159
> > 
> > 2017-07-10 15:42:53.556203 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering]
> > state<Started/Primary/Peering>: Leaving Peering
> > 
> > 2017-07-10 15:42:53.556211 7f8af99bc700  5 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 peering] exit Started/Primary/Peering 0.955776 0 0.000000
> > 
> > 2017-07-10 15:42:53.556221 7f8af99bc700  5 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] enter Started/Primary/Active
> > 
> > 2017-07-10 15:42:53.556232 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=93 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive]
> > state<Started/Primary/Active>: In Active, about to call activate
> > 
> > 2017-07-10 15:42:53.556241 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] activate - snap_trimq []
> > 
> > 2017-07-10 15:42:53.556249 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] activate - no missing, moving last_complete 
> > 95'397
> > -> 95'397
> > 
> > 2017-07-10 15:42:53.556261 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] 1.20f primary missing_loc.add_active_missing 
> > peer
> > 11 missing: {}
> > 
> > 2017-07-10 15:42:53.556268 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0 inactive] needs_recovery is recovered
> > 
> > 2017-07-10 15:42:53.556275 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0
> > activating+undersized+degraded] state<Started/Primary/Active>: 
> > activating+undersized+Activate
> > Finished
> > 
> > 2017-07-10 15:42:53.556283 7f8af99bc700  5 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0
> > activating+undersized+degraded] enter 
> > activating+undersized+Started/Primary/Active/Activating
> > 
> > 2017-07-10 15:42:53.556290 7f8af99bc700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0
> > activating+undersized+degraded] handle_peering_event: epoch_sent: 
> > activating+undersized+1674
> > epoch_requested: 1674 NullEvt
> > 
> > 2017-07-10 15:42:53.564228 7f8b06334700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0
> > activating+undersized+degraded] _activate_committed 1674 
> > activating+undersized+peer_activated now
> > 11 last_epoch_started 180 same_interval_since 1673
> > 
> > 2017-07-10 15:42:53.564250 7f8b06334700 10 osd.11 pg_epoch: 1674 
> > pg[1.20f( v
> > 95'397 (95'300,95'397] local-les=1674 n=69 ec=75 les/c 180/180
> > 1673/1673/1673) [11] r=0 lpr=1673 pi=92-1672/3 crt=95'394 lcod 0'0 
> > mlcod 0'0
> > activating+undersized+degraded] all_activated_and_committed
> > 
> >  
> > 
> > osd 7 log:
> > 
> > 2017-07-10 16:08:29.812470 7f4fb425a700  7 osd.7 2123 handle_pg_info
> > pg_info(1 pgs e2123:1.20f(6)) v4 from osd.11
> > 
> > 2017-07-10 16:08:29.833669 7f4fac24a700 10 osd.7 pg_epoch: 2123 
> > pg[1.20f( v
> > 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 
> > 1898/1898
> > 2122/2122/1673) [11,7] r=1 lpr=2122 pi=179-2121/15 crt=1663'652 lcod
> > 0'0 inactive NOTIFY] handle_peering_event: epoch_sent: 2123 epoch_requested:
> > 2123 MInfoRec from 11 info: 1.20f( v 95'397 (95'300,95'397]
> > local-les=2123
> > n=69 ec=75 les/c 1898/1898 2122/2122/1673)
> > 
> > 2017-07-10 16:08:29.833684 7f4fac24a700 10 osd.7 pg_epoch: 2123 
> > pg[1.20f( v
> > 1663'655 (1663'600,1663'655] local-les=180 n=69 ec=75 les/c 
> > 1898/1898
> > 2122/2122/1673) [11,7] r=1 lpr=2122 pi=179-2121/15 crt=1663'652 lcod
> > 0'0 inactive NOTIFY] state<Started/Stray>: got info from osd.11 
> > 1.20f( v 95'397 (95'300,95'397] local-les=2123 n=69 ec=75 les/c 
> > 1898/1898
> > 2122/2122/1673)
> > 
> > --------------------------------------------------------------------
> > --
> > -----
> > ----------------------------------------------------------
> > 本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出
> > 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、
> ?? ??制、
> > 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并
> ?? ??除本
> > 邮件!
> > This e-mail and its attachments contain confidential information 
> > from New H3C, which is intended only for the person or entity whose 
> > address is listed above. Any use of the information contained herein 
> > in any way (including, but not limited to, total or partial 
> > disclosure, reproduction, or dissemination) by persons other than 
> > the intended
> > recipient(s) is prohibited. If you receive this e-mail in error, 
> > please notify the sender by phone or email immediately and delete it!
> > 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to