I was wrong the previous analysis, it was not the iterator got reset,
the problem I can see now, is that during the syncing, a new round of
election kicked off and thus it needs to probe the newly added
monitor, however, since it hasn't been synced yet, it will restart the
syncing from there.

Hi Sage and Joao,
Is there a way to freeze the election by some tunable to let the sync finish?

Thanks,
Guang

On Fri, Nov 13, 2015 at 9:00 AM, Guang Yang <guan...@gmail.com> wrote:
> Hi Joao,
> We have a problem when trying to add new monitors to the cluster on an
> unhealthy cluster, which I would like ask for your suggestion.
>
> After adding the new monitor, it  started syncing the store and went
> into an infinite loop:
>
> 2015-11-12 21:02:23.499510 7f1e8030e700 10
> mon.mon04c011@2(synchronizing) e5 handle_sync_chunk mon_sync(chunk
> cookie 4513071120 lc 14697737 bl 929616 bytes last_key
> osdmap,full_22530) v2
> 2015-11-12 21:02:23.712944 7f1e8030e700 10
> mon.mon04c011@2(synchronizing) e5 handle_sync_chunk mon_sync(chunk
> cookie 4513071120 lc 14697737 bl 799897 bytes last_key
> osdmap,full_3259) v2
>
>
> We talked early in the morning on IRC, and at the time I thought it
> was because the osdmap epoch was increasing, which lead to this
> infinite loop.
>
> I then set those nobackfill/norecovery flags and the osdmap epoch
> freezed, however, the problem is still there.
>
> While the osdmap epoch is 22531, the switch always happened at
> osdmap.full_22530 (as showed by the above log).
>
> Looking at the code at both sides, it looks this check
> (https://github.com/ceph/ceph/blob/master/src/mon/Monitor.cc#L1389)
> always true, and I can confirm from the log that (sp.last_commited <
> paxos->get_version()) was false, so the chance is that the
> sp.synchronizer always has next chunk?
>
> Does this look familiar to you? Or any other trouble shoot I can try?
> Thanks very much.
>
> Thanks,
> Guang
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to