On Tue, Mar 11, 2014 at 9:15 AM, Joao Eduardo Luis <joao.l...@inktank.com>wrote:

> On 03/10/2014 10:30 PM, Pawel Veselov wrote:
>
>>
>> Now, I'm getting this. May be any idea what can be done to straighten
>> this up?
>>
>
> This is weird.  Can you please share the steps taken until this was
> triggered, as well as the rest of the log?
>

At this point, no, sorry.

This whole thing started with migrating from 0.56.7 to 0.72.2. First, we
started seeing failed assertions of (version == pg_map.version) in
PGMonitor.cc:273, but on one monitor (d) only. I attempted to resync the
failing monitor (d) with --force-sync from (c). (d) started to work, but
(c) started to fail with (version==pg_map.version) assertion. So, I tried
re-syncing (c) from (d) with --force-resync. That's when (c) started to
fail with this particular (ret==0) assertion. I don't really think that
resyncing actually worked any at that point.

I didn't find a way to fix this quickly enough, so I restored the mon
directories from back up, and started again. The (version ==
pg_map.version) came back, but my back-up was taken before I was trying to
do force-resync, but not before the migration started (that was stupid of
me to not have backed up before migration). (That's the point when I tried
all kindsa crazy stuff for a while).

After some poking around, what I ended up doing is plain removing
'store.db' directory from the monitor fs, and starting the monitors. That
just re-initiated the migration, and this time it was done in the absence
of client requests, and one monitor at a time.



>>       0> 2014-03-10 22:26:23.757166 7fc0397e5700 -1 mon/AuthMonitor.cc:
>> In function 'virtual void AuthMonitor::create_initial()' thread
>> 7fc0397e5700 time 2014-03-10 22:26:23.755442
>> mon/AuthMonitor.cc: 101: FAILED assert(ret == 0)
>>
>>   ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>>   1: (AuthMonitor::create_initial()+0x4d8) [0x637bb8]
>>   2: (PaxosService::_active()+0x51b) [0x594fcb]
>>   3: (Context::complete(int)+0x9) [0x565499]
>>   4: (finish_contexts(CephContext*, std::list<Context*,
>> std::allocator<Context*> >&, int)+0x95) [0x5698b5]
>>   5: (Paxos::handle_accept(MMonPaxos*)+0x885) [0x589595]
>>   6: (Paxos::dispatch(PaxosServiceMessage*)+0x28b) [0x58d66b]
>>   7: (Monitor::dispatch(MonSession*, Message*, bool)+0x4f0) [0x563620]
>>   8: (Monitor::_ms_dispatch(Message*)+0x1fb) [0x5639fb]
>>   9: (Monitor::ms_dispatch(Message*)+0x32) [0x57f212]
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to