On Tue, Mar 11, 2014 at 9:15 AM, Joao Eduardo Luis <joao.l...@inktank.com>wrote:
> On 03/10/2014 10:30 PM, Pawel Veselov wrote: > >> >> Now, I'm getting this. May be any idea what can be done to straighten >> this up? >> > > This is weird. Can you please share the steps taken until this was > triggered, as well as the rest of the log? > At this point, no, sorry. This whole thing started with migrating from 0.56.7 to 0.72.2. First, we started seeing failed assertions of (version == pg_map.version) in PGMonitor.cc:273, but on one monitor (d) only. I attempted to resync the failing monitor (d) with --force-sync from (c). (d) started to work, but (c) started to fail with (version==pg_map.version) assertion. So, I tried re-syncing (c) from (d) with --force-resync. That's when (c) started to fail with this particular (ret==0) assertion. I don't really think that resyncing actually worked any at that point. I didn't find a way to fix this quickly enough, so I restored the mon directories from back up, and started again. The (version == pg_map.version) came back, but my back-up was taken before I was trying to do force-resync, but not before the migration started (that was stupid of me to not have backed up before migration). (That's the point when I tried all kindsa crazy stuff for a while). After some poking around, what I ended up doing is plain removing 'store.db' directory from the monitor fs, and starting the monitors. That just re-initiated the migration, and this time it was done in the absence of client requests, and one monitor at a time. >> 0> 2014-03-10 22:26:23.757166 7fc0397e5700 -1 mon/AuthMonitor.cc: >> In function 'virtual void AuthMonitor::create_initial()' thread >> 7fc0397e5700 time 2014-03-10 22:26:23.755442 >> mon/AuthMonitor.cc: 101: FAILED assert(ret == 0) >> >> ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) >> 1: (AuthMonitor::create_initial()+0x4d8) [0x637bb8] >> 2: (PaxosService::_active()+0x51b) [0x594fcb] >> 3: (Context::complete(int)+0x9) [0x565499] >> 4: (finish_contexts(CephContext*, std::list<Context*, >> std::allocator<Context*> >&, int)+0x95) [0x5698b5] >> 5: (Paxos::handle_accept(MMonPaxos*)+0x885) [0x589595] >> 6: (Paxos::dispatch(PaxosServiceMessage*)+0x28b) [0x58d66b] >> 7: (Monitor::dispatch(MonSession*, Message*, bool)+0x4f0) [0x563620] >> 8: (Monitor::_ms_dispatch(Message*)+0x1fb) [0x5639fb] >> 9: (Monitor::ms_dispatch(Message*)+0x32) [0x57f212] >> >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com