Re: [ceph-users] Help! All ceph mons crashed.

2014-03-06 Thread YIP Wai Peng
Ok, I think I got bitten by http://tracker.ceph.com/issues/7210, or rather,
the cppool command in
http://www.sebastien-han.fr/blog/2013/03/12/ceph-change-pg-number-on-the-fly/

I did use "rados cppool  " in a pool with snapshots
(openstack glance). A user feedback that ceph crashed when he deleted an
image in openstack.

I'm now wondering if I can ignore the operation, or the openstack glance
pool, and get the mons to start up again. Any help will be greatly
appreciated!

- WP


On Thu, Mar 6, 2014 at 5:33 PM, YIP Wai Peng  wrote:

> Hi,
>
> I am currently facing a horrible situation. All my mons are crashing on
> startup.
>
> Here's a dump of mon.a.log. The last few ops are below. It seems to crash
> trying to remove a snap? Any ideas?
>
> - WP
>
> 
>-10> 2014-03-06 17:04:38.838490 7fb2a541a700  1 --
> 192.168.116.24:6789/0 --> osd.9 192.168.116.27:6955/11604 --
> mon_subscribe_ack(300s) v1 -- ?+0 0x32d0540
> -9> 2014-03-06 17:04:38.838511 7fb2a541a700  1 --
> 192.168.116.24:6789/0 <== osd.1 192.168.116.24:6812/30221 6 
> mon_subscribe({monmap=14+,osd_pg_creates=0}) v2  50+0+0 (3009623588 0
> 0) 0x32d71c0 con 0x3224200
> -8> 2014-03-06 17:04:38.838527 7fb2a541a700  1 --
> 192.168.116.24:6789/0 --> osd.1 192.168.116.24:6812/30221 --
> mon_subscribe_ack(300s) v1 -- ?+0 0x32d3640
> -7> 2014-03-06 17:04:38.838545 7fb2a541a700  1 --
> 192.168.116.24:6789/0 <== mon.2 192.168.116.26:6789/0 1868286886 
> forward(pg_stats(0 pgs tid 790 v 0) v1 caps allow rwx) to leader v1 
> 398+0+0 (2470115819 0 0) 0x32f8c80 con 0x2f91760
> -6> 2014-03-06 17:04:38.838570 7fb2a541a700  1 mon.a@0(leader).paxos(paxos
> active c 7361285..7361906) is_readable now=2014-03-06 17:04:38.838571
> lease_expire=2014-03-06 17:04:43.820707 has v0 lc 7361906
> -5> 2014-03-06 17:04:38.838595 7fb2a541a700  1 mon.a@0(leader).paxos(paxos
> active c 7361285..7361906) is_readable now=2014-03-06 17:04:38.838597
> lease_expire=2014-03-06 17:04:43.820707 has v0 lc 7361906
> -4> 2014-03-06 17:04:38.838626 7fb2a541a700  1 --
> 192.168.116.24:6789/0 <== mon.2 192.168.116.26:6789/0 1868286887 
> forward(osd_pgtemp(e26089 {6.fe=[]} v26089) v1 caps allow rwx) to leader v1
>  297+0+0 (2442013554 0 0) 0x32f8780 con 0x2f91760
> -3> 2014-03-06 17:04:38.838662 7fb2a541a700  1 mon.a@0(leader).paxos(paxos
> active c 7361285..7361906) is_readable now=2014-03-06 17:04:38.838665
> lease_expire=2014-03-06 17:04:43.820707 has v0 lc 7361906
> -2> 2014-03-06 17:04:38.838696 7fb2a541a700  1 --
> 192.168.116.24:6789/0 <== mon.2 192.168.116.26:6789/0 1868286888 
> forward(pool_op(delete unmanaged snap pool 10 auid 0 tid 27 name  v0) v4
> caps allow r) to leader v1  313+0+0 (3176715156 0 0) 0x32f8500 con
> 0x2f91760
> -1> 2014-03-06 17:04:38.838715 7fb2a541a700  1 mon.a@0(leader).paxos(paxos
> active c 7361285..7361906) is_readable now=2014-03-06 17:04:38.838717
> lease_expire=2014-03-06 17:04:43.820707 has v0 lc 7361906
>  0> 2014-03-06 17:04:38.840833 7fb2a541a700 -1 osd/osd_types.cc: In
> function 'void pg_pool_t::remove_unmanaged_snap(snapid_t)' thread
> 7fb2a541a700 time 2014-03-06 17:04:38.838745
> osd/osd_types.cc: 799: FAILED assert(is_unmanaged_snaps_mode())
>
>  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>  1: /usr/bin/ceph-mon() [0x6c96e9]
>  2: (OSDMonitor::prepare_pool_op(MPoolOp*)+0x970) [0x5c3ad0]
>  3: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x1ab) [0x5c3d8b]
>  4: (PaxosService::dispatch(PaxosServiceMessage*)+0xa1a) [0x5940ea]
>  5: (Monitor::dispatch(MonSession*, Message*, bool)+0xdb) [0x56320b]
>  6: (Monitor::_ms_dispatch(Message*)+0x1fb) [0x5639fb]
>  7: (Monitor::handle_forward(MForward*)+0x9c2) [0x565092]
>  8: (Monitor::dispatch(MonSession*, Message*, bool)+0x400) [0x563530]
>  9: (Monitor::_ms_dispatch(Message*)+0x1fb) [0x5639fb]
>  10: (Monitor::ms_dispatch(Message*)+0x32) [0x57f212]
>  11: (DispatchQueue::entry()+0x582) [0x7de6c2]
>  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7d994d]
>  13: (()+0x79d1) [0x7fb2ac05e9d1]
>  14: (clone()+0x6d) [0x7fb2aad95b6d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
> 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help! All ceph mons crashed.

2014-03-06 Thread YIP Wai Peng
Hi,

I am currently facing a horrible situation. All my mons are crashing on
startup.

Here's a dump of mon.a.log. The last few ops are below. It seems to crash
trying to remove a snap? Any ideas?

- WP


   -10> 2014-03-06 17:04:38.838490 7fb2a541a700  1 -- 192.168.116.24:6789/0 -->
osd.9 192.168.116.27:6955/11604 -- mon_subscribe_ack(300s) v1 -- ?+0
0x32d0540
-9> 2014-03-06 17:04:38.838511 7fb2a541a700  1 -- 192.168.116.24:6789/0 <==
osd.1 192.168.116.24:6812/30221 6 
mon_subscribe({monmap=14+,osd_pg_creates=0}) v2  50+0+0 (3009623588 0
0) 0x32d71c0 con 0x3224200
-8> 2014-03-06 17:04:38.838527 7fb2a541a700  1 -- 192.168.116.24:6789/0 -->
osd.1 192.168.116.24:6812/30221 -- mon_subscribe_ack(300s) v1 -- ?+0
0x32d3640
-7> 2014-03-06 17:04:38.838545 7fb2a541a700  1 -- 192.168.116.24:6789/0 <==
mon.2 192.168.116.26:6789/0 1868286886  forward(pg_stats(0 pgs tid 790
v 0) v1 caps allow rwx) to leader v1  398+0+0 (2470115819 0 0)
0x32f8c80 con 0x2f91760
-6> 2014-03-06 17:04:38.838570 7fb2a541a700  1 mon.a@0(leader).paxos(paxos
active c 7361285..7361906) is_readable now=2014-03-06 17:04:38.838571
lease_expire=2014-03-06 17:04:43.820707 has v0 lc 7361906
-5> 2014-03-06 17:04:38.838595 7fb2a541a700  1 mon.a@0(leader).paxos(paxos
active c 7361285..7361906) is_readable now=2014-03-06 17:04:38.838597
lease_expire=2014-03-06 17:04:43.820707 has v0 lc 7361906
-4> 2014-03-06 17:04:38.838626 7fb2a541a700  1 -- 192.168.116.24:6789/0 <==
mon.2 192.168.116.26:6789/0 1868286887  forward(osd_pgtemp(e26089
{6.fe=[]} v26089) v1 caps allow rwx) to leader v1  297+0+0 (2442013554
0 0) 0x32f8780 con 0x2f91760
-3> 2014-03-06 17:04:38.838662 7fb2a541a700  1 mon.a@0(leader).paxos(paxos
active c 7361285..7361906) is_readable now=2014-03-06 17:04:38.838665
lease_expire=2014-03-06 17:04:43.820707 has v0 lc 7361906
-2> 2014-03-06 17:04:38.838696 7fb2a541a700  1 -- 192.168.116.24:6789/0 <==
mon.2 192.168.116.26:6789/0 1868286888  forward(pool_op(delete
unmanaged snap pool 10 auid 0 tid 27 name  v0) v4 caps allow r) to leader
v1  313+0+0 (3176715156 0 0) 0x32f8500 con 0x2f91760
-1> 2014-03-06 17:04:38.838715 7fb2a541a700  1 mon.a@0(leader).paxos(paxos
active c 7361285..7361906) is_readable now=2014-03-06 17:04:38.838717
lease_expire=2014-03-06 17:04:43.820707 has v0 lc 7361906
 0> 2014-03-06 17:04:38.840833 7fb2a541a700 -1 osd/osd_types.cc: In
function 'void pg_pool_t::remove_unmanaged_snap(snapid_t)' thread
7fb2a541a700 time 2014-03-06 17:04:38.838745
osd/osd_types.cc: 799: FAILED assert(is_unmanaged_snaps_mode())

 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: /usr/bin/ceph-mon() [0x6c96e9]
 2: (OSDMonitor::prepare_pool_op(MPoolOp*)+0x970) [0x5c3ad0]
 3: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x1ab) [0x5c3d8b]
 4: (PaxosService::dispatch(PaxosServiceMessage*)+0xa1a) [0x5940ea]
 5: (Monitor::dispatch(MonSession*, Message*, bool)+0xdb) [0x56320b]
 6: (Monitor::_ms_dispatch(Message*)+0x1fb) [0x5639fb]
 7: (Monitor::handle_forward(MForward*)+0x9c2) [0x565092]
 8: (Monitor::dispatch(MonSession*, Message*, bool)+0x400) [0x563530]
 9: (Monitor::_ms_dispatch(Message*)+0x1fb) [0x5639fb]
 10: (Monitor::ms_dispatch(Message*)+0x32) [0x57f212]
 11: (DispatchQueue::entry()+0x582) [0x7de6c2]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7d994d]
 13: (()+0x79d1) [0x7fb2ac05e9d1]
 14: (clone()+0x6d) [0x7fb2aad95b6d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com