Can't start ceph mon

2012-11-19 Thread Dave Humphreys (Datatone)

I have a problem in which I can't start my ceph monitor. The log is shown below.

The log shows version 0.54. I was running 0.52 when the problem arose, and I 
moved to the latest in case the newer version fixed the problem.

The original failure happened a week or so ago, and could have been as a result 
of running out of disk space when the ceph monitor log became huge.

What should I do to recover the situation?


David





2012-11-19 20:38:51.598468 7fc13fdc6780  0 ceph version 0.54 
(commit:60b84b095b1009a305d4d6a5b16f88571cbd3150), process ceph-mon, pid 21012
2012-11-19 20:38:51.598482 7fc13fdc6780  1 store(/ceph/mon.vault01) mount
2012-11-19 20:38:51.598527 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 21
2012-11-19 20:38:51.598542 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
magic = 21 bytes
2012-11-19 20:38:51.598562 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 75
2012-11-19 20:38:51.598567 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
feature_set = 75 bytes
2012-11-19 20:38:51.598582 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 205
2012-11-19 20:38:51.598586 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
monmap/latest = 205 bytes
2012-11-19 20:38:51.598809 7fc13fdc6780  1 -- 10.0.1.1:6789/0 learned my addr 
10.0.1.1:6789/0
2012-11-19 20:38:51.598818 7fc13fdc6780  1 accepter.accepter.bind my_inst.addr 
is 10.0.1.1:6789/0 need_addr=0
2012-11-19 20:38:51.599498 7fc13fdc6780  1 -- 10.0.1.1:6789/0 messenger.start
2012-11-19 20:38:51.599508 7fc13fdc6780  1 accepter.accepter.start
2012-11-19 20:38:51.599610 7fc13fdc6780  1 mon.vault01@-1(probing) e1 init fsid 
4d7d8d20-338c-4bdc-9918-9bcf04f9a832
2012-11-19 20:38:51.599674 7fc13cdbe700  1 -- 10.0.1.1:6789/0  :/0 
pipe(0x213c6c0 sd=14 :6789 pgs=0 cs=0 l=0).accept sd=14
2012-11-19 20:38:51.599678 7fc141eff700  1 -- 10.0.1.1:6789/0  :/0 
pipe(0x213c240 sd=9 :6789 pgs=0 cs=0 l=0).accept sd=9
2012-11-19 20:38:51.599718 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 37
2012-11-19 20:38:51.599723 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
cluster_uuid = 37 bytes
2012-11-19 20:38:51.599718 7fc13ccbd700  1 -- 10.0.1.1:6789/0  :/0 
pipe(0x213c480 sd=19 :6789 pgs=0 cs=0 l=0).accept sd=19
2012-11-19 20:38:51.599729 7fc13fdc6780 10 mon.vault01@-1(probing) e1 
check_fsid cluster_uuid contains '4d7d8d20-338c-4bdc-9918-9bcf04f9a832'
2012-11-19 20:38:51.599739 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 75
2012-11-19 20:38:51.599745 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
feature_set = 75 bytes
2012-11-19 20:38:51.599751 7fc13fdc6780 10 mon.vault01@-1(probing) e1 features 
compat={},rocompat={},incompat={1=initial feature set (~v.18)}
2012-11-19 20:38:51.599759 7fc13fdc6780 15 store(/ceph/mon.vault01) exists_bl 
joined
2012-11-19 20:38:51.599769 7fc13fdc6780 10 mon.vault01@-1(probing) e1 
has_ever_joined = 1
2012-11-19 20:38:51.599794 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int 
pgmap/last_committed = 13
2012-11-19 20:38:51.599801 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int 
pgmap/first_committed = 132833
2012-11-19 20:38:51.599810 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 239840
2012-11-19 20:38:51.599928 7fc13cbbc700  1 -- 10.0.1.1:6789/0  :/0 
pipe(0x213cd80 sd=20 :6789 pgs=0 cs=0 l=0).accept sd=20
2012-11-19 20:38:51.599950 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
pgmap/latest = 239840 bytes
--- begin dump of recent events ---2012-11-19 20:38:51.600509 7fc13fdc6780 -1 
*** Caught signal (Aborted) **
 in thread 7fc13fdc6780

 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150)
 1: ceph-mon() [0x53adf8]
 2: (()+0xfe90) [0x7fc141830e90]
 3: (gsignal()+0x3e) [0x7fc140016dae]
 4: (abort()+0x17b) [0x7fc14001825b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fc141af300d]
 6: (()+0xb31b6) [0x7fc141af11b6]
 7: (()+0xb31e3) [0x7fc141af11e3]
 8: (()+0xb32de) [0x7fc141af12de]
 9: ceph-mon() [0x5ecb9f]
 10: (Paxos::get_stashed(ceph::buffer::list)+0x1ed) [0x49e28d]
 11: (Paxos::init()+0x109) [0x49e609]
 12: (Monitor::init()+0x36a) [0x485a4a]
 13: (main()+0x1289) [0x46d909]
 14: (__libc_start_main()+0xed) [0x7fc14000364d]
 15: ceph-mon() [0x46fa09]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

   -55 2012-11-19 20:38:51.596694 7fc13fdc6780  5 asok(0x213d000) 
register_command perfcounters_dump hook 0x2131050
   -55 2012-11-19 20:38:51.596720 7fc13fdc6780  5 asok(0x213d000) 
register_command 1 hook 0x2131050
   -54 2012-11-19 20:38:51.596725 7fc13fdc6780  5 asok(0x213d000) 
register_command perf dump hook 0x2131050
   -53 2012-11-19 20:38:51.596735 7fc13fdc6780  5 asok(0x213d000) 
register_command perfcounters_schema hook 0x2131050
   -52 2012-11-19 20:38:51.596740 7fc13fdc6780  5 asok(0x213d000) 
register_command 2 hook 0x2131050
   -51 2012-11-19 20:38:51.596745 7fc13fdc6780  5 asok(0x213d000) 
register_command perf schema hook 0x2131050
   -50 2012-11-19 

Cannot Start Ceph Mon

2012-11-19 Thread Dave Humphreys (Datatone)
(Apologies if this is seen to be a repeat posting: I think that the last 
attempt fell into the void).

I can't start my ceph monitor. The log is below.

Though this shows version 0.54, the problem arose whilst using 0.52. Something 
may have become corrupted when the disk space ran out due to an immense ceph 
mon log.

Is there anything I can do to recover the situation?

Regards,
David


bash-4.1# cat /var/log/ceph/mon.vault01.log 
2012-11-19 20:38:51.598468 7fc13fdc6780  0 ceph version 0.54 
(commit:60b84b095b1009a305d4d6a5b16f88571cbd3150), process ceph-mon, pid 21012
2012-11-19 20:38:51.598482 7fc13fdc6780  1 store(/ceph/mon.vault01) mount
2012-11-19 20:38:51.598527 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 21
2012-11-19 20:38:51.598542 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
magic = 21 bytes
2012-11-19 20:38:51.598562 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 75
2012-11-19 20:38:51.598567 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
feature_set = 75 bytes
2012-11-19 20:38:51.598582 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 205
2012-11-19 20:38:51.598586 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
monmap/latest = 205 bytes
2012-11-19 20:38:51.598809 7fc13fdc6780  1 -- 10.0.1.1:6789/0 learned my addr 
10.0.1.1:6789/0
2012-11-19 20:38:51.598818 7fc13fdc6780  1 accepter.accepter.bind my_inst.addr 
is 10.0.1.1:6789/0 need_addr=0
2012-11-19 20:38:51.599498 7fc13fdc6780  1 -- 10.0.1.1:6789/0 messenger.start
2012-11-19 20:38:51.599508 7fc13fdc6780  1 accepter.accepter.start
2012-11-19 20:38:51.599610 7fc13fdc6780  1 mon.vault01@-1(probing) e1 init fsid 
4d7d8d20-338c-4bdc-9918-9bcf04f9a832
2012-11-19 20:38:51.599674 7fc13cdbe700  1 -- 10.0.1.1:6789/0  :/0 
pipe(0x213c6c0 sd=14 :6789 pgs=0 cs=0 l=0).accept sd=14
2012-11-19 20:38:51.599678 7fc141eff700  1 -- 10.0.1.1:6789/0  :/0 
pipe(0x213c240 sd=9 :6789 pgs=0 cs=0 l=0).accept sd=9
2012-11-19 20:38:51.599718 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 37
2012-11-19 20:38:51.599723 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
cluster_uuid = 37 bytes
2012-11-19 20:38:51.599718 7fc13ccbd700  1 -- 10.0.1.1:6789/0  :/0 
pipe(0x213c480 sd=19 :6789 pgs=0 cs=0 l=0).accept sd=19
2012-11-19 20:38:51.599729 7fc13fdc6780 10 mon.vault01@-1(probing) e1 
check_fsid cluster_uuid contains '4d7d8d20-338c-4bdc-9918-9bcf04f9a832'
2012-11-19 20:38:51.599739 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 75
2012-11-19 20:38:51.599745 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
feature_set = 75 bytes
2012-11-19 20:38:51.599751 7fc13fdc6780 10 mon.vault01@-1(probing) e1 features 
compat={},rocompat={},incompat={1=initial feature set (~v.18)}
2012-11-19 20:38:51.599759 7fc13fdc6780 15 store(/ceph/mon.vault01) exists_bl 
joined
2012-11-19 20:38:51.599769 7fc13fdc6780 10 mon.vault01@-1(probing) e1 
has_ever_joined = 1
2012-11-19 20:38:51.599794 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int 
pgmap/last_committed = 13
2012-11-19 20:38:51.599801 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int 
pgmap/first_committed = 132833
2012-11-19 20:38:51.599810 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at 
off 0 of 239840
2012-11-19 20:38:51.599928 7fc13cbbc700  1 -- 10.0.1.1:6789/0  :/0 
pipe(0x213cd80 sd=20 :6789 pgs=0 cs=0 l=0).accept sd=20
2012-11-19 20:38:51.599950 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl 
pgmap/latest = 239840 bytes
--- begin dump of recent events ---2012-11-19 20:38:51.600509 7fc13fdc6780 -1 
*** Caught signal (Aborted) **
 in thread 7fc13fdc6780

 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150)
 1: ceph-mon() [0x53adf8]
 2: (()+0xfe90) [0x7fc141830e90]
 3: (gsignal()+0x3e) [0x7fc140016dae]
 4: (abort()+0x17b) [0x7fc14001825b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fc141af300d]
 6: (()+0xb31b6) [0x7fc141af11b6]
 7: (()+0xb31e3) [0x7fc141af11e3]
 8: (()+0xb32de) [0x7fc141af12de]
 9: ceph-mon() [0x5ecb9f]
 10: (Paxos::get_stashed(ceph::buffer::list)+0x1ed) [0x49e28d]
 11: (Paxos::init()+0x109) [0x49e609]
 12: (Monitor::init()+0x36a) [0x485a4a]
 13: (main()+0x1289) [0x46d909]
 14: (__libc_start_main()+0xed) [0x7fc14000364d]
 15: ceph-mon() [0x46fa09]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

   -55 2012-11-19 20:38:51.596694 7fc13fdc6780  5 asok(0x213d000) 
register_command perfcounters_dump hook 0x2131050
   -55 2012-11-19 20:38:51.596720 7fc13fdc6780  5 asok(0x213d000) 
register_command 1 hook 0x2131050
   -54 2012-11-19 20:38:51.596725 7fc13fdc6780  5 asok(0x213d000) 
register_command perf dump hook 0x2131050
   -53 2012-11-19 20:38:51.596735 7fc13fdc6780  5 asok(0x213d000) 
register_command perfcounters_schema hook 0x2131050
   -52 2012-11-19 20:38:51.596740 7fc13fdc6780  5 asok(0x213d000) 
register_command 2 hook 0x2131050
   -51 2012-11-19 20:38:51.596745 7fc13fdc6780  5 asok(0x213d000) 
register_command perf schema hook 0x2131050
   -50