Can't start ceph mon
I have a problem in which I can't start my ceph monitor. The log is shown below. The log shows version 0.54. I was running 0.52 when the problem arose, and I moved to the latest in case the newer version fixed the problem. The original failure happened a week or so ago, and could have been as a result of running out of disk space when the ceph monitor log became huge. What should I do to recover the situation? David 2012-11-19 20:38:51.598468 7fc13fdc6780 0 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150), process ceph-mon, pid 21012 2012-11-19 20:38:51.598482 7fc13fdc6780 1 store(/ceph/mon.vault01) mount 2012-11-19 20:38:51.598527 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 21 2012-11-19 20:38:51.598542 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl magic = 21 bytes 2012-11-19 20:38:51.598562 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75 2012-11-19 20:38:51.598567 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes 2012-11-19 20:38:51.598582 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 205 2012-11-19 20:38:51.598586 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl monmap/latest = 205 bytes 2012-11-19 20:38:51.598809 7fc13fdc6780 1 -- 10.0.1.1:6789/0 learned my addr 10.0.1.1:6789/0 2012-11-19 20:38:51.598818 7fc13fdc6780 1 accepter.accepter.bind my_inst.addr is 10.0.1.1:6789/0 need_addr=0 2012-11-19 20:38:51.599498 7fc13fdc6780 1 -- 10.0.1.1:6789/0 messenger.start 2012-11-19 20:38:51.599508 7fc13fdc6780 1 accepter.accepter.start 2012-11-19 20:38:51.599610 7fc13fdc6780 1 mon.vault01@-1(probing) e1 init fsid 4d7d8d20-338c-4bdc-9918-9bcf04f9a832 2012-11-19 20:38:51.599674 7fc13cdbe700 1 -- 10.0.1.1:6789/0 :/0 pipe(0x213c6c0 sd=14 :6789 pgs=0 cs=0 l=0).accept sd=14 2012-11-19 20:38:51.599678 7fc141eff700 1 -- 10.0.1.1:6789/0 :/0 pipe(0x213c240 sd=9 :6789 pgs=0 cs=0 l=0).accept sd=9 2012-11-19 20:38:51.599718 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 37 2012-11-19 20:38:51.599723 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl cluster_uuid = 37 bytes 2012-11-19 20:38:51.599718 7fc13ccbd700 1 -- 10.0.1.1:6789/0 :/0 pipe(0x213c480 sd=19 :6789 pgs=0 cs=0 l=0).accept sd=19 2012-11-19 20:38:51.599729 7fc13fdc6780 10 mon.vault01@-1(probing) e1 check_fsid cluster_uuid contains '4d7d8d20-338c-4bdc-9918-9bcf04f9a832' 2012-11-19 20:38:51.599739 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75 2012-11-19 20:38:51.599745 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes 2012-11-19 20:38:51.599751 7fc13fdc6780 10 mon.vault01@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18)} 2012-11-19 20:38:51.599759 7fc13fdc6780 15 store(/ceph/mon.vault01) exists_bl joined 2012-11-19 20:38:51.599769 7fc13fdc6780 10 mon.vault01@-1(probing) e1 has_ever_joined = 1 2012-11-19 20:38:51.599794 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/last_committed = 13 2012-11-19 20:38:51.599801 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/first_committed = 132833 2012-11-19 20:38:51.599810 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 239840 2012-11-19 20:38:51.599928 7fc13cbbc700 1 -- 10.0.1.1:6789/0 :/0 pipe(0x213cd80 sd=20 :6789 pgs=0 cs=0 l=0).accept sd=20 2012-11-19 20:38:51.599950 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl pgmap/latest = 239840 bytes --- begin dump of recent events ---2012-11-19 20:38:51.600509 7fc13fdc6780 -1 *** Caught signal (Aborted) ** in thread 7fc13fdc6780 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150) 1: ceph-mon() [0x53adf8] 2: (()+0xfe90) [0x7fc141830e90] 3: (gsignal()+0x3e) [0x7fc140016dae] 4: (abort()+0x17b) [0x7fc14001825b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fc141af300d] 6: (()+0xb31b6) [0x7fc141af11b6] 7: (()+0xb31e3) [0x7fc141af11e3] 8: (()+0xb32de) [0x7fc141af12de] 9: ceph-mon() [0x5ecb9f] 10: (Paxos::get_stashed(ceph::buffer::list)+0x1ed) [0x49e28d] 11: (Paxos::init()+0x109) [0x49e609] 12: (Monitor::init()+0x36a) [0x485a4a] 13: (main()+0x1289) [0x46d909] 14: (__libc_start_main()+0xed) [0x7fc14000364d] 15: ceph-mon() [0x46fa09] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -55 2012-11-19 20:38:51.596694 7fc13fdc6780 5 asok(0x213d000) register_command perfcounters_dump hook 0x2131050 -55 2012-11-19 20:38:51.596720 7fc13fdc6780 5 asok(0x213d000) register_command 1 hook 0x2131050 -54 2012-11-19 20:38:51.596725 7fc13fdc6780 5 asok(0x213d000) register_command perf dump hook 0x2131050 -53 2012-11-19 20:38:51.596735 7fc13fdc6780 5 asok(0x213d000) register_command perfcounters_schema hook 0x2131050 -52 2012-11-19 20:38:51.596740 7fc13fdc6780 5 asok(0x213d000) register_command 2 hook 0x2131050 -51 2012-11-19 20:38:51.596745 7fc13fdc6780 5 asok(0x213d000) register_command perf schema hook 0x2131050 -50 2012-11-19
Cannot Start Ceph Mon
(Apologies if this is seen to be a repeat posting: I think that the last attempt fell into the void). I can't start my ceph monitor. The log is below. Though this shows version 0.54, the problem arose whilst using 0.52. Something may have become corrupted when the disk space ran out due to an immense ceph mon log. Is there anything I can do to recover the situation? Regards, David bash-4.1# cat /var/log/ceph/mon.vault01.log 2012-11-19 20:38:51.598468 7fc13fdc6780 0 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150), process ceph-mon, pid 21012 2012-11-19 20:38:51.598482 7fc13fdc6780 1 store(/ceph/mon.vault01) mount 2012-11-19 20:38:51.598527 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 21 2012-11-19 20:38:51.598542 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl magic = 21 bytes 2012-11-19 20:38:51.598562 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75 2012-11-19 20:38:51.598567 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes 2012-11-19 20:38:51.598582 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 205 2012-11-19 20:38:51.598586 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl monmap/latest = 205 bytes 2012-11-19 20:38:51.598809 7fc13fdc6780 1 -- 10.0.1.1:6789/0 learned my addr 10.0.1.1:6789/0 2012-11-19 20:38:51.598818 7fc13fdc6780 1 accepter.accepter.bind my_inst.addr is 10.0.1.1:6789/0 need_addr=0 2012-11-19 20:38:51.599498 7fc13fdc6780 1 -- 10.0.1.1:6789/0 messenger.start 2012-11-19 20:38:51.599508 7fc13fdc6780 1 accepter.accepter.start 2012-11-19 20:38:51.599610 7fc13fdc6780 1 mon.vault01@-1(probing) e1 init fsid 4d7d8d20-338c-4bdc-9918-9bcf04f9a832 2012-11-19 20:38:51.599674 7fc13cdbe700 1 -- 10.0.1.1:6789/0 :/0 pipe(0x213c6c0 sd=14 :6789 pgs=0 cs=0 l=0).accept sd=14 2012-11-19 20:38:51.599678 7fc141eff700 1 -- 10.0.1.1:6789/0 :/0 pipe(0x213c240 sd=9 :6789 pgs=0 cs=0 l=0).accept sd=9 2012-11-19 20:38:51.599718 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 37 2012-11-19 20:38:51.599723 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl cluster_uuid = 37 bytes 2012-11-19 20:38:51.599718 7fc13ccbd700 1 -- 10.0.1.1:6789/0 :/0 pipe(0x213c480 sd=19 :6789 pgs=0 cs=0 l=0).accept sd=19 2012-11-19 20:38:51.599729 7fc13fdc6780 10 mon.vault01@-1(probing) e1 check_fsid cluster_uuid contains '4d7d8d20-338c-4bdc-9918-9bcf04f9a832' 2012-11-19 20:38:51.599739 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75 2012-11-19 20:38:51.599745 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes 2012-11-19 20:38:51.599751 7fc13fdc6780 10 mon.vault01@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18)} 2012-11-19 20:38:51.599759 7fc13fdc6780 15 store(/ceph/mon.vault01) exists_bl joined 2012-11-19 20:38:51.599769 7fc13fdc6780 10 mon.vault01@-1(probing) e1 has_ever_joined = 1 2012-11-19 20:38:51.599794 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/last_committed = 13 2012-11-19 20:38:51.599801 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/first_committed = 132833 2012-11-19 20:38:51.599810 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 239840 2012-11-19 20:38:51.599928 7fc13cbbc700 1 -- 10.0.1.1:6789/0 :/0 pipe(0x213cd80 sd=20 :6789 pgs=0 cs=0 l=0).accept sd=20 2012-11-19 20:38:51.599950 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl pgmap/latest = 239840 bytes --- begin dump of recent events ---2012-11-19 20:38:51.600509 7fc13fdc6780 -1 *** Caught signal (Aborted) ** in thread 7fc13fdc6780 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150) 1: ceph-mon() [0x53adf8] 2: (()+0xfe90) [0x7fc141830e90] 3: (gsignal()+0x3e) [0x7fc140016dae] 4: (abort()+0x17b) [0x7fc14001825b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fc141af300d] 6: (()+0xb31b6) [0x7fc141af11b6] 7: (()+0xb31e3) [0x7fc141af11e3] 8: (()+0xb32de) [0x7fc141af12de] 9: ceph-mon() [0x5ecb9f] 10: (Paxos::get_stashed(ceph::buffer::list)+0x1ed) [0x49e28d] 11: (Paxos::init()+0x109) [0x49e609] 12: (Monitor::init()+0x36a) [0x485a4a] 13: (main()+0x1289) [0x46d909] 14: (__libc_start_main()+0xed) [0x7fc14000364d] 15: ceph-mon() [0x46fa09] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. -55 2012-11-19 20:38:51.596694 7fc13fdc6780 5 asok(0x213d000) register_command perfcounters_dump hook 0x2131050 -55 2012-11-19 20:38:51.596720 7fc13fdc6780 5 asok(0x213d000) register_command 1 hook 0x2131050 -54 2012-11-19 20:38:51.596725 7fc13fdc6780 5 asok(0x213d000) register_command perf dump hook 0x2131050 -53 2012-11-19 20:38:51.596735 7fc13fdc6780 5 asok(0x213d000) register_command perfcounters_schema hook 0x2131050 -52 2012-11-19 20:38:51.596740 7fc13fdc6780 5 asok(0x213d000) register_command 2 hook 0x2131050 -51 2012-11-19 20:38:51.596745 7fc13fdc6780 5 asok(0x213d000) register_command perf schema hook 0x2131050 -50