Re: [ceph-users] 答复: One of three monitors can not be started

张皓宇 Wed, 01 Apr 2015 22:58:12 -0700

 i checked the cluster state, it has recoveried to HEALTH_OK. i don's know why.


yesterday, 09:02, i started the mon.computer06 , it can not be started, the 
log‘s in attachment 0902.

and 16:38, i started the mon.computer06 again,  it also stucked with these 
processes:
/usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid -c 
/etc/ceph/ceph.conf
 /usr/sbin/ceph-create-keys -i computer06

but in this morning, it just be ok. the log's in attachment 1638. anyone can 
explain that?




To: g...@gregs42.com
From: zhanghaoyu1...@hotmail.com
Subject: 答复: [ceph-users] One of three monitors can not be started
Date: Thu, 2 Apr 2015 07:53:19 +0800







it has no reponds.



发件人:
Gregory Farnum

发送时间:
‎2015/‎4/‎2 1:01

收件人:
张皓宇

主题:
Re: [ceph-users] One of three monitors can not be started





On Tue, Mar 31, 2015 at 10:25 PM, 张皓宇 <zhanghaoyu1...@hotmail.com> wrote:

> There is asok on computer06.

> I tried to start the mon.computer06, maybe two hours later,  the

> mon.computer06 still not start,

> but there are some different processes on computer06, I don't know how to

> handle it:

> root      7812     1  0 11:39 pts/4    00:00:00 python

> /usr/sbin/ceph-create-keys -i computer06



That's a thing that runs on every monitor invocation to make sure

necessary keys are in place; it's just stuck because the monitor isn't

working.



> root     11025     1 12 09:02 pts/4    00:32:13 /usr/bin/ceph-mon -i

> computer06 --pid-file /var/run/ceph/mon.computer06.pid -c

> /etc/ceph/ceph.conf



That's the monitor.



> root     35692  7812  0 12:59 pts/4    00:00:00 python /usr/bin/ceph

> --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.computer06.asok

> mon_status



This is an attempt of yours to invoke mon_status on the admin socket.

So you're saying the admin socket is there but it's not responding to

queries?



>

>

> I got the quorum_status from another running monitor:

> { "election_epoch": 508,

>   "quorum": [

>         0,

>         1],

>   "quorum_names": [

>         "computer05",

>         "computer04"],

>   "quorum_leader_name": "computer04",

>   "monmap": { "epoch": 4,

>       "fsid": "471483e5-493f-41f6-b6f4-0187c13d156d",

>       "modified": "2014-07-26 09:52:02.411967",

>       "created": "0.000000",

>       "mons": [

>             { "rank": 0,

>               "name": "computer04",

>               "addr": "192.168.1.60:6789\/0"},

>             { "rank": 1,

>               "name": "computer05",

>               "addr": "192.168.1.65:6789\/0"},

>             { "rank": 2,

>               "name": "computer06",

>               "addr": "192.168.1.66:6789\/0"}]}}



And that indicates mon.computer04 and mon.computer05 are working and

in a quorum together to make progress.



You said that computer05 got compacted, but that computer06 broke?

Given that computer04 is doing fine, it may not be related. If you

gather a log from mon.computer06 trying to start up (with "debug mon =

20" in the config file to dump a lot of output) somebody may be able

to help you.

-Greg



>

>

>

>> Date: Tue, 31 Mar 2015 12:30:22 -0700

>> Subject: Re: [ceph-users] One of three monitors can not be started

>> From: g...@gregs42.com

>> To: zhanghaoyu1...@hotmail.com

>> CC: ceph-users@lists.ceph.com

>

>>

>> On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 <zhanghaoyu1...@hotmail.com> wrote:

>> > Who can help me?

>> >

>> > One monitor in my ceph cluster can not be started.

>> > Before that, I added '[mon] mon_compact_on_start = true' to

>> > /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell

>> > mon.computer05 compact ' on computer05, which has a monitor on it.

>> > When store.db of computer05 changed from 108G to 1G, mon.computer06

>> > stoped,

>> > and it can not be started since that.

>> >

>> > If I start mon.computer06, it will stop on this state:

>> > # /etc/init.d/ceph start mon.computer06

>> > === mon.computer06 ===

>> > Starting Ceph mon.computer06 on computer06...

>> >

>> > The process info is like this:

>> > root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start

>> > mon.computer06

>> > root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;

>> > /usr/bin/ceph-mon -i computer06 --pid-file

>> > /var/run/ceph/mon.computer06.pid

>> > -c /etc/ceph/ceph.conf

>> > root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06

>> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf

>> > root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i

>> > computer06

>> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf

>> >

>> > Log on computer06 is like this:

>> > 2015-03-30 20:46:54.152956 7fc5379d07a0 0 ceph version 0.72.2

>> > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309

>> > ...

>> > 2015-03-30 20:46:54.759791 7fc5379d07a0 1 mon.computer06@-1(probing) e4

>> > preinit clean up potentially inconsistent store state

>>

>> So I haven't looked at this code in a while, but I think the monitor

>> is trying to validate that it's consistent with the others. You

>> probably want to dig around the monitor admin sockets and see what

>> state each monitor is in, plus its perception of the others.

>>

>> In this case, I think maybe mon.computer06 is trying to examine its

>> whole store, but 100GB is a lot (way too much, in fact), so this can

>> take a loooong time.

>>

>> >

>> > Sorry, my English is not good.

>> >

>> > _______________________________________________

>> > ceph-users mailing list

>> > ceph-users@lists.ceph.com

>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >

0902
Description: Binary data

1638
Description: Binary data

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 答复: One of three monitors can not be started

Reply via email to