Just to emphasize that I don't think it's clock skew, here is the NTP state
of all three monitors:

# ansible ceph_mons -m command -a "ntpq -p" -kK
SSH password:
sudo password [defaults to SSH password]:
ceph0 | success | rc=0 >>
     remote           refid      st t when poll reach   delay   offset
jitter
==============================================================================
*controller-10g  198.60.73.8      2 u   43   64  377    0.236    0.057
0.097

ceph1 | success | rc=0 >>
     remote           refid      st t when poll reach   delay   offset
jitter
==============================================================================
*controller-10g  198.60.73.8      2 u   39   64  377    0.273    0.035
0.064

ceph2 | success | rc=0 >>
     remote           refid      st t when poll reach   delay   offset
jitter
==============================================================================
*controller-10g  198.60.73.8      2 u   30   64  377    0.201   -0.063
0.063

I think they are pretty well in synch.

 - Travis


On Tue, Mar 25, 2014 at 11:09 AM, Travis Rhoden <trho...@gmail.com> wrote:

> Hello,
>
> I just deployed a new Emperor cluster using ceph-deploy 1.4.  All went
> very smooth, until I rebooted all the nodes.  After reboot, the monitors no
> longer form a quorum.
>
> I followed the troubleshooting steps here:
> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
>
> Specifically, I"m in the stat described in this section:
> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#most-common-monitor-issues
>
> The state for all the monitors is "electing".  The docs say this is most
> likely clock skew, but I do have all nodes synch'd with NTP.  I've
> confirmed this multiple times.  I've also confirmed the monitors can reach
> each other (by telneting to IP:PORT, and I can see established connections
> via netstat).
>
> I'm baffled.
>
> here is a sample mon_status output:
>
> root@ceph0:~# ceph daemon mon.ceph0 quorum_status
> { "election_epoch": 31,
>   "quorum": [],
>   "quorum_names": [],
>   "quorum_leader_name": "",
>   "monmap": { "epoch": 2,
>       "fsid": "XXX", (redacted)
>       "modified": "2014-03-24 14:35:22.332646",
>       "created": "0.000000",
>       "mons": [
>             { "rank": 0,
>               "name": "ceph0",
>               "addr": "10.10.30.0:6789\/0"},
>             { "rank": 1,
>               "name": "ceph1",
>               "addr": "10.10.30.1:6789\/0"},
>             { "rank": 2,
>               "name": "ceph2",
>               "addr": "10.10.30.2:6789\/0"}]}}
>
> They all look identical to that.
>
> Any ideas what I can look at besides NTP?  The docs really stress that it
> should be clock skew, so I'll keep looking at that...
>
>  - Travis
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to