Joao, Please see below. I think you’re totally right on:
> I suspect they may already have this monitor in their map, but either > with a different name or a different address -- and are thus ignoring > probes from a peer that does not match what they are expecting. The monitor in question has been previously working (quorum). It was removed and now attempting to re-add using a different IP address as per public procedure: http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ (I followed the 'CHANGING A MONITOR’S IP ADDRESS (THE RIGHT WAY)’ procedure) # ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smg01.asok mon_status { "name": "smg01", "rank": 0, "state": "probing", "election_epoch": 0, "quorum": [], "outside_quorum": [ "smg01" ], "extra_probe_peers": [ "10.20.1.8:6789\/0", "10.20.10.251:6789\/0", "10.20.10.252:6789\/0" ], "sync_provider": [], "monmap": { "epoch": 0, "fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7", "modified": "0.000000", "created": "0.000000", "mons": [ { "rank": 0, "name": "smg01", "addr": "10.20.10.250:6789\/0" }, { "rank": 1, "name": "smon01s", "addr": "0.0.0.0:0\/1" }, { "rank": 2, "name": "smon02s", "addr": "0.0.0.0:0\/2" }, { "rank": 3, "name": "b02s08", "addr": "0.0.0.0:0\/3" } ] } } # ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smon01.asok mon_status { "name": "smon01", "rank": 1, "state": "peon", "election_epoch": 2702, "quorum": [ 0, 1, 2 ], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 12, "fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7", "modified": "2015-12-09 06:23:43.665100", "created": "0.000000", "mons": [ { "rank": 0, "name": "b02s08", "addr": "10.20.1.8:6789\/0" }, { "rank": 1, "name": "smon01", "addr": "10.20.10.251:6789\/0" }, { "rank": 2, "name": "smon02", "addr": "10.20.10.252:6789\/0" } ] } } # ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smon02.asok mon_status { "name": "smon02", "rank": 2, "state": "peon", "election_epoch": 2702, "quorum": [ 0, 1, 2 ], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 12, "fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7", "modified": "2015-12-09 06:23:43.665100", "created": "0.000000", "mons": [ { "rank": 0, "name": "b02s08", "addr": "10.20.1.8:6789\/0" }, { "rank": 1, "name": "smon01", "addr": "10.20.10.251:6789\/0" }, { "rank": 2, "name": "smon02", "addr": "10.20.10.252:6789\/0" } ] } } # ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.b02s08.asok mon_status { "name": "b02s08", "rank": 0, "state": "leader", "election_epoch": 2702, "quorum": [ 0, 1, 2 ], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 12, "fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7", "modified": "2015-12-09 06:23:43.665100", "created": "0.000000", "mons": [ { "rank": 0, "name": "b02s08", "addr": "10.20.1.8:6789\/0" }, { "rank": 1, "name": "smon01", "addr": "10.20.10.251:6789\/0" }, { "rank": 2, "name": "smon02", "addr": "10.20.10.252:6789\/0" } ] } } > On Dec 14, 2015, at 04:56 , Joao Eduardo Luis <j...@suse.de> wrote: > > On 12/14/2015 12:41 AM, deeepdish wrote: >> Perhaps I’m not understanding something.. >> >> The “extra_probe_peers” ARE the other working monitors in quorum out of >> the mon_host line in ceph.conf. >> >> In the example below 10.20.1.8 = b20s08; 10.20.10.251 = smon01s; >> 10.20.10.252 = smon02s >> >> The monitor is not reaching out to the other IPs and syncing. I’m able >> to ping all IPs in the extra_probe_peers list. > > Okay, so that means the other monitors are, for some reason, ignoring > the probes from this monitor. > > Can you please show the result of mon_status from the monitors in the > quorum? > > I suspect they may already have this monitor in their map, but either > with a different name or a different address -- and are thus ignoring > probes from a peer that does not match what they are expecting. > > -Joao
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com