Hello,
I am at my wit's end.
So I made a mistake in the configuration of my router and one
of the monitors (out of 3) dropped out of the quorum and nothing
I’ve done allow it to rejoin. That includes reinstalling the
monitor with ceph-ansible.
The connectivity issue is fixed. I’ve tested it using “nc” and
the host can connect to both port 3300 and 6789 of the other
monitors. But the wayward monitor continue to stay out of quorum.
What is wrong? I see a bunch of “EBUSY” errors in the log, with
the message:
e1 handle_auth_request haven't formed initial quorum, EBUSY
How do I fix this? Any help would be greatly appreciated.
Many thanks,
-kc
With debug_mon at 1/10, I got these log snippets:
2020-10-28 15:40:05.961 7fb79253a700 4 mon.mgmt03@2(probing) e1 probe_timeout
0x564050353ec0
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 bootstrap
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
sync_reset_requester
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
unregister_cluster_logger - not registered
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 monmap e1: 3
mons at
{mgmt01=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0],mgmt02=[v2:10.1.1.1:3300/0,v1:10.1.1.1:6789/0],mgmt03=[v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]}
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 _reset
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing).auth v0
_set_mon_num_rank num 0 rank 0
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
timecheck_finish
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
scrub_event_cancel
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 scrub_reset
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
reset_probe_timeout 0x564050347ce0 after 2 seconds
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 probing other
monitors
2020-10-28 15:40:07.961 7fb79253a700 4 mon.mgmt03@2(probing) e1 probe_timeout
0x564050347ce0
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 bootstrap
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
sync_reset_requester
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
unregister_cluster_logger - not registered
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 monmap e1: 3
mons at
{mgmt01=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0],mgmt02=[v2:10.1.1.1:3300/0,v1:10.1.1.1:6789/0],mgmt03=[v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]}
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 _reset
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing).auth v0
_set_mon_num_rank num 0 rank 0
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
timecheck_finish
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
scrub_event_cancel
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 scrub_reset
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
reset_probe_timeout 0x564050360660 after 2 seconds
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 probing other
monitors
2020-10-28 15:40:09.107 7fb79253a700 -1 mon.mgmt03@2(probing) e1
get_health_metrics reporting 7 slow ops, oldest is log(1 entries from seq 1 at
2020-10-27 23:03:41.586915)
2020-10-28 15:40:09.961 7fb79253a700 4 mon.mgmt03@2(probing) e1 probe_timeout
0x564050360660
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 bootstrap
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
sync_reset_requester
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
unregister_cluster_logger - not registered
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 monmap e1: 3
mons at
{mgmt01=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0],mgmt02=[v2:10.1.1.1:3300/0,v1:10.1.1.1:6789/0],mgmt03=[v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]}
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 _reset
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing).auth v0
_set_mon_num_rank num 0 rank 0
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1
tim