In my case it was syncing, and was syncing slowly (hour or so?). You should see this in the log file. I wanted to report this, because my store.db is only 200MB, and I guess you want your monitors up and running quickly.
I also noticed that when the 3rd monitor left the quorum, ceph -s command was slow timing out. Probably trying to connect to the 3rd monitor, but why? When this monitor is not in quorum. -----Original Message----- From: Sean Purdy [mailto:s.pu...@cv-library.co.uk] Sent: donderdag 21 september 2017 12:02 To: Gregory Farnum Cc: ceph-users Subject: Re: [ceph-users] monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER On Wed, 20 Sep 2017, Gregory Farnum said: > That definitely sounds like a time sync issue. Are you *sure* they > matched each other? NTP looked OK at the time. But see below. > Is it reproducible on restart? Today I did a straight reboot - and it was fine, no issues. The issue occurs after the machine is off for a number of hours, or has been worked on in the BIOS for a number of hours and then booted. And then perhaps waited at the disk decrypt key prompt. So I'd suspect hardware clock drift at those times. (Using Dell R720xd machines) Logs show a time change a few seconds after boot. After boot it's running NTP and within that 45 minute period the NTP state looks the same as the other nodes in the (small) cluster. How much drift is allowed between monitors? Logs say: Sep 20 09:45:21 store03 ntp[2329]: Starting NTP server: ntpd. Sep 20 09:45:21 store03 ntpd[2462]: proto: precision = 0.075 usec (-24) ... Sep 20 09:46:44 store03 systemd[1]: Time has been changed Sep 20 09:46:44 store03 ntpd[2462]: receive: Unexpected origin timestamp 0xdd6ca972.c694801d does not match aorg 0000000000.00000000 from server@172.16.0.16 xmt 0xdd6ca974.0c5c18f So system time was changed about 6 seconds after disks were unlocked/boot proceeded. But there was still 45 minutes of monitor messages after that. Surely the time should have converged sooner than 45 minutes? NTP from today, post-problem. But ntpq at the time of the problem looked just as OK: store01:~$ ntpstat synchronised to NTP server (172.16.0.19) at stratum 3 time correct to within 47 ms store02$ ntpstat synchronised to NTP server (172.16.0.19) at stratum 3 time correct to within 63 ms store03:~$ sudo ntpstat synchronised to NTP server (172.16.0.19) at stratum 3 time correct to within 63 ms store03:~$ ntpq -p remote refid st t when poll reach delay offset jitter ======================================================================== ====== +172.16.0.16 85.91.1.164 3 u 561 1024 377 0.287 0.554 0.914 +172.16.0.18 94.125.129.7 3 u 411 1024 377 0.388 -0.331 0.139 *172.16.0.19 158.43.128.33 2 u 289 1024 377 0.282 -0.005 0.103 Sean > On Wed, Sep 20, 2017 at 2:50 AM Sean Purdy <s.pu...@cv-library.co.uk> wrote: > > > > > Hi, > > > > > > Luminous 12.2.0 > > > > Three node cluster, 18 OSD, debian stretch. > > > > > > One node is down for maintenance for several hours. When bringing > > it back up, OSDs rejoin after 5 minutes, but health is still > > warning. monitor has not joined quorum after 40 minutes and logs > > show BADAUTHORIZER message every time the monitor tries to connect to the leader. > > > > 2017-09-20 09:46:05.581590 7f49e2b29700 0 -- 172.16.0.45:0/2243 >> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 > > l=0).handle_connect_reply connect got BADAUTHORIZER > > > > Then after ~45 minutes monitor *does* join quorum. > > > > I'm presuming this isn't normal behaviour? Or if it is, let me know > > and I won't worry. > > > > All three nodes are using ntp and look OK timewise. > > > > > > ceph-mon log: > > > > (.43 is leader, .45 is rebooted node, .44 is other live node in > > quorum) > > > > Boot: > > > > 2017-09-20 09:45:21.874152 7f49efeb8f80 0 ceph version 12.2.0 > > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process > > (unknown), pid 2243 > > > > 2017-09-20 09:46:01.824708 7f49e1b27700 0 -- 172.16.0.45:6789/0 >> > > 172.16.0.44:6789/0 conn(0x56007244d000 :6789 > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > l=0).handle_connect_msg accept connect_seq 3 vs existing csq=0 > > existing_state=STATE_CONNECTING 2017-09-20 09:46:01.824723 > > 7f49e1b27700 0 -- 172.16.0.45:6789/0 >> 172.16.0.44:6789/0 > > conn(0x56007244d000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH > > pgs=0 cs=0 l=0).handle_connect_msg accept we reset (peer sent cseq > > 3, 0x5600722c0000.cseq = 0), sending RESETSESSION 2017-09-20 > > 09:46:01.825247 7f49e1b27700 0 -- 172.16.0.45:6789/0 >> > > 172.16.0.44:6789/0 conn(0x56007244d000 :6789 > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 > > existing_state=STATE_CONNECTING 2017-09-20 09:46:01.828053 > > 7f49e1b27700 0 -- 172.16.0.45:6789/0 >> 172.16.0.44:6789/0 > > conn(0x5600722c0000 :-1 > > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21872 cs=1 > > l=0).process missed message? skipped from seq 0 to 552717734 > > > > 2017-09-20 09:46:05.580342 7f49e1b27700 0 -- 172.16.0.45:6789/0 >> > > 172.16.0.43:6789/0 conn(0x5600720fe800 :-1 > > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=49261 cs=1 > > l=0).process missed message? skipped from seq 0 to 1151972199 > > 2017-09-20 09:46:05.581097 7f49e2b29700 0 -- 172.16.0.45:0/2243 >> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 > > l=0).handle_connect_reply connect got BADAUTHORIZER 2017-09-20 > > 09:46:05.581590 7f49e2b29700 0 -- 172.16.0.45:0/2243 >> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 > > l=0).handle_connect_reply connect got BADAUTHORIZER ... > > [message repeats for 45 minutes] > > ... > > 2017-09-20 10:23:38.818767 7f49e2b29700 0 -- 172.16.0.45:0/2243 >> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 > > l=0).handle_connect_reply connect got BADAUTHORIZER > > > > > > At this point, "ceph mon stat" says .45/store03 not in quorum: > > > > e5: 3 mons at {store01= > > 172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:67 > > 89/0}, election epoch 376, leader 0 store01, quorum 0,1 > > store01,store02 > > > > > > Then suddenly a valid connection is made and sync happens: > > > > 2017-09-20 10:23:43.041009 7f49e5b2f700 1 > > mon.store03@2(synchronizing).mds > > e1 Unable to load 'last_metadata' > > 2017-09-20 10:23:43.041967 7f49e5b2f700 1 > > mon.store03@2(synchronizing).osd > > e2381 e2381: 18 total, 13 up, 14 in > > ... > > 2017-09-20 10:23:43.045961 7f49e5b2f700 1 > > mon.store03@2(synchronizing).osd > > e2393 e2393: 18 total, 15 up, 15 in > > ... > > 2017-09-20 10:23:43.049255 7f49e5b2f700 1 > > mon.store03@2(synchronizing).osd > > e2406 e2406: 18 total, 18 up, 18 in > > ... > > 2017-09-20 10:23:43.054828 7f49e5b2f700 0 log_channel(cluster) log > > [INF] > > : mon.store03 calling new monitor election 2017-09-20 > > 10:23:43.054901 7f49e5b2f700 1 mon.store03@2(electing).elector(372) > > init, last seen epoch 372 > > > > > > Now "ceph mon stat" says: > > > > e5: 3 mons at {store01= > > 172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:67 > > 89/0}, election epoch 378, leader 0 store01, quorum 0,1,2 > > store01,store02,store03 > > > > and everything's happy. > > > > > > What should I look for/fix? It's a fairly vanilla system. > > > > > > Thanks in advance, > > > > Sean Purdy > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com