Hey folks. Odd problem. I have gone over many things, and I am stumped. I am tempted to just destroy and rebuild my errant vlserver, but I'd like to know what's going on and I know I am missing something :-)
I have three vlservers, one in each of my colos. Lets call them A (10.33.10.43), B(10.36.10.7) and C(10.38.10.7) A and B sync with each other B and C sync with each other (or try to) but fail A and C do not. They do know about each other, as a udebug -long shows that. However between A and C I see this in the long form: Server (10.38.10.7): (db 0.0) last vote never rcvd last beacon never sent dbcurrent=0, up=0 beaconSince=0 Same is true for C to A. The database version is completely off between A and C. This is the complete udebug from server C, which is the one that is acting up: Host's addresses are: 10.38.10.7 Host's 10.38.10.7 time is Sun Jan 12 23:04:13 2014 Local time is Sun Jan 12 23:04:16 2014 (time differential 3 secs) Last yes vote for 10.38.10.7 was 2 secs ago (not sync site); Last vote started 2 secs ago (at Sun Jan 12 23:04:14 2014) Local db version is 1388991001.15777 I am not sync site Lowest host 10.38.10.7 was set 2 secs ago Sync host 0.0.0.0 was set 1389567853 secs ago The last trans I handled was 0.46 Sync site's db version is 1388991001.15777 0 locked pages, 0 of them for write Server (10.36.10.7): (db 0.0) last vote rcvd 2 secs ago (at Sun Jan 12 23:04:14 2014), last beacon sent 2 secs ago (at Sun Jan 12 23:04:14 2014), last vote was no dbcurrent=0, up=1 beaconSince=1 Server (10.33.10.43): (db 0.0) last vote never rcvd last beacon never sent dbcurrent=0, up=0 beaconSince=0 So it thinks it is not the sync site, but 0.0.0.0 is. In reality, server A is the sync site. I have run tcpdump to make absolutely certain there are packets running between the two hosts on all AFS ports, and there are. KeyFiles are fine, and checked with md5sum. Set debug to 25 on the affected vlserver (10.38.10.7, Server C) and got this: Sun Jan 12 23:19:24 2014 Using 10.38.10.7 as my primary address Sun Jan 12 23:19:40 2014 beacon: amSyncSite is 0 Sun Jan 12 23:19:40 2014 Starting AFS vlserver 4 (/usr/lib/openafs/vlserver -rxbind -d 25) Sun Jan 12 23:19:40 2014 no vote from 10.36.10.7 @(#) OpenAFS 1.6.1-1+ubuntu0.2-debian built 2013-07-24 12 23:19:40 2014 Received beacon type 0 from host 10.38.10.7 Sun Jan 12 23:19:44 2014 recovery running in state 0 Sun Jan 12 23:19:55 2014 beacon: amSyncSite is 0 Sun Jan 12 23:19:55 2014 no vote from 10.36.10.7 Sun Jan 12 23:19:55 2014 beacon: amSyncSite is 0 Sun Jan 12 23:19:55 2014 Received beacon type 0 from host 10.38.10.7 Sun Jan 12 23:20:10 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:10 2014 no vote from 10.36.10.7 Sun Jan 12 23:20:10 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:10 2014 Received beacon type 0 from host 10.38.10.7 Sun Jan 12 23:20:25 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:25 2014 no vote from 10.36.10.7 Sun Jan 12 23:20:25 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:25 2014 Received beacon type 0 from host 10.38.10.7 Sun Jan 12 23:20:40 2014 ubik:server 10.33.10.43 still down Sun Jan 12 23:20:40 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:40 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:40 2014 no vote from 10.36.10.7 Sun Jan 12 23:20:40 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:40 2014 Received beacon type 0 from host 10.38.10.7 Sun Jan 12 23:20:40 2014 Ubik: vote 'yes' for 10.38.10.7 (NOT in quorum) Sun Jan 12 23:20:44 2014 recovery running in state 0 Sun Jan 12 23:20:44 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:48 2014 recovery running in state 0 Sun Jan 12 23:20:48 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:52 2014 recovery running in state 0 Sun Jan 12 23:20:52 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:55 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:55 2014 no vote from 10.36.10.7 Sun Jan 12 23:20:55 2014 beacon: amSyncSite is 0 Sun Jan 12 23:20:55 2014 Received beacon type 0 from host 10.38.10.7 Sun Jan 12 23:20:56 2014 recovery running in state 0 Sun Jan 12 23:20:56 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:00 2014 recovery running in state 0 Sun Jan 12 23:21:00 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:04 2014 recovery running in state 0 Sun Jan 12 23:21:04 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 GetVolumeByID 536870913 (2) 10.38.10.83 noauth Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 GetVolumeByID 536870913 (2) 10.38.10.83 noauth Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:07 2014 allbetter checking Sun Jan 12 23:21:07 2014 allbetter: returning 1 Sun Jan 12 23:21:08 2014 recovery running in state 0 Sun Jan 12 23:21:08 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:10 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:10 2014 no vote from 10.36.10.7 Sun Jan 12 23:21:10 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:10 2014 Received beacon type 0 from host 10.38.10.7 Sun Jan 12 23:21:12 2014 recovery running in state 0 Sun Jan 12 23:21:25 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:25 2014 no vote from 10.36.10.7 Sun Jan 12 23:21:25 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:25 2014 Received beacon type 0 from host 10.38.10.7 Sun Jan 12 23:21:29 2014 allbetter checking Sun Jan 12 23:21:29 2014 allbetter: returning 1 Sun Jan 12 23:21:29 2014 GetVolumeByName <snipped> Sun Jan 12 23:21:29 2014 allbetter checking Sun Jan 12 23:21:29 2014 allbetter: returning 1 Sun Jan 12 23:21:29 2014 allbetter checking Sun Jan 12 23:21:29 2014 allbetter: returning 1 Sun Jan 12 23:21:29 2014 allbetter checking Sun Jan 12 23:21:29 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 GetVolumeByName *<snipped>* Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 GetVolumeByName *<snipped>* Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 GetVolumeByName *<snipped>* Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:30 2014 allbetter checking Sun Jan 12 23:21:30 2014 allbetter: returning 1 Sun Jan 12 23:21:40 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:40 2014 no vote from 10.36.10.7 Sun Jan 12 23:21:40 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:40 2014 Received beacon type 0 from host 10.38.10.7 Sun Jan 12 23:21:55 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:55 2014 no vote from 10.36.10.7 Sun Jan 12 23:21:55 2014 beacon: amSyncSite is 0 Sun Jan 12 23:21:55 2014 Received beacon type 0 from host 10.38.10.7 Any help is appreciated! I have already upgraded and rebooted all three servers, lowest to highest, to no avail. I also attempted moving the data files out of the way and starting the affected vlserver, but the same symptoms remain. -- Timothy Balcer / IT Services Telmate / San Francisco, CA Direct / (415) 300-4313 Customer Service / (800) 205-5510