My RW server went bump in the night last night.  After rebooting, everything
came back up as normal but attempting to access either /afs/icequake.net or
/afs/.icequake.net would result in "connection timed out".

I have restarted all fileservers and all clients with only the following to
note: after the client is restarted, the first request to /afs will pause for a
few seconds before returning the timeout error, then subsequent requests return
timeout immediately.  fs checks/checkv had no effect except to introduce the
pause on the first request again.

Needless to say this is baffling.  There is nothing interesting in the logs or
udebug output, but maybe someone else might disagree. 10.0.1.230 is the ubik
master and 10.0.1.232 is the RW fileserver.


# udebug 10.0.1.230 7003
Host's addresses are: 10.0.1.230 65.38.17.159 
Host's 10.0.1.230 time is Sat Sep 17 10:19:38 2011
Local time is Sat Sep 17 10:19:38 2011 (time differential 0 secs)
Last yes vote for 10.0.1.230 was 6 secs ago (sync site); 
Last vote started 6 secs ago (at Sat Sep 17 10:19:32 2011)
Local db version is -1438751922.1777336322
I am sync site until 53 secs from now (at Sat Sep 17 10:20:31 2011) (3 servers)
Recovery state 1f
Sync site's db version is -1438751922.1777336322
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
         1145824 secs ago (at Sun Sep  4 04:02:34 2011)

Server (10.0.1.233 65.38.17.160): (db -1438751922.1777336322)
    last vote rcvd 7 secs ago (at Sat Sep 17 10:19:31 2011),
    last beacon sent 6 secs ago (at Sat Sep 17 10:19:32 2011), last vote was yes
    dbcurrent=1, up=1 beaconSince=1

Server (10.0.1.232 65.38.17.158): (db -1438751922.1777336322)
    last vote rcvd 7 secs ago (at Sat Sep 17 10:19:31 2011),
    last beacon sent 6 secs ago (at Sat Sep 17 10:19:32 2011), last vote was yes
    dbcurrent=1, up=1 beaconSince=1

# udebug 10.0.1.230 7002
Host's addresses are: 10.0.1.230 65.38.17.159 
Host's 10.0.1.230 time is Sat Sep 17 10:19:37 2011
Local time is Sat Sep 17 10:19:39 2011 (time differential 2 secs)
Last yes vote for 10.0.1.230 was 7 secs ago (sync site); 
Last vote started 7 secs ago (at Sat Sep 17 10:19:32 2011)
Local db version is 1313883291.5
I am sync site until 50 secs from now (at Sat Sep 17 10:20:29 2011) (3 servers)
Recovery state 1f
Sync site's db version is 1313883291.5
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
         2389486 secs ago (at Sat Aug 20 18:34:53 2011)

Server (10.0.1.233 65.38.17.160): (db 1313883291.5)
    last vote rcvd 8 secs ago (at Sat Sep 17 10:19:31 2011),
    last beacon sent 7 secs ago (at Sat Sep 17 10:19:32 2011), last vote was yes
    dbcurrent=1, up=1 beaconSince=1

Server (10.0.1.232 65.38.17.158): (db 1313883291.5)
    last vote rcvd 10 secs ago (at Sat Sep 17 10:19:29 2011),
    last beacon sent 7 secs ago (at Sat Sep 17 10:19:32 2011), last vote was yes
    dbcurrent=1, up=1 beaconSince=1


# cat FileLog
Sat Sep 17 10:04:45 2011 File server starting (/usr/lib/openafs/dafileserver -p 
123 -pctspare 20 -L -busyat 50 -rxpck 2000 -rxbind -cb 4000000 -vattachpar 128 
-vlruthresh 1440 -vlrumax 8 -vhashsize 11)
Sat Sep 17 10:04:45 2011 afs_krb_get_lrealm failed, using icequake.net.
Sat Sep 17 10:04:46 2011 VLRU: starting scanner with the following 
configuration parameters:
Sat Sep 17 10:04:46 2011 VLRU:  offlining volumes after minimum of 86400 
seconds of inactivity
Sat Sep 17 10:04:46 2011 VLRU:  running VLRU soft detach pass every 120 seconds
Sat Sep 17 10:04:46 2011 VLRU:  taking up to 8 volumes offline per pass
Sat Sep 17 10:04:46 2011 VLRU:  scanning generation 0 for inactive volumes 
every 10800 seconds
Sat Sep 17 10:04:46 2011 VLRU:  scanning for promotion/demotion between 
generations 0 and 1 every 172800 seconds
Sat Sep 17 10:04:46 2011 VLRU:  scanning for promotion/demotion between 
generations 1 and 2 every 345600 seconds
Sat Sep 17 10:04:46 2011 Set thread id 3 for FSYNC_sync
Sat Sep 17 10:04:46 2011 VInitVolumePackage: beginning parallel fileserver 
startup
Sat Sep 17 10:04:46 2011 VInitVolumePackage: using 1 threads to pre-attach 
volumes on 1 partitions
Sat Sep 17 10:04:46 2011 Scanning partitions on thread 1 of 1
Sat Sep 17 10:04:46 2011 Partition /vicepa: pre-attaching volumes
Sat Sep 17 10:04:46 2011 Partition scan thread 1 of 1 ended
Sat Sep 17 10:04:46 2011 fs_stateRestore: commencing fileserver state restore
Sat Sep 17 10:04:46 2011 fs_stateRestore: host table restored
Sat Sep 17 10:04:46 2011 fs_stateRestore: FileEntry and CallBack tables restored
Sat Sep 17 10:04:46 2011 fs_stateRestore: host table indices remapped
Sat Sep 17 10:04:46 2011 fs_stateRestore: FileEntry and CallBack indices 
remapped
Sat Sep 17 10:04:46 2011 fs_stateRestore: restore phase complete
Sat Sep 17 10:04:46 2011 fs_stateRestore: beginning state verification phase
Sat Sep 17 10:04:46 2011 h_stateVerifyUuidHash: warning: uuid hash entry points 
to different host struct (1, 0)
Sat Sep 17 10:04:46 2011 fs_stateRestore: fileserver state verification complete
Sat Sep 17 10:04:46 2011 fs_stateRestore: restore was successful
Sat Sep 17 10:04:46 2011 Set thread id 0000007E for 'FiveMinuteCheckLWP'
Sat Sep 17 10:04:46 2011 Getting FileServer name...
Sat Sep 17 10:04:46 2011 Set thread id 00000081 for 'HostCheckLWP'
Sat Sep 17 10:04:46 2011 FileServer host name is 'valhalla'
Sat Sep 17 10:04:46 2011 Getting FileServer address...
Sat Sep 17 10:04:46 2011 Set thread id 00000083 for 'FsyncCheckLWP'
Sat Sep 17 10:04:46 2011 FileServer valhalla has address 10.0.1.232 (0xe801000a 
or 0xa0001e8 in host byte order)
Sat Sep 17 10:04:46 2011 File Server started Sat Sep 17 10:04:46 2011



-- 
Ryan C. Underwood, <neme...@icequake.net>

Attachment: signature.asc
Description: Digital signature

Reply via email to