Shortly after the weekly scheduled fileserver restart, things blew up in a big way. My RW root.cell was inaccessible in the end. No kernel messages indicating filesystem or disk problems underneath. I force-fscked the vice partition (ext4) and no problems were found. Any speculation on what in the world happened here? This is running -pre3 with the patch from Andrew mentioned earlier.
Sun Mar 20 04:00:29 2011 File server starting (/usr/lib/openafs/dafileserver -p 123 -pctspare 20 -L -busyat 50 -rxpck 2000 -rxbind -cb 4000000 -vattachpar 128 -vlruthresh 1440 -vlrumax 8 -vhashsize 11) Sun Mar 20 04:00:29 2011 afs_krb_get_lrealm failed, using icequake.net. Sun Mar 20 04:00:30 2011 VLRU: starting scanner with the following configuration parameters: Sun Mar 20 04:00:30 2011 VLRU: offlining volumes after minimum of 86400 seconds of inactivity Sun Mar 20 04:00:30 2011 VLRU: running VLRU soft detach pass every 120 seconds Sun Mar 20 04:00:30 2011 VLRU: taking up to 8 volumes offline per pass Sun Mar 20 04:00:30 2011 VLRU: scanning generation 0 for inactive volumes every 10800 seconds Sun Mar 20 04:00:30 2011 VLRU: scanning for promotion/demotion between generations 0 and 1 every 172800 seconds Sun Mar 20 04:00:30 2011 VLRU: scanning for promotion/demotion between generations 1 and 2 every 345600 seconds Sun Mar 20 04:00:30 2011 Set thread id 3 for FSYNC_sync Sun Mar 20 04:00:30 2011 VInitVolumePackage: beginning parallel fileserver startup Sun Mar 20 04:00:30 2011 VInitVolumePackage: using 1 threads to pre-attach volumes on 1 partitions Sun Mar 20 04:00:30 2011 Scanning partitions on thread 1 of 1 Sun Mar 20 04:00:30 2011 Partition /vicepa: pre-attaching volumes Sun Mar 20 04:00:30 2011 fs_stateRestore: commencing fileserver state restore Sun Mar 20 04:00:30 2011 Partition scan thread 1 of 1 ended Sun Mar 20 04:00:30 2011 fs_stateRestore: host table restored Sun Mar 20 04:00:30 2011 fs_stateRestore: FileEntry and CallBack tables restored Sun Mar 20 04:00:30 2011 fs_stateRestore: host table indices remapped Sun Mar 20 04:00:30 2011 fs_stateRestore: FileEntry and CallBack indices remapped Sun Mar 20 04:00:30 2011 fs_stateRestore: restore phase complete Sun Mar 20 04:00:30 2011 fs_stateRestore: beginning state verification phase Sun Mar 20 04:00:30 2011 fs_stateRestore: fileserver state verification complete Sun Mar 20 04:00:30 2011 fs_stateRestore: restore was successful Sun Mar 20 04:00:30 2011 Getting FileServer name... Sun Mar 20 04:00:30 2011 Set thread id 0000007E for 'FiveMinuteCheckLWP' Sun Mar 20 04:00:30 2011 FileServer host name is 'valhalla' Sun Mar 20 04:00:30 2011 Set thread id 00000082 for 'FsyncCheckLWP' Sun Mar 20 04:00:30 2011 Set thread id 0000007F for 'HostCheckLWP' Sun Mar 20 04:00:30 2011 Getting FileServer address... Sun Mar 20 04:00:30 2011 FileServer valhalla has address 10.0.1.232 (0xe801000a or 0xa0001e8 in host byte order) Sun Mar 20 04:00:30 2011 File Server started Sun Mar 20 04:00:30 2011 Sun Mar 20 04:02:02 2011 Scheduling salvage for volume 536870915 on part /vicepa over SALVSYNC Sun Mar 20 04:02:02 2011 VVGC_scan_partition: finished scanning /vicepa: 147 volumes in 74 groups Sun Mar 20 04:02:03 2011 fssync: breaking all call backs for volume 536870915 Sun Mar 20 04:02:05 2011 Scheduling salvage for volume 536871273 on part /vicepa over SALVSYNC Sun Mar 20 04:02:05 2011 fssync: breaking all call backs for volume 536871273 Sun Mar 20 04:04:01 2011 VAttachVolume: Error reading diskDataHandle header for vol 536870916; error=101 Sun Mar 20 04:04:01 2011 Scheduling salvage for volume 536870916 on part /vicepa over SALVSYNC Sun Mar 20 04:04:01 2011 warning: volume 536870916 recursively checked out by programType id 4 Sun Mar 20 04:04:13 2011 Scheduling salvage for volume 536871274 on part /vicepa over SALVSYNC Sun Mar 20 04:04:13 2011 warning: volume 536871274 recursively checked out by programType id 4 Sun Mar 20 04:06:02 2011 fssync: breaking all call backs for volume 536870916 Sun Mar 20 04:10:02 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:12:02 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:14:03 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:14:14 2011 CopyOnWrite failed: volume 536871273 in partition /vicepa (tried reading 8192, read 0, wrote 0, errno 0) volume needs salvage Sun Mar 20 04:14:14 2011 CopyOnWrite failed: requesting salvage Sun Mar 20 04:14:14 2011 Scheduling salvage for volume 536871273 on part /vicepa over SALVSYNC Sun Mar 20 04:14:14 2011 fssync: breaking all call backs for volume 536871273 Sun Mar 20 04:16:02 2011 Scheduling salvage for volume 536871274 on part /vicepa over SALVSYNC Sun Mar 20 04:18:02 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:20:02 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:20:04 2011 CopyOnWrite failed: volume 536871273 in partition /vicepa (tried reading 8192, read 0, wrote 0, errno 0) volume needs salvage Sun Mar 20 04:20:04 2011 CopyOnWrite failed: requesting salvage Sun Mar 20 04:20:04 2011 Scheduling salvage for volume 536871273 on part /vicepa over SALVSYNC Sun Mar 20 04:20:04 2011 fssync: breaking all call backs for volume 536871273 Sun Mar 20 04:20:07 2011 Scheduling salvage for volume 536871274 on part /vicepa over SALVSYNC Sun Mar 20 04:24:02 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:26:02 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:26:03 2011 CopyOnWrite failed: volume 536871273 in partition /vicepa (tried reading 8192, read 0, wrote 0, errno 0) volume needs salvage Sun Mar 20 04:26:03 2011 CopyOnWrite failed: requesting salvage Sun Mar 20 04:26:03 2011 Scheduling salvage for volume 536871273 on part /vicepa over SALVSYNC Sun Mar 20 04:26:03 2011 fssync: breaking all call backs for volume 536871273 Sun Mar 20 04:26:08 2011 Scheduling salvage for volume 536871274 on part /vicepa over SALVSYNC Sun Mar 20 04:30:02 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:32:02 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:32:54 2011 CopyOnWrite failed: volume 536871273 in partition /vicepa (tried reading 8192, read 0, wrote 0, errno 0) volume needs salvage Sun Mar 20 04:32:54 2011 CopyOnWrite failed: requesting salvage Sun Mar 20 04:32:54 2011 Scheduling salvage for volume 536871273 on part /vicepa over SALVSYNC Sun Mar 20 04:32:54 2011 fssync: breaking all call backs for volume 536871273 Sun Mar 20 04:34:02 2011 fssync: breaking all call backs for volume 536870916 Sun Mar 20 04:34:10 2011 Scheduling salvage for volume 536871274 on part /vicepa over SALVSYNC Sun Mar 20 04:35:51 2011 CopyOnWrite failed: volume 536870915 in partition /vicepa (tried reading 8192, read 0, wrote 0, errno 0) volume needs salvage Sun Mar 20 04:35:51 2011 CopyOnWrite failed: requesting salvage Sun Mar 20 04:35:51 2011 Scheduling salvage for volume 536870915 on part /vicepa over SALVSYNC Sun Mar 20 04:35:51 2011 fssync: breaking all call backs for volume 536870915 Sun Mar 20 04:36:01 2011 VAttachVolume: Error reading diskDataHandle header for vol 536870916; error=101 Sun Mar 20 04:36:01 2011 Scheduling salvage for volume 536870916 on part /vicepa over SALVSYNC Sun Mar 20 04:36:09 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:38:02 2011 fssync: breaking all call backs for volume 536870916 Sun Mar 20 04:38:12 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:38:13 2011 CopyOnWrite failed: volume 536871273 in partition /vicepa (tried reading 8192, read 0, wrote 0, errno 0) volume needs salvage Sun Mar 20 04:38:13 2011 CopyOnWrite failed: requesting salvage Sun Mar 20 04:38:13 2011 Scheduling salvage for volume 536871273 on part /vicepa over SALVSYNC Sun Mar 20 04:38:13 2011 fssync: breaking all call backs for volume 536871273 Sun Mar 20 04:38:18 2011 Scheduling salvage for volume 536871274 on part /vicepa over SALVSYNC Sun Mar 20 04:42:02 2011 fssync: breaking all call backs for volume 536871274 Sun Mar 20 04:44:02 2011 fssync: breaking all call backs for volume 536870916 [..] Sun Mar 20 05:46:16 2011 VRequestSalvage: volume 536871273 online salvaged too many times; forced offline. Sun Mar 20 05:48:02 2011 VRequestSalvage: volume 536871273 online salvaged too many times; forced offline. Sun Mar 20 05:48:02 2011 FSYNC_backgroundSalvage: unable to request salvage for volume 536871273 Sun Mar 20 05:48:05 2011 VRequestSalvage: volume 536871273 online salvaged too many times; forced offline. Sun Mar 20 05:48:05 2011 FSYNC_backgroundSalvage: unable to request salvage for volume 536871273 Sun Mar 20 05:48:08 2011 VRequestSalvage: volume 536871273 online salvaged too many times; forced offline. Sun Mar 20 05:48:08 2011 FSYNC_backgroundSalvage: unable to request salvage for volume 536871273 [..] Sun Mar 20 16:32:17 2011 VRequestSalvage: volume 536871273 online salvaged too many times; forced offline. Sun Mar 20 16:32:17 2011 FSYNC_backgroundSalvage: unable to request salvage for volume 536871273 Sun Mar 20 16:32:17 2011 Shutting down file server at Sun Mar 20 16:32:17 2011 Sun Mar 20 16:32:17 2011 Vice was last started at Sun Mar 20 04:00:30 2011 Sun Mar 20 16:32:17 2011 Volume hash summary: 2048 buckets Sun Mar 20 16:32:17 2011 chain length : min=0, max=1, avg=0, total=147 Sun Mar 20 16:32:17 2011 looks : min=0, max=19189, avg=95, total=195480 Sun Mar 20 16:32:17 2011 gets : min=0, max=19190, avg=95, total=195627 Sun Mar 20 16:32:17 2011 reorders : min=0, max=0, avg=0, total=0 Sun Mar 20 16:32:17 2011 Partition /vicepa has 147 online volumes Sun Mar 20 16:32:17 2011 VLRU State Dump: Sun Mar 20 16:32:17 2011 VLRU_QUEUE_NEW: Sun Mar 20 16:32:17 2011 536871132, 536870916, 536871131, 536870913, 536871279, Sun Mar 20 16:32:17 2011 536871218, 536871107, 536870952, 536871119, 536871098, Sun Mar 20 16:32:17 2011 536871101, 536871143, 536871110, 536871134, 536871152, Sun Mar 20 16:32:17 2011 536871164, 536871062, 536871155, 536871113, 536871065, Sun Mar 20 16:32:17 2011 536871122, 536871149, 536871140, 536871162, 536870970, Sun Mar 20 16:32:17 2011 536871183, 536870967, 536870966, 536870957, 536871212, Sun Mar 20 16:32:17 2011 536870951, 536871182, 536870960, 536870969, 536870954, Sun Mar 20 16:32:17 2011 536871161, 536870981, 0, 0, 0 Sun Mar 20 16:32:17 2011 Sun Mar 20 16:32:17 2011 VLRU_QUEUE_MID: Sun Mar 20 16:32:17 2011 Sun Mar 20 16:32:17 2011 VLRU_QUEUE_OLD: Sun Mar 20 16:32:17 2011 Sun Mar 20 16:32:17 2011 VLRU_QUEUE_CANDIDATE: Sun Mar 20 16:32:17 2011 Sun Mar 20 16:32:17 2011 VLRU_QUEUE_HELD: Sun Mar 20 16:32:17 2011 Sun Mar 20 16:32:17 2011 Large vnode cache, 600 entries, 38 allocs, 38392 gets (2434 reads), 171 writes Sun Mar 20 16:32:17 2011 Small vnode cache,600 entries, 9 allocs, 47958 gets (22008 reads), 1339 writes Sun Mar 20 16:32:17 2011 Volume header cache, 600 entries, 78165 gets, 442 replacements Sun Mar 20 16:32:17 2011 Partition /vicepa: 519115596 available 1K blocks (minfree=27780708), 216780036 free blocks Sun Mar 20 16:32:17 2011 With 120 directory buffers; 1296 reads resulted in 13 read I/Os Sun Mar 20 16:32:17 2011 Total Client entries = 1083, blocks = 15; Host entries = 9, blocks = 1 Sun Mar 20 16:32:17 2011 There are 1083 connections, process size 161040 Sun Mar 20 16:32:17 2011 There are 8 workstations, 4 are active (req in < 15 mins), 0 marked "down" Sun Mar 20 16:32:17 2011 CopyOnWrite: calls 194 off0 80 size0 80 maxsize 0xf00000 Sun Mar 20 16:32:17 2011 VShutdown: shutting down on-line volumes on 1 partition... Sun Mar 20 16:32:17 2011 VShutdown: partition /vicepa has 37 volumes with attached headers Sun Mar 20 16:32:17 2011 VShutdown: beginning parallel fileserver shutdown Sun Mar 20 16:32:17 2011 VShutdown: using 128 threads to offline volumes on 1 partition Sun Mar 20 16:32:17 2011 SYNC_getCom: error receiving command Sun Mar 20 16:32:17 2011 FSYNC_com: read failed; dropping connection (cnt=156067) Sun Mar 20 16:32:17 2011 VShutdown: pass 0 completed using the 1 thread per partition algorithm Sun Mar 20 16:32:17 2011 VShutdown: starting passes 1 through 3 using finely-granular mp-fast algorithm Sun Mar 20 16:32:17 2011 ShutdownController: schedule version=1, vol_remaining=37, pass=1 Sun Mar 20 16:32:17 2011 ShutdownController: n_threads_complete=0, n_parts_done_pass=0 Sun Mar 20 16:32:17 2011 ShutdownController: part[0] : (len=37, thread_target=128, done_pass=0, pass_head=097C2170) Sun Mar 20 16:32:17 2011 SYNC_getCom: error receiving command Sun Mar 20 16:32:17 2011 FSYNC_com: read failed; dropping connection (cnt=156068) Sun Mar 20 16:32:17 2011 VShutdown: pass 1 completed using 128 threads on 1 partitions Sun Mar 20 16:32:17 2011 VShutdown: pass 2 completed using 128 threads on 1 partitions Sun Mar 20 16:32:17 2011 VShutdown: done shutting down volumes on partition /vicepa. Sun Mar 20 16:32:17 2011 VShutdown: pass 3 completed using 128 threads on 1 partitions Sun Mar 20 16:32:18 2011 VShutdown: /vicepa stats : (pass[0]=110, pass[1]=1, pass[2]=36, pass[3]=0) Sun Mar 20 16:32:18 2011 VShutdown: shutdown finished using 128 threads Sun Mar 20 16:32:18 2011 VShutdown: complete. Sun Mar 20 16:32:18 2011 fs_stateSave: commencing fileserver state dump Sun Mar 20 16:32:18 2011 fs_stateSave: performing internal consistency checks before proceeding with state dump Sun Mar 20 16:32:18 2011 h_stateVerifyUuidHash: warning: uuid hash entry points to different host struct (7, 5) Sun Mar 20 16:32:18 2011 h_stateVerifyUuidHash: warning: uuid hash entry points to different host struct (6, 5) Sun Mar 20 16:32:18 2011 fs_stateSave: proceeding with dump Sun Mar 20 16:32:18 2011 h_stateSave: hostCount=9 Sun Mar 20 16:32:18 2011 fs_stateSave: fileserver state dump completed successfully Sun Mar 20 16:32:18 2011 File server has terminated normally at Sun Mar 20 16:32:18 2011 _______________________________________________ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info