Derrick Brashear writes: ] [...] ] John Boyland writes: ] <...> ] > ** Cache entry @ 0xa26da3f0 for 1.536875155.530.1418 [cs.uwm.edu] ] > locks: (reader_waiting, write_locked(pid:17732 at:250), 2 waiters) ] > 26532 bytes DV 197 refcnt 3 ] > callback 00000000 expires 1270781645 ] > 1 opens 0 writers ] > normal file ] > states (0x0) ] ] ok, that's the rdwr vnode op, so that makes some sense. ] ] > ** Cache entry @ 0xa27651d0 for 1.536875155.608.1458 [cs.uwm.edu] ] > locks: (upgrade_waiting, write_locked(pid:17679 at:66), 12 waiters) ] > 19094744 bytes DV 109 refcnt 13 ] > callback 00000000 expires 1270782798 ] > 0 opens 0 writers ] > normal file ] > states (0x0) ] ] and 66 is GetDCache
] > The person who generated this process says they were running aprocess ] > that was writing debugging output redirected into a file and there was a ] > loop so the file got very large. ] ] hm. did it really hang, or just get very slow? I was told that at first things got slow, but then it stopped altogether. They then moved to a different machine and read the file from there. Meanwhile, on the bad machine, ANY afs file action hangs. ] > ** Cache entry @ 0xa2710018 for 1.536875155.610.1499 [cs.uwm.edu] ] > locks: (none_waiting, write_locked(pid:17889 at:250)) ] > 23240 bytes DV 1 refcnt 1 ] > callback 00000000 expires 1270782798 ] > 1 opens 0 writers ] > normal file ] > states (0x0) ] > ** Cache entry @ 0xa26bf000 for 1.536873892.1.1 [cs.uwm.edu] ] > locks: (writer_waiting, 7 read_locks(pid:18421), 41 waiters) ] > 2048 bytes DV 8 refcnt 49 ] > callback 00000000 expires 1270774475 ] > 0 opens 0 writers ] > volume root ] > states (0x4), read-only This last one (with 41 waiters) is /afs: good% /usr/afsws/bin/fs getfid /afs File /afs (536873892.1.1) contained in volume 536873892 (on the bad machine, I can't touch /usr/afsws of course since it is a link into /afs) cmdebug now shows 71 waiters -- of course it increases whenever someone touches afs and joins the tarbaby. ] Almost certainly a dcache lock (which cmdebug doesn't show, alas) ] involved here and causing this. ] ] I suppose the other thing which may help: can you collect fstrace for this? bad# ./fstrace setset cm -active [ during this time I touch /afs/not-here and freeze ] bad# ./fstrace dump cm AFS Trace Dump - Date: Fri Apr 9 16:50:14 2010 Found 1 logs. Contents of log cmfx: AFS Trace Dump - Completed bad# ./fstrace setset cm -inactive John Boyland _______________________________________________ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info