Andrew Deason wrote, On 3/17/2010 2:02 PM:
On Wed, 17 Mar 2010 13:42:10 -0400
"John W. Sopko Jr."<so...@cs.unc.edu>  wrote:

Is there anything in FileLog around this time, possibly about
failing to break callbacks on a certain host?

No clues, one of the first places I looked.

If this fileserver is not under any load, turning up the debug level can
tell you if it's the fileserver that's being slow and (probably) why.
Sending the fileserver process a TSTP signal 4 times will get you a ton
of debug information in FileLog. Looking around for SAFS_RemoveDir or
SAFS_MakeDir should tell you what's going on. Send it a HUP to stop
spewing debug info.

Here is the strace command and the output of the FileLog trace, thanks for
your help. I did a "rmdir try" and the strace command complained
"No such device", if I "fs flushvolume" on the client it shows the
directory got removed else it shows it still there.

Output from the "strace -tt rmdir try" command on the client host IP 152.2.140.200:

14:25:39.851544 rmdir("try")            = -1 ENODEV (No such device)
14:25:57.908574 write(2, "rmdir: ", 7rmdir: )  = 7
14:25:57.908721 write(2, "try", 3try)      = 3
14:25:57.908849 open("/usr/share/locale/locale.alias", O_RDONLY) = 3
14:25:57.908955 fstat(3, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0
14:25:57.909092 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b394ab7a000
14:25:57.909208 read(3, "# Locale name alias data base.\n#"..., 4096) = 2528
14:25:57.909369 read(3, "", 4096)       = 0
14:25:57.909479 close(3)                = 0
14:25:57.909588 munmap(0x2b394ab7a000, 4096) = 0
14:25:57.909711 open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 14:25:57.909827 open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
14:25:57.909928 write(2, ": No such device", 16: No such device) = 16
14:25:57.910061 write(2, "\n", 1)       = 1

Here is the file server log "kill TSTP" 4 times during the same time,
the client IP is 152.2.140.200

Wed Mar 17 14:25:39 2010 [12] SAFS_RemoveDir try, Did = 536884167.59.4927, Host 152.2.140.200:7001, Id 3903 Wed Mar 17 14:25:39 2010 [12] BCB: BreakCallBack(all but 152.2.140.200:7001, (536884167,59,4927))
Wed Mar 17 14:25:40 2010 [14] SAFS_GetCapabilties
Wed Mar 17 14:25:40 2010 [14] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:40 2010 [10] SAFS_FetchStatus, Fid = 1769560786.8714.9111, Host 152.2.131.239:7001, Id 32766
Wed Mar 17 14:25:40 2010 [10] SAFS_FetchStatus returns 0
Wed Mar 17 14:25:40 2010 [8] SAFS_GetCapabilties
Wed Mar 17 14:25:40 2010 [8] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:43 2010 [6] SAFS_GetCapabilties
Wed Mar 17 14:25:43 2010 [6] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:46 2010 [9] SAFS_GetCapabilties
Wed Mar 17 14:25:46 2010 [9] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:46 2010 [11] SAFS_GetCapabilties
Wed Mar 17 14:25:46 2010 [11] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:47 2010 [7] SAFS_ExtendLock Fid = 536878631.42038.185826, Host 152.2.128.40:7001, Id 1269
Wed Mar 17 14:25:47 2010 [7] SAFS_ExtendLock returns 0
Wed Mar 17 14:25:47 2010 [12] Starting multibreakcall back on all addr for host 152.2.140.115 Wed Mar 17 14:25:48 2010 [4] SAFS_ExtendLock Fid = 536874899.6362.427923, Host 152.2.140.129:7001, Id 1291
Wed Mar 17 14:25:48 2010 [4] SAFS_ExtendLock returns 0
Wed Mar 17 14:25:48 2010 [3] SAFS_ExtendLock Fid = 536874899.6550.427914, Host 152.2.140.129:7001, Id 1291
Wed Mar 17 14:25:48 2010 [3] SAFS_ExtendLock returns 0
Wed Mar 17 14:25:51 2010 [13] SAFS_GetCapabilties
Wed Mar 17 14:25:51 2010 [13] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:51 2010 [14] SAFS_GetCapabilties
Wed Mar 17 14:25:51 2010 [14] FindClient: authenticating connection: authClass=0
Wed Mar 17 14:25:51 2010 [10] SAFS_GetTime returns 1268850351, 919888
Wed Mar 17 14:25:54 2010 [12] BCB: Failed on file 536884167.59.4927, Host 152.2.140.115:7001 is down
Wed Mar 17 14:25:54 2010 [12] SAFS_RemoveDir    returns 0
Wed Mar 17 14:25:57 2010 [8] SAFS_FetchStatus, Fid = 1769568877.1.1, Host 152.2.132.230:7001, Id 32766
Wed Mar 17 14:25:57 2010 [8] SAFS_FetchStatus returns 0
Wed Mar 17 14:25:57 2010 [6] SAFS_FetchStatus, Fid = 1769568877.9.10, Host 152.2.132.230:7001, Id 32766
Wed Mar 17 14:25:57 2010 [6] SAFS_FetchStatus returns 0
Wed Mar 17 14:25:57 2010 [9] SAFS_RemoveDir try, Did = 536884167.59.4927, Host 152.2.140.200:7001, Id 3903
Wed Mar 17 14:25:57 2010 [9] SAFS_RemoveDir     returns 2


--
John W. Sopko Jr.               University of North Carolina
email: sopko AT cs.unc.edu      Computer Science Dept., CB 3175
Phone: 919-962-1844             Fred Brooks Building; Room 140
Fax:   919-962-1799             Chapel Hill, NC 27599-3175
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to