I've had occasional problems where I get a very quick "connection timed out"
when trying to write files.  Here's the symptoms:

$ tar xvf /tmp/cs854.mail.tar 2>&1 | more
./
./1
./.mh_sequences
./2
./3
./4
./5
...
./295
tar: ./295: Cannot close: Connection timed out
./296
tar: ./296: Cannot open: File exists
./297
tar: ./297: Cannot open: Connection timed out
./298
tar: ./298: Cannot open: Connection timed out
... etc more of the same
 ./606
tar: ./606: Cannot open: Connection timed out
./607
tar: ./607: Cannot open: Connection timed out
tar: .: Cannot stat: Connection timed out
tar: Exiting with failure status due to previous errors

(The "File Exists" is partly because I tried touching 296 and 297
to see if that would avoid the connection timed out message).

Several interesting aspects:
(1) The scenario is utterly repeatable.  It always dies on 295.
(2) The "connection timed out" happens without any delay --
    the timeout must be a fraction of a second
(3) The FileLog for the fileserver has no relevant problems:

...
Fri May 17 10:42:30 2013 CB: RCallBackConnectBack (host.c) failed for host 
24.209.180.128:7001
Fri May 17 10:45:29 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55018) 
failed -1
Fri May 17 10:49:30 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55040) 
failed -1
Fri May 17 10:53:31 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55009) 
failed -1
Fri May 17 10:57:32 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55011) 
failed -1
Fri May 17 11:01:33 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55063) 
failed -1

The server and client are both on 129.xx.xx.xx

server is running (I think) SELinux
client is running Ubuntu

$ uname -a
Linux pabst 3.0.0-14-generic #23somerville3-Ubuntu SMP Mon Dec 12 09:20:18 UTC 
2011 x86_64 x86_64 x86_64 GNU/Linux
$ rxdebug localhost -port 7001 -version
Trying 127.0.0.1 (port 7001):
AFS version:  OpenAFS 1.6.0-1-debian built  2013-03-21 
$ rxdebug <fileserver> -port 7000 -version
Trying 129.xx.xx.xx (port 7000):
AFS version:  OpenAFS 1.6.1 built 2013-03-01 (114....@fnal.gov)

(Since I composed the mail part of this email, I have
dicovered the same repeatable problem when trying to clone
a git repository.  Both cases, then, happen when the
file server is being asked to write many many files in
quick succession.)

Given that this is a repeatable problem
and assuming it's a known bug, I'd be happy to run some diagnostics.
If you send me email, you may get some strange messages.
Despite messages about mail being rejected, I
still get it, just not at the most useful place.

Best regards,
John Boyland
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to