Hi again!
On Mon, 22 Aug 2005, Roland Kuhn wrote:
Hi Jeffrey!
On Mon, 22 Aug 2005, Jeffrey Altman wrote:
Roland Kuhn wrote:
Hi folks!
On Sun, 21 Aug 2005, Derrick J Brashear wrote:
it needs to include the first error packet, e.g. the window where it
loses contact, to be useful
Okay, it happened again, and I have a full trace:
http://www.e18.physik.tu-muenchen.de/~rkuhn/openafs-fail-trace.cap
http://www.e18.physik.tu-muenchen.de/~rkuhn/openafs-fail-trace-end.cap
The latter contains only the last 81 frames and begins a few frames
before the request which fails. The former is 10MB in size. If you need
more history, I also have the last 1GB of the connection available.
192.168.18.2 is the server, 192.168.18.39 the client. The access is for
big files typically.
Ciao,
Roland
The Abort code is RXKADEXPIRED (19270409L). Would you verify that you
still have a valid token and that your system clocks are in sync?
The clocks are perfectly synchronized and I'm pretty sure that the batch jobs
have valid tokens, otherwise I would see other failures as well. Also,
wouldn't it be very nasty to effectively disable a complete client because
one connection has no valid token?
The other thing is: it is the _client_ which sends the first ABORT in
response to a challenge....
I've also captured the 'self-healing' of the client state, although I'm
not able to make something of it myself. The full trace is at
http://www.e18.physik.tu-muenchen.de/~rkuhn/openafs.cap
It seems that 118 minutes after the failure the client makes a get-time
call which succeeds, and then everything is happy again.
Ciao,
Roland
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel