Hi Jeffrey!

On Mon, 22 Aug 2005, Jeffrey Altman wrote:

Roland Kuhn wrote:

The Abort code is RXKADEXPIRED (19270409L).   Would you verify that you
still have a valid token and that your system clocks are in sync?

The clocks are perfectly synchronized and I'm pretty sure that the
batch jobs have valid tokens, otherwise I would see other failures as
well. Also, wouldn't it be very nasty to effectively disable a
complete client because one connection has no valid token?

The other thing is: it is the _client_ which sends the first ABORT in
response to a challenge....

I've also captured the 'self-healing' of the client state, although I'm
not able to make something of it myself. The full trace is at

http://www.e18.physik.tu-muenchen.de/~rkuhn/openafs.cap

It seems that 118 minutes after the failure the client makes a get-time
call which succeeds, and then everything is happy again.

Ciao,
                    Roland

I simply interpret that to mean that after 118 minutes the client
finally dumps the token and starts to make unencrypted file server
requests.

But how would that explain that even other users with completely unrelated tokens cannot access files on that fileserver from the failed client?

What I am seeing here is that the rx libary is detecting that the
token is expired.   It sends an abort to the server which simply
marks the client's connection in an error state.  Each subsequent
request from the client on that connection is responded to with the
expired token abort code.

Now the question is what is the client doing with the RXKADEXPIRED
error when it receives it from the server.   The answer appears to
be "not much".   It looks to me as if the client is simply issuing
a warning to the user that the tokens are expired.   It does not
actually remove the tokens or reset the connection.

I've had syslog entries like 'kernel: afs: Tokens for user of AFS id -1 for cell <mycell> have expired', but not from the client which actually failed. That one logged 'kernel: afs: failed to store file (110)', where 110 translates into 'connection timed out', right?

Ciao,
                                        Roland
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to