On Fri, 12 Apr 2013 20:59:27 +0100 Simon Wilkinson <[email protected]> wrote:
> Various things can cause a client and server to have differing views > on the available call channels. When the client attempts to use a call > channel that the server thinks is in use, the server responds with a > BUSY packet. Originally, the client would just ignore this. It would > then look like the server wasn't responding, and the client would keep > retrying on that channel until either the call timed out, or the > channel on the server was freed. Can't the race below still happen with this old behavior? Say you retry the call 6 times before timing out (pulling numbers out of the air; I don't remember how many it typically takes). The first 5 result in a BUSY response. On the 6th, the server receives the packet and the call channel is clear, but before it gets an ACK to the client, the client times out the call. And the same thing you describe happens; the server processes the request but the client thinks it failed. > The race is as follows: > > Client Server > > Sends 1st pkt of call to server > Receives 1st packet, but channel busy > sets error on old call > sends BUSY packet to client > RTT expires > resends 1st pkt > Old call terminates > Receives BUSY packet > sets call busy flag [sorry if my client messes up the formatting, hopefully you get where I'm referencing] I'm not looking at the code at the moment, but don't we get the serial of the offending packet in the BUSY we receive? Therefore, we should be able to ignore a BUSY packet if it does not reference the most recent serial we've sent. > The question is whether just adding more cases where we invalidate the > cache is the right approach, or whether we should reconsider the BUSY > behaviour. If what I described in the first paragraph indeed applies, it seems like we'd need to invalidate cache items on any network error (not just idle dead, or busy, or whatever). Even a normal DEAD error for the 1st packet of an RPC doesn't guarantee that the server never received anything; imagine the case where, say, all of the server's ACKs get dropped. I'm not really checking myself here and looking rather quickly, so I may be remembering stuff entirely incorrectly. But if any of all that makes any sense, eliminating such errors is impossible and we just need to discard cache data for all uncertain cases. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
