On 30 Nov 2011, at 18:58, Andrew Deason wrote:
> On Wed, 30 Nov 2011 18:48:47 +0000
> Simon Wilkinson <[email protected]> wrote:
>
>> The idle dead code isn't in any shipping versions of 1.4. Current 1.4
>> clients won't get RX_CALL_TIMEOUT, or RX_CALL_DEAD.
>
> I'm not sure if we're talking about completely different things or what.
> The afs_BlackListOnce code exists in (shipping) 1.4 and, I mean, it
> certainly gets _called_. If I insert a sleep(10000) into the FetchStatus
> handler, the client will give an error (or failover to another site,
> etc); it won't just hang forever on the request.
Okay, so this is all a bit convoluted (isn't everything with RX!). There are
two ways in which an idle dead timeout can be caused...
The relevant code from rxi_CheckCall() is:
/* see if we have a non-activity timeout */
if (call->startWait && idleDeadTime
&& ((call->startWait + idleDeadTime) < now) &&
(call->flags & RX_CALL_READER_WAIT)) {
if (call->state == RX_STATE_ACTIVE) {
cerror = RX_CALL_TIMEOUT;
goto mtuout;
}
}
if (call->lastSendData && idleDeadTime && (conn->idleDeadErr != 0)
&& ((call->lastSendData + idleDeadTime) < now)) {
if (call->state == RX_STATE_ACTIVE) {
cerror = conn->idleDeadErr;
goto mtuout;
}
}
The first code is in 1.4.x, and is enabled there - it returns CALL_TIMEOUT,
which is handled by BlackListOnce. The second block is only enabled on 1.6 and
master and is configured to return CALL_DEAD.
The first block only fires on clients which have turned the call around, and
are now attempting to read from the fileserver. This is actually really fragile
- what CALL_RECEIVE_WAIT actually means is that the application thread has
managed to push all of its packets into the RX layer, and is now blocked on
rx_Read(). In the current implementation this just means that the number of
transmitted packets left unacknowledged is less than 2x the current window
size. For pretty much every AFS-3 RPC other than StoreData, it's meaningless -
we'll enter RECEIVE_WAIT immediately. What it does mean is that this block is
very unlikely to fire for StoreData, as for most chunk sizes we'll be writing
more packets than can be held in the buffer.
So, with StoreData we'll hit the second block. If the other end isn't reading
packets out of RX (because it's blocked on I/O, for example), we won't be able
to send any packets, and we'll trigger the timeout.
It's this behaviour, coupled with the lack of error handling for CALL_DEAD, and
the fact that we don't try and flush a full cache apart from when we're writing
to it, that was the root cause of the original bug report.
However, all of this has exposed some real problems with the idle dead code, as
it currently stands. I believe that some of them are the root cause of some
long standing bug reports.
1) If you have an RPC with a small number of arguments (say CreateFile) the
client will end up in READER_WAIT as soon as it has transmitted the first
packet. If that CreateFile requires a callback break which takes longer than
the idle dead timeout, then the client will timeout the call with CALL_TIMEOUT.
In the meantime, the server will complete the callback break, and create the
file. afs_Analyze will receive CALL_TIMEOUT and retry the operation, the server
will see that the file already exists, and return EEXIST. So, we have an
operation that has actually succeeded returning an error.
2) In cases where a fileserver is taking a long time to break callbacks, the
client can end up giving up due to idle dead timeouts, even if the server would
later be able to handle its request. In 1.4, we'll retry, and (possibly)
succeed, in 1.6 we'll tend to hit the second case first and so fail. However,
just retrying has penalties ..
3) Idle dead is a big cause of call busy problems. It breaks the client and the
server's view of which call slots are empty. Take the example of a client that
has slots 2,3,4 busy with long-running store operations. Slot 1 hits an idle
dead timeout, and the client must retry. So, it starts a new call to the
server, but the only slot that's available is slot 1. It starts a call on that
slot, but that's then bounced back with CALL_BUSY by the server.
Cheers,
Simon.
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel