Re: [OpenAFS-devel] RX retransmit timeout value being overestimated? (Poor performance over WAN)

Simon Wilkinson Fri, 27 Apr 2012 14:40:03 -0700

On 27 Apr 2012, at 19:11, Peter Wells wrote:

> I’m trying to work out why the fileserver keeps pausing like this.


I'm not sure how much of OpenAFS's file transport protocol that you're aware 
of, so sorry if some of this retreads old ground. 

OpenAFS uses a UDP based RPC mechanism called RX. RX provides a reliable 
connection layer on top of UDP by implementing its own acknowledgment and 
congestion control scheme. Originally this was pretty much unique, but over the 
years RX has converged more and more on a TCP style mechanism for congestion 
control.

Unfortunately, OpenAFS releases up until 1.6 were stuck in a neverworld between 
RX's old burst based transmission algorithm, and a TCP-style mechanism for flow 
control. The behaviour that you are seeing is a product of a number of 
unfortunate issues in the 1.4.x RX stack.
 
> The last packet before the pauses is an ACK from the client with a mixture of 
> 32 +ve and –ve acknowledgements… then silence between the server and client 
> for 1.2 seconds…

RX has what we term 'hard' and 'soft' ACKs. A hard ACK moves the congestion 
control window forwards, a soft ACK is roughly analogous to a TCP SACK - it 
implies that that packet has been received, but we have missing packets and so 
we cannot move the window forwards. In 1.4 release, the maximum window size is 
32 packets, which is why you are stalling after with 32 pending 
acknowledgments. 

There is a bug in 1.4 which means that we don't immediately start 
retransmitting when it becomes obvious that packets have been missed (TCP will 
retransmit if more than 2 packets have been received subsequent to a missing 
packet). So, we have to wait until the packets time out. A timeout is a hard 
error, it forces the connection back into slow start (which drops the window 
size), and so you'll see transmission rates slowly ramp back up from here.

> As the rate picks up, the client will NACK a data packet, and then subsequent 
> ACK packets grow in length (in terms of the number of ACKS) until they reach 
> 32, at which time there is another long pause. 

What's interesting about this trace is how regular your stalls are. I can't 
easily explain this regularity, other than that it looks like the connection is 
regu;arly dropping particular packet types.

>    Average rtt is 0.104, with 17838 samples
>    Minimum rtt is 0.000, maximum is 2.147
>  
> That’s a pretty large maximum rtt and I was wondering if this was somehow 
> skewing the calculation of the retransmit timeout value, somehow causing the 
> fileserver to snooze before suddenly realising it should be retransmitting 
> packets. 

RTT calculation in 1.4 is very, very broken, as it feeds far too many samples 
into the RTT alogrithm. However, the effect here shouldn't be to inflate the 
RTT number itself, just to remove the smoothing factor.

> Any thoughts you have will be much appreciated.  The AFS versions are as 
> follows in case it helps:

I would be very interested in seeing how 1.6.1 performs with this network 
configuration. It is unlikely that any work is going to get done in fixing the 
1.4.x transport, but if you can reproduce these issues with 1.6, I'd really 
like to look at some packet traces and work out what's going on.

Cheers,

Simon.

_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: [OpenAFS-devel] RX retransmit timeout value being overestimated? (Poor performance over WAN)

Reply via email to