On 09/15/2012 06:06 AM, Simon Wilkinson wrote:
My best guess for what may have been attempted in the code: when the
calling application defines an error via rx_SetMsgsizeRetryErr, we kill
the call immediately with an error (e.g. RX_MSGSIZE). Otherwise, we try
to force the 1400-byte packet through, and lower packet sizes back to
the discovered MTU as soon as we can. If fragments can't get through,
the call dies with a network error.
Hi Andrew,

I think your understanding on the PMTU code is roughly correct. It pretty much 
matches what I worked out last time I looked at this.

One critical thing that I think your overview misses is that we have two 
different types of MTU discovery. In my notes, I've take to calling these low 
and high MTU.

High MTU is where we attempt to discover if the MTU of the link is larger than 
the RX packet size. Code to do this has been in the tree for a while - Derrick 
reworked this as part of the YFS grant work, but I don't think ever got 
something that worked. High MTU discovery uses ICMP errors, the DF flag, and 
works in approximately the same way as TCP PMTU discovery, with the exception 
(as you note) that we can't resize existing RX packets.

When I looked at this last, my intention was to use high MTU discovery as a 
means of safely enabling jumbograms. Rather than using jumbograms to go over 
the known MTU (which causes fragmentation, and all of the problems that 
jumbograms are known for), you'd use jumbograms to combine RX packets to just 
below the discovered MTU. Doing this avoids all of the problems of jumbograms, 
and means that we don't have to get into creating oversize RX packets, which 
has its own pitfalls.

Low MTU is where the MTU of the link is smaller than the RX packet size. This 
is the case that Derrick discovered at the conference at UIUC and wrote code to 
work around. Low MTU detection doesn't use the traditional path MTU discovery 
code, but instead uses padded RX ping packets. If we don't get a response to a 
ping packet of a certain size, then we resend the ping with a lower size. When 
we eventually get a response, that's the MTU of the link. This is the code that 
uses rx_SetMsgsizeRetryErr - if that's registered, and we aren't making 
progress because of MTU, then the call will be failed with that error, and the 
application can retry, and thus get a smaller packet size.

To my mind, keeping the two of these separate makes sense at present. There are 
a lot of questions around support for setting the DF flag, and getting the ICMP 
errors delivered to the RX stack, especially when that stack is in userspace. 
The low MTU detection should work everywhere. Last time I looked, low MTU had 
some issues - in particular, it was using hard ACKs to determine with a call 
was making progress, when actually the presence of soft ACKs is sufficient (you 
don't care that the packet has reached the application, just that it has been 
successfully received by the network stack)

It would be good to keep discussing this. Like most of RX, this code is all a 
bit tangled, and I think discussing overall design intent is a great way to 
make sure that the patches do what we all expect them to!
Is this already documented somewhere outside of the source code? Should this be in the wiki?

Jason
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to