On Tue, Sep 25, 2012 at 2:16 PM, Andrew Deason <[email protected]> wrote: > On Sat, 15 Sep 2012 11:06:37 +0100 > Simon Wilkinson <[email protected]> wrote: > >> High MTU is where we attempt to discover if the MTU of the link is >> larger than the RX packet size. Code to do this has been in the tree >> for a while - Derrick reworked this as part of the YFS grant work, but >> I don't think ever got something that worked. High MTU discovery uses >> ICMP errors, the DF flag, and works in approximately the same way as >> TCP PMTU discovery, with the exception (as you note) that we can't >> resize existing RX packets. > > Here you are talking about enabling the Linux IP_MTU_DISCOVER > functionality, and the ICMP error queue stuff, correct?
No. This is code which pads packets to discover when they stop being passed. > Maybe what you > describe was the intent of this, but that's certainly not all it does; > this method does detect when the pmtu decreases and we get an icmp > response saying what our next frag limit is. I don't see how this ever > increases the peer mtu. > > Or are you also counting RX_ACK_MTU,lastPacketSize,lastPingSize,etc > here? Yes. >> Low MTU is where the MTU of the link is smaller than the RX packet >> size. This is the case that Derrick discovered at the conference at >> UIUC and wrote code to work around. Low MTU detection doesn't use the >> traditional path MTU discovery code, but instead uses padded RX ping >> packets. If we don't get a response to a ping packet of a certain >> size, then we resend the ping with a lower size. When we eventually >> get a response, that's the MTU of the link. This is the code that uses >> rx_SetMsgsizeRetryErr - if that's registered, and we aren't making >> progress because of MTU, then the call will be failed with that error, >> and the application can retry, and thus get a smaller packet size. > > So, this sounds like either RX_ACK_MTU, lastPacketSize, lastPingSize, > etc, or it sounds like the 'mtuout' label in rxi_CheckCall. One of > those, yes? well, the lastPacket/lastPing is related to low and high. the mtuout case is low. >> To my mind, keeping the two of these separate makes sense at present. >> There are a lot of questions around support for setting the DF flag, >> and getting the ICMP errors delivered to the RX stack, especially when >> that stack is in userspace. > > For now, I'm only worried about receiving ICMP errors on Linux, since > that's the only platform I'm aware of that allows us to receive such > errors without receiving nearly all ICMP errors for the whole box. And > for Linux, this isn't difficult for userspace operations or anything, as > it is a normal unprivileged operation. (Maybe there are other methods on > other platforms for doing this, but I haven't looked into it.) I have a Solaris streams module to do it, but it's ugly. > I think my immediate concern is what to do about lastPacketSize raising > the MTU after we have 'forced' a packet through via fragmentation that > is higher than the actual MTU; this appears to be my only issue > preventing the ICMP/IP_MTU_DISCOVER-based pmtu from working. Ideally I > would want to just not set lastPacketSize/etc for a packet that is going > out that is fragmented, but I don't think we have a way to determine > that under the current model. > > What we could possibly do for Linux is to have two sockets open, one > which is set to always set DF, and one to never set DF, and we could > choose ourselves (Linux doesn't let you set this per-call; we'd have to > setsockopt every time we want to switch... I think other platforms may > let us set this per-call). What we could then do is always send the MTU > pings with DF set, and everything else with DF not set, and only adjust > MTU based on those MTU pings. that sounds like a reasonable approach, though, I suspect this is more portable than just Linux quite simply, and the more places we can have it, the better. > Basing MTU decisions on both (MTU-specific pings and actual data > packets) seems error prone, since the data packets we want to try and > push through by all means, but the MTU ones should fail if any hop > doesn't like the packet size. > > I'm somewhat thinking aloud here now; does this make sense? > > -- > Andrew Deason > [email protected] > > _______________________________________________ > OpenAFS-devel mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-devel > -- Derrick _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
