[see conclusion at the end if you skip some stuff]
After the feedback this afternoon I read RFC 4821, which is about
discovering the path MTU by probing rather than depending on ICMP
messages.
This approach has the advantage that it can work just as well over a
layer 2 path with an unknown MTU as over a layer 3 path where
traditional path MTU discovery doesn't work because ICMP "too big"
messages aren't sent or received.
This means that it would be possible for hosts implementing this
mechanism to simply set the maximum MTU search size to their local
non-standard MTU and everything will work.
In my approach, I wanted to avoid sending oversized packets that
don't make it through the layer 2 network as much as possible to
avoid the problems this may cause. In the degenerative case, a
10/100/1000 Mbit host sends a bunch of oversized packets in a short
time because a number of TCP sessions are searching for the MTU and
this happens on an old 10 Mbps network where this leads to some kind
of exception state with further negative impact. (Hubs/switches that
disconnect ports with too many errors, that kind of thing.)
Another difference is that in my draft, routers can announce a TCP
MSS value but the MTU discovery overrides this information on the
local subnet, which makes it easy to stick to 1500 byte (or smaller)
packets across the net but use larger packets locally. Routers can
also announce a maximum allowed MTU so it's easy to make sure that
hosts don't send packets larger than a certain size administratively
if desired. Obviously it's also possible to simply announce the
largest possible MSS so large packets can be used across the internet
(I think this may have been unclear this afternoon).
Last but not least, the RFC 4821 mechanism must be implemented per
transport protocol, while the mechanism in my draft works at the IP
layer so it doesn't introduce new logic in transports.
The idea of having switches send "too big" messages isn't very
attractive for three reasons:
1. This isn't very robust at the IP layer with traditional PMTUD
2. A node could send packets that are so large that the switch can't
receive them so it's not possible to send an ICMP message back either
3. Nodes would need to know whether the switches support this before
they can send larger packets, which is more or less the same reason
why most subnets aren't configured for jumboframes today
First reaction to issues with neighbor discovery over tunnels:
tunnels have problems with MTUs in general and PMTUD in particular.
There are many opportunities for problems, but I think in practice
the mechanism I proposed wouldn't lead to much additional trouble
because the MTU for the tunnel interface is generally low enough that
the mechanism isn't used anyway, and probing + neighbor
unreachability detection (for IPv6) will make sure it's possible to
avoid problems and/or recover from them.
Multiple paths with multiple MTUs: you can't have loops in your
ethernet topology, so the only way to do this is with 802.3ad link
aggregation. As far as I can tell, at least some Cisco equipment
makes sure bundled links all use the same MTU. Not sure if 802.3ad
says anything about this. Also, switches don't send packets belonging
to the same session over different links in a bundle to avoid packet
reordering. So if an MTU failure occurs, it will be consistent.
Because the actual traffic and the ICMP probe message aren't
necessarily the same "session" it's possible that MTU probes and data
traffic see different MTUs. Neighbor unreachability detection will
have to detect the problem so the neighbor MTU is reset.
About 9000 bytes is not enough: all MTU fields are 32 bits in the
draft. :-)
Someone made me aware of this:
http://grouper.ieee.org/groups/802/3/frame_study/index.html
This effort doesn't increase the payload size of ethernet packets,
though.
My conclusion:
Wide scale implementation of RFC 4821 makes the MTU probing packets
unnecessary, but this and the other options and messages can still be
useful for severa reasons:
- skip probing steps because neighbor MTU is known immediately
- allow administrators to limit MTU sizes
- use different MTUs for different link speeds for jitter/delay
control and interaction with nodes/switches with limited capabilities
- allow unmodified transports to use larger packets
I'm thinking that it's probably possible and desireable to make all
messages and options optional, with the exception of something that
allows administrators to limit the MTU subnet-wide with one setting.
But maybe cases can be made for completely removing some messages or
options because it's unlikely they'll be implemented or provide many
benefits if implemented. However, please note that although the
number of new options and messages may seem a bit high, the way in
which they work is actually very straightforward with very simple
decision making logic and only a single new timer introduced.
The goal is to allow the use of larger packets between supporting
nodes on a subnet. Whatever gets that done without breaking any old
stuff that's reasonably still in use is fine by me.
_______________________________________________
Int-area mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/int-area