>> The hole is that there may be an L2 device in the middle which has a >> lower MTU than the end hosts. The neighbor MTU is an upper bound, >> but it's not guaranteed to work -- you need to probe to see what >> really works. > > > (Layer 1 devices can also impose MTU limits.) > > It's important to keep this simple, unless the IEEE guys also want to > play and add mechanisms to exchange MTU information between switches.
Before two hosts can transmit packets on Ethernet they have to undergo neighbour solicitation to find the remote ends hardware address anyway. When you send the neighbour solicitation you could pad the packet out with multiple "MTU padding" options to make the packet the size of the MTU you want to use. The remote end receives the message, and adds it's own padding[1] to the reply. If the original host receives a message from the remote host with MTU padding options then it knows the remote host can receive a message at least that big and uses that as the remote hosts MTU. If the original host does not receive a reply, it updates it's neighbour cache to say that the MTU supported is the MTU announced by the RA's, if no MTU is announced then it uses a configured minimum MTU for that link and retransmits the query without any "MTU padding" options. This supports multiple MTU's on the same link. This consumes more bandwidth (the padding) during neighbour solicitation which is a "rare" event (only once per host pair, not per connection, or worse, per packet). Neighbour solicitation messages should go to multicast addresses that are likely to be used by at most only a few nodes, so the extra bandwidth used by the padding isn't excessive. This doesn't introduce more round trips however step down could be implemented with more round trips if desired. The test is "admin proof", admins shouldn't need to do any configuration for a host to start using large MTU's. If the network is poorly designed with an "invisible" low MTU link then the hosts will fall back correctly. This should recover from changes in MTU since new neighbour discoverys will discover the new link MTU. It's backwards compatible (nodes that don't understand this protocol MUST silently ignore the option according to RFC2461). The test should be reasonably robust, if a network cannot support[2] large mtus then the protocol will fall back to the minimum size. Problems with this protocol include that it can detect the existence of a "jumbogram path", and if it doesn't find it at the "preferred MTU" then it falls back to the lowest common denominator and doesn't detect intermediate sizes. It "wastes" bandwidth by padding neighbour solicitation/advertisement packets. This also assumes that MTU paths are symmetric. Options can only be 256 bytes long (there is an 8 byte length field for options), so a lot of options are required to pad a neighbour solicitation/advertisement to 9k. This could be resolved by changing the neighbour discovery protocol to use another (longer) length field if the length field is 0 (RFC 2461 explicitly says you must drop packets that have this set). This change would have much more impact however. If this sounds insane, in my defense it's 3am and sounded like a good idea at the time :) Perry ---- [1]: A host for instance may support receiving 9k MTU packets, but that doesn't mean that it doesn't prefer them to be under 4k, so a host can ask for it to be sent 4k MTU packets. [2]: Cannot support because of MTU issues, or due to high loss rates etc. -------------------------------------------------------------------- IETF IPv6 working group mailing list ipv6@ietf.org Administrative Requests: https://www1.ietf.org/mailman/listinfo/ipv6 --------------------------------------------------------------------