>> The hole is that there may be an L2 device in the middle which has  a
>> lower MTU than the end hosts.  The neighbor MTU is an upper  bound,
>> but it's not guaranteed to work -- you need to probe to see  what
>> really works.
> 
> 
> (Layer 1 devices can also impose MTU limits.)
> 
> It's important to keep this simple, unless the IEEE guys also want to 
> play and add mechanisms to exchange MTU information between switches.

Before two hosts can transmit packets on Ethernet they have to undergo
neighbour solicitation to find the remote ends hardware address anyway.

When you send the neighbour solicitation you could pad the packet out
with multiple "MTU padding" options to make the packet the size of the
MTU you want to use.

The remote end receives the message, and adds it's own padding[1] to the
reply.

If the original host receives a message from the remote host with MTU
padding options then it knows the remote host can receive a message at
least that big and uses that as the remote hosts MTU.

If the original host does not receive a reply, it updates it's neighbour
cache to say that the MTU supported is the MTU announced by the RA's, if
no MTU is announced then it uses a configured minimum MTU for that link
 and retransmits the query without any "MTU padding" options.

This supports multiple MTU's on the same link.  This consumes more
bandwidth (the padding) during neighbour solicitation which is a "rare"
event (only once per host pair, not per connection, or worse, per
packet).  Neighbour solicitation messages should go to multicast
addresses that are likely to be used by at most only a few nodes, so the
extra bandwidth used by the padding isn't excessive.  This doesn't
introduce more round trips however step down could be implemented with
more round trips if desired.  The test is "admin proof", admins
shouldn't need to do any configuration for a host to start using large
MTU's. If the network is poorly designed with an "invisible" low MTU
link then the hosts will fall back correctly.  This should recover from
changes in MTU since new neighbour discoverys will discover the new link
MTU.  It's backwards compatible (nodes that don't understand this
protocol MUST silently ignore the option according to RFC2461).  The
test should be reasonably robust, if a network cannot support[2] large
mtus then the protocol will fall back to the minimum size.

Problems with this protocol include that it can detect the existence of
a "jumbogram path", and if it doesn't find it at the "preferred MTU"
then it falls back to the lowest common denominator and doesn't detect
intermediate sizes.  It "wastes" bandwidth by padding neighbour
solicitation/advertisement packets.  This also assumes that MTU paths
are symmetric.

Options can only be 256 bytes long (there is an 8 byte length field for
options), so a lot of options are required to pad a neighbour
solicitation/advertisement to 9k.  This could be resolved by changing
the neighbour discovery protocol to use another (longer) length field if
the length field is 0 (RFC 2461 explicitly says you must drop packets
that have this set).  This change would have much more impact however.

If this sounds insane, in my defense it's 3am and sounded like a good
idea at the time :)

Perry

----
[1]: A host for instance may support receiving 9k MTU packets, but that
doesn't mean that it doesn't prefer them to be under 4k, so a host can
ask for it to be sent 4k MTU packets.

[2]: Cannot support because of MTU issues, or due to high loss rates etc.

--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6@ietf.org
Administrative Requests: https://www1.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------

Reply via email to