Re: jumbo frame of GbE and IPv6 -- A proposal

Iljitsch van Beijnum Sun, 24 Jul 2005 04:45:00 -0700

On 23-jul-2005, at 17:12, Perry Lorier wrote:

Before two hosts can transmit packets on Ethernet they have to undergo
neighbour solicitation to find the remote ends hardware addressanyway.


Right.

When you send the neighbour solicitation you could pad the packet out
with multiple "MTU padding" options to make the packet the size of the
MTU you want to use.

I don't think we want to do this in the neighbor _solicitation_because there is a good chance that the neighbor doesn't support thelarger MTU so the packet is lost.

If the original host does not receive a reply, it updates it'sneighbourcache to say that the MTU supported is the MTU announced by theRA's, ifno MTU is announced then it uses a configured minimum MTU for thatlink
 and retransmits the query without any "MTU padding" options.

This is problematic for two reasons: when the packet gets lostbecause either the receiver or the layer 2 network don't support asufficiently large MTU/MRU, there is a timeout, which wastes time,and if the receiver does in fact support jumboframes but of a smallersize than the sender supports, this isn't detected.

If this sounds insane, in my defense it's 3am and sounded like a good
idea at the time :)


:-)

I think our requirements are:

1. do not impede 1500-byte operation
2. discover and utilize jumboframe capability where possible
3. discover and utilize (close to) the maximum MTU

4. recover from sudden MTU reductions fast enough for TCP and similarto survive

First of all, we need for hosts to find out that their correspondentssupport a larger MTU/MRU. This can easily be done in an ND option.

Since we're not going to get cooperation from switches, let alonehubs, it's important that we send test packets to see whether thejumboframes actually make it to the other side. I think using ND forthis isn't a good idea: in its current form, it doesn't support therequired packet size, and when the padded packet doesn't make it,there recovery complexities and delays.

So it makes sense to come up with a new protocol for this. Aninteresting notion here is that this protocol doesn't have to be IPv6-specific. However, in IPv6 we have neighbor unreachability detectionwhich we can use to find MTU reductions fast enough to fall back to1500 bytes before bad things happen. In IPv4 or pure ethernet, wedon't have that, and we also don't have neighbor discovery toexchange per-host MRU/MTU information.

If we use IPv6 for this, I think a new ICMP type makes sense.Whenever two systems (hosts or routers) on a link perform neighbordiscovery, they can trigger the MTU verficiation immediatelyafterward, and if jumboframe support is confirmed by receiving thelarger packets, the MTU for the the neighbor can be updated. If thelarger packets don't make it to the neighbor there is no complexityand no delay: communication was already underway at 1500 bytes andcontinues without the need for further action.

However, this doesn't accommodate finding out jumboframe support atreduced sizes very well. For this, I think we should use anadditional exchange, but this one should probably happen overmulticast. Hosts/routers could take turns in a distributed search forthe largest supported framesize. I think it's important that alljumbo-capable systems take part in this in order to deal with unusualtopologies. For instance, consider a network with three switches: onesupport 9000, another 8000 and the two are connected through a thirdswitch that only supports 3000 bytes:



  A                C
  |                |
+-+--+  +----+  +--+-+
|9000+--+3000+--+8000|
+-+--+  +----+  +--+-+
  |                |
  B                D

Suppose all hosts support 9216 byte jumboframes.

I think the most efficient way to handle this is to do two concurrentsearches: one for the maximum packet size that can be used to atleast one correspondent, and one for the minimum jumboframe size thatis supported by all jumboframe supporting systems.

So first A sends out an announcement that it's going to send a 9216byte and a 5596 (1500 + 4096) byte packet, and then sends thepackets. Nobody receives the first packet, but everyone knows A sentit because of the preceding announcement, and B receives the secondpacket.

Then B would (for instance) send out its 9216 byte packet along witha 1500 + 2048 = 3548 byte packet, and also indicates the largest sizethat worked (5596) and the smallest size that didn't work (9126). Areceives the 3548 byte packet but not the 9216 byte one.

C is next and sends out 9216 and (1500 + 1024 = ) 2524 byte packets,along with the information that no jumboframe size has worked so far.A, B and D all receive the 2524 byte packet.

D then sends out 9216 and (1500 + 1536 = ) 3036 byte packets withinformation that it received 2524 but not 3548. C receives the 3036byte packet.

It's now A's turn again. A knows that the size that everyone canreceive is betweeen 2524 and 3036 and the size that at least onecorrespondent can receive is between 5596 and 9216. So it sends out2780 and 7406 byte packets.


And so on.

After a few round like this, each system knows the maximum jumboframesize it can send/receive (so it can adjust its announcements in theND option), and the minimum jumboframe size that everyone supports.It's probably doable to generalize this into any given number oflevels, but I doubt that more than 3 is worth the trouble, and maybehaving two levels even isn't. On the other hand, if some hostssupport 9000 but the majority support 8192 it may be a good idea toforget 9000 and just do 8192.


This may sound horribly complex, but it really isn't.  :-)

The biggest challenge is probably making the different systems talkin turn, but that can probably be done by having a timer that dependson the difference in MAC address between the last system to transmitand prospective next one.

Extra credit: monitor spanning tree events for quick adaption tochanging layer 2 topologies.

Alternatively, we could add an RA option that administrators can useto tell hosts the jumboframe size the layer 2 network supports. (TheRA option doesn't say anything about the capabilities of the_router_.) Then the whole multicast taking turns discovery isn'tnecessary, and we can suffice with a quick one-to-one verificationbefore jumboframes are used.


--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6@ietf.org
Administrative Requests: https://www1.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------

Re: jumbo frame of GbE and IPv6 -- A proposal

Reply via email to