Thus spake "Iljitsch van Beijnum" <[EMAIL PROTECTED]>
On 22-jul-2005, at 20:20, Stephen Sprunk wrote:
Thinking about this a bit more, this could probably be fairly easy to
achieve by creating a "onlink-MRU" or "interface-MRU" option for ND
Neighbour Advertisements.
If there aren't any big holes in what I'm suggesting, I'm willing to
spend some time co-authoring an Internet draft on this.
I'm sure we can find a nice restaurant or café in Paris to discuss the
matter further. :-)
I'm not able to attend, so I'd appreciate it if someone would give my
comments a bit of airtime even if none of the attendees agree with me.
The hole is that there may be an L2 device in the middle which has a
lower MTU than the end hosts. The neighbor MTU is an upper bound, but
it's not guaranteed to work -- you need to probe to see what really
works.
(Layer 1 devices can also impose MTU limits.)
True; read the above L2 as L1/L2.
It's important to keep this simple, unless the IEEE guys also want
to play and add mechanisms to exchange MTU information between
switches.
Given their past stance on jumbos, I don't see that happening.
Just like PMTUD, you need to periodically probe and adjust to changing
network conditions, including detecting "black holes". Fred Baker
suggested the host send both minimum MTU (576 for
v4 and 1280 for v6) and maximum MTU frames in a given burst
and track what gets through.
I'm not really comfortable with this... It makes more sense to me to have
a router or two, or maybe one or two non-router hosts, send out "MTU
announcements", and other hosts only announce the non-
standard MTU in neighbor advertisements when they recently heard
one of those announcements. When the MTU suddenly decreasees,
the announcements are no longer heard, hosts put 1500 in their
neighbor advertisements and neighbor unreachability detection does
the rest.
I'm okay with hosts dropping to 1500 (assuming Ethernet) for neighbors that
can't receive jumbos. It'd be desirable for them to find a higher value
that works but which is still less than both hosts' MTUs. Trying to ratchet
the MTU back up after it's been lowered is probably more trouble than it's
worth.
The fact that ethernet is supposed to have a tree topology makes
things slightly simpler.
Some 802 networks are not trees, and there are non-802 networks out there.
I'd like to have a single jumbo spec for all L1/L2 types; it doesn't seem to
require much tap-dancing around the specific numbers to generalize it to all
media types, though obviously Ethernet is the most common and most in need
of help.
The most perverse scenario I can envision is a network where one host
has an MTU of 9k, another has 8k, one network path has 10k, another path
has 3k, and the path varies every few minutes (and isn't necessarily
symmetric).
Real-life ethernet isn't supposed to be like that...
I've been traumatized by some of the networks I've seen in the wild. The
scenario I gave was deliberately perverse, but it's not very far from the
worst I've encountered -- and I guarantee such things will occur if we
standardize jumbos. I'm betting that's why the IEEE refused to tackle it.
For those who think there isn't a real problem here: it takes a little
over 800 packets per second to saturate a 10 Mbps ethernet link.
At GE speeds that's 80000 packets per second. It is very hard to
achieve decent performance when you have to stop what you're doing 80000
times per second...
Modern hosts can do it, but it'd be nice to reduce the CPU load due to NIC
interrupts if possible.
There is also the environment to consider because the amount of
power switches use is strongly related to the number of packets that
flow through the switch. So increasing the MTU from (for instance)
1500 to 9000 bytes means it only takes 3 packets to transfer 18000
bytes (2 data, 1 ack), while it takes (best case) 13 packets at 1500
bytes (12 data, 1 ack) but usually 18 (6 acks). That saves a LOT
of power.
The power consideration didn't even occur to me; I was thinking of CPU load
on end hosts, per-packet overhead on the wire, and pps limits on network
gear.
S
Stephen Sprunk "Those people who think they know everything
CCIE #3723 are a great annoyance to those of us who do."
K5SSS --Isaac Asimov
--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6@ietf.org
Administrative Requests: https://www1.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------