Re: Thoughts on increasing MTUs on the internet

2007-04-14 Thread Douglas Otis



On Apr 13, 2007, at 4:55 PM, Fred Baker wrote:

The biggest value in real practice is IMHO that the end systems  
deal with a lower interrupt rate when moving the same amount of  
data. That said, some who are asking about larger MTUs are asking  
for values so large that CRC schemes lose their value in error  
detection, and they find themselves looking at higher layer FEC  
technologies to make up for the issue. Given that there is an  
equipment cost related to larger MTUs, I believe that there is such  
a thing as an MTU that is impractical.


1500 byte MTUs in fact work. I'm all for 9K MTUs, and would  
recommend them. I don't see the point of 65K MTUs.


Keep in mind that a 9KB MTU still reduces the Ethernet CRC  
effectiveness by a fair amount.  Adoption of CRC32c by SCTP and iSCSI  
has a larger Hamming distance restoring the detection rates for Jumbo  
packets.


-Doug



Re: Thoughts on increasing MTUs on the internet

2007-04-14 Thread Iljitsch van Beijnum


On 14-apr-2007, at 19:22, Douglas Otis wrote:

1500 byte MTUs in fact work. I'm all for 9K MTUs, and would  
recommend them. I don't see the point of 65K MTUs.


Keep in mind that a 9KB MTU still reduces the Ethernet CRC  
effectiveness by a fair amount.


In the article Error Characteristics of FDDI by Raj Jain (see  
http://citeseer.ist.psu.edu/341988.html ) table VII says:


Hamming Distance of FCS Polynomal

HammingMax Frame Size
Weight Octets
3  11454
4375
5 37

Of course a 9000 byte packets has 6 times the number of bits in it,  
so the chance of having a number of bit errors in the packet that  
exceeds the hamming distance is ~ 6 times greater.


I can't find bit error rate specs for various types of ethernet real  
quick, but if you assume 10^-9 that means that ~ 1 in 1 11454  
byte packets has one bit error, so around 1 in 10^12 has four bit  
errors and has a _chance_ to defeat the CRC32. The naieve assumption  
that only 1 in 2^32 of those packets with 3 flipped bits will have a  
valid CRC32 is probably incorrect, but the CRC should still catch  
most of those packetss for a fairly large value of most.


For 1500 byte packets the fraction of packets with three bits flipped  
would be around 1 : 10^15, correcting for the larger number of  
packets per given amount of data, that's a difference of about 1 :  
100. That seems like a lot, but getting better quality fiber easily  
compensates for this. Expressed differently, the average amount of  
data transmitted where you see one packet with three flipped bits is  
around 10 petabytes for 11454 byte packets and some 1.3 exabytes for  
1500 byte packets. For the large packets that would be one packet in  
three years at 1 Gbps, for the small ones one packet in 380 years.


Re: Thoughts on increasing MTUs on the internet

2007-04-14 Thread Bill Stewart


One of my customers comments that he doesn't care about jumbograms of
9K or 4K - what he really wants is to be sure the networks support
MTUs of at least 1600-1700 bytes, so that various combinations of
IPSEC, UDP-padding, PPPoE, etc. don't break the real 1500-byte packets
underneath.


Re: Thoughts on increasing MTUs on the internet

2007-04-14 Thread Randy Bush

 One of my customers comments that he doesn't care about jumbograms of
 9K or 4K - what he really wants is to be sure the networks support
 MTUs of at least 1600-1700 bytes, so that various combinations of
 IPSEC, UDP-padding, PPPoE, etc. don't break the real 1500-byte packets
 underneath.

nice to have smart customers!


Re: Thoughts on increasing MTUs on the internet

2007-04-14 Thread Stephen Sprunk


Thus spake Bill Stewart [EMAIL PROTECTED]

One of my customers comments that he doesn't care about
jumbograms of 9K or 4K - what he really wants is to be sure the
networks support MTUs of at least 1600-1700 bytes, so that
various combinations of IPSEC, UDP-padding, PPPoE, etc.
don't break the real 1500-byte packets underneath.


This is a more realistic case, and support for baby jumbos of 2kB to 3kB 
is almost universal even on mid-range networking gear.  However, the 
problems of getting it deployed are mostly the same, except one can take the 
end nodes out of the picture in the simplest case.


OTOH, if we had a viable solution to the variable-MTU mess in the first 
place, you could just upgrade every network to the largest MTU possible and 
hosts would figure out what the PMTU was and nobody would be sending 
1500-byte packets; they'd be either something like 1400 bytes or 9000 bytes, 
depending on whether the path included segments that hadn't been upgraded 
yet...


S

Stephen Sprunk  Those people who think they know everything
CCIE #3723 are a great annoyance to those of us who do.
K5SSS --Isaac Asimov 





Re: Thoughts on increasing MTUs on the internet

2007-04-14 Thread Joe Maimon




Simon Leinen wrote:




* Current Path MTU Discovery doesn't work reliably.

  Please, let's wait for these more robust PMTUD mechanisms to be
  universally deployed before trying to increase the Internet MTU.


I think this is the proper summary of where we are at: Trying to restore 
one of the original design goals of ipv4 -- reliable internetworking of 
different MTU sized networks.


But the waiting game doesnt work, act local and think global.



* IP assumes a consistent MTU within a logical subnet.

  This seems to be a pretty fundamental assumption, and Iljitsch's
  original mail suggests that we fix this. 


This is an implementation detail, since local IP nodes have no 
conception of remote IP nodes subnet detals.




Re: Thoughts on increasing MTUs on the internet

2007-04-14 Thread Douglas Otis


On Apr 14, 2007, at 1:10 PM, Iljitsch van Beijnum wrote:

On 14-apr-2007, at 19:22, Douglas Otis wrote:


1500 byte MTUs in fact work. I'm all for 9K MTUs, and would  
recommend them. I don't see the point of 65K MTUs.


Keep in mind that a 9KB MTU still reduces the Ethernet CRC  
effectiveness by a fair amount.


I can't find bit error rate specs for various types of ethernet  
real quick, but if you assume 10^-9 that means that ~ 1 in 1  
11454 byte packets has one bit error, so around 1 in 10^12 has four  
bit errors and has a _chance_ to defeat the CRC32.  The naieve  
assumption that only 1 in 2^32 of those packets with 3 flipped bits  
will have a valid CRC32 is probably incorrect, but the CRC should  
still catch most of those packetss for a fairly large value of most.


http://www.ietf.org/rfc/rfc3385.txt
http://citeseer.ist.psu.edu/koopman02bit.html


For 1500 byte packets the fraction of packets with three bits  
flipped would be around 1 : 10^15, correcting for the larger number  
of packets per given amount of data, that's a difference of about  
1 : 100.




Quoting from When The CRC and TCP Checksum Disagree by Jonathan  
Stone and Craig Partridge:


http://citeseer.ist.psu.edu/cache/papers/cs/21401/ 
http:zSzzSzsigcomm.it.uu.sezSzconfzSzpaperzSzsigcomm2000-9-1.pdf/ 
stone00when.pdf


Traces of Internet packets from the past two years show that between  
1 packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even  
on links where link-level CRCs should catch all but 1 in 4 billion  
errors.  For certain situations, the rate of checksum failures can be  
even higher: in one hour-long test we observed a checksum failure of  
1 packet in 400.  We investigate why so many errors are observed,  
when link-level CRCs should catch nearly all of them.


We have collected nearly 500,000 packets which failed the TCP or UDP  
or IP checksum. This dataset shows the Internet has a wide variety of  
error sources which can not be detected by link-level checks.  We  
describe analysis tools that have identified nearly 100 different  
error patterns. Categorizing packet errors, we can infer likely  
causes which explain roughly half the observed errors. The causes  
span the entire spectrum of a network stack, from memory errors to  
bugs in TCP.


After an analysis we conclude that the checksum will fail to detect  
errors for roughly 1 in 16 million to 10 billion packets. From our  
analysis of the cause of errors, we propose simple changes to several  
protocols which will decrease the rate of undetected error. Even so,  
the highly non-random distribution of errors strongly suggests some  
applications should employ application-level checksums or equivalents.


Hardware weaknesses within DSLAMs or various memory arrays, such as a  
weak driver on some internal interface, can generate high levels of  
multi-bit errors not detected by TCP checksums.  When affecting the  
same bit within an interface, more than 1 out of 100 may go undetected.



That seems like a lot, but getting better quality fiber easily  
compensates for this. Expressed differently, the average amount of  
data transmitted where you see one packet with three flipped bits  
is around 10 petabytes for 11454 byte packets and some 1.3 exabytes  
for 1500 byte packets. For the large packets that would be one  
packet in three years at 1 Gbps, for the small ones one packet in  
380 years.


Consider that the CRC is not always carried with the packet between  
interfaces.


-Doug



RE: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Neil J. McRae


Saku Ytti wrote:

 IXP peeps, why are you not offering high MTU VLAN option?
 From my point of view, this is biggest reason why we today
 generally don't have higher end-to-end MTU.
 I know that some IXPs do, eg. NetNOD but generally it's
 not offered even though many users would opt to use it.

Larger MTU size was something I did some work on back in the FDDI days and the 
benefits are significant. More than just CPU improvements. Throughput and 
server performance increased substantially also. But  FDDI and the like didn't 
come cheap so little interest at the time. At the LINX a few providers did run 
larger MTUs during those FDDI days. We did some testing with SRP/DPT in 
Stockholm and London also and again it worked well but again not cheap. (we 
were looking at this for storage and exchange of cached content at the time)

Unfortunately I think the time where IXPs could make a difference might be past 
- and to make this happen tere needs to be more of a demand  from the members 
of tose exchanges, its not just a case of turning on a vlan either the impact 
to the main fabric needs to be understood. Also atleast here in Europe many of 
the circuits into exchanges are Ethernet based also and I suspect many circuits 
into exchanges would require a lot of work to support Jumbos.  And then again 
lots of circuits into customer premise are Ethernet based now also some on GFP 
based SDH systems, some ATM and other whacko technologies with dubious support 
for jumbo or larger frames.

Then there is the actual interface card support of large amounts of jumbos 
which in my experience is questionable  based on a limitedl amount of testing 
though - Come back POS all is forgiven!

Regards,
Neil



RE: Thoughts on increasing MTUs on the internet

2007-04-13 Thread michael.dillon


 No, I doubt it will change.  The CRC algorithm used in Ethernet is 
 already strained by the 1500-byte-plus payload size.  802.3 
 won't extend 
   to any larger size without running a significant risk of the CRC 
 algorithm failing.

I believe this has already been debunked.
 
  From a practical side, the cost of developing, qualifying, 
 and selling 
 new chipsets to handle jumbo packets would jack up the cost of inside 
 equipment.  What is the payback?  How much money do you save going to 
 jumbo packets?

I believe that the change is intended to apply to routers and the
ethernet switches that interconnect them in PoPs and NAPs and exchange
points. Therefore the cost of a small chipset modification is likely to
be negligible in the grand scheme of things.

As for numbers, it is not dollar figures that I want to see. I would
like the people who have jumbo packets inside their end-user networks to
run some MTU discovery and publish a full MTU matrix on all paths on the
Internet. That way we can all see where there is end-to-end support for
large MTUs and people who want to make buying decisions on this basis
will have something other than vendor assurances to show that a network
supports jumbograms. 

--Michael Dillon


Re: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Valdis . Kletnieks
On Fri, 13 Apr 2007 08:22:49 +0300, Saku Ytti said:
 
 On (2007-04-12 20:00 -0700), Stephen Satchell wrote:
  
  From a practical side, the cost of developing, qualifying, and selling 
  new chipsets to handle jumbo packets would jack up the cost of inside 
  equipment.  What is the payback?  How much money do you save going to 
  jumbo packets?
 
 It's rather hard to find ethernet gear operators could imagine using in
 peering or core that do not support +9k MTU's.

Note that the number of routers in the core is probably vastly outweighted
by the number of border and edge routers.  There's a *lot* of old eBay routers
out there - and until you get a clean path all the way back to the source
system, you won't *see* any 9K packets.

What's the business case for upgrading an older edge router to support 9K
MTU, when the only source of packets coming in is a network of Windows
boxes (both servers and end systems in offices) run by somebody who wouldn't
believe an Ethernet has anything other than a 1500 MTU if you stapled the
spec sheet to their forehead?

For that matter, what releases of Windows support setting a 9K MTU?  That's
probably the *real* uptake limiter.


pgpvV1Mibb6JM.pgp
Description: PGP signature


RE: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Leigh Porter


I don't think it matters that everything can use jumbograms or that every 
single device on the Internet supports them. Heck, I still know networks with 
kit that does not support VLSM!

What would be good is if when a jumbogram capable path on the Internet exists, 
jumbograms can be used.

This way it does not matter than some box somewhere does not support anything 
greater than a 1500 byte MTU, anything with such a box in the path will simply 
not support a jumbogram. How do you find out? Just send a jumbogram across the 
path and see what happens.. ;-)

--
Leigh Porter
UK Broadband


-Original Message-
From: [EMAIL PROTECTED] on behalf of [EMAIL PROTECTED]
Sent: Fri 4/13/2007 3:36 PM
To: Saku Ytti
Cc: NANOG list
Subject: Re: Thoughts on increasing MTUs on the internet
 
On Fri, 13 Apr 2007 08:22:49 +0300, Saku Ytti said:
 
 On (2007-04-12 20:00 -0700), Stephen Satchell wrote:
  
  From a practical side, the cost of developing, qualifying, and selling 
  new chipsets to handle jumbo packets would jack up the cost of inside 
  equipment.  What is the payback?  How much money do you save going to 
  jumbo packets?
 
 It's rather hard to find ethernet gear operators could imagine using in
 peering or core that do not support +9k MTU's.

Note that the number of routers in the core is probably vastly outweighted
by the number of border and edge routers.  There's a *lot* of old eBay routers
out there - and until you get a clean path all the way back to the source
system, you won't *see* any 9K packets.

What's the business case for upgrading an older edge router to support 9K
MTU, when the only source of packets coming in is a network of Windows
boxes (both servers and end systems in offices) run by somebody who wouldn't
believe an Ethernet has anything other than a 1500 MTU if you stapled the
spec sheet to their forehead?

For that matter, what releases of Windows support setting a 9K MTU?  That's
probably the *real* uptake limiter.



RE: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Mikael Abrahamsson


On Fri, 13 Apr 2007, Leigh Porter wrote:

What would be good is if when a jumbogram capable path on the Internet 
exists, jumbograms can be used.


Yes, and it would be good if PMTUD worked, and ECN, oh and large 
UDP-packets for DNS, and BCP38, and... and... and.


The internet is a very diverse and complicated beast and if end systems 
can properly detect PMTU by doing discovery of this, it might work. 
Requiring the core and distribution to change isn't going to happen 
overnight, so end systems first. Make sure they can properly detect PMTU 
by use of nothing more than is this packet size getting thru (ie no 
ICMP-NEED-TO-FRAG) or alike, then we might see partial adoption of larger 
MTU in some parts and if this becomes a major customer requirement then it 
might spread.


--
Mikael Abrahamssonemail: [EMAIL PROTECTED]


Re: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Steve Meuse

On 4/13/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:



For that matter, what releases of Windows support setting a 9K
MTU?  That's
probably the *real* uptake limiter.




Most, if not all.  I have an XP box that has a GigE with 9k MTU.


--

-Steve


Re: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Adrian Chadd

On Fri, Apr 13, 2007, Steve Meuse wrote:
 On 4/13/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 
 
 For that matter, what releases of Windows support setting a 9K
 MTU?  That's
 probably the *real* uptake limiter.
 
 Most, if not all.  I have an XP box that has a GigE with 9k MTU.

Lucky you. The definition of large frames varies depending entirely
upon driver. I came up against this when a client nicely asked about
jumbo frames on his shiny new Cisco 3560 switch - and none of his
computers could agree on anything greater than 4k. And, to make things
worse - a few of the drivers wanted to enforce certain values rather
than any value between 1500 and an upper limit - making the whole
feat impossible.

Yay for non-clear specifications. The skeptic in me says ain't going
to happen. The believer in me says Ah, that'd be cool, wouldn't it?
The realist in me says probably best to mandate that kind of stuff
with the next revision of the ipv6-internet with the first few bits
set to 010 instead of 001. :)

The real uptake limiter is the disagreement on implementation.
Some of you have to remember how this whole internet thing started
and grew (I've only read about the collaboration in books.)



Adrian



Re: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Stephen Sprunk


Thus spake Mikael Abrahamsson [EMAIL PROTECTED]

The internet is a very diverse and complicated beast and if end
systems can properly detect PMTU by doing discovery of this, it
might work.  ... Make sure they can properly detect PMTU by
use of nothing more than is this packet size getting thru (ie
no ICMP-NEED-TO-FRAG) or alike, then we might see partial
adoption of larger MTU in some parts and if this becomes a
major customer requirement then it might spread.


PMTU Black Hole Detection works well in my experience, but unfortunately MS 
doesn't turn it on by default, which is where all of the L2VPN with 1500 
MTU issues come from; turn BHD on and the problems just go away...  (And, as 
others have noted, there's better PMTUD algorithms that are designed to work 
_with_ black holes, but IME they're not really needed)


Still, we have a (mostly) working solution for wide-area use; what's missing 
is the critical step in getting varying MTUs working on a single subnet. 
All the solutions so far have required setting a higher, but still fixed, 
MTU for every device and that isn't realistic on the edge except in tightly 
controlled environments like HPC or internal datacenters.


Perry Lorier's solution is rather clever; perhaps we don't even need a 
protocol sanctioned by the IEEE or IETF?


S

Stephen Sprunk  Those people who think they know everything
CCIE #3723 are a great annoyance to those of us who do.
K5SSS --Isaac Asimov 





RE: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Lasher, Donn
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Stephen Sprunk
Sent: Friday, April 13, 2007 10:32 AM
To: Mikael Abrahamsson
Cc: North American Noise and Off-topic Gripes
Subject: Re: Thoughts on increasing MTUs on the internet

PMTU Black Hole Detection works well in my experience, but unfortunately MS
doesn't turn it on by default, which is
 where all of the L2VPN with 1500 MTU issues come from; turn BHD on and
the problems just go away...  (And, as others
have noted, there's better PMTUD algorithms that are designed to work
_with_ black holes, but IME they're not really
 needed)

I wish I'd had your experience. PMTU _can_ work well, but on the internet as
a whole, far too many ignorant paranoid admins block PMTU, mostly by
accident, causing all sorts of unpleasantness. Clearing DF only takes you so
far. Unless both ends are aware, and respond apppropriately to the squeeze
in the middle, you're back to square one.

Unless there were some other method of MTU Discovery implemented, depending
on something like PMTU discovery may fail just as dramatically on larger
packets as it does on 1500byte now.





smime.p7s
Description: S/MIME cryptographic signature


Re: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Stephen Sprunk


Thus spake Lasher, Donn [EMAIL PROTECTED]

PMTU Black Hole Detection works well in my experience, but unfortunately
MS doesn't turn it on by default, which is where all of the L2VPN with 
1500

MTU issues come from; turn BHD on and the problems just go away...  (And,
as others have noted, there's better PMTUD algorithms that are designed to
work _with_ black holes, but IME they're not really needed)


I wish I'd had your experience. PMTU _can_ work well, but on the internet 
as

a whole, far too many ignorant paranoid admins block PMTU, mostly by
accident, causing all sorts of unpleasantness.


You can't block PMTUD per se, just the ICMP messages that dumber 
implementations rely on.  And, as I noted, MS's implementation is dumb by 
default, which leads to the problems we're all familiar with.  PMTU Black 
Hole Detection is appropriately named; one registry change* and a reboot is 
all you need to solve the problem.  Of course, that's non-trivial to 
implement when there's hundreds of millions of boxes with the wrong 
setting...



Clearing DF only takes you so far. Unless both ends are aware, and respond
apppropriately to the squeeze in the middle, you're back to square one.


Smarter implementations still set DF.  The difference is that when they get 
neither an ACK nor an ICMP, they try progressively smaller sizes until they 
do get a response of some kind.  They make a note of what works and continue 
on with that, with the occasional larger probe in case the problem was 
transient.


In fact, one could consider Lorier's mtud to be roughly the same idea; 
it's only needed because the stack's own PMTUD code is typically bypassed 
for on-subnet destinations and/or not as smart as it should be.


S

* HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\
Parameters\EnablePMTUBHDetect=1

Stephen Sprunk  Those people who think they know everything
CCIE #3723 are a great annoyance to those of us who do.
K5SSS --Isaac Asimov 





Re: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Simon Leinen

Ah, large MTUs.  Like many other academic backbones, we implemented
large (9192 bytes) MTUs on our backbone and 9000 bytes on some hosts.
See [1] for an illustration.  Here are *my* current thoughts on
increasing the Internet MTU beyond its current value, 1500.  (On the
topic, see also [2] - a wiki page which is actually served on a
9000-byte MTU server :-)

Benefits of 1500-byte MTUs:

Several benefits of moving to larger MTUs, say in the 9000-byte range,
were cited.  I don't find them too convincing anymore.

1. Fewer packets reduce work for routers and hosts.

   Routers:
 
   Most backbones seem to size their routers to sustain (near-)
   line-rate traffic even with small (64-byte) packets.  That's a good
   thing, because if networks were dimensioned to just work at average
   packet sizes, they would be pretty easy to DoS by sending floods of
   small packets.  So I don't see how raising the MTU helps much
   unless you also raise the minimum packet size - which might be
   interesting, but I haven't heard anybody suggest that.

   This should be true for routers and middleboxes in general,
   although there are certainly many places (especially firewalls)
   where pps limitations ARE an issue.  But again, raising the MTU
   doesn't help if you're worried about the worst case.  And I would
   like to see examples where it would help significantly even in the
   normal case.  In our network it certainly doesn't - we have Mpps to
   spare.
 
   Hosts:
 
   For hosts, filling high-speed links at 1500-byte MTU has often been
   difficult at certain times (with Fast Ethernet in the nineties,
   GigE 4-5 years ago, 10GE today), due to the high rate of
   interrupts/context switches and internal bus crossings.
   Fortunately tricks like polling-instead-of-interrupts (Saku Ytti
   mentioned this), Interrupt Coalescence and Large-Send Offload have
   become commonplace these days.  These give most of the end-system
   performance benefits of large packets without requiring any support
   from the network.

2. Fewer bytes (saved header overhead) free up bandwidth.

   TCP segments over Ethernet with 1500 byte MTU is only 94.2%
   efficient, while with 9000 byte MTU it would be 99.?% efficient.
   While an improvement would certainly be nice, 94% already seems
   good enough to me.  (I'm ignoring the byte savings due to fewer
   ACKs.  On the other hand not all packets will be able to grow
   sixfold - some transfers are small.)

3. TCP runs faster.

   This boils down to two aspects (besides the effects of (1) and (2)):

   a) TCP reaches its cruising speed faster.

  Especially with LFNs (Long Fat Networks, i.e. paths with a large
  bandwidth*RTT product), it can take quite a long time until TCP
  slow-start has increased the window so that the maximum
  achievable rate is reached.  Since the window increase happens
  in units of MSS (~MTU), TCPs with larger packets reach this
  point proportionally faster.

  This is significant, but there are alternative proposals to
  solve this issue of slow ramp-up, for example HighSpeed TCP [3].

   b) You get a larger share of a congested link.

  I think this is true when a TCP-with-large-packets shares a
  congested link with TCPs-with-small-packets, and the packet loss
  probability isn't proportional to the size of the packet.  In
  fact the large-packet connection can get a MUCH larger share
  (sixfold for 9K vs. 1500) if the loss probability is the same
  for everybody (which it often will be, approximately).  Some
  people consider this a fairness issue, other think it's a good
  incentive for people to upgrade their MTUs.

About the issues:

* Current Path MTU Discovery doesn't work reliably.

  Path MTU Discovery as specified in RFC 1191/1981 relies on ICMP
  messages to discover when a smaller MTU has to be used.  When these
  ICMP messages fail to arrive (or be sent), the sender will happily
  continue to send too-large packets into the blackhole.  This problem
  is very real.  As an experiment, try configuring an MTU  1500 on a
  backbone link which has Ethernet-connected customers behind it.
  I bet that you'll receive LOUD complaints before long.

  Some other people mention that Path MTU Discovery has been refined
  with blackhole detection methods in some systems.  This is widely
  implemented, but not configured (although it probably could be with
  a Service Pack).

  Note that a new Path MTU Discovery proposal was just published as
  RFC 4821 [4].  This is also supposed to solve the problem of relying
  on ICMP messages.

  Please, let's wait for these more robust PMTUD mechanisms to be
  universally deployed before trying to increase the Internet MTU.

* IP assumes a consistent MTU within a logical subnet.

  This seems to be a pretty fundamental assumption, and Iljitsch's
  original mail suggests that we fix this.  Umm, ok, I hope we don't
  miss anything important that makes use of 

Re: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Fred Baker


I agree with many of your thoughts. This is essentially the same  
discussion we had upgrading from the 576 byte common MTU of the  
ARPANET to the 1500 byte MTU of Ethernet-based networks. Larger MTUs  
are a good thing, but are not a panacea. The biggest value in real  
practice is IMHO that the end systems deal with a lower interrupt  
rate when moving the same amount of data. That said, some who are  
asking about larger MTUs are asking for values so large that CRC  
schemes lose their value in error detection, and they find themselves  
looking at higher layer FEC technologies to make up for the issue.  
Given that there is an equipment cost related to larger MTUs, I  
believe that there is such a thing as an MTU that is impractical.


1500 byte MTUs in fact work. I'm all for 9K MTUs, and would recommend  
them. I don't see the point of 65K MTUs.


On Apr 14, 2007, at 7:39 AM, Simon Leinen wrote:



Ah, large MTUs.  Like many other academic backbones, we implemented
large (9192 bytes) MTUs on our backbone and 9000 bytes on some hosts.
See [1] for an illustration.  Here are *my* current thoughts on
increasing the Internet MTU beyond its current value, 1500.  (On the
topic, see also [2] - a wiki page which is actually served on a
9000-byte MTU server :-)

Benefits of 1500-byte MTUs:

Several benefits of moving to larger MTUs, say in the 9000-byte range,
were cited.  I don't find them too convincing anymore.

1. Fewer packets reduce work for routers and hosts.

   Routers:

   Most backbones seem to size their routers to sustain (near-)
   line-rate traffic even with small (64-byte) packets.  That's a good
   thing, because if networks were dimensioned to just work at average
   packet sizes, they would be pretty easy to DoS by sending floods of
   small packets.  So I don't see how raising the MTU helps much
   unless you also raise the minimum packet size - which might be
   interesting, but I haven't heard anybody suggest that.

   This should be true for routers and middleboxes in general,
   although there are certainly many places (especially firewalls)
   where pps limitations ARE an issue.  But again, raising the MTU
   doesn't help if you're worried about the worst case.  And I would
   like to see examples where it would help significantly even in the
   normal case.  In our network it certainly doesn't - we have Mpps to
   spare.

   Hosts:

   For hosts, filling high-speed links at 1500-byte MTU has often been
   difficult at certain times (with Fast Ethernet in the nineties,
   GigE 4-5 years ago, 10GE today), due to the high rate of
   interrupts/context switches and internal bus crossings.
   Fortunately tricks like polling-instead-of-interrupts (Saku Ytti
   mentioned this), Interrupt Coalescence and Large-Send Offload have
   become commonplace these days.  These give most of the end-system
   performance benefits of large packets without requiring any support
   from the network.

2. Fewer bytes (saved header overhead) free up bandwidth.

   TCP segments over Ethernet with 1500 byte MTU is only 94.2%
   efficient, while with 9000 byte MTU it would be 99.?% efficient.
   While an improvement would certainly be nice, 94% already seems
   good enough to me.  (I'm ignoring the byte savings due to fewer
   ACKs.  On the other hand not all packets will be able to grow
   sixfold - some transfers are small.)

3. TCP runs faster.

   This boils down to two aspects (besides the effects of (1) and  
(2)):


   a) TCP reaches its cruising speed faster.

  Especially with LFNs (Long Fat Networks, i.e. paths with a large
  bandwidth*RTT product), it can take quite a long time until TCP
  slow-start has increased the window so that the maximum
  achievable rate is reached.  Since the window increase happens
  in units of MSS (~MTU), TCPs with larger packets reach this
  point proportionally faster.

  This is significant, but there are alternative proposals to
  solve this issue of slow ramp-up, for example HighSpeed TCP [3].

   b) You get a larger share of a congested link.

  I think this is true when a TCP-with-large-packets shares a
  congested link with TCPs-with-small-packets, and the packet loss
  probability isn't proportional to the size of the packet.  In
  fact the large-packet connection can get a MUCH larger share
  (sixfold for 9K vs. 1500) if the loss probability is the same
  for everybody (which it often will be, approximately).  Some
  people consider this a fairness issue, other think it's a good
  incentive for people to upgrade their MTUs.

About the issues:

* Current Path MTU Discovery doesn't work reliably.

  Path MTU Discovery as specified in RFC 1191/1981 relies on ICMP
  messages to discover when a smaller MTU has to be used.  When these
  ICMP messages fail to arrive (or be sent), the sender will happily
  continue to send too-large packets into the blackhole.  This problem
  is very real.  As an 

Re: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Joe Greco

   As long as only a small minority of hosts supports 1500-byte MTUs,
   there is no incentive for anyone important to start supporting them.
   A public server supporting 9000-byte MTUs will be frustrated when it
   tries to use them.  The overhead (from attempted large packets that
   don't make it) and potential trouble will just not be worth it.
   This is a little similar to IPv6.
 
 So I don't see large MTUs coming to the Internet at large soon.  They
 probably make sense in special cases, maybe for land-speed records
 and dumb high-speed video equipment, or for server-to-server stuff
 such as USENET news.

It is *certainly* helpful for USENET news.

So perhaps it is time to chuck the whole thing out and start over.  There
seem to be enough projects out there (cleanslate.stanford.edu, etc) that
are looking at just that topic...  maybe it is time for a new network
design with IPv6, flexible MTU's, etc.

The existing MTU 1500 situation made sense on ten megabit ethernet, of
course, and at the time, the overall design of the Internet, and the
capabilities of the underlying network hardware were such that it 
wasn't that reasonable or practical to consider trying to make it
negotiable.

There is no valid technical reason for that situation with modern
hardware.  The reasons people argue against larger MTU all appear to
have to do with hysterical raisins.

1500 was okay at 10 megabits.  That could imply 15000 for 100 megabits,
and 15 for 1 gigabit.  There probably isn't a huge number of
applications for such large MTU's, and certainly universal support is
not likely to happen, but we have to realize that the speeds of networks
will continue to increase, and in five years we'll probably be running
terabit networks everywhere.  I could picture 150K MTU's being useful
at those speeds.

The goal shouldn't really be to simply allow for some fixed higher MTU.
If any of these redesign the Internet programs succeed, we should be
very certain that MTU flexibility is a core feature.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again. - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Pierfrancesco Caci

:- Iljitsch == Iljitsch van Beijnum [EMAIL PROTECTED] writes:

 Dear NANOGers,

 It irks me that today, the effective MTU of the internet is 1500
 bytes, while more and more equipment can handle bigger packets.

 What do you guys think about a mechanism that allows hosts and
 routers on a subnet to automatically discover the MTU they can use
 towards other systems on the same subnet, so that:

 1. It's no longer necessary to limit the subnet MTU to that of the
 least capable system

 2. It's no longer necessary to manage 1500 byte+ MTUs manually

 Any additional issues that such a mechanism would have to address?

wouldn't that work only if the switch in the middle of your neat
office lan is a real switch (i.e. not flooding oversize packets to
hosts that can't handle them, possibly crashing their NIC drivers) and
it's itself capable of larger MTUs?

Pf


-- 


---
 Pierfrancesco Caci | Network  System Administrator - INOC-DBA: 6762*PFC
 [EMAIL PROTECTED] | Telecom Italia Sparkle - http://etabeta.noc.seabone.net/
Linux clarabella 2.6.12-10-686-smp #1 SMP Fri Sep 15 16:47:57 UTC 2006 i686 
GNU/Linux



Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Iljitsch van Beijnum


On 12-apr-2007, at 12:02, Pierfrancesco Caci wrote:


wouldn't that work only if the switch in the middle of your neat
office lan is a real switch (i.e. not flooding oversize packets to
hosts that can't handle them, possibly crashing their NIC drivers) and
it's itself capable of larger MTUs?


Well, yes, being compatible with stuff that doesn't support larger  
packets pretty much goes without saying. I don't think there is any  
need to worry about crashing drivers, packets that are longer than  
they should are a common error condition that drivers are supposed to  
handle without incident. (They often keep a giant count.)


A more common problem would be two hosts that support jumboframes  
with a switch in the middle that doesn't. So it's necessary to test  
for this and avoid excessive numbers or large packets when something  
in the middle doesn't support them.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Saku Ytti

On (2007-04-12 11:20 +0200), Iljitsch van Beijnum wrote:
 
 What do you guys think about a mechanism that allows hosts and  
 routers on a subnet to automatically discover the MTU they can use  
 towards other systems on the same subnet, so that:
 1. It's no longer necessary to limit the subnet MTU to that of the  
 least capable system
 
 2. It's no longer necessary to manage 1500 byte+ MTUs manually

To me this sounds adding complexity for rather small pay-off. And
then we'd have to ask IXP people, would the enable this feature
if it was available? If so, why don't they offer high MTU VLAN
today?
And in the end, pay-off of larger MTU is quite small, perhaps
some interrupts are saved but not sure how relevant that is
in poll() based NIC drivers. Of course bigger pay-off
would be that users could use tunneling and still offer 1500
to LAN.

IXP peeps, why are you not offering high MTU VLAN option?
From my point of view, this is biggest reason why we today
generally don't have higher end-to-end MTU.
I know that some IXPs do, eg. NetNOD but generally it's
not offered even though many users would opt to use it.

Thanks,
-- 
  ++ytti


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Stephen Wilcox

On Thu, Apr 12, 2007 at 01:03:45PM +0200, Iljitsch van Beijnum wrote:
 
 On 12-apr-2007, at 12:02, Pierfrancesco Caci wrote:
 
 wouldn't that work only if the switch in the middle of your neat
 office lan is a real switch (i.e. not flooding oversize packets to
 hosts that can't handle them, possibly crashing their NIC drivers) and
 it's itself capable of larger MTUs?
 
 Well, yes, being compatible with stuff that doesn't support larger  
 packets pretty much goes without saying. I don't think there is any  
 need to worry about crashing drivers, packets that are longer than  
 they should are a common error condition that drivers are supposed to  
 handle without incident. (They often keep a giant count.)
 
 A more common problem would be two hosts that support jumboframes  
 with a switch in the middle that doesn't. So it's necessary to test  
 for this and avoid excessive numbers or large packets when something  
 in the middle doesn't support them.

the internet is broken.. too many firewalls dropping icmp, too many hard coded 
systems that work for 'default' but dont actually allow for alternative 
parameters that should work according to the RFCs

if you can fix all that then it might work

alternatively if you can redesign path mtu discovery that might work too..

Martin Levy suggested this too me only two weeks ago, he had an idea of sending 
two packets initially - one 'default' and one at the higher mtu .. if the 
higher one gets dropped somewhere you can quickly spot it and revert to 
'default' behaviour.

I think his explanation was more complicated but it was an interesting idea

Steve




Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Mikael Abrahamsson


On Thu, 12 Apr 2007, Saku Ytti wrote:


IXP peeps, why are you not offering high MTU VLAN option?


Netnod in Sweden offer MTU 4470 option.

Otoh it's not so easy operationally since for instance Juniper and Cisco 
calculates MTU differently.


But I don't really see it beneficial to try to up the endsystem MTU to 
over standard ethernet MTU, if you think it's operationally troublesome 
with PMTUD now, imagine when everybody is running different MTU.


Biggest benefit would be if the transport network people run PPPoE and 
other tunneled traffic over, would allow for whatever MTU needed to carry 
unfragmented 1500 byte tunneled packets, so we could assure that all hosts 
on the internet actually have 1500 IP MTU transparently.


--
Mikael Abrahamssonemail: [EMAIL PROTECTED]


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Niels Bakker


* [EMAIL PROTECTED] (Mikael Abrahamsson) [Thu 12 Apr 2007, 14:07 CEST]:

On Thu, 12 Apr 2007, Saku Ytti wrote:

IXP peeps, why are you not offering high MTU VLAN option?
Biggest benefit would be if the transport network people run PPPoE and 
other tunneled traffic over, would allow for whatever MTU needed to 
carry unfragmented 1500 byte tunneled packets, so we could assure that 
all hosts on the internet actually have 1500 IP MTU transparently.


How much traffic from DSLAM to service provider is currently being 
exchanged across IXPs?


(My money's on as close to 0 to not really matter)


-- Niels.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Steven M. Bellovin

On Thu, 12 Apr 2007 11:20:18 +0200
Iljitsch van Beijnum [EMAIL PROTECTED] wrote:

 
 Dear NANOGers,
 
 It irks me that today, the effective MTU of the internet is 1500
 bytes, while more and more equipment can handle bigger packets.
 
 What do you guys think about a mechanism that allows hosts and
 routers on a subnet to automatically discover the MTU they can use
 towards other systems on the same subnet, so that:
 
 1. It's no longer necessary to limit the subnet MTU to that of the
 least capable system
 
 2. It's no longer necessary to manage 1500 byte+ MTUs manually
 
 Any additional issues that such a mechanism would have to address?
 

Last I heard, the IEEE won't go along, and they're the ones who
standardize 802.3.

A few years ago, the IETF was considering various jumbogram options.
As best I recall, that was the official response from the relevant
IEEE folks: no. They're concerned with backward compatibility.  

Perhaps that has changed (and I certainly) don't remember who sent that
note.  


--Steve Bellovin, http://www.cs.columbia.edu/~smb


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Gian Constantine
I agree. The throughput gains are small. You're talking about a  
difference between a 4% header overhead versus a 1% header overhead  
(for TCP).


One could argue a decreased pps impact on intermediate systems, but  
when factoring in the existing packet size distribution on the  
Internet and the perceived adjustment seen by a migration to 4470 MTU  
support, the gains remain small.


Development costs and the OpEx costs of implementation and support  
will, likely, always outweigh the gains.


Gian Anthony Constantine


On Apr 12, 2007, at 7:50 AM, Saku Ytti wrote:



On (2007-04-12 11:20 +0200), Iljitsch van Beijnum wrote:


What do you guys think about a mechanism that allows hosts and
routers on a subnet to automatically discover the MTU they can use
towards other systems on the same subnet, so that:
1. It's no longer necessary to limit the subnet MTU to that of the
least capable system

2. It's no longer necessary to manage 1500 byte+ MTUs manually


To me this sounds adding complexity for rather small pay-off. And
then we'd have to ask IXP people, would the enable this feature
if it was available? If so, why don't they offer high MTU VLAN
today?
And in the end, pay-off of larger MTU is quite small, perhaps
some interrupts are saved but not sure how relevant that is
in poll() based NIC drivers. Of course bigger pay-off
would be that users could use tunneling and still offer 1500
to LAN.

IXP peeps, why are you not offering high MTU VLAN option?
From my point of view, this is biggest reason why we today
generally don't have higher end-to-end MTU.
I know that some IXPs do, eg. NetNOD but generally it's
not offered even though many users would opt to use it.

Thanks,
--
  ++ytti




Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Florian Weimer

* Steven M. Bellovin:

 A few years ago, the IETF was considering various jumbogram options.
 As best I recall, that was the official response from the relevant
 IEEE folks: no. They're concerned with backward compatibility.  

Gigabit ethernet has already broken backwards compatibility and is
essentially point-to-point, so the old compatibility concerns no
longer apply.  Jumbo frame opt-in could even be controlled with a
protocol above layer 2.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Iljitsch van Beijnum


On 12-apr-2007, at 16:04, Gian Constantine wrote:

I agree. The throughput gains are small. You're talking about a  
difference between a 4% header overhead versus a 1% header overhead  
(for TCP).


6% including ethernet overhead and assuming the very common TCP  
timestamp option.


One could argue a decreased pps impact on intermediate systems, but  
when factoring in the existing packet size distribution on the  
Internet and the perceived adjustment seen by a migration to 4470  
MTU support, the gains remain small.


Average packet size on the internet has been fairly constant at  
around 500 bytes the past 10 years or so from my vantage point. You  
only need to make 7% of all packets 9000 bytes and you double that.  
This means that you can have twice the amount of data transferred for  
the same amount of per-packet work. If you're at 100% of your CPU or  
TCAM capacity today, that is a huge win. On the other hand, if you  
need to buy equipment that can do line rate at 64 bytes per packet,  
it doesn't matter much.


There are other benefits too, though. For instance, TCP can go much  
faster with bigger packets. Additional tunnel/VPN overhead isn't as bad.


Development costs and the OpEx costs of implementation and support  
will, likely, always outweigh the gains.


Gains will go up as networks get faster and faster, implementation  
should approach zero over time and support shouldn't be an issue if  
it works fully automatically.


Others mentioned ICMP filtering and PMTUD problems. Filtering  
shouldn't be an issue for a mechanism that is local to a subnet, and  
if it is, there's still no problem if the mechanism takes the  
opposite approach of PMTUD. With PMTUD, the assumption is that large  
works, and extra messages result in a smaller packet size. By  
exchanging large messages that indicate the capability to exchange  
large messages, form and function align, and if an indication that  
large messages are possible isn't received, it's not used and there  
are no problems.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Steven M. Bellovin

On Thu, 12 Apr 2007 16:12:43 +0200
Florian Weimer [EMAIL PROTECTED] wrote:

 * Steven M. Bellovin:
 
  A few years ago, the IETF was considering various jumbogram options.
  As best I recall, that was the official response from the relevant
  IEEE folks: no. They're concerned with backward compatibility.  
 
 Gigabit ethernet has already broken backwards compatibility and is
 essentially point-to-point, so the old compatibility concerns no
 longer apply.  Jumbo frame opt-in could even be controlled with a
 protocol above layer 2.
 
I'm neither attacking nor defending the idea; I'm merely reporting.

I'll also note that the IETF is very unlikely to challenge IEEE on
this.  There's an informal agreement on who owns which standards.  The
IETF resents attempts at modifications to its standards by other
standards bodies; by the same token, it tries to avoid doing that to
others.


--Steve Bellovin, http://www.cs.columbia.edu/~smb


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Iljitsch van Beijnum


On 12-apr-2007, at 15:26, Steven M. Bellovin wrote:


Last I heard, the IEEE won't go along, and they're the ones who
standardize 802.3.


I knew there was a reason we use ethernet II rather than IEEE 802.3  
for IP.  :-)



A few years ago, the IETF was considering various jumbogram options.
As best I recall, that was the official response from the relevant
IEEE folks: no. They're concerned with backward compatibility.


Obviously keeping the same maximum packet size when moving from 10 to  
100 to 1000 to 1 Mbps is suboptimal. However, if the newer  
standards were to mandate a larger maximum packet size, a station  
connected to a 10/100/1000 switch at 1000 Mbps would be able to send  
packets that a 10 Mbps station wouldn't be able to receive. (And the  
802.3 length field starts clashing with ethernet II type codes.)


However, to a large degree this ship has sailed because many vendors  
implement jumboframes. If we can fix the interoperability issue at  
layer 3 for IP that the IEEE can't fix at layer 2 for 802.3, then I  
don't see how anyone could have a problem with that. Also, such a  
mechanism would obviously be layer 2 agnostic, so in theory, it  
doesn't step on the IEEE's turf at all.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Keegan . Holley
I think it's a great idea operationally, less work for the routers and 
more efficient use of bandwidth.   It would also be useful to devise some 
way to at least partially reassemble fragmented frames at links capable of 
large MTU's.  Since most PC's are on a subnet with a MTU of 1500 (or 1519) 
packets would still be limited to 1500B or fragmented before they reach 
the higher speed links.  The problem with bringing this to fruition in the 
internet is going to be cost and effort.  The ATT's and Verizons of the 
world are going to see this as a major upgrade without much benefit or 
profit.  The Cisco's and Junipers are going to say the same thing when 
they have to write this into their code plus interoperability with other 
vendors implementations of it.





Iljitsch van Beijnum [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
04/12/2007 05:20 AM

To
NANOG list [EMAIL PROTECTED]
cc

Subject
Thoughts on increasing MTUs on the internet







Dear NANOGers,

It irks me that today, the effective MTU of the internet is 1500 
bytes, while more and more equipment can handle bigger packets.

What do you guys think about a mechanism that allows hosts and 
routers on a subnet to automatically discover the MTU they can use 
towards other systems on the same subnet, so that:

1. It's no longer necessary to limit the subnet MTU to that of the 
least capable system

2. It's no longer necessary to manage 1500 byte+ MTUs manually

Any additional issues that such a mechanism would have to address?





Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Florian Weimer

* Steven M. Bellovin:

 On Thu, 12 Apr 2007 16:12:43 +0200
 Florian Weimer [EMAIL PROTECTED] wrote:

 * Steven M. Bellovin:
 
  A few years ago, the IETF was considering various jumbogram options.
  As best I recall, that was the official response from the relevant
  IEEE folks: no. They're concerned with backward compatibility.  
 
 Gigabit ethernet has already broken backwards compatibility and is
 essentially point-to-point, so the old compatibility concerns no
 longer apply.  Jumbo frame opt-in could even be controlled with a
 protocol above layer 2.

 I'm neither attacking nor defending the idea; I'm merely reporting.

I just wanted to point out that the main reason why this couldn't be
done without breaking backwards compatibility is gone (shared physical
medium with unknown and unforeseeable receiver capabilities).

 I'll also note that the IETF is very unlikely to challenge IEEE on
 this.

It's certainly unwise to do so before PMTUD works without ICMP
support. 8-)


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Warren Kumari


On Apr 12, 2007, at 10:04 AM, Gian Constantine wrote:

I agree. The throughput gains are small. You're talking about a  
difference between a 4% header overhead versus a 1% header overhead  
(for TCP).


One of the benefits of larger MTU is that, during the additive  
increase phase, or after recovering from congestion, you reach full  
speed sooner --  it does also mean that if you do reach congestion,  
you throw away more data, and, because of the length of flows, are  
probably more likely to cause congestion...





One could argue a decreased pps impact on intermediate systems, but  
when factoring in the existing packet size distribution on the  
Internet and the perceived adjustment seen by a migration to 4470  
MTU support, the gains remain small.t




Development costs and the OpEx costs of implementation and support  
will, likely, always outweigh the gains.


Gian Anthony Constantine


On Apr 12, 2007, at 7:50 AM, Saku Ytti wrote:



On (2007-04-12 11:20 +0200), Iljitsch van Beijnum wrote:


What do you guys think about a mechanism that allows hosts and
routers on a subnet to automatically discover the MTU they can use
towards other systems on the same subnet, so that:
1. It's no longer necessary to limit the subnet MTU to that of the
least capable system

2. It's no longer necessary to manage 1500 byte+ MTUs manually


To me this sounds adding complexity for rather small pay-off. And
then we'd have to ask IXP people, would the enable this feature
if it was available? If so, why don't they offer high MTU VLAN
today?
And in the end, pay-off of larger MTU is quite small, perhaps
some interrupts are saved but not sure how relevant that is
in poll() based NIC drivers. Of course bigger pay-off
would be that users could use tunneling and still offer 1500
to LAN.

IXP peeps, why are you not offering high MTU VLAN option?
From my point of view, this is biggest reason why we today
generally don't have higher end-to-end MTU.
I know that some IXPs do, eg. NetNOD but generally it's
not offered even though many users would opt to use it.

Thanks,
--
  ++ytti




--
Some people are like Slinkies..Not really good for anything but  
they still bring a smile to your face when you push them down the  
stairs.






Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Saku Ytti

On (2007-04-12 16:28 +0200), Iljitsch van Beijnum wrote:
 
 On 12-apr-2007, at 16:04, Gian Constantine wrote:
 
 I agree. The throughput gains are small. You're talking about a  
 difference between a 4% header overhead versus a 1% header overhead  
 (for TCP).
 
 6% including ethernet overhead and assuming the very common TCP  
 timestamp option.

Out of curiosity how is this calculated?
[EMAIL PROTECTED] ~]% echo 1450/(1+7+6+6+2+1500+4+12)*100|bc -l
94.27828348504551365400
[EMAIL PROTECTED] ~]% echo 8950/(1+7+6+6+2+9000+4+12)*100|bc -l
99.02633325957070148200
[EMAIL PROTECTED] ~]% 

I calculated less than 5% from 1500 to 9000, with ethernet and
adding TCP timestamp. What did I miss?

Or compared without tcp timestamp and 1500 to 4470.
[EMAIL PROTECTED] ~]% echo 1460/(1+7+6+6+2+1500+4+12)*100|bc -l
94.92847854356306892000
[EMAIL PROTECTED] ~]% echo 4410/(1+7+6+6+2+4470+4+12)*100|bc -l
97.82608695652173913000

Less than 3%.

However, I don't think it's relevant if it's 1% or 10%, bigger
benefit would be to give 1500 end-to-end, even with eg. ipsec
to the office.

-- 
  ++ytti


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Saku Ytti

 Or compared without tcp timestamp and 1500 to 4470.
 [EMAIL PROTECTED] ~]% echo 1460/(1+7+6+6+2+1500+4+12)*100|bc -l
 94.92847854356306892000
 [EMAIL PROTECTED] ~]% echo 4410/(1+7+6+6+2+4470+4+12)*100|bc -l
 97.82608695652173913000

Apparently 70-40 is too hard for me.

[EMAIL PROTECTED] ~]% echo 4430/(1+7+6+6+2+4470+4+12)*100|bc -l
98.26974267968056787900

so ~3.3%

-- 
  ++ytti


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Gian Constantine
I did a rough, top-of-the-head, with ~60 bytes header (ETH, IP, TCP)  
into 1500 and 4470 (a mistake, on my part, not to use 9216).


I still think the cost outweighs the gain, though there are some  
reasonable arguments for the increase.


Gian Anthony Constantine


On Apr 12, 2007, at 12:07 PM, Saku Ytti wrote:



On (2007-04-12 16:28 +0200), Iljitsch van Beijnum wrote:


On 12-apr-2007, at 16:04, Gian Constantine wrote:


I agree. The throughput gains are small. You're talking about a
difference between a 4% header overhead versus a 1% header overhead
(for TCP).


6% including ethernet overhead and assuming the very common TCP
timestamp option.


Out of curiosity how is this calculated?
[EMAIL PROTECTED] ~]% echo 1450/(1+7+6+6+2+1500+4+12)*100|bc -l
94.27828348504551365400
[EMAIL PROTECTED] ~]% echo 8950/(1+7+6+6+2+9000+4+12)*100|bc -l
99.02633325957070148200
[EMAIL PROTECTED] ~]%

I calculated less than 5% from 1500 to 9000, with ethernet and
adding TCP timestamp. What did I miss?

Or compared without tcp timestamp and 1500 to 4470.
[EMAIL PROTECTED] ~]% echo 1460/(1+7+6+6+2+1500+4+12)*100|bc -l
94.92847854356306892000
[EMAIL PROTECTED] ~]% echo 4410/(1+7+6+6+2+4470+4+12)*100|bc -l
97.82608695652173913000

Less than 3%.

However, I don't think it's relevant if it's 1% or 10%, bigger
benefit would be to give 1500 end-to-end, even with eg. ipsec
to the office.

--
  ++ytti




Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Stephen Wilcox

On Thu, Apr 12, 2007 at 11:34:43AM -0400, [EMAIL PROTECTED] wrote:
 
I think it's a great idea operationally, less work for the routers and more
efficient use of bandwidth.   It would also be useful to devise some way to
at least partially reassemble fragmented frames at links capable of large
MTU's.  

I think you underestimate the memory and cpu required on large links to be able 
to buffer the data that would allow a reassembly by an intermediate router

Since most PC's are on a subnet with a MTU of 1500 (or 1519) packets
would still be limited to 1500B or fragmented before they reach the higher
speed links.  The problem with bringing this to fruition in the internet is
going to be cost and effort.  The ATT's and Verizons of the world are going
to see this as a major upgrade without much benefit or profit.  The Cisco's
and Junipers are going to say the same thing when they have to write this
into their code plus interoperability with other vendors implementations of
it.

I dont think any of the above will throw out any particular objection.. I think 
your problem is in figuring out a way to implement this globally and not break 
stuff which relies so heavily upon 1500 bytes much of which does not even cater 
for the possibility another MTU might be possible.

Steve


 
Iljitsch van Beijnum [EMAIL PROTECTED]
Sent by: [EMAIL PROTECTED]
 
04/12/2007 05:20 AM
 
To
 
NANOG list [EMAIL PROTECTED]
 
cc
 
   Subject
 
Thoughts on increasing MTUs on the internet
 
Dear NANOGers,
It irks me that today, the effective MTU of the internet is 1500
bytes, while more and more equipment can handle bigger packets.
What do you guys think about a mechanism that allows hosts and
routers on a subnet to automatically discover the MTU they can use
towards other systems on the same subnet, so that:
1. It's no longer necessary to limit the subnet MTU to that of the
least capable system
2. It's no longer necessary to manage 1500 byte+ MTUs manually
Any additional issues that such a mechanism would have to address?


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Iljitsch van Beijnum


On 12-apr-2007, at 18:07, Saku Ytti wrote:


I agree. The throughput gains are small. You're talking about a
difference between a 4% header overhead versus a 1% header overhead
(for TCP).



6% including ethernet overhead and assuming the very common TCP
timestamp option.



Out of curiosity how is this calculated?


8 bytes preamble
14 bytes ethernet II header
20 bytes IP header
20 bytes TCP header
12 bytes timestamp option
4 bytes FCS/CRC
12 bytes equivalent inter frame gap

90 bytes total overhead, 52 deducted from the ethernet payload, 38  
added to it.


90 / (1500 - 52 = 1448) * 100 = 6.21

90 / (9000 - 52 = 8948) * 100 = 1

Also note that the real overhead is much bigger because for every two  
full size TCP packets an ACK is sent so that adds 90 bytes per 2 data  
packets, or increases the overhead to 9% / 1.5%.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Saku Ytti

On (2007-04-12 19:51 +0200), Iljitsch van Beijnum wrote:
 
 8 bytes preamble
 14 bytes ethernet II header
 20 bytes IP header
 20 bytes TCP header
 12 bytes timestamp option
 4 bytes FCS/CRC
 12 bytes equivalent inter frame gap
 
 90 bytes total overhead, 52 deducted from the ethernet payload, 38  
 added to it.
 
 90 / (1500 - 52 = 1448) * 100 = 6.21
 
 90 / (9000 - 52 = 8948) * 100 = 1
 
 Also note that the real overhead is much bigger because for every two  
 full size TCP packets an ACK is sent so that adds 90 bytes per 2 data  
 packets, or increases the overhead to 9% / 1.5%.

Aren't you double penalizing? Should it be:
[EMAIL PROTECTED] ~]% echo 90 / (1500+38) * 100|bc -l 
5.85175552665799739900

Or other way to say it:
[EMAIL PROTECTED] ~]% echo 100-(1448/(1+7+6+6+2+1500+4+12)*100)|bc -l
5.8517555266579974


-- 
  ++ytti


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Joe Loiacono
Large MTUs enable significant throughput performance enhancements for 
large data transfers over long round-trip times (RTTs.) The original 
question had to do with local subnet to local subnet where the difference 
would not be noticable. But for users transferring large data sets over 
long distances  (e.g. LHC experimental data from CERN in France to 
universities in the US) large MTUs can make a big difference.

For an excellent and detailed (though becoming dated) examination of this 
see:

Raising the Internet MTU Matt Mathis, et. al. 

http://www.psc.edu/~mathis/MTU/

Joe




Stephen Wilcox [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
04/12/2007 01:45 PM

To
[EMAIL PROTECTED]
cc
NANOG list [EMAIL PROTECTED]
Subject
Re: Thoughts on increasing MTUs on the internet







On Thu, Apr 12, 2007 at 11:34:43AM -0400, [EMAIL PROTECTED] wrote:
 
I think it's a great idea operationally, less work for the routers 
and more
efficient use of bandwidth.   It would also be useful to devise some 
way to
at least partially reassemble fragmented frames at links capable of 
large
MTU's. 

I think you underestimate the memory and cpu required on large links to be 
able to buffer the data that would allow a reassembly by an intermediate 
router

Since most PC's are on a subnet with a MTU of 1500 (or 1519) packets
would still be limited to 1500B or fragmented before they reach the 
higher
speed links.  The problem with bringing this to fruition in the 
internet is
going to be cost and effort.  The ATT's and Verizons of the world are 
going
to see this as a major upgrade without much benefit or profit.  The 
Cisco's
and Junipers are going to say the same thing when they have to write 
this
into their code plus interoperability with other vendors 
implementations of
it.

I dont think any of the above will throw out any particular objection.. I 
think your problem is in figuring out a way to implement this globally and 
not break stuff which relies so heavily upon 1500 bytes much of which does 
not even cater for the possibility another MTU might be possible.

Steve


 
Iljitsch van Beijnum [EMAIL PROTECTED]
Sent by: [EMAIL PROTECTED]
 
04/12/2007 05:20 AM
 
 To
 
NANOG list [EMAIL PROTECTED]
 
 cc
 
 Subject
 
Thoughts on increasing MTUs on the internet
 
Dear NANOGers,
It irks me that today, the effective MTU of the internet is 1500
bytes, while more and more equipment can handle bigger packets.
What do you guys think about a mechanism that allows hosts and
routers on a subnet to automatically discover the MTU they can use
towards other systems on the same subnet, so that:
1. It's no longer necessary to limit the subnet MTU to that of the
least capable system
2. It's no longer necessary to manage 1500 byte+ MTUs manually
Any additional issues that such a mechanism would have to address?



Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Randy Bush

 A few years ago, the IETF was considering various jumbogram options.
 As best I recall, that was the official response from the relevant
 IEEE folks: no. They're concerned with backward compatibility.

worse.  they felt that the ether checksum is good at 1500 and not
so good at 4k etc.  they *really* did not want to do jumbo.  i
worked that doc.

randy



RE: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Buhrmaster, Gary


 Last I heard, the IEEE won't go along, and they're the ones who
 standardize 802.3.
 
 A few years ago, the IETF was considering various jumbogram options.
 As best I recall, that was the official response from the relevant
 IEEE folks: no. They're concerned with backward compatibility.  

As I remember it, the IEEE did not say no (that is not the
style of such standards bodies).  Instead, they said something
along the lines of We will consider any proposal that does
not break (existing) standards/implementations.  And, to the
best of my knowledge, the smart people of the world have not
yet made a proposal that meets the requirements (and I believe
more than a few have tried to think the issues through).

There is absolutely nothing to prevent one from implementing
jumbos (if you can even agree how large that should be).
It just seems that whatever one implements will likely not
be an IEEE standard (unless one is smarter than the last
set of smart people).

Gary


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Randy Bush

 A few years ago, the IETF was considering various jumbogram options.
 As best I recall, that was the official response from the relevant
 IEEE folks: no. They're concerned with backward compatibility.  
 
 As I remember it, the IEEE did not say no

i was in the middle of this one.  they said no.  the checksum becomes
much weaker at 4k and 9k.  and ether does have errors.

randy


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Iljitsch van Beijnum


On 12-apr-2007, at 20:15, Randy Bush wrote:


A few years ago, the IETF was considering various jumbogram options.
As best I recall, that was the official response from the relevant
IEEE folks: no. They're concerned with backward compatibility.



worse.  they felt that the ether checksum is good at 1500 and not
so good at 4k etc.  they *really* did not want to do jumbo.  i
worked that doc.


It looks to me that the checksum issue is highly exaggerated or even  
completely wrong (as in the 1500 / 4k claim above). From http:// 
www.aarnet.edu.au/engineering/networkdesign/mtu/size.html :


---
The ethernet packet also contains a Frame Check Sequence, which is a  
32-bit CRC of the frame. The weakening of this frame check which  
greater frame sizes is explored in R. Jain's Error Characteristics  
of Fiber Distributed Data Interface (FDDI), which appeared in IEEE  
Transactions on Communications, August 1990. Table VII shows a table  
of Hamming Distance versus frame size. Unfortunately, the CRC for  
frames greater than 11445 bytes only has a minimum Hamming Distance  
of 3. The implication being that the CRC will only detect one-bit and  
two-bit errors (and not non-burst 3-bit or 4-bit errors). The CRC for  
between 375 and 11543 bytes has a minimum Hamming Distance of 4,  
implying that all 1-bit, 2-bit and 3-bit errors are detected and most  
non-burst 4-bit errors are detected.


The paper has two implications. Firstly, the power of ethernet's  
Frame Check Sequence is the major limitation on increasing the  
ethernet MTU beyond 11444 bytes. Secondly, frame sizes under 11445  
bytes are as well protected by ethernet's Frame Check Sequence as  
frame sizes under 1518 bytes.


---



Is the FCS supposed to provide guaranteed protection against a  
certain number of bit errors per packet? I don't believe that's the  
case. With random bit errors, there's still only a risk of not  
detecting an error in the order of 1 : 2^32, regardless of the length  
of the packet. But even *any* effective weakening of the FCS caused  
by an increased packet size is considered unacceptable, it's still  
possible to do 11543 byte packets without changing the FCS algorithm.




Also, I don't see fundamental problem in changing the FCS for a new  
802.3 standard, as switches can strip off a 64-bit FCS and add a 32- 
bit one as required.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Randy Bush

 It looks to me that the checksum issue is highly exaggerated or even
 completely wrong (as in the 1500 / 4k claim above). From
 http://www.aarnet.edu.au/engineering/networkdesign/mtu/size.html :

glad you have an opinion.  take it to the ieee.

randy


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Randy Bush

mark does not have posting privs and has asked me to post the
following for him:

---

To: Gian Constantine [EMAIL PROTECTED]
From: Mark Allman [EMAIL PROTECTED]
cc: NANOG list [EMAIL PROTECTED]
Subject: Re: Thoughts on increasing MTUs on the internet 
Date: Thu, 12 Apr 2007 11:47:35 -0400

Folks-

 I agree. The throughput gains are small. You're talking about a
 difference between a 4% header overhead versus a 1% header overhead
 (for TCP).

This does not begin to reflect the gain.  Check out the model of TCP
performance given in:

  M. Mathis, J. Semke, J. Mahdavi, T. Ott, The Macroscopic Behavior of
  the TCP Congestion Avoidance Algorithm, Computer Communication Review,
  volume 27, number3, July 1997.
  (number 35 at http://www.psc.edu/~mathis/papers/index.html)

The key point is that performance is directly proportional to packet
size.  So, an increase in the packet size is much more than a simple
lowering of the overhead.

In addition, the newly published RFC 4821 offers a different way to do
PMTUD without relying on ICMP feedback (essentially by trying different
packet sizes and trying to infer things from whether they get dropped).

A good general reference to the subject of bigger MTUs is Matt Mathis'
page on the subject:

  http://www.psc.edu/~mathis/MTU/

allman

-- 
Mark Allman -- ICIR/ICSI -- http://www.icir.org/mallman/



Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Mikael Abrahamsson


On Thu, 12 Apr 2007, Joe Loiacono wrote:


Large MTUs enable significant throughput performance enhancements for
large data transfers over long round-trip times (RTTs.) The original


This is solved by increasing TCP window size, it doesn't depend very much 
on MTU.


Larger MTU is better for devices that for instance do per-packet 
interrupting, like most endsystems probably do. It doesn't increase 
long-RTT transfer performance per se (unless you have high packetloss 
because you'll slow-start more efficiently).


--
Mikael Abrahamssonemail: [EMAIL PROTECTED]


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Joe Loiacono
[EMAIL PROTECTED] wrote on 04/12/2007 04:05:43 PM:

 
 On Thu, 12 Apr 2007, Joe Loiacono wrote:
 
  Large MTUs enable significant throughput performance enhancements for
  large data transfers over long round-trip times (RTTs.) The original
 
 This is solved by increasing TCP window size, it doesn't depend very 
much 
 on MTU.

Window size is of course critical, but it turns out that MTU also impacts 
rates (as much as 33%, see below):

MSS  0.7
Rate = - * ---
RTT(P)**0.5

MSS = Maximum Segment Size
RTT = Round Trip Time
P   = packet loss

Mathis, et. al. have 'verified the model through both simulation and live 
Internet measurements.'

Also (http://www.aarnet.edu.au/engineering/networkdesign/mtu/why.html): 

This is shown to be the case in Anand and Hartner's TCP/IP Network Stack 
Performance in Linux Kernel 2.4 and 2.5 in Proceedings of the Ottawa 
Linux Symposium, 2002. Their experience was that a machine using a 1500 
byte MTU could only reach 750Mbps whereas the same machine configured with 
9000 byte MTUs handsomely reached 1Gbps.

AARnet - Australia's Academic and Research Network

 
 Larger MTU is better for devices that for instance do per-packet 
 interrupting, like most endsystems probably do. It doesn't increase 
 long-RTT transfer performance per se (unless you have high packetloss 
 because you'll slow-start more efficiently).
 
 -- 
 Mikael Abrahamssonemail: [EMAIL PROTECTED]


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Mikael Abrahamsson


On Thu, 12 Apr 2007, Joe Loiacono wrote:


Window size is of course critical, but it turns out that MTU also impacts
rates (as much as 33%, see below):

   MSS  0.7
Rate = - * ---
   RTT(P)**0.5

MSS = Maximum Segment Size
RTT = Round Trip Time
P   = packet loss


So am I to understand that with 0 packetloss I get infinite rate? And TCP 
window size doesn't affect the rate?


I am quite confused by this statement. Yes, under congestion larger MSS is 
better, but without congestion I don't see where it would differ apart 
from the interrupt load I mentioned earlier?


--
Mikael Abrahamssonemail: [EMAIL PROTECTED]


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread David W. Hankins
Hopefully I'll be forgiven for geeking out over DHCP on nanog-l twice
in the same week.

On Thu, Apr 12, 2007 at 11:20:18AM +0200, Iljitsch van Beijnum wrote:
 1. It's no longer necessary to limit the subnet MTU to that of the  
 least capable system

I dunno for that.

 2. It's no longer necessary to manage 1500 byte+ MTUs manually

But for this, there has been (for a long time now) a DHCPv4 option
to give a client its MTU for the interface being configured (#26,
RFC2132).

The thing is, not very many (if any) clients actually request it.
Possibly because of problem #1 (if you change your MTU, and no one
else does, you're hosed).

So, if you solve for the first problem in isolation, you can
easily just use DHCP to solve the second with virtually no work
and probably only (heh) client software updates.


I could also note that your first problem plagues DHCP software
today...it's further complicated...let's just say it sucks, and
bad.

If one were to solve that problem for DHCP speakers, you could
probably put a siphon somewhere in the process.

But it's an even harder problem to solve.

-- 
David W. HankinsIf you don't do it right the first time,
Software Engineer   you'll just have to do it again.
Internet Systems Consortium, Inc.   -- Jack T. Hankins


pgpdEnSRZ3khA.pgp
Description: PGP signature


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Joe Loiacono
I believe the formula applies when the TCP window size is held constant 
(and maybe as large as is necessary for the bandwidth-delay product). 
Obviously P going to zero is a problem; there are practical limitations. 
But bit error rate is usually not zero over long distances. 

The formula is not mine, it's not new, and there is empirical evidence to 
support it. Check out the links for more (and better :-) info.

Joe

[EMAIL PROTECTED] wrote on 04/12/2007 04:48:09 PM:

 
 On Thu, 12 Apr 2007, Joe Loiacono wrote:
 
  Window size is of course critical, but it turns out that MTU also 
impacts
  rates (as much as 33%, see below):
 
 MSS  0.7
  Rate = - * ---
 RTT(P)**0.5
 
  MSS = Maximum Segment Size
  RTT = Round Trip Time
  P   = packet loss
 
 So am I to understand that with 0 packetloss I get infinite rate? And 
TCP 
 window size doesn't affect the rate?
 
 I am quite confused by this statement. Yes, under congestion larger MSS 
is 
 better, but without congestion I don't see where it would differ apart 
 from the interrupt load I mentioned earlier?
 
 -- 
 Mikael Abrahamssonemail: [EMAIL PROTECTED]


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Daniel Senie


At 05:28 PM 4/12/2007, David W. Hankins wrote:


Hopefully I'll be forgiven for geeking out over DHCP on nanog-l twice
in the same week.

On Thu, Apr 12, 2007 at 11:20:18AM +0200, Iljitsch van Beijnum wrote:
 1. It's no longer necessary to limit the subnet MTU to that of the
 least capable system

I dunno for that.


Indeed. I do hope the vocal advocates for general use of larger MTU 
sizes on Ethernet have had in their careers the opportunity to enjoy 
the fun that ensues with LAN technologies were multiple MTUs are 
supported, namely token ring and FDDI. Debugging networks where MTU 
and MRU mismatches occur can be interesting, to say the least.


It's not just a matter of receiving stations noticing there's packets 
coming in that are too big. Depending on the design of the interface 
chips, the packet may not be received at all, and no indication sent 
to the driver. The result can be endless re-sending of information, 
doomed to failure.


OSPF has a way to negotiate MTU over LAN segments to deal with 
exactly this situation. I uncovered the problem debugging a largish 
OSPF network that would run for weeks or months, then fail to 
converge. Multi-access media benefits from predictable MTU/MRU sizes. 
Ethernet was well served by the fixed size.


I have no issue with allowing for a larger MTU size, but disagree 
with attempts to reduce everyone on the link to the lowest common 
denominator UNLESS that negotiation is repeated periodically (with 
MTU sizes able to both increase and decrease). If systemns negotiate 
a particular size among all players on a LAN, and a new station is 
introduced, the decision process for what to do must be understood.


An alternative is to limit everyone to 1500 byte MTUs unless or until 
adjacent stations negotiate a larger window size. At the LAN level, 
this could be handled in ARP or similar, but the real desire would be 
to find a way to negotiate endpoint-to-endpoint at the IP layer. 
Don't even get into IP multicast...




 2. It's no longer necessary to manage 1500 byte+ MTUs manually

But for this, there has been (for a long time now) a DHCPv4 option
to give a client its MTU for the interface being configured (#26,
RFC2132).

The thing is, not very many (if any) clients actually request it.
Possibly because of problem #1 (if you change your MTU, and no one
else does, you're hosed).


Trying to do this via DHCP is, IMO, doomed to failure. The systems 
most likely to be in need of larger MTUs are likely servers, and 
probably not on DHCP-assigned addresses.




So, if you solve for the first problem in isolation, you can
easily just use DHCP to solve the second with virtually no work
and probably only (heh) client software updates.


I could also note that your first problem plagues DHCP software
today...it's further complicated...let's just say it sucks, and
bad.

If one were to solve that problem for DHCP speakers, you could
probably put a siphon somewhere in the process.

But it's an even harder problem to solve.


DHCP has enough issues and problems today, I think we're in agreement 
that heaping more on it might not be prudent.


Dan



Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread David W. Hankins
On Thu, Apr 12, 2007 at 05:58:07PM -0400, Daniel Senie wrote:
  2. It's no longer necessary to manage 1500 byte+ MTUs manually
 
 But for this, there has been (for a long time now) a DHCPv4 option
 to give a client its MTU for the interface being configured (#26,
 RFC2132).
 
 Trying to do this via DHCP is, IMO, doomed to failure. The systems 
 most likely to be in need of larger MTUs are likely servers, and 
 probably not on DHCP-assigned addresses.

If you're bothering to statically configure a system with a fixed
address (such as with a server), why can you not also statically
configure it with an MTU?

-- 
David W. HankinsIf you don't do it right the first time,
Software Engineer   you'll just have to do it again.
Internet Systems Consortium, Inc.   -- Jack T. Hankins


pgpBuU12jypmU.pgp
Description: PGP signature


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Daniel Senie


At 06:09 PM 4/12/2007, David W. Hankins wrote:


On Thu, Apr 12, 2007 at 05:58:07PM -0400, Daniel Senie wrote:
  2. It's no longer necessary to manage 1500 byte+ MTUs manually
 
 But for this, there has been (for a long time now) a DHCPv4 option
 to give a client its MTU for the interface being configured (#26,
 RFC2132).

 Trying to do this via DHCP is, IMO, doomed to failure. The systems
 most likely to be in need of larger MTUs are likely servers, and
 probably not on DHCP-assigned addresses.

If you're bothering to statically configure a system with a fixed
address (such as with a server), why can you not also statically
configure it with an MTU?


Neither addresses interoperability on a multi-access medium where a 
new station could be introduced, and can result in the same MTU/MRU 
mismatch problems that were seen on token ring and FDDI. The problem 
is you might open a conversation (whatever the protocol), then get 
into trouble when larger data packets follow smaller initial 
conversation opening packets.


Or you can work with the same assumptions people use today: all 
stations on a particular network segment must use the same MTU size, 
whether that's the standard Ethernet size, or a larger size, and a 
warning sign hanging from the switch, saying use MTU size of  or 
suffer the consequences. 



Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread David W. Hankins
On Thu, Apr 12, 2007 at 06:18:56PM -0400, Daniel Senie wrote:
 Neither addresses interoperability on a multi-access medium where a 
 new station could be introduced, and can result in the same MTU/MRU 
 mismatch problems that were seen on token ring and FDDI.

Solving Ilijitsch's #1 is a separate problem, and you can solve
them in isolation.  If you chose to do so, #2 is already solved
for all hosts where dynamic configuration is desirable.

-- 
David W. HankinsIf you don't do it right the first time,
Software Engineer   you'll just have to do it again.
Internet Systems Consortium, Inc.   -- Jack T. Hankins


pgpGBzCWda9Fn.pgp
Description: PGP signature


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Will Hargrave

Saku Ytti wrote:

 IXP peeps, why are you not offering high MTU VLAN option?
 From my point of view, this is biggest reason why we today
 generally don't have higher end-to-end MTU.
 I know that some IXPs do, eg. NetNOD but generally it's
 not offered even though many users would opt to use it.

At LONAP a jumbo frames peering vlan is on the 'to investigate' list. I
am not sure if there is that much interest though. Another vlan, another
SVI, another peering session...

The fabric itself is enabled to 9216 bytes; we have several members
exchanging L2TP DSL traffic at higher MTUs but this is currently done
over private (i.e. member addressed) vlans.

There are some other possible IX applications... MPLS springs to mind as
another network technology which requires at least baby giants; what
would providers use this for? Handoff of multiprovider l2/l3 VPNs?

The other technology which sees people deploying jumbos out there is
storage. Selling storage as well as transit over the IX? It could happen :-)

-- 
Will Hargrave [EMAIL PROTECTED]
Technical Director
LONAP Ltd







Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Perry Lorier


Iljitsch van Beijnum wrote:


Dear NANOGers,

It irks me that today, the effective MTU of the internet is 1500 bytes, 
while more and more equipment can handle bigger packets.


What do you guys think about a mechanism that allows hosts and routers 
on a subnet to automatically discover the MTU they can use towards other 
systems on the same subnet, so that:


1. It's no longer necessary to limit the subnet MTU to that of the least 
capable system


2. It's no longer necessary to manage 1500 byte+ MTUs manually

Any additional issues that such a mechanism would have to address?


I have a half completed, prototype mtud that runs under Linux.  It 
sets the interface to 9k, but sets the route for the subnet down to 
1500.  It then watches the arp table for new arp entries.  As a new MAC 
is added, it sends a 9k UDP datagram to that host and listens for an 
ICMP port unreachable reply (like traceroute does).  If the error 
arrives, it assumes that host can receive packets that large, and adds a 
host route with the larger MTU to that host.  It steps up the mtu's from 
1500 to 16k trying to rapidly increase the MTU without having to wait 
for annoying timeouts.  If anything goes wrong somewhere along the way, 
(a host is firewalled or whatever) then it won't receive the ICMP reply, 
and won't raise the MTU.


The idea is that you can run this on routers/servers on a network that 
has 9k mtu's but not all the hosts are assured to be 9k capable, and it 
will increase correctly detect the available MTU between servers, or 
routers, but still be able to correctly talk to machines that are still 
stuck with 1500 byte mtu's etc.


In other interesting data points in this field, for some reason a while 
ago we had reason to do some throughput tests under Linux with varying 
the MTU using e1000's and ended up with this pretty graph:


http://wand.net.nz/~perry/mtu.png

we never had the time to investigate exactly what was going on, but 
interestingly at 8k MTU's (which is presumably what NFS would use), 
performance is exceptionally poor compared to 9k and 1500 byte MTU's. 
Our (untested) hypothesis is that the Linux kernel driver isn't smart 
about how it allocates it's buffers.





Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Stephen Satchell


Steven M. Bellovin wrote:

On Thu, 12 Apr 2007 11:20:18 +0200
Iljitsch van Beijnum [EMAIL PROTECTED] wrote:


Dear NANOGers,

It irks me that today, the effective MTU of the internet is 1500
bytes, while more and more equipment can handle bigger packets.

What do you guys think about a mechanism that allows hosts and
routers on a subnet to automatically discover the MTU they can use
towards other systems on the same subnet, so that:

1. It's no longer necessary to limit the subnet MTU to that of the
least capable system

2. It's no longer necessary to manage 1500 byte+ MTUs manually

Any additional issues that such a mechanism would have to address?



Last I heard, the IEEE won't go along, and they're the ones who
standardize 802.3.

A few years ago, the IETF was considering various jumbogram options.
As best I recall, that was the official response from the relevant
IEEE folks: no. They're concerned with backward compatibility.  


Perhaps that has changed (and I certainly) don't remember who sent that
note.  


No, I doubt it will change.  The CRC algorithm used in Ethernet is 
already strained by the 1500-byte-plus payload size.  802.3 won't extend 
 to any larger size without running a significant risk of the CRC 
algorithm failing.


From a practical side, the cost of developing, qualifying, and selling 
new chipsets to handle jumbo packets would jack up the cost of inside 
equipment.  What is the payback?  How much money do you save going to 
jumbo packets?


Show me the numbers.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Saku Ytti

On (2007-04-13 00:17 +0100), Will Hargrave wrote:
 
 At LONAP a jumbo frames peering vlan is on the 'to investigate' list. I
 am not sure if there is that much interest though. Another vlan, another
 SVI, another peering session...

Why another? For neighbours that are willing to peer over eg. VLAN MTU 9k,
peer with them only over that VLAN. I don't see much point peering
over both VLANs.
What I remember discussing with unnamed european IXP staff was that
they were worried about loosing 'frame too big' counters. Since of
course then the switch environment would accept bigger frames even
on the 1500 MTU VLAN. And if member misconfigures the small MTU VLAN,
and calls to IXP complaining how IXP is dropping their frames (due 
to sending over 1500bytes) IXP staff can't quickly diagnose the
problem from interface counters. I argued that it's mostly irrelevant,
since IXP staff can ping from IXP 'small mtu VLAN' the customer
they're suspecting sending too large frames, and confirm this
if router replies to a ping over 1500 bytes. But then again, I have
0 operational experience running IXP and it's easy for me to
oversimplify the issue.

 The fabric itself is enabled to 9216 bytes; we have several members
 exchanging L2TP DSL traffic at higher MTUs but this is currently done
 over private (i.e. member addressed) vlans.

This I believe to be biggest gain, tunneling, eg. ability to run IPSec
site-to-site while providing full 1500bytes to LAN.

 There are some other possible IX applications... MPLS springs to mind as
 another network technology which requires at least baby giants; what
 would providers use this for? Handoff of multiprovider l2/l3 VPNs?
 
 The other technology which sees people deploying jumbos out there is
 storage. Selling storage as well as transit over the IX? It could happen :-)
 
 -- 
 Will Hargrave [EMAIL PROTECTED]
 Technical Director
 LONAP Ltd
 
 
 
 
 

-- 
  ++ytti


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Saku Ytti

On (2007-04-12 20:00 -0700), Stephen Satchell wrote:
 
 From a practical side, the cost of developing, qualifying, and selling 
 new chipsets to handle jumbo packets would jack up the cost of inside 
 equipment.  What is the payback?  How much money do you save going to 
 jumbo packets?

It's rather hard to find ethernet gear operators could imagine using in
peering or core that do not support +9k MTU's.

-- 
  ++ytti