[Masq] Some good news on the Linux MTU issue..

David A. Ranch Sun, 28 Mar 1999 02:02:04 -0500

This comes from the IPSEC (SWAN) list but it sounds like
they are hitting our MTU bug too.  Since the SWAN group is
VERY technically, maybe this will be finally fixed!

As it stands, please see the bottom email here and PLEASE
try this patch.  If you would, please let me know if it
DOES or DOESN'T work for you!  Also.. read the email after
that.

Thanks!

--David




Our initial operational experience with IPSEC is that inserting it
into transmission paths provokes operational Path MTU Discovery
problems that were not previously apparent.  I'm forwarding a pretty
succinct note from John Denker of AT&T that describes the problems.

Besides circumventing this problem in Linux IPSEC, it should be
brought up in a few more general fora.  I'm amazed that the original
Path MTU Discovery RFC (1191) never considered the failure mode that
happens if ICMP messages don't get back to the sender.  (The fix, to
terminate MTU discovery after a few unsuccessful retransmissions,
would have been simple had it been thought of.)

I'm surprised that the Linux kernel (2.0) is not sending ICMP
"fragmentation required and DF set" responses.  I hope this is fixed
in 2.1; RFC 1191 requires it.  I've cc'd Alan Cox (Linux networking
maintainer) and Keith Owens (who has posted several clear notes to
linux-kernel about some aspects of the issue).

I would have been shocked, shocked! had there not been an RFC or
Internet-Draft about Path MTU Discovery failures.  But indeed,
the IETF "TCP Implementation" working group is working on it:

    http://www.ietf.org/internet-drafts/draft-ietf-tcpimpl-pmtud-00.txt

I've cc'd the author of this draft, Kevin Lahey, on this message.  The
draft should definitely add mentions of the interaction of Path MTU
problems, IP tunnelling, and IPSEC, including problems getting ICMP
messages out of tunnels, and MTU's that are reduced by the size of
tunnel and IPSEC headers.  It should also cross-reference the Path MTU
and Tunnel MTU discussions in RFC 2003 (IP-in-IP).  It sounds like we
need some cross-fertilization between the TCPIMPL and IPSEC working
groups, and with other groups using IP-in-IP encapsulation (RFC's 2003
and 1853) such as MOBILEIP.

        John

Date: Wed, 24 Mar 1999 12:10:19 -0500
To: [EMAIL PROTECTED]
From: John Denker <[EMAIL PROTECTED]>
Subject: linux-ipsec: cornered: MTU and fragmentation bugs

Hi --

At the risk of being forever banished from the hacker community, and having
my wizardly pointy hat confiscated, let me say this:  The MTU/fragmentation
bug is *not* microsoft's fault!  Eeeck!  

Here's the deal:

0) Path-MTU discovery is a good thing.  Typically this is done by initially
sending large packets with the DF bit set, and seeing if they get through. 

1) The microsoft TCP clients negotiate for a large initial MSS.  This is
perfectly legal, and should result in efficiency if other players do their
part.  This is necessarily done with no knowledge of the actual path-MTU.

2) This makes it likely that packets will be sent that exceed the MTU of
some router along the path -- especially when there is encapsulation going
on at some point, such as the ipsec tunnel.

3) The RFCs say that when an oversized packet (with the DF bit set)
arrives, a router MAY return an ICMP message of type host-unreachable
explaining that fragmentation is needed and suggesting a new packet size.
In practice, path-MTU discovery without these frag-needed messages is
somewhat inefficient.

4) Heretofore linux has not generated these frag-needed messages.  I
consider this a weakness in linux.  I have a patch for this, as mentioned
in previous notes.

5) What's worse, there are some firewalls (the Firewall-One brand in
particular, and quite likely others) that in their usual configuration do
not pass these ICMP frag-needed datagrams.  I consider this a weakness in
the firewalls.  This is a pain in the neck to fix.

6) What's *much* worse is that practically all the web servers in the world
improperly assume that the routers MUST return a frag-needed message.  As
much as you might enjoy bashing microsoft, their web site is the only one
I've been able to discover that is both efficient and robust... efficient
in that it starts out by sending large packets, and robust that it will
(even in the absence of frag-needed messages) back off if they don't get
through.

Here is a partial list of servers I've checked:
www.ibm.com                     inefficient: always requests a small MSS
www.snap.com                    inefficient: always requests a small MSS
www.toad.com                    inefficient: never sets DF, always sends small packets
www.sandelman.ottawa.on.ca      inefficient: never sets DF, always sends small
packets
www.hotbot.com          chokes
www.aol.com                     chokes
www.netscape.com                chokes
www.altavista.com               chokes
www.yahoo.com                   chokes
www.clinet.fi                   chokes
www.sgi.com                     chokes
www.intel.com                   chokes
www.compaq.com          chokes
www.psi.net                     chokes
www.cygnus.com          chokes
www.quintillion.com             chokes
www.research.att.com            chokes

Except for the first four, these servers are grossly noncompliant with the
RFCs.

========

Solution #1 (ideal):  Fix linux and fix the firewalls so that ICMP
frag-needed messages are returned to servers who depend on them.  This
results in maximum efficiency.

Solution #2 (for users who can't easily fix their firewalls): ipsec must
(at least optionally) support a "virtually-enormous tunnel" mode.  In that
mode, as I have previously discussed, when a packet arrives that is too big
to be transported in a single envelope, it should be fragmented
(*regardless* of whether the DF bit was set) and transported in multiple
envelopes.  If the DF bit was set on the raw packet (and perhaps not
otherwise) the packet should be reassembled by the other security gateway
before being sent on its way.  This behavior, while perhaps very slightly
inefficient, is much more robust in the face of all those real-world
ill-behaved web servers.

A tunnel with virtually-infinite MTU doesn't offend me in the least.  I
consider it consistent with the fact that the tunnel shows up as virtually
a single hop, no matter how many real-ethernet hops are used to transport
the envelopes.

IMHO this solution #2 is a required feature, necessary for version 1.00.
It should be at least a compile-time option.  Making it a run-time option
would be even nicer, with (I would think) hardly any extra work.

Cheers --- jsd

[John Gilmore here again:]

John Denker, when you say a Web server "chokes", you appear to mean that:

        *  It sends big packets with "DF".
        *  It doesn't recover if it never sees ICMP frag-neededs.

This is, in fact, compatible with the current RFC's.  The problem is
a protocol design bug, not an implementation bug.  See 
draft-ietf-tcpimpl-pmtud-00.txt for more details.

RFC 1191 does require routers to return ICMP messages when they can't
fragment a datagram, though it doesn't use those all-important capital
letters in exactly the right place (it says "is required to" rather
than "MUST" in section 4), and I haven't found an RFC that
specifically says MUST about this.  The RFC 1191 requirement was added
shortly after the "Gateway Requirements" RFC 1009 collected all the
little requirements into one place.  I don't think a newer collection
of router requirements has ever been issued, so it's easy for router
mfrs to miss this one.

Here are a few explanations of the general Path MTU problem.  I've
cc'd these folks too, so they can add this info to their explanations.

    http://www.indy.net/~gswallow/rtfm/mtu/index.html    
    http://www.euronet.nl/~gco_fvee/win95netbugs/faq-c.html#c10
    http://www.uwsg.indiana.edu/hypermail/linux/net/9701.1/0097.html        
    http://ftp.std.com/obi/Networking/rfc/rfc1435.txt


--

X-Authentication-Warning: lohi.clinet.fi: majordom set sender to
[EMAIL PROTECTED] using -f
Date: Thu, 25 Mar 1999 12:49:05 -0500 (EST)
From: Henry Spencer <[EMAIL PROTECTED]>
To: Alan Cox <[EMAIL PROTECTED]>
cc: Linux IPsec <[EMAIL PROTECTED]>
Subject: Re: linux-ipsec: cornered: MTU and fragmentation bugs 
Sender: [EMAIL PROTECTED]

Enclosed is the patch John Denker suggested for fixing the apparent
problem with Linux not sending frag-needed ICMPs.  Your comments would
be welcome.

                                                          Henry Spencer
                                                       [EMAIL PROTECTED]
                                                     ([EMAIL PROTECTED])

*** ip_fragment.c.old   Tue Mar 23 17:22:41 1999
--- ip_fragment.c       Tue Mar 23 17:24:34 1999
***************
*** 685,692 ****
--- 685,693 ----
        if (iph->frag_off & htons(IP_DF))
        {
                ip_statistics.IpFragFails++;
                NETDEBUG(printk("ip_queue_xmit: frag needed\n"));
+
icmp_send(skb,ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED,htons(dev->mtu), dev); /* jsd
*/
                return;
        }
  
        /*


--

Finally, here is a response from Alan Cox.  Though Alan is a kernel
god, he seems to think that there isn't a MTU problem.

X-Authentication-Warning: lohi.clinet.fi: majordom set sender to
[EMAIL PROTECTED] using -f
From: [EMAIL PROTECTED] (Alan Cox)
Subject: Re: linux-ipsec: cornered: MTU and fragmentation bugs
To: [EMAIL PROTECTED] (Henry Spencer)
Date: Thu, 25 Mar 1999 23:01:12 +0000 (GMT)
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Sender: [EMAIL PROTECTED]

> Enclosed is the patch John Denker suggested for fixing the apparent
> problem with Linux not sending frag-needed ICMPs.  Your comments would
> be welcome.

It causes two to be sent  not one and I can't see any cases in the standard
kernel where this path can be taken - its a bug if this path is taken.

Alan


.----------------------------------------------------------------------------.
|  David A. Ranch - Linux/Networking/PC hardware         [EMAIL PROTECTED]  |
!----                                                                    ----!
`----- For more detailed info, see http://www.ecst.csuchico.edu/~dranch -----'


_______________________________________________
Masq maillist  -  [EMAIL PROTECTED]
http://tiffany.indyramp.com/mailman/listinfo/masq
Admin requests can be handled by web (above) or [EMAIL PROTECTED]
[Masq] Some good news on the Linux MTU issue..

Reply via email to