Re: [Int-area] Please review: draft-savola-mtufrag-network-tunneling-04.txt

Pekka Savola Mon, 29 Aug 2005 03:02:14 -0700

On Wed, 24 Aug 2005, Joe Touch wrote:

From Pekka Savola <[EMAIL PROTECTED]>:

...

I've had some clarifying discussion with Joe off-list, but now
getting back on so if others have opinions, those can be heard.


(this is where I thought we were waiting)... - was there other input?

No -- folks, please read and comment so this wouldn't be justdialogue between the two of us.

As for the earlier note on the list, IMHO RFC 2003 section 5.1
touches only very cursorily from on the subject (some of the
respective text is in my draft in sections 3.1 and 3.2); this
offers much more extensive discussion.


I disagree; actually, I believe that 2003 is both complete and
sufficient on the subject, so I'm not sure what the purpose of an
additional draft discussing the same issue in other contexts is.

I think we just have to agree to disagree. Over half a dozen peoplehave stated that they find the draft very useful, and at least two ADshave been willing to sponsor it. There have also been discussion onvarious lists, particularly form the ops perspective (e.g.,http://www.merit.edu/mail.archives/nanog/2004-10/threads.html#00125).

I don't think "is required" is correct or I don't understand what you
refer to with "required".  RFC 2003 says,

   Thus, the encapsulator SHOULD normally do Path MTU Discovery,
   requiring it to send all datagrams into the tunnel with the "Don't
   Fragment" bit set in the outer IP header.


It says:
     Identification, Flags, Fragment Offset

        These three fields are set as specified in [10].  However, if
        the "Don't Fragment" bit is set in the inner IP header, it MUST
        be set in the outer IP header; if the "Don't Fragment" bit is
        not set in the inner IP header, it MAY be set in the outer IP
        header, as described in Section 5.1.

There's a MUST there  ;-)

OK, I didn't notice it because I didn't see its relevance here; morebelow.

SHOULD is not a MUST (earlier, the RFC does mandate implementation
of PMTUD, but as said, doesn't mandate its usage). Similarly,
draft-ietf-mech-v2 and its predecessors allow the implementors
and/or users to choose non-DF approach as well, if they judge it
appropriate.


2003 would override draft-ietf-mech-v2 esp. regarding v4 in v4 tunnels;
I know of no effort to override 2003 yet.


The text in the draft says,

   When desiring to avoid fragmentation, IPv4 allows two options: copy
   the DF bit from the inner packets to the encapsulating header, or
   always set the DF bit.  The latter is better especially in controlled
   environments, because it forces PMTUD to converge immediately.

Both of these fulfill the MUST you quote: if it's copied (when DF isset), it's also set for the outer header; when it's forced, it'salways set.

So, I don't see a problem here. Could you you be more specific whatyou would like to see changed?


Would the following be better?

   When desiring to avoid fragmentation, there are two options with IPv4: copy
   the DF bit from the inner packets to the encapsulating header, or
   always set the DF bit.  The latter is better especially in controlled
   environments, because it forces PMTUD to converge immediately.

(just rewording out the "allows")

Later sec 3.4 says "but there are certainly uses for it". IMO,
that's endorsement, and I disagree. It doesn't "work around"
PMTUD issues, but rather defeats PMUTD. If there is indeed a
justifiable reason to consider this option, it should be
explained in this document, as it is key to whether it should
be considered anything other than broken (or are you also
allowing stacks that miscalcuate the IP checksum, just because
some do? :-)


I think the most compelling case has been described in section 3.1
-- when having tunnels (especially IPsec tunnels, but any tunnel is
the same) over which a large amount of traffic is being
transported, where PMTUD is too unrealiable or unfeasible as noted
in section 3.2.


This is just making a decision at the tunnel that you prefer efficiency
to PMTUD support. 2003 doesn't say it's OK to make that decision, esp.
since it's not detectable from the endpoints. I don't think this is a
compelling reason, as a result.

That may or may not be a compelling reason. Many operators are doingso regardless, and vendors are supporting it. I don't think there isanything to be gained (but rather lost) by having the IETF be indenial about this. This document is precisely not a BCP because Iwant to document the *operational* issues (and solutions, even if notoptimal ones).

However, having disclaimers e.g., noting clearly that the behaviour isincompliant and may break things is fine with me, and I certainly wantto make such a point myself.

So, could you check if there's text which should be added or rewordedto make this clearer? Suggestions would be welcome.

In this case, reassembly becomes a problem (requires basically
infinite buffers, problems when a fragment goes missing, v4 ID
space is insufficient so it wraps over and causes data
corruption/misassociation, etc. -- see section 3.1).


Reassembly doesn't require infinite buffers; the fragments need be held
only for a network MSL, and the IP ID numbers are required not to wrap
during that period, so there's no wrap problem if everybody stays to spec.

That's a lot of assumptions. 60 seconds (mentioned in RFC1122, sect3.2.2) on 10 Gbit/s interface is basically infinite buffers. Even ifthe time was 1 second, it would basically mean infinite buffers.

Further, if you look athttp://www.watersprings.org/pub/id/draft-mathis-frag-harmful-00.txt,the IP ID numbers wrap many times during that period.

It's equally possible to just limit reassembly buffers and toss old
fragments earlier than a network MSL; it just increases the error rate
of the 'link' provided by the tunnel.

I have doubts whether this would work or not but that's probably notrelevant to this draft in any case. I'm not aware of hardwareimplementations which perform this, though it would be interesting tohear if anyone knows any.

So, it just isn't possible to do a [v4] high-bandwidth tunnel with
fragmentation/reassembly; I guess the point is whether the reasons
why you can't use PMTUD are convincing enough (e.g., would have to
be done to millions of sources, passive monitoring, etc.)


Yes, it is not possible to do high bandwidth frag/reassembly, period,
regardless of whether it's a tunnel or not. We may need to address this,
or not - but breaking PMTUD is a bad way to do so, since it would
increase fragmentation. The whole point of PMTUD is to discover the MTU
size to avoid inefficient (esp. for high bandwidth paths) sizes;
deciding to violate 2003's requirements for DF copying just to support
high bandwidth frag/reassy (which won't work anyway) is very illogical.

If you'll look at the details in the draft, the frag/reasm does work-- after the PMTUD violation -- because the *recipient* (a host!)performs reassembly and because the stream is thus small enough not tocause IP ID wrapping in that (src, dst, ID) context. Therefore thereasm is no longer too "high speed". So, I think the approach "works"to some definition of "works" -- it certainly seems to solve problemssome operators have been having.

Breaking PMTUD is of course unfortunate especially for v6, but maybemore or less a consequence of the IETF being in denial about theproblem.

In particular, clearing the DF bit may disable downstream path
discovery, BUT the discussion implies it should be done only
for already-fragmented packets. A packet which is fragmented
with the DF bit set is an error and should be dropped, since it
was a don't fragmented and has been fragmented, as noted in
RFC791 (see page 8).

I.e., this example is nonsensical; if there is a valid one, it
would be useful to present it.


No, the discussion tries to point out that if the implementation
clears the inner DF bit (and consequently causes fragmentation for
big packets at the same node), even if further downstream the
packets would get fragmented _again_, that further fragmentation
doesn't make matters any worse.


A packet with the DF bit set with MF also set is an error as per RFC791.
The discussion implies that this can exist - or are you talking about
clearing only the outer DF?

I'm not 100% certain as this was text submitted to me by an earlierreviewer. But I'm not sure if I understand -- what is the scenariowhere you would have both DF and MF? I don't see one -- if you clearthe DF bit, by definition you don't get any packets with MF from thepath before clearing the DF bit; and if you have cleared the DF bit,you may get MF bit on some packets after that point, but no longer DFbit.



--
Pekka Savola                 "You each name yourselves king, yet the
Netcore Oy                    kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings

_______________________________________________
Int-area mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/int-area

Re: [Int-area] Please review: draft-savola-mtufrag-network-tunneling-04.txt

Reply via email to