On Wed, 24 Aug 2005, Joe Touch wrote:
From Pekka Savola <[EMAIL PROTECTED]>:
...
I've had some clarifying discussion with Joe off-list, but now
getting back on so if others have opinions, those can be heard.
(this is where I thought we were waiting)... - was there other input?
No -- folks, please read and comment so this wouldn't be just
dialogue between the two of us.
As for the earlier note on the list, IMHO RFC 2003 section 5.1
touches only very cursorily from on the subject (some of the
respective text is in my draft in sections 3.1 and 3.2); this
offers much more extensive discussion.
I disagree; actually, I believe that 2003 is both complete and
sufficient on the subject, so I'm not sure what the purpose of an
additional draft discussing the same issue in other contexts is.
I think we just have to agree to disagree. Over half a dozen people
have stated that they find the draft very useful, and at least two ADs
have been willing to sponsor it. There have also been discussion on
various lists, particularly form the ops perspective (e.g.,
http://www.merit.edu/mail.archives/nanog/2004-10/threads.html#00125).
I don't think "is required" is correct or I don't understand what you
refer to with "required". RFC 2003 says,
Thus, the encapsulator SHOULD normally do Path MTU Discovery,
requiring it to send all datagrams into the tunnel with the "Don't
Fragment" bit set in the outer IP header.
It says:
Identification, Flags, Fragment Offset
These three fields are set as specified in [10]. However, if
the "Don't Fragment" bit is set in the inner IP header, it MUST
be set in the outer IP header; if the "Don't Fragment" bit is
not set in the inner IP header, it MAY be set in the outer IP
header, as described in Section 5.1.
There's a MUST there ;-)
OK, I didn't notice it because I didn't see its relevance here; more
below.
SHOULD is not a MUST (earlier, the RFC does mandate implementation
of PMTUD, but as said, doesn't mandate its usage). Similarly,
draft-ietf-mech-v2 and its predecessors allow the implementors
and/or users to choose non-DF approach as well, if they judge it
appropriate.
2003 would override draft-ietf-mech-v2 esp. regarding v4 in v4 tunnels;
I know of no effort to override 2003 yet.
The text in the draft says,
When desiring to avoid fragmentation, IPv4 allows two options: copy
the DF bit from the inner packets to the encapsulating header, or
always set the DF bit. The latter is better especially in controlled
environments, because it forces PMTUD to converge immediately.
Both of these fulfill the MUST you quote: if it's copied (when DF is
set), it's also set for the outer header; when it's forced, it's
always set.
So, I don't see a problem here. Could you you be more specific what
you would like to see changed?
Would the following be better?
When desiring to avoid fragmentation, there are two options with IPv4: copy
the DF bit from the inner packets to the encapsulating header, or
always set the DF bit. The latter is better especially in controlled
environments, because it forces PMTUD to converge immediately.
(just rewording out the "allows")
Later sec 3.4 says "but there are certainly uses for it". IMO,
that's endorsement, and I disagree. It doesn't "work around"
PMTUD issues, but rather defeats PMUTD. If there is indeed a
justifiable reason to consider this option, it should be
explained in this document, as it is key to whether it should
be considered anything other than broken (or are you also
allowing stacks that miscalcuate the IP checksum, just because
some do? :-)
I think the most compelling case has been described in section 3.1
-- when having tunnels (especially IPsec tunnels, but any tunnel is
the same) over which a large amount of traffic is being
transported, where PMTUD is too unrealiable or unfeasible as noted
in section 3.2.
This is just making a decision at the tunnel that you prefer efficiency
to PMTUD support. 2003 doesn't say it's OK to make that decision, esp.
since it's not detectable from the endpoints. I don't think this is a
compelling reason, as a result.
That may or may not be a compelling reason. Many operators are doing
so regardless, and vendors are supporting it. I don't think there is
anything to be gained (but rather lost) by having the IETF be in
denial about this. This document is precisely not a BCP because I
want to document the *operational* issues (and solutions, even if not
optimal ones).
However, having disclaimers e.g., noting clearly that the behaviour is
incompliant and may break things is fine with me, and I certainly want
to make such a point myself.
So, could you check if there's text which should be added or reworded
to make this clearer? Suggestions would be welcome.
In this case, reassembly becomes a problem (requires basically
infinite buffers, problems when a fragment goes missing, v4 ID
space is insufficient so it wraps over and causes data
corruption/misassociation, etc. -- see section 3.1).
Reassembly doesn't require infinite buffers; the fragments need be held
only for a network MSL, and the IP ID numbers are required not to wrap
during that period, so there's no wrap problem if everybody stays to spec.
That's a lot of assumptions. 60 seconds (mentioned in RFC1122, sect
3.2.2) on 10 Gbit/s interface is basically infinite buffers. Even if
the time was 1 second, it would basically mean infinite buffers.
Further, if you look at
http://www.watersprings.org/pub/id/draft-mathis-frag-harmful-00.txt,
the IP ID numbers wrap many times during that period.
It's equally possible to just limit reassembly buffers and toss old
fragments earlier than a network MSL; it just increases the error rate
of the 'link' provided by the tunnel.
I have doubts whether this would work or not but that's probably not
relevant to this draft in any case. I'm not aware of hardware
implementations which perform this, though it would be interesting to
hear if anyone knows any.
So, it just isn't possible to do a [v4] high-bandwidth tunnel with
fragmentation/reassembly; I guess the point is whether the reasons
why you can't use PMTUD are convincing enough (e.g., would have to
be done to millions of sources, passive monitoring, etc.)
Yes, it is not possible to do high bandwidth frag/reassembly, period,
regardless of whether it's a tunnel or not. We may need to address this,
or not - but breaking PMTUD is a bad way to do so, since it would
increase fragmentation. The whole point of PMTUD is to discover the MTU
size to avoid inefficient (esp. for high bandwidth paths) sizes;
deciding to violate 2003's requirements for DF copying just to support
high bandwidth frag/reassy (which won't work anyway) is very illogical.
If you'll look at the details in the draft, the frag/reasm does work
-- after the PMTUD violation -- because the *recipient* (a host!)
performs reassembly and because the stream is thus small enough not to
cause IP ID wrapping in that (src, dst, ID) context. Therefore the
reasm is no longer too "high speed". So, I think the approach "works"
to some definition of "works" -- it certainly seems to solve problems
some operators have been having.
Breaking PMTUD is of course unfortunate especially for v6, but maybe
more or less a consequence of the IETF being in denial about the
problem.
In particular, clearing the DF bit may disable downstream path
discovery, BUT the discussion implies it should be done only
for already-fragmented packets. A packet which is fragmented
with the DF bit set is an error and should be dropped, since it
was a don't fragmented and has been fragmented, as noted in
RFC791 (see page 8).
I.e., this example is nonsensical; if there is a valid one, it
would be useful to present it.
No, the discussion tries to point out that if the implementation
clears the inner DF bit (and consequently causes fragmentation for
big packets at the same node), even if further downstream the
packets would get fragmented _again_, that further fragmentation
doesn't make matters any worse.
A packet with the DF bit set with MF also set is an error as per RFC791.
The discussion implies that this can exist - or are you talking about
clearing only the outer DF?
I'm not 100% certain as this was text submitted to me by an earlier
reviewer. But I'm not sure if I understand -- what is the scenario
where you would have both DF and MF? I don't see one -- if you clear
the DF bit, by definition you don't get any packets with MF from the
path before clearing the DF bit; and if you have cleared the DF bit,
you may get MF bit on some packets after that point, but no longer DF
bit.
--
Pekka Savola "You each name yourselves king, yet the
Netcore Oy kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
_______________________________________________
Int-area mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/int-area