Re: [Int-area] Please review: draft-savola-mtufrag-network-tunneling-04.txt

Pekka Savola Thu, 08 Sep 2005 23:34:09 -0700

On Tue, 30 Aug 2005, Joe Touch wrote:

I think we just have to agree to disagree.  Over half a dozen people
have stated that they find the draft very useful, and at least two ADs
have been willing to sponsor it.  There have also been discussion on
various lists, particularly form the ops perspective (e.g.,
http://www.merit.edu/mail.archives/nanog/2004-10/threads.html#00125).


A quick scan of that list didn't note anyone who saw the relationship to
RFC2003. Since the I-D didn't cite that, the (IMO) definitive current
work on the topic, it's not clear what the positive input on this draft
really means (maybe that they should read RFC2003, i.e.).

If it makes you happier, I could try querying the folks on whetherthey think the draft is useful after taking a look at RFC 2003 -- ifyou provide the list of section numbers of RFC 2003 I should referthem to read in particular.

I'm pretty sure I already know the answer, but if that helps you dropthe argument, I can try it..

The text in the draft says,

   When desiring to avoid fragmentation, IPv4 allows two options: copy
   the DF bit from the inner packets to the encapsulating header, or
   always set the DF bit.  The latter is better especially in controlled
   environments, because it forces PMTUD to converge immediately.

Both of these fulfill the MUST you quote: if it's copied (when DF is
set), it's also set for the outer header; when it's forced, it's always
set.

So, I don't see a problem here.  Could you you be more specific what you
would like to see changed?


s/allows two options/requires one of two alternatives [RFC2003]/


Ok, changed.

This is just making a decision at the tunnel that you prefer efficiency
to PMTUD support. 2003 doesn't say it's OK to make that decision, esp.
since it's not detectable from the endpoints. I don't think this is a
compelling reason, as a result.


That may or may not be a compelling reason.  Many operators are doing so
regardless, and vendors are supporting it.  I don't think there is
anything to be gained (but rather lost) by having the IETF be in denial
about this.  This document is precisely not a BCP because I want to
document the *operational* issues (and solutions, even if not optimal
ones).

However, having disclaimers e.g., noting clearly that the behaviour is
incompliant and may break things is fine with me, and I certainly want
to make such a point myself.

So, could you check if there's text which should be added or reworded to
make this clearer?  Suggestions would be welcome.


Text that describes the things it WILL break. Why are vendors supporting
it? Is it just further denial by others that RFC2003 exists or has
precedence here? Or is there real utility?

I already described one example of a utility w/ avoiding high-speedreassembly at a router, which just simply does not work. I couldquery for others from vendors, but I don't think those use casesbelong in the draft at least in a verbose manner. Including the usecases would seem to make even more compelling case for causing moreviolations in those cases than just being silent. Further, it wouldcause ratholing discussions on whether folks see those use cases beingreally required or not, even though the specifics (and discussionabout them) is not the point, just that the operators ARE using themand probably for good reasons.

We already know - as described in the new PMTUD doc, as well as in other
places - that this practice has issues. Why do we need another document
that says so? Or if we do, why should it be an RFC (if the purpose is to
educate or discuss, a tech report or magazine paper would suffice).

PMTUD docs are describing new PMTUD mechanisms, not focusing on thenarrow problem and solutions which are used *today*. This is muchmore focused and shorter, hence more likely to be read.

I think the RFC series is most appropriate here because it causes mostdissemination (and maybe follow-up activity) among the IETF andoperators from my perspective. But that decision is irrelevant tothis discussion.

Reassembly doesn't require infinite buffers; the fragments need be held
only for a network MSL, and the IP ID numbers are required not to wrap
during that period, so there's no wrap problem if everybody stays to
spec.


That's a lot of assumptions.  60 seconds (mentioned in RFC1122, sect
3.2.2) on 10 Gbit/s interface is basically infinite buffers.  Even if
the time was 1 second, it would basically mean infinite buffers.


That presumes that the entire bandwidth is used between two hosts, and
it does mean that fragmentation should be avoided in those cases.

Sure. In this case, we're talking about high-speed encap/decap (onhardware) between routers. While the whole bandwidth is probably notused up there, it could be -- and the problem is sufficiently seriouswith much lower rates as well.

It's
even OK to drop packets that are fragments altogether if they hit a
tunnel that won't carry them.

If we assume MTU=1500, would it be OK to drop all the packets withsize > 1472 (or something thereabouts) which have DF bit set? That'swhat we're talking about here (and the unfeasibility of signallingback "Packet too Big" to all those sources sending >1472 (or whatever)packets). While RFC791 says it's OK, it certainly isn't in practise.

It's still NOT OK to clear the DF bit in the outer header when it's set
in the inner, though.

Sure, sure -- but that's still being done widely out there. Whatwould you prefer? Be hush-hush about it? I want to bring the problemsout in the open, with appropriate disclaimers of course.

Further, if you look at
http://www.watersprings.org/pub/id/draft-mathis-frag-harmful-00.txt, the
IP ID numbers wrap many times during that period.


Since that doc is an expired ID, I presume you know why I won't address
its contents. When IP ID numbers wrap, that is a violation of RFC791 and
has known problems. There are ways to overcome it, esp. using aliases of
multiple IP addresses per physical interface, to avoid such wrap. But
ignoring the problem or accepting violating implementations is not the
appropriate solution.

I have no control on why the authors didn't continue the work; Icertainly provided some suggestions for enhancements myself (note: ifthe authors are listening, I could consider picking up the draft ifyou'd like). But that aside..

Requiring hundreds or thousands of IP aliases to overcome IP IDwrapping is not a [real] solution. Could you please state a betterone if one exists?


Wrt. the incompliancy, are you referring to this:

    The choice of the Identifier for a datagram is based on the need to
    provide a way to uniquely identify the fragments of a particular
    datagram.  The protocol module assembling fragments judges fragments
    to belong to the same datagram if they have the same source,
    destination, protocol, and Identifier.  **Thus, the sender must choose
    the Identifier to be unique for this source, destination pair and
    protocol for the time the datagram (or any fragment of it) could be
    alive in the internet.**

    It seems then that a sending protocol module needs to keep a table
    of Identifiers, one entry for each destination it has communicated
    with in the last maximum packet lifetime for the internet.

(esp last sentence of 1st paragraph and 2nd para.)

I don't see any real solutions here. I don't think it's acceptable torefuse to send any more packets to destination X until an ID slot isfreed (a buffering problem, a denial of service issue), as it is notacceptable to switch source IP addresses because the peer would likelyno longer recognize the session because the IPs changed.

But no matter what, that is not really relevant for this draft. Thepractice exists for (probably) valid operational reasons. We can'tdeny that.

Again my question would be, what changes would you like to see in thedraft to make this clearer?

If you'll look at the details in the draft, the frag/reasm does work --
after the PMTUD violation -- because the *recipient* (a host!) performs
reassembly and because the stream is thus small enough not to cause IP
ID wrapping in that (src, dst, ID) context.  Therefore the reasm is no
longer too "high speed". So, I think the approach "works" to some
definition of "works" -- it certainly seems to solve problems some
operators have been having.


"For some definition of works" isn't what the IETF is about, IMO.

I think that's a direct consequence of the IETF not providing a bettersolution, so the operators and vendors have to do with what's outthere -- but that's not the point here.

Breaking PMTUD is of course unfortunate especially for v6, but maybe
more or less a consequence of the IETF being in denial about the problem.


The IETF may be in denial about the problem, but addressing violations
of spec as anything other than such is worse - it doesn't move us
forward to change the spec in useful ways (increasing the IP ID space,
dealing with fragmentation more usefully, etc.).

I'm open to text suggestions on how to make the spec violations (orwhatever) clearer, and possibly add a "call for action" statements ifyou'd like. Any other text suggestions as long as it includes adescription of this operationally-used technique would probably befine as well.

A packet with the DF bit set with MF also set is an error as per RFC791.
The discussion implies that this can exist - or are you talking about
clearing only the outer DF?


I'm not 100% certain as this was text submitted to me by an earlier
reviewer.  But I'm not sure if I understand -- what is the scenario
where you would have both DF and MF?  I don't see one -- if you clear
the DF bit, by definition you don't get any packets with MF from the
path before clearing the DF bit; and if you have cleared the DF bit, you
may get MF bit on some packets after that point, but no longer DF bit.


The text isn't clear on that. The example you've given would be useful
to add.

OK. I tried to see how to add that in the text, but couldn't figure away how to incorporate so it would fit in nicely. Could you proposewhich exact clarification wording (and where) would be helpful?


--
Pekka Savola                 "You each name yourselves king, yet the
Netcore Oy                    kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings

_______________________________________________
Int-area mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/int-area

Re: [Int-area] Please review: draft-savola-mtufrag-network-tunneling-04.txt

Reply via email to