Re: [Int-area] Please review: draft-savola-mtufrag-network-tunneling-04.txt

Joe Touch Tue, 30 Aug 2005 15:20:02 -0700


Pekka Savola wrote:
...
>>>> As for the earlier note on the list, IMHO RFC 2003 section 5.1
>>>> touches only very cursorily from on the subject (some of the
>>>> respective text is in my draft in sections 3.1 and 3.2); this
>>>> offers much more extensive discussion.
>>
>> I disagree; actually, I believe that 2003 is both complete and
>> sufficient on the subject, so I'm not sure what the purpose of an
>> additional draft discussing the same issue in other contexts is.
> 
> I think we just have to agree to disagree.  Over half a dozen people
> have stated that they find the draft very useful, and at least two ADs
> have been willing to sponsor it.  There have also been discussion on
> various lists, particularly form the ops perspective (e.g.,
> http://www.merit.edu/mail.archives/nanog/2004-10/threads.html#00125).


A quick scan of that list didn't note anyone who saw the relationship to
RFC2003. Since the I-D didn't cite that, the (IMO) definitive current
work on the topic, it's not clear what the positive input on this draft
really means (maybe that they should read RFC2003, i.e.).

>>>> I don't think "is required" is correct or I don't understand what you
>>>> refer to with "required".  RFC 2003 says,
>>>>
>>>>    Thus, the encapsulator SHOULD normally do Path MTU Discovery,
>>>>    requiring it to send all datagrams into the tunnel with the "Don't
>>>>    Fragment" bit set in the outer IP header.
>>
>> It says:
>>      Identification, Flags, Fragment Offset
>>
>>         These three fields are set as specified in [10].  However, if
>>         the "Don't Fragment" bit is set in the inner IP header, it MUST
>>         be set in the outer IP header; if the "Don't Fragment" bit is
>>         not set in the inner IP header, it MAY be set in the outer IP
>>         header, as described in Section 5.1.
>>
>> There's a MUST there  ;-)
> 
> OK, I didn't notice it because I didn't see its relevance here; more below.
> 
>>>> SHOULD is not a MUST (earlier, the RFC does mandate implementation
>>>> of PMTUD, but as said, doesn't mandate its usage). Similarly,
>>>> draft-ietf-mech-v2 and its predecessors allow the implementors
>>>> and/or users to choose non-DF approach as well, if they judge it
>>>> appropriate.
>>
>> 2003 would override draft-ietf-mech-v2 esp. regarding v4 in v4 tunnels;
>> I know of no effort to override 2003 yet.
> 
> The text in the draft says,
> 
>    When desiring to avoid fragmentation, IPv4 allows two options: copy
>    the DF bit from the inner packets to the encapsulating header, or
>    always set the DF bit.  The latter is better especially in controlled
>    environments, because it forces PMTUD to converge immediately.
> 
> Both of these fulfill the MUST you quote: if it's copied (when DF is
> set), it's also set for the outer header; when it's forced, it's always
> set.
> 
> So, I don't see a problem here.  Could you you be more specific what you
> would like to see changed?

s/allows two options/requires one of two alternatives [RFC2003]/

> Would the following be better?
> 
>    When desiring to avoid fragmentation, there are two options with
> IPv4: copy
>    the DF bit from the inner packets to the encapsulating header, or
>    always set the DF bit.  The latter is better especially in controlled
>    environments, because it forces PMTUD to converge immediately.
> 
> (just rewording out the "allows")

IMO, something stronger is required, as are more references to RFC2003
on this point.

>>>>>> Later sec 3.4 says "but there are certainly uses for it". IMO,
>>>>>> that's endorsement, and I disagree. It doesn't "work around"
>>>>>> PMTUD issues, but rather defeats PMUTD. If there is indeed a
>>>>>> justifiable reason to consider this option, it should be
>>>>>> explained in this document, as it is key to whether it should
>>>>>> be considered anything other than broken (or are you also
>>>>>> allowing stacks that miscalcuate the IP checksum, just because
>>>>>> some do? :-)
>>>>
>>>>
>>>> I think the most compelling case has been described in section 3.1
>>>> -- when having tunnels (especially IPsec tunnels, but any tunnel is
>>>> the same) over which a large amount of traffic is being
>>>> transported, where PMTUD is too unrealiable or unfeasible as noted
>>>> in section 3.2.
>>
>>
>> This is just making a decision at the tunnel that you prefer efficiency
>> to PMTUD support. 2003 doesn't say it's OK to make that decision, esp.
>> since it's not detectable from the endpoints. I don't think this is a
>> compelling reason, as a result.
> 
> That may or may not be a compelling reason.  Many operators are doing so
> regardless, and vendors are supporting it.  I don't think there is
> anything to be gained (but rather lost) by having the IETF be in denial
> about this.  This document is precisely not a BCP because I want to
> document the *operational* issues (and solutions, even if not optimal
> ones).
> 
> However, having disclaimers e.g., noting clearly that the behaviour is
> incompliant and may break things is fine with me, and I certainly want
> to make such a point myself.
> 
> So, could you check if there's text which should be added or reworded to
> make this clearer?  Suggestions would be welcome.

Text that describes the things it WILL break. Why are vendors supporting
it? Is it just further denial by others that RFC2003 exists or has
precedence here? Or is there real utility?

We already know - as described in the new PMTUD doc, as well as in other
places - that this practice has issues. Why do we need another document
that says so? Or if we do, why should it be an RFC (if the purpose is to
educate or discuss, a tech report or magazine paper would suffice).

>>>> In this case, reassembly becomes a problem (requires basically
>>>> infinite buffers, problems when a fragment goes missing, v4 ID
>>>> space is insufficient so it wraps over and causes data
>>>> corruption/misassociation, etc. -- see section 3.1).
>>
>> Reassembly doesn't require infinite buffers; the fragments need be held
>> only for a network MSL, and the IP ID numbers are required not to wrap
>> during that period, so there's no wrap problem if everybody stays to
>> spec.
> 
> That's a lot of assumptions.  60 seconds (mentioned in RFC1122, sect
> 3.2.2) on 10 Gbit/s interface is basically infinite buffers.  Even if
> the time was 1 second, it would basically mean infinite buffers.

That presumes that the entire bandwidth is used between two hosts, and
it does mean that fragmentation should be avoided in those cases. It's
even OK to drop packets that are fragments altogether if they hit a
tunnel that won't carry them.

It's still NOT OK to clear the DF bit in the outer header when it's set
in the inner, though.

> Further, if you look at
> http://www.watersprings.org/pub/id/draft-mathis-frag-harmful-00.txt, the
> IP ID numbers wrap many times during that period.

Since that doc is an expired ID, I presume you know why I won't address
its contents. When IP ID numbers wrap, that is a violation of RFC791 and
has known problems. There are ways to overcome it, esp. using aliases of
multiple IP addresses per physical interface, to avoid such wrap. But
ignoring the problem or accepting violating implementations is not the
appropriate solution.

>> It's equally possible to just limit reassembly buffers and toss old
>> fragments earlier than a network MSL; it just increases the error rate
>> of the 'link' provided by the tunnel.
> 
> I have doubts whether this would work or not but that's probably not
> relevant to this draft in any case.  I'm not aware of hardware
> implementations which perform this, though it would be interesting to
> hear if anyone knows any.
> 
>>>> So, it just isn't possible to do a [v4] high-bandwidth tunnel with
>>>> fragmentation/reassembly; I guess the point is whether the reasons
>>>> why you can't use PMTUD are convincing enough (e.g., would have to
>>>> be done to millions of sources, passive monitoring, etc.)
>>
>>
>> Yes, it is not possible to do high bandwidth frag/reassembly, period,
>> regardless of whether it's a tunnel or not. We may need to address this,
>> or not - but breaking PMTUD is a bad way to do so, since it would
>> increase fragmentation. The whole point of PMTUD is to discover the MTU
>> size to avoid inefficient (esp. for high bandwidth paths) sizes;
>> deciding to violate 2003's requirements for DF copying just to support
>> high bandwidth frag/reassy (which won't work anyway) is very illogical.
> 
> If you'll look at the details in the draft, the frag/reasm does work --
> after the PMTUD violation -- because the *recipient* (a host!) performs
> reassembly and because the stream is thus small enough not to cause IP
> ID wrapping in that (src, dst, ID) context.  Therefore the reasm is no
> longer too "high speed". So, I think the approach "works" to some
> definition of "works" -- it certainly seems to solve problems some
> operators have been having.

"For some definition of works" isn't what the IETF is about, IMO.

> Breaking PMTUD is of course unfortunate especially for v6, but maybe
> more or less a consequence of the IETF being in denial about the problem.

The IETF may be in denial about the problem, but addressing violations
of spec as anything other than such is worse - it doesn't move us
forward to change the spec in useful ways (increasing the IP ID space,
dealing with fragmentation more usefully, etc.).

>>>>>> In particular, clearing the DF bit may disable downstream path
>>>>>> discovery, BUT the discussion implies it should be done only
>>>>>> for already-fragmented packets. A packet which is fragmented
>>>>>> with the DF bit set is an error and should be dropped, since it
>>>>>> was a don't fragmented and has been fragmented, as noted in
>>>>>> RFC791 (see page 8).
>>>>>>
>>>>>> I.e., this example is nonsensical; if there is a valid one, it
>>>>>> would be useful to present it.
>>>>
>>>>
>>>> No, the discussion tries to point out that if the implementation
>>>> clears the inner DF bit (and consequently causes fragmentation for
>>>> big packets at the same node), even if further downstream the
>>>> packets would get fragmented _again_, that further fragmentation
>>>> doesn't make matters any worse.
>>
>> A packet with the DF bit set with MF also set is an error as per RFC791.
>> The discussion implies that this can exist - or are you talking about
>> clearing only the outer DF?
> 
> I'm not 100% certain as this was text submitted to me by an earlier
> reviewer.  But I'm not sure if I understand -- what is the scenario
> where you would have both DF and MF?  I don't see one -- if you clear
> the DF bit, by definition you don't get any packets with MF from the
> path before clearing the DF bit; and if you have cleared the DF bit, you
> may get MF bit on some packets after that point, but no longer DF bit.

The text isn't clear on that. The example you've given would be useful
to add.

Joe

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Int-area mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/int-area

Re: [Int-area] Please review: draft-savola-mtufrag-network-tunneling-04.txt

Reply via email to