Iljitsch van Beijnum wrote: > On 2-aug-2007, at 15:30, Joe Touch wrote: ... >> tunnel endpoints MUST either: > >> 1. set outer DF=0 and allow fragmentation (including at the >> tunnel source > > So far so good... > >> 2. set outer DF=1 when their payload fits, > > ...but this makes no sense at all. The whole point of PMTUD is to _find_ > _out_ whether stuff fits. You can't know that in advance.
Fits in what you currently think the path MTU is (I should have been
more specific).
>> Receipt of a too-big at the tunnel source should not be expected to be
>> translated to be sent to the original packet's source;
>
> Not for IPv4. For IPv6, that would be a valid choice but handling this
> in the same way as IPv4 would also be fine.
It's a valid choice, but should not be EXPECTED.
>> The primary benefit of receiving such
>> messages is for subsequent packets; the tunnel source would decrease its
>> MTU, and then **other** packets from that source (or any other source)
>> would correct the actions above (#1 would make smaller fragments, #2
>> would generate ICMPs back to the source).
>
> Right. Note that TCP tends to send out two packets at a time, so with
> this in effect the first packet will trigger PMTUD in the tunnel, but by
> then, the second packet is also on its way, so both packets will be lost
> and TCP will probably stall for some time. Then when the third packet
> comes, the sending host finally sees the too big.
Yes. Transport protocols will react poorly to this - once. Presumably
other connections will not experience this problem.
>> These rules apply equally to IPv4 and IPv6; in neither case should
>> tunnels fragment the encapsulated packet, IMO.
>
> Why not?
>
> Fragmentation needs to happen in certain cases with IPv4. The only
> choice is who is going to reassemble.
I like treating IPv4 and IPv6 similarly. Tunnels should not put undue
burden on endpoints. Since a tunnel destination MUST exist (to
decapsulate), it ought to be saddled with the work of reassembly, rather
than dropping it on the endpoint.
>>> Some choices and the extra headers they allow for:
>
>>> 1492: PPPoE
>>> 1480: PPPoE / IPv4
>>> 1476: PPPoE / IPv4 / IPv4 + GRE
>>> 1472: PPPoE + IPv4 / IPv4 + GRE
>>> 1460: PPPoE + IPv4 / IPv4 + GRE / 2 x IPv4 / IPv6
>>> 1452: PPPoE + 2 x IPv4 / 2 x IPv4 + GRE / PPPoE + IPv6
>
>> There are many other cases - notably IPsec tunnels, which consume even
>> more bytes. Tunnel endpoints may employ header compression which may
>> somewhat compensate for size inflation too. IMO, it's not useful to
>> guess these sizes or expected layerings, as the use of layered VPNs and
>> overlays is likely to increase over time.
>
> I'm aware that there is a race going on to see who can be the first to
> implement 1500 bytes of overhead per packet. Obviously whatever maximum
> packet size above 68 bytes a sender of a packet chooses, there will be
> some configuration that can't carry packets of that size. And since
> datagram based applications can't arbitrarily reduce their packet size,
> there will always be _some_ fragmentation. (Or black holes if people
> prevent fragmentation from working properly.) Reducing packet sizes a
> few percent for applications / transports that require a one time packet
> size choice seems like a good idea to avoid triggering these issues
> unnecessarily.
Sure. Let's pick one we won't have to move too often, though. That means
a few layers of possible IPsec, e.g., 1300 or 1200. That's close enough
to 1500 for efficiency.
Joe
--
----------------------------------------------------------------------
Joe Touch Sr. Network Engineer, USAF TSAT Space Segment
Postel Center Director & Research Assoc. Prof., USC/ISI
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Int-area mailing list [email protected] https://www1.ietf.org/mailman/listinfo/int-area
