Bob, thanks for the insightful comments.  Response inline...


Bob Briscoe wrote:
Matt, John,

1/ During the lifetime of this draft its scope has become restricted to soley describing the problem. That's good (and it's a good solid draft), but the abstract or intro needs to explictly say what it is deliberately not setting out to say (not describing partial solutions, not proposing solutions).

2/ Given the new restricted scope, the title is wrong - it's snappy, but not appropriate for this draft any more. It should be "Field Wrapping Problems with IP Fragmentation" or some such.

"IPv4 Fragmentation Considered Very Harmful" attributes blame squarely on the fragmentation protocol (as opposed to re-assembly implementations - see later). This title effectively says "You SHOULD NOT fragment with a 16b ID field", which is beyond what the text dares to say, and it's beyond what an informational draft should say.

This is basically what the IESG review said and we agree. The text in the abstract is not clear. The intent is not to say the IPv4 fragmentation should never be used, but that it is not safe for use under some conditions. Basically, we feel that for some (esp. low-rate) applications such as DNSSEC it's probably fine, but for other purposes, especially bulk transport, it's not suitable. Note, however, that even low-rate fragmenting applications can be exposed to problems if a high-rate non-fragmenting application is running at the same time.

I think the text of the document is adequately clear on this, but we will be rewording the abstract.

About the title: it was meant to indicate that this document is documenting an additional problem (which is in fact very harmful when it occurs), and is extension, not a substitute, for the landmark Kent/Mogul "Fragmentation considered harmful" paper. It seems to have caused sufficient confusion and misinterpretation that it is probably worth reconsidering the title.


Even if this was a BCP rather than informational, I would say it would be wrong to deprecate 16b fragmentation anyway. There are good reasons why fragmentation is useful (e.g. in tunnels), so we need to try really hard to find robust ways to do it before writing it off as deprecated. Saying it's very harmful should only be a last resort if we /prove/ it cannot ever be done robustly.

As we know, one way to fragment with improved robustness is to use more bits for the ID field. But if 16b fragmentation is a problem now, 32b fragmentation will be a problem in the future (not so distant future as Matt pointed out on int-area <http://osdir.com/ml/[email protected]/msg00545.html>). IP is sufficiently pivotal at the neck of the hour glass that anything we say about it should endure for decades. I would contend that the IETF shouldn't condone putting off a problem to a later date when we know it will return. The present title leads us towards that sort of solution, implying "32b ID fields aren't [ever] very harmful" - on a draft that isn't even meant to be discussing solutions.

Given hierarchical layering is here to stay (and always has been), it would be more fruitful to admit that we need to be able to do fragmentation robustly and so we cannot avoid choosing an ID field width that will
- either not be wide enough at some future time
- or will be overly wasteful today.

In this vein, it would be useful to focus everyone on designing better re-assembly /implementations/ around a 16b fragmentation /protocol/ (see a possible idea below). There is no proof yet that we have reached the end of our innovation potential on this.

A sketch idea for a more robust re-assembly implementation:
On receipt of each fragment, within the re-assembly implementation increase the precision of the ID field by adding a "received timestamp" of sufficient precision. Then on a first pass, match fragments only if the fragment IDs match AND the timestamps are within a certain narrow range of each other. Otherwise hold the fragment and, as a last resort later, widen the timestamp range that will cause a match - perhaps when the fragment is about to be expired from the buffer (...rest of implementation left as an exercise for the reader).

In summary, a 16bit fragment ID field should be innocent until proven guilty. As long as the culprit might be /implementations/, the title shouldn't presume the IPv4 fragmentation /protocol/ is guilty.

Originally we had thoughts of trying to prescribe an implementation fix. There are certainly a lot of possible approaches. Linux has just recently implemented a scheme where fragments time out not by wall clock time, but by the number of intermediate packets received, effectively having a tunable parameter on the amount of reordering tolerated. This seems to work pretty well.

However, there really is a protocol problem that prevented us from trying to prescribe a solution. The sender is responsible for not reusing IPIDs before the fragments time out, but the receiver is responsible for timing out fragments. There is no way to the sender to know if a receiver has timed out the old fragments yet.

As an operations person, I might be able to deploy fragmentation with some confidence if I can ensure that all receivers I'm using are running recent Linux kernels with the workaround enabled. But as an application developer, there's no way for me to be sure I won't ever run in to protocol trouble if I rely on fragmentation.


3/ The draft should say something about how the problem gets worse if the sender uses a pseudo-random number generator for the IPid field (as recent versions of OpenBSD and some versions of FreeBSD do). Then there is no longer a deterministic wrapping problem, but there is /always/ some small probability of a clash within the max packet lifetime. A good ref for this is:

S. Bellovin, ``A Technique for Counting NATted Hosts,'' Proceedings of the Second Internet Measurement Workshop, November 2002. http://www.cs.columbia.edu/~smb/papers/fnat.pdf

Yes, we can add this.

Thanks,
  -John

_______________________________________________
Int-area mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/int-area

Reply via email to