Hi Bill, Thanks for your summary, which is correct in many respects - pretty good going if you only read the list message, rather than the page itself:
http://www.firstpr.com.au/ip/ivip/pmtud-frag/ > 1. Have the ITR maintain an "uncertainty zone" for sizes of packets > that can be sent to a given ETR. The uncertainty zone is bounded by a > size previously determined to be smaller than or equal to the actual > PMTU (LPME) and a size previously determined to be larger than the > actual PMTU (UPME). Yes. > 2. The ITR encapsulates and transmits packets smaller than LPME > normally. Yes, except the ITR should probably send a few such packets with RPD2 (BTW, if anyone can think of a better acronym ...) to explore the possibility that the Real PMTU is now lower than LPME or higher than UPME. These need to be rate limited. Most of the time, there will be no such change from month-to-month, but sometimes there will be. > It rejects packets larger than UPME immediately with a too-big > message. Yes, except for occasionally where it uses one as an explorative probe to detect if the Real PMTU has risen above UPME. If the packet is not delivered, then it sends a PTB to the SH as you describe, with MTU value equal to UPME. (If it gets a PTB from the tunnel, the MTU in that PTB is used to set an upper limit on UPME.) The only packets which are always rejected with a PTB are those which, once encapsulated, would exceed the MTU of the interface the ITR uses to send packets to this ETR. > 3. If the packet size is in the uncertainty zone, encapsulate it with > RPD2 instead of the normal encapsulation and hold the original packet > until the ETR responds. This encapsulation consists of two packets: > one in the uncertainty zone and one smaller than LPME. Actually, the small one will not only be smaller than LPME, it will be way smaller than some figure like 1200 bytes, which we assume can be sent from any ITR to any ETR without PMTU problems. > If successfully transmitted, the ETR will reassemble the two packets into > one before passing them on. Yes - if the ETR receives the big Packet B and at least one small Packet A. This is true except for the just mentioned occasional exploratory probe packets of length longer then UPME or shorter than LPME. > 4. The ETR is required to respond to the ITR with information about > all communications associated with RPD2, in addition to delivering the > packets. By comparing the ETR's response to the RPD2 messages with the > RPD2 messages it sent, the ITR can narrow the uncertainty zone until > LPME and UPME meet. > > Please correct any part of that I misunderstood. There a few other points. 1 - Packet B, the large one, is sent with its outer header's source address set to the ITR's address. This is true in all instances or RPD2, including Ivip. In Ivip, the Packet As are sent with their outer source address being that of the SH. 2 - Therefore if Packet B gets to a router in the ITR --> ETR tunnel with an outgoing MTU which is too small for it, the ITR will receive a Packet Too Big message. (Except if the Packet B or the PTB packet are dropped for some random reason, or if the PTB is blocked by a filter. A BCP will say: Don't put your ITRs and ETRs behind such filters.) 3 - When the ITR gets a PTB from the tunnel, is told by the ETR that the Packet B didn't arrive in a reasonable, but short, time-frame (maybe try twice) it sends a PTB back to the Sending Host (SH) - so the SH will try again, with a smaller packet, and no data should be lost to the application. 4 - If the ITR simply gets back from the ETR, it might try again. I am not sure what the ITR would do then, but I don't think it should be adjusting down its UPME variable, or sending PTBs to the SH, just because it can't get a report of any kind from the ETR. This is probably a temporary glitch. If it is permanent, then there's no point in sending a PTB anyway, since the data will never get to this ETR, at least via this ITR. Also, the ITR always* learns something truthful when it uses RPD2 to send a packet with a length within the Zone of Uncertainty. * This is not counting extreme cases where two attempts at sending the sets of packets do not result in the ITR receiving a report from the ETR - but that would be a case of at least temporarily very poor reachability between the two, so we can't expect anything better. > Two questions, one note: > > Question #1: How does the ITR determine that its old PMTU estimate has > been invalidated, either because of a route change or because > individual packets are being transmitted along multiple channels each > with a different PMTU? There needs to be some low rate of exploratory probing using RPD2 sending of some packets shorter than LPME and longer than UPME. > If I understand you, packets are not transmitted with RPD2 unless the > ITR believes the size falls in the uncertainty zone, Yes, except for the occasional exploratory shorter and longer packets. > and not transmitted with the ITR's source IP address regardless, The long Packet B of RPD2 is always sent with the outer header's source address being that of the ITR. > so the ITR has no real hope of seeing normal too-big complaints. > So how does it ever decide that its estimated PMTU is no longer > valid? Ivip's ordinary encapsulation of traffic packets (IP-in-IP) has the outer header set to the SH's address. So the ITR gets no PTB from them, and a properly implemented RFC 1191 SH would not recognise the PTB either. A SH which was looking out for this kind of PTB could detect it, but I haven't explored this and am determined not to make any part of Ivip dependent on host changes - other perhaps than a souped up traceroute program. Occasional shorter and longer exploratory probe packets, with direct reports from the ETR will detect changes in the Real PMTU outside LPME to UPME - but not as fast as if the normally encapsulated traffic packets had the ITR's address as their source *and* the ITR could store enough state to securely validate PTB messages they cause. A non-Ivip ITR, or some other device using this IPTM - RPD2 procedure probably could use the ordinary encapsulation to detect the Real PMTU getting shorter than it currently assumes. The trick would be to only cache the information for a handful of the longest packets. There's no point in caching stuff for the shorter ones while longer ones are being sent, close to or at the limit set by LPME. Relying on securely checked PTBs is a pretty good way of finding out that the Real PMTU has got shorter than LPME. Using one or more non-arrivals of the long probe packet at the ETR is not quite as reliable, since this could occasionally occur due to bad luck with packet loss. It would be bad to lower LPME in a spurious way, due just to non-arrival of the probe packet (rather than the gutsier way of getting a real PTB). This would result in the ITR sending a PTB to the SH with a lower than needed MTU value. The SH would then be bound to use that value to limit its packet size for the next ten minutes. This is bad, but not disastrous - it is just a loss of efficiency, rather than a loss of data or of connectivity. Relying on a report from the ETR that a long packet did arrive OK is the best way of detecting that the Real PMTU is higher than UPME. The mere absence of PTBs is not as reliable, since they could be dropped randomly (or the probe packet dropped randomly before it hit the PMTU limiting router) - or perhaps the PTBs could be blocked by ICMP filters which violate the BCP recommendation. IPTM - RPD2 can do its job reliably without PTBs from the tunnel, but if they are there, that is better. The ITR has to be able to get the PTBs it generates to SH, but if it can't do that, then we are sunk anyway. The sections: Discovering changes in Real PMTU An alternative to the RPD2 approach of splitting the traffic packet discuss the various approaches, with and without Ivip's "outer source = SH" approach, including some promising possibilities of ITRs only caching some packets, and alternatives to RPD2's approach of splitting the traffic packet. > Question #2: nearly every ITR->ETR map will trigger the use of RPD2 as > two associated end sites begin transmitting data. This is quite different from the debate about "pure pull" (LISP-ALT and TRRP, though I now think neither is quite so pure) ITRs frequently delaying initial packets. Firstly, RPD2 is only used for packets longer than 1200 bytes. This means that almost all session establishments will not be encumbered by RPD2, since I figure very few protocols start up with such long initial packets. Many kinds of traffic will never require packets longer than 1200 or whatever bytes, including DNS and almost all HTTP traffic in the client -> server direction. I figure SMTP and many other protocols only have big packets going in one direction for each session. Secondly, the burden of RPD2 is primarily due to involving the ITR's and the ETR's central CPU. There is also the burden of sending extra packets, but the probe Packet B is the same length as an ordinarily encapsulated packet, and the 2 or maybe 3 short Packet A's are likely to be 100 bytes or less each. There no significant extra delay. Assuming the Packet B and at least one of the first two Packet A's get to the ETR, the traffic packet is delivered. This need not take more than a fraction of a millisecond longer on high-speed links, unless the central CPU does not have the capacity to attend to this promptly. These delays would be far shorter than the delay of looking up mapping in the ALT or TRRP global query server system, or using their initial packet delivery systems to get the packet to the ETR before the ITR has the mapping. Also, these RPD2 packets do not involve data loss to the application. Sometimes, they require a resend with a smaller packet - but that is when the only way of delivering the original packet would be via some fragmentation or other splitting mechanism, since the packet, once encapsulated, was in fact too big for the tunnel PMTU. > Given the complexity, you're looking at a general-purpose CPU on > both ends to handle this. What sort of impact does that have > on the system capacity? I can't say for sure. I can't think of a simpler approach, and this PMTUD stuff really does need to be solved. There may well be some gotchas, but the way it looks now is far better and cleaner than I thought would be possible a few days ago. Since October I have assumed we would need synthetic probe packets and that it would be necessary to break up some packets into smaller chunks to deliver them in spite of PMTU limitations. In this scheme, no traffic carrying probe packet goes to waste. It is either delivered and the ITR learns about the Real PMTU, or it is not delivered, and the ITR also learns - with no application data loss. Then the RFC 1191 SH automatically cooks up a shorter packet, which is just what is needed for the ITR to find out more about the Real PMTU. > Note #1: in your document, you describe the ETR returning multiple > packets to the ITR for each received RPD2 packet, until the ITR > acknowledges receipt. This potentially resurrects our old friend, the > smurf amplifier. This is definitely a gotcha. This IPTM - RRG stuff didn't exist two days ago, so it amenable to change. Maybe limit the retries to a single retry, or at most to two. That only gives an amplification factor of two or three. The report packets would be pretty short, and if generated by an ETR in response to bogus Packet As' would be ignored by most devices, including any ITR. Perhaps a way to discourage attackers using of this aspect of the ETR's functionality would be to ensure that the Packet As needed to be as long as the total length of the two or three ETR -> ITR report packets. But that just adds overhead to the entire protocol. Cheers - Robin -- to unsubscribe send a message to [EMAIL PROTECTED] with the word 'unsubscribe' in a single line as the message text body. archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
