Robin, Your proposal needs to talk about the setting of DF in the outer IPv4 header after encapsulation. Based on my 5+ years of studying this, if it sets DF=1, its busted.
IMHO, SEAL is well beyond the research phase now and pretty deep into engineering solution space. It is written in the form of a functional specification from which a programmer can actually produce running code. Therefore, I think it is ready for experimentation on a wider scale. Thanks - Fred [EMAIL PROTECTED] >-----Original Message----- >From: Robin Whittle [mailto:[EMAIL PROTECTED] >Sent: Monday, April 21, 2008 1:14 AM >To: Routing Research Group >Cc: William Herrin >Subject: Re: [RRG] Path MTU Discovery: a new approach > >Hi Bill, > >Thanks for your summary, which is correct in many respects - pretty >good going if you only read the list message, rather than the page >itself: > > http://www.firstpr.com.au/ip/ivip/pmtud-frag/ > >> 1. Have the ITR maintain an "uncertainty zone" for sizes of packets >> that can be sent to a given ETR. The uncertainty zone is bounded by a >> size previously determined to be smaller than or equal to the actual >> PMTU (LPME) and a size previously determined to be larger than the >> actual PMTU (UPME). > >Yes. > > >> 2. The ITR encapsulates and transmits packets smaller than LPME >> normally. > >Yes, except the ITR should probably send a few such packets with >RPD2 (BTW, if anyone can think of a better acronym ...) to explore >the possibility that the Real PMTU is now lower than LPME or higher >than UPME. These need to be rate limited. Most of the time, there >will be no such change from month-to-month, but sometimes >there will be. > > >> It rejects packets larger than UPME immediately with a too-big >> message. > >Yes, except for occasionally where it uses one as an explorative >probe to detect if the Real PMTU has risen above UPME. If the >packet is not delivered, then it sends a PTB to the SH as you >describe, with MTU value equal to UPME. (If it gets a PTB from the >tunnel, the MTU in that PTB is used to set an upper limit on UPME.) > >The only packets which are always rejected with a PTB are those >which, once encapsulated, would exceed the MTU of the interface the >ITR uses to send packets to this ETR. > > >> 3. If the packet size is in the uncertainty zone, encapsulate it with >> RPD2 instead of the normal encapsulation and hold the original packet >> until the ETR responds. This encapsulation consists of two packets: >> one in the uncertainty zone and one smaller than LPME. > >Actually, the small one will not only be smaller than LPME, it will >be way smaller than some figure like 1200 bytes, which we assume can >be sent from any ITR to any ETR without PMTU problems. > >> If successfully transmitted, the ETR will reassemble the two >packets into >> one before passing them on. > >Yes - if the ETR receives the big Packet B and at least one small >Packet A. > >This is true except for the just mentioned occasional exploratory >probe packets of length longer then UPME or shorter than LPME. > > >> 4. The ETR is required to respond to the ITR with information about >> all communications associated with RPD2, in addition to >delivering the >> packets. By comparing the ETR's response to the RPD2 >messages with the >> RPD2 messages it sent, the ITR can narrow the uncertainty zone until >> LPME and UPME meet. >> >> Please correct any part of that I misunderstood. > >There a few other points. > >1 - Packet B, the large one, is sent with its outer header's source > address set to the ITR's address. This is true in all instances > or RPD2, including Ivip. In Ivip, the Packet As are sent with > their outer source address being that of the SH. > >2 - Therefore if Packet B gets to a router in the ITR --> ETR tunnel > with an outgoing MTU which is too small for it, the ITR will > receive a Packet Too Big message. (Except if the Packet B or > the PTB packet are dropped for some random reason, or if the PTB > is blocked by a filter. A BCP will say: Don't put your ITRs and > ETRs behind such filters.) > >3 - When the ITR gets a PTB from the tunnel, is told by the ETR that > the Packet B didn't arrive in a reasonable, but short, > time-frame (maybe try twice) it sends a PTB back to the > Sending Host (SH) - so the SH will try again, with a smaller > packet, and no data should be lost to the application. > >4 - If the ITR simply gets back from the ETR, it might try again. > I am not sure what the ITR would do then, but I don't think it > should be adjusting down its UPME variable, or sending PTBs to > the SH, just because it can't get a report of any kind from the > ETR. This is probably a temporary glitch. If it is permanent, > then there's no point in sending a PTB anyway, since the data > will never get to this ETR, at least via this ITR. > >Also, the ITR always* learns something truthful when it uses RPD2 to >send a packet with a length within the Zone of Uncertainty. > >* This is not counting extreme cases where two attempts at sending > the sets of packets do not result in the ITR receiving a report > from the ETR - but that would be a case of at least temporarily > very poor reachability between the two, so we can't expect > anything better. > > >> Two questions, one note: >> >> Question #1: How does the ITR determine that its old PMTU >estimate has >> been invalidated, either because of a route change or because >> individual packets are being transmitted along multiple channels each >> with a different PMTU? > >There needs to be some low rate of exploratory probing using RPD2 >sending of some packets shorter than LPME and longer than UPME. > > >> If I understand you, packets are not transmitted with RPD2 unless the >> ITR believes the size falls in the uncertainty zone, > >Yes, except for the occasional exploratory shorter and longer packets. > >> and not transmitted with the ITR's source IP address regardless, > >The long Packet B of RPD2 is always sent with the outer header's >source address being that of the ITR. > >> so the ITR has no real hope of seeing normal too-big complaints. >> So how does it ever decide that its estimated PMTU is no longer >> valid? > >Ivip's ordinary encapsulation of traffic packets (IP-in-IP) has the >outer header set to the SH's address. So the ITR gets no PTB from >them, and a properly implemented RFC 1191 SH would not recognise the >PTB either. > >A SH which was looking out for this kind of PTB could detect it, but >I haven't explored this and am determined not to make any part of >Ivip dependent on host changes - other perhaps than a souped up >traceroute program. > >Occasional shorter and longer exploratory probe packets, with direct >reports from the ETR will detect changes in the Real PMTU outside >LPME to UPME - but not as fast as if the normally encapsulated >traffic packets had the ITR's address as their source *and* the ITR >could store enough state to securely validate PTB messages they cause. > >A non-Ivip ITR, or some other device using this IPTM - RPD2 >procedure probably could use the ordinary encapsulation to detect >the Real PMTU getting shorter than it currently assumes. The trick >would be to only cache the information for a handful of the longest >packets. There's no point in caching stuff for the shorter ones >while longer ones are being sent, close to or at the limit set by >LPME. > >Relying on securely checked PTBs is a pretty good way of finding out >that the Real PMTU has got shorter than LPME. Using one or more >non-arrivals of the long probe packet at the ETR is not quite as >reliable, since this could occasionally occur due to bad luck with >packet loss. It would be bad to lower LPME in a spurious way, due >just to non-arrival of the probe packet (rather than the gutsier way >of getting a real PTB). This would result in the ITR sending a PTB >to the SH with a lower than needed MTU value. The SH would then be >bound to use that value to limit its packet size for the next ten >minutes. This is bad, but not disastrous - it is just a loss of >efficiency, rather than a loss of data or of connectivity. > >Relying on a report from the ETR that a long packet did arrive OK is >the best way of detecting that the Real PMTU is higher than UPME. >The mere absence of PTBs is not as reliable, since they could be >dropped randomly (or the probe packet dropped randomly before it hit >the PMTU limiting router) - or perhaps the PTBs could be blocked by > ICMP filters which violate the BCP recommendation. > >IPTM - RPD2 can do its job reliably without PTBs from the tunnel, >but if they are there, that is better. The ITR has to be able to >get the PTBs it generates to SH, but if it can't do that, then we >are sunk anyway. > >The sections: > > Discovering changes in Real PMTU > > An alternative to the RPD2 approach of splitting the traffic > packet > >discuss the various approaches, with and without Ivip's "outer >source = SH" approach, including some promising possibilities of >ITRs only caching some packets, and alternatives to RPD2's approach >of splitting the traffic packet. > > >> Question #2: nearly every ITR->ETR map will trigger the use >of RPD2 as >> two associated end sites begin transmitting data. > >This is quite different from the debate about "pure pull" (LISP-ALT >and TRRP, though I now think neither is quite so pure) ITRs >frequently delaying initial packets. > >Firstly, RPD2 is only used for packets longer than 1200 bytes. This >means that almost all session establishments will not be encumbered >by RPD2, since I figure very few protocols start up with such long >initial packets. Many kinds of traffic will never require packets >longer than 1200 or whatever bytes, including DNS and almost all >HTTP traffic in the client -> server direction. I figure SMTP and >many other protocols only have big packets going in one direction >for each session. > >Secondly, the burden of RPD2 is primarily due to involving the ITR's >and the ETR's central CPU. There is also the burden of sending >extra packets, but the probe Packet B is the same length as an >ordinarily encapsulated packet, and the 2 or maybe 3 short Packet >A's are likely to be 100 bytes or less each. > >There no significant extra delay. Assuming the Packet B and at >least one of the first two Packet A's get to the ETR, the traffic >packet is delivered. This need not take more than a fraction of a >millisecond longer on high-speed links, unless the central CPU does >not have the capacity to attend to this promptly. These delays >would be far shorter than the delay of looking up mapping in the ALT >or TRRP global query server system, or using their initial packet >delivery systems to get the packet to the ETR before the ITR has the >mapping. > >Also, these RPD2 packets do not involve data loss to the >application. Sometimes, they require a resend with a smaller packet >- but that is when the only way of delivering the original packet >would be via some fragmentation or other splitting mechanism, since >the packet, once encapsulated, was in fact too big for the tunnel PMTU. > > >> Given the complexity, you're looking at a general-purpose CPU on >> both ends to handle this. What sort of impact does that have >> on the system capacity? > >I can't say for sure. I can't think of a simpler approach, and this >PMTUD stuff really does need to be solved. There may well be some >gotchas, but the way it looks now is far better and cleaner than I >thought would be possible a few days ago. Since October I have >assumed we would need synthetic probe packets and that it would be >necessary to break up some packets into smaller chunks to deliver >them in spite of PMTU limitations. > >In this scheme, no traffic carrying probe packet goes to waste. It >is either delivered and the ITR learns about the Real PMTU, or it is >not delivered, and the ITR also learns - with no application data >loss. Then the RFC 1191 SH automatically cooks up a shorter packet, >which is just what is needed for the ITR to find out more about the >Real PMTU. > > >> Note #1: in your document, you describe the ETR returning multiple >> packets to the ITR for each received RPD2 packet, until the ITR >> acknowledges receipt. This potentially resurrects our old friend, the >> smurf amplifier. > >This is definitely a gotcha. This IPTM - RRG stuff didn't exist two >days ago, so it amenable to change. Maybe limit the retries to a >single retry, or at most to two. That only gives an amplification >factor of two or three. > >The report packets would be pretty short, and if generated by an ETR >in response to bogus Packet As' would be ignored by most devices, >including any ITR. > >Perhaps a way to discourage attackers using of this aspect of the >ETR's functionality would be to ensure that the Packet As needed to >be as long as the total length of the two or three ETR -> ITR report >packets. But that just adds overhead to the entire protocol. > > Cheers > > - Robin > > >-- >to unsubscribe send a message to [EMAIL PROTECTED] with the >word 'unsubscribe' in a single line as the message text body. >archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg > -- to unsubscribe send a message to [EMAIL PROTECTED] with the word 'unsubscribe' in a single line as the message text body. archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
