Short version: Discussion of SEAL and in particular of the differing approaches Fred and I have about PMTUD. Fred seems to support continued use of IPv4 DF=0 "fragmentable in the network" packets while I think it should be deprecated. (It is banned in IPv6.)
Fred seems to support RFC 4821 PMTUD, at least for packets longer than 1500 bytes. I am opposed to RFC 4821 - because I think it is over-defensively trying to cope with problems which should be fixed, rather than tolerated and therefore encouraged. ETEs (ETRs) can't be expected to try to detect packets arriving supposedly from legitimate ITEs (ITRs) but which are sent by an attacker. Hi Fred, I plan to write a "IRON: SEAL summary V2" based on what I learnt from your two recent on-list messages and one off-list message. Here is my response to your first on-list message and some elements of your off-list message. >From your off-list message, I understand: * The longest IPv6 prefix length IRON/RANGER is intended to support is /56. This sounds OK to me but at present I plan Ivip to work in integer units of /64. Maybe that can be scaled back to /56 nearer the time of deployment. * I will assume that the route redirect messages currently specified in RANGER (native ICMP IPv4 and IPv6 redirects) will be replaced by SEAL messages. This will allow the inclusion of a caching time. * I understand that the North Island IRON router will first of all send the traffic packet to the Seattle router, based on its VP prefix 43.0.0.0 /16 in its FIB. The Seattle router somehow gets the packet to the correct IRON router in the South Island - and sends a redirect to the North Island IRON router. That redirect causes the North Island IRON router to install a more-specific prefix in its FIB, for 43.0.56.76 /30 with the path leading to the correct South Island router. So subsequent traffic packets addressed to this prefix will be tunneled direct to the correct South Island router - this is RANGER's Route Optimization process. * I suggested, without any thought, a 10 minute time by which an IRON router would purge its FIB of any more-specific route for a particular EUN (End User Network) EID prefix if there was no traffic for it. You suggest a 2 minute STALETIME - which seems fine to me. So a router which receives a SEAL redirect would maintain it in its FIB for the the caching time if packets keep arriving for it at intervals less than STALETIME or for the STALETIME if no packets arrive in that time. * I only have a rough idea how the EUN router creates "bubbles" with RANGER's (IPv6's?) RA (Router Advertisements) as a means of the one or more IRON routers of its ISPs securely registering themselves as the correct destination for packets whose destination address matches 43.0.56.76 /30. To what extent does this involve adding things to IPv6 - and to what extent is it practical, and with what additions, for IPv4? * I think we have very different goals regarding support for DF=0 packets and for dealing with problems in the network, such as tunnels which don't support RFC 1191 / RFC 1981 PMTUD and filters which drop PTB packets. I think you support creating protocols which can cope with this stuff, including potentially very long DF=0 packets. I think these filtering and bad tunnel arrangements need to be fixed - and that DF=0 should be deprecated. >>> SEAL explicitly turns off PMTUD and uses its own tunnel >>> endpoint-to-endpoint MTU determination, so in the normal >>> case it does not expect to receive any ICMP PTBs from >>> routers within the tunnel. >> >> My understanding is that this is only true for IPv4, because the SEAL >> ITE (Ingress Tunnel Endpoint) sends packets with DF=0 to the ETE >> (Egress Tunnel Endpoint). For IPv6, the ITE can get PTBs from >> routers in the tunnel since no packets are fragmentable. So I think >> it would not be true to state that SEAL ITE "turns off" the >> traditional IPv6 RFC 1981 PMTUD mechanism when it tunnels packets to >> the ITE. > > Yes, that's right. My mind has been so locked into the > IPv4 case that I forget that IPv6 does not allow > fragmentation in the network. So, you are right that > IPv6 as the outer protocol requires RFC1981 PMTUD > feedback from the network. I understand from what follows that you intend to rewrite SEAL to not use the IPv6 Fragment Header, but to use an explicit SEAL header by which the ITE can request the ETE acknowledge the receipt of the packet. This would mean that if there were no PTBs arriving at the ITE due to an MTU limit in a router in the ITE -> ETE path, that the ITE could try various packet lengths until it found a length short enough to avoid the lowest MTU limit. >>> SEAL *can* enable PMTUD for certain "expendable" packets, >> I don't recall what these would be. > > Out-of-band probes, e.g. OK - I guess such as I just mentioned >> Is there a mechanism for SEAL, in IPv4, to send these "expendable" >> packets with DF=1? > > Yes; just set DF=1 in the outer IPv4 header and send it. OK. >>>>> In some environments, it may be necessary to insert a >>>>> mid-layer UDP header in order to give ECMP/LAG routers >>>>> a handle to support multipath traffic flow separation. >>>> http://en.wikipedia.org/wiki/Equal-cost_multi-path_routing >>>> >>>> http://www.force10networks.com/CSPortal20/TechTips/0065_HowDoIConfigureLoadBalancing.aspx >>>> >>>> As far as I know, these techniques are not something to consider with >>>> the RANGER CES, or with LISP or Ivip. If the routers can handle >>>> ordinary traffic packets they can handle encapsulated packets too. I >>>> haven't read about these techniques in detail. I guess that within >>>> RANGER, beyond its use as a CES scalable routing solution, you may >>>> want to support ECMP and LAG. >>> >>> There has been a great deal of talk about taking care >>> of ECMP/LAG routers within the network that only >>> recognize common-case protocols (i.e., TCP and UDP), >>> which is why LISP has locked into using UDP encaps. >> >> Do you expect this to be the case for IRON? If so, then I guess that >> SEAL in IRON must always use this UDP header before the SEAL header - >> since no ITE could know for sure whether ECMP/LAG is in use on the >> path to the ETE. > > Yes, I guess so. OK but ... >> If anyone can point me to good references on ECMP used with LAG, I >> would really appreciate it. I am keen to read more about this. Here you seem to agree with my suggestion that the "mid-layer headers" are after the IPv4/6 header and before the SEAL header. However, I see from the Figures 1 and 2 in: http://tools.ietf.org/html/draft-templin-intarea-seal-08 that these "mid-layer" headers are after the SEAL header. As far as I know, placing a UDP header after the SEAL header would make it invisible to ECMP/LAG routers. So I think that if the packets have to be UDP packets to keep these ECMP/LAG routers happy, the SEAL header and all that follows is part of the UDP payload. Assuming you did this (and I am not suggesting you need to, since I haven't yet read up on ECMP/LAG) and if you wanted to send IPv4 DF=1 packets in the tunnels, and if the MTU limiting router sent back a PTB with only the IPv4 header and the next 8 bytes (the UDP header) then the ITE could still authenticate the PTB by caching its 16 bit UDP checksum. This is because the UDP checksum would be affected by the full 32 bit value of the SEAL_ID in the SEAL header, and the most significant 16 bits of the SEAL_ID are in the returned IPv4 header's Identification field anyway. This might be handy for the IPv4 probing packets you mentioned above. Regarding my suggestions about a timer-like algorithm by which the ITE could decide which range of SEAL_IDs it had "recently" sent to an ETE, for the purposes of authenticating messages from that ETE, or authenticating PTBs from routers in the ITE -> ETE path, you wrote: > OK, that sounds good on the ITE side but what about the > ETE side? If the ETE is going to be tracking the SEAL_ID > for this ITE, can't it similarly keep a sliding window > based on the packets received within the last ~3sec? I don't recall any prior mention of the ETE attempting to decide whether a packet apparently from an ITE was really from that ITE or not. Maybe you could try this for ITE <-> ETE communications, but I think it may be impossible for reasons similar or identical to the arguments below. Here is an argument about why it would be pointless or worse for the ETE in an IRON/RANGER setting to try to use SEAL_ID to decide whether or not to accept tunneled packets containing traffic packets. 1 - Why it can't defend against an attacker. I assume the attacker's purpose is to get bogus packets to the Destination Host (DH). There's no need for an attacker to try to spoof a packet arriving from an active ITE - one which has recently tunneled traffic packets to this ETE. If the attacker could get the ETE to accept them, then yes, the ETE would dutifully forward the packets towards the DH and the DH would get the bogus packet. However, the attacker doesn't need to do this in order to achieve his or her goal. They can send a packet, tunneled just as if it were tunneled by some ITE which the ETE had not recently received packets from. To the ETE, this would be a "new" ITE, and the SEAL_ID would be set to some random value by this ITE. The attacker could keep up this flow of packets and the ETE would keep accepting them. There's no point in making ETE's respond to such packets by sending a packet to the supposed ITE, and then by ignoring those packets if that ITE doesn't confirm it sent them. This would lead to extra network traffic and the attacker could simply spoof the traffic packets themselves and allow them to be forwarded to a legitimate ITE. 2 - Why it would be worse than useless. For the ETE to use SEAL_ID to accept or reject packets would lead to an easy DoS vulnerability. The attacker wants to clobber the ability of ITE X to tunnel packets to ETE Y, before ITE X has done so. The attacker crafts a single packet which appears to ETE Y to have been sent by ITE X, with some random SEAL_ID value. Then, the ETE would use this and reject any packets genuinely coming from ITE Y, because the real ITE Y would have chosen a different random starting point for its SEAL_ID. Just as there is no way of fully preventing attackers sending packets to any host now, with any source address they choose, nor is there any way of preventing such problems with a CES system. Furthermore, since in CES system, EUNs (End User Networks) using "edge" addresses could be connected to any ISP, if these ISPs are going to support these EUNs, then they need to allow the forwarding of all packets from these EUNs which have source addresses matching any "edge" address. At present, ISPs can do their bit to stop spoofing by dropping packets with source addresses not matching the prefix of the network which they came from - but that can't be applied to packets with "edge" addresses in a CES system, unless the ISP is prepared to be extremely fussy and watch the mapping system to see which "edge" prefixes are currently being mapped to an ETR which serves the particular EUN. >> My view is that for IPv4, RFC 1191 PMTUD is an excellent system - >> except that the PTB message should be made to follow the RFC 1981 >> requirement of sending back as much of the original packet as would >> not make the PTB exceed 576 octets: > > How can a system with a going-in strategy of *throwing > away good data* be "excellent"? Because it is unreasonable of hosts or networks to emit some kinds of packets - specifically any packet which is too long for the path to the DH, and where the host or network expects the rest of the network to fuss about chopping the packet into fragments, and then to carry those fragments, and then for the DH to have to reassemble those fragments - which is a complex task. The Post-Office has maximum packet sizes and so does the IPv6 Internet. I think DF=0 packets were always a mistake. I guess it made sense in the early days of very dumb hosts, but I think it is a host responsibility to alter its behaviour so as to send packets which are the right size for the network to deliver in a single piece. If the network dutifully fragments DF=0 packets and attempts to deliver them - as it does in IPv4 - then this allows and therefore encourages hosts to send such packets which require this inefficient, unfair and unreliable form of handling. Likewise, if the network attempted to deliver packets which were too big, by sending a PTB and then by fragmenting them so the final packet could be assembled at the DH, this would allow and therefore encourage SHs to keep sending such packets. It would also involve the DH getting some of the first packet in the second, since the SH couldn't be sure that the longer packet was successfully reassembled at the DH from its fragments. The too-big packet should be thrown away - except for sending enough of it back to the SH to enable the SH to authenticate the PTB. Also, since there can be multiple levels of tunnel, I think the PTB should contain a few hundred bytes of the packet, so that the SH of an inner tunnel, which is a router in the path of an outer tunnel, can construct a PTB to its sending host (the ingress router of a still further outer tunnel) which contains enough information not just for authenticating the PTB, but to allow the final outer router of the first tunnel to construct a PTB with sufficient length for the real SH to authenticate. Without this, ingress tunnel routers would need to cache substantial parts of the packets they send, in order to be able to generate a valid PTB. I guess many IPv4 tunnels don't do this, which is part of the reason for the lousy PMTUD situation today. I think that DF=0 packets should be deprecated, and that the RFC 1191 PTB message should be revised to be the same as the ~540 bytes of packet which are returned in an IPv6 RFC 1981 PTB. I am opposed to what I think you are attempting with SEAL, in at least some circumstances - of fragmenting or segmenting packets which are too long, without an attempt to tell the sending host to send a suitably shortened packet. DF=0 packets do not allow the SH to be told this - so I think they should be deprecated. They were considered unworthy of inclusion in IPv6 in the mid-1990s. I think they should have been deprecated in IPv4 from that time. >> If the RFC 1191 designers had correctly anticipated the need for one >> or more levels of tunneling to support their PMTUD system, then I >> think they would have altered the PTB requirements to be as long as >> those for IPv6's RFC 1981. Then we probably would have tunnels today >> which properly support RFC 1191 PMTUD. > > But, if any one of those tunnels uses IPsec encryption > or the like there is no opportunity for performing the > necessary translation function. So if there were a > decent segmentation and reassembly capability it seems > like IPsec implementations would be wise to use it. If an IPSEC ingress tunnel router can't decipher the initial part of the packet returned in a PTB, then it needs to cache a copy of the initial part of the packet it received as input to the tunnel, for the purpose of constructing a PTB which would be recognised by a SH, and furthermore which is long enough that if this tunnel is within other tunnels, then that by the time this router's PTB contents have been passed back up the line to the other ingress tunnel routers, that the outer one still has enough of the original traffic packet to be able to generate a valid PTB for the SH. Without this, the IPSEC tunnel is not supporting RFC 1191 / RFC 1981 PMTUD. I understand such support is mandatory, so how could a self-respecting tunnel, IPSEC or not, not support it? The world turns to rot if RFC 1191 and RFC 1981 PMTUD are not supported. People have to write messy things like RFC 4821 in an effort to get around the mess caused by such tunnels, or by filtering PTBs. I think we should not be so defensive about stuff happening in the network that we allow it and adapt to cope with it, when the practices are fundamentally inefficient and do not support the best way of doing things. >> Also, I think that DF=0 packets should be deprecated - unless perhaps >> they are shorter than some constant such as 1200 bytes or so. I >> think it would be bad to expect ITRs and ETRs and thewhole CES >> system to work over paths with MTUs below this. People shouldn't use >> such short PMTU links in the DFZ and shouldn't place their ITRs or >> ETRs anywhere where there are such short PMTU links between them and >> the DFZ. > > DF=0 has two benefits - it can allow good data to > get through in cases where DF=1 would have dropped > the data, and it can allows MTU indication through > to the ETE which can report back to the ITE. > >> My view is that for IPv6, RFC 1981 is an excellent system. > > How can a system that places blind faith in the network > be "excellent"? People pay to use services with tunnels. If the tunnels screw up the only efficient, practical, approach to PMTUD (RFC 1191 / RFC 1981) then people shouldn't use such tunnels or pay for any service which uses them. If they do, then its all downhill from there - with people fixing packet lengths to avoid trouble, and busying themselves updating stacks and applications to no longer rely on the perfectly good RFC 1191 / RFC 1981 PMTUD approach (if RFC 1191 had mandatory ~540 byte packet fragments). >> From your research (msg05910), it seems that the current state of >> PMTUD in IPv4 is a shambles - with some networks blocking PTBs, some >> tunnels (or combinations of tunnels) not generating PTBs and with >> some hosts ignoring PTBs, or not responding properly to them. Also >> some hosts send DF=0 packets of 1470 bytes (Google at least). >> >> As far as I know, everything generally works because many hosts are >> configured not to send packets long enough to run into PMTU problems. > > Agree. >> From the current basis, there's no way we can generally adopt >> jumboframe paths in the DFZ as they appear. > > Also agree. >> Nor is there a way of introducing a tunneling-based CES architecture >> which relies for its PMTUD on PTBs. My IPTM approach and I think >> your SEAL approach should be able to cope without relying on PTBs >> from within the tunnel (but see my forthcoming message). But what if >> the ITRs (ITEs) can correctly sense the PMTU to the ETRs (ETEs) and >> are unable to alter the sending host's packet lengths? >> >> This could be due to: >> >> 1 A PTB sent by the ITR is dropped by some filtering system >> before it can get to the SH. This seems more likely if >> the ITR is outside the ISP or end-user network where the >> SH is located, than within it. >> >> If people filter PTBs from entering their system, or use an >> ISP which does the same, this is their own fault. >> >> The trouble is, they get away with it now, because the packets >> their hosts send are generally short enough not to run into MTU >> problems. Unfortunately, such networks will perceive the >> difficulties resulting from their choices as being caused by >> sending packets to a host with an SPI ("edge") address in the >> CES architecture - and may not think it is their own filtering >> which is causing the trouble. >> >> 2 The SH ignoring or responding incorrectly to the PTB. >> >> As above - they get away with it now, and would perceive the >> problem as being caused by the destination network which >> is using the CES system's "edge" space. > > Cases 1 and 2 are a problem of the end site and not of > the ITE. If the ITE as an edge router of the site is > sending PTBs and the source host is not either not > getting them or not responding correctly then the end > site has to find the problems and fix them. I agree. >> 3 The SH sends DF=0 packets which are too long, after >> encapsulation for some, many or all paths to ETRs. >> >> Again, as above, they get away with it now - but would blame >> the CES system, or rather the destination network which they >> may not know has adopted the "edge" space provided by the >> CES system. >> >> So does a CES system have to fragment every such packet? >> It seems so. > > The CES needs to select a "safe" size for performing inner > fragmentation while not choosing one so excessively small > as to invoke inner fragmentation very often. Then you are always making things less efficient than you could be, and ruling out the use of jumboframe paths in the DFZ until such time that every path is jumboframe compatible - which may be never. I am opposed to the CES scheme continually fragmenting packets if we can possibly avoid it. Maybe we have to for DF=0 packets which are 1470 bytes and the CES scheme can only get a little less than this into each tunnel packet. But this would be BAD considering how Google sends out a lot of 1470 byte DF=0 packets. I imagine Google could be talked into lowering this or better still into using DF=1. >> I think that to implement defensive, complex protocols such as RFC >> 4821 would be to accept and allow all these bad practices, and would >> forever doom us to having to do extra work, and suffer extra >> flakiness, just because of these bad practices. >> >> RFC 4821 will always be a slower and less accurate method of >> determining PMTU to a given host than RFC 1191 or RFC 1981. It would >> be subject to choosing a lower than proper value, if there was an >> outage for a while and it interpreted this as a PMTU limitation. > > My belief is that SEAL used correctly has a chance > to establish a minimum "Internet cell size" of 1500. I can't see how you could do this, since there will always be 1500 limits in the DFZ, ISP and other networks for years to come, and there will at times be tunneling, such as with PPPoE in DSL services. > Then, if end systems adopt the strategy of "use > classic PMTUD for packets no larger than 1500 and > use RFC4821 or equivalent for packets larger than > 1500" then we would have a path to an MTU-clean > Internet that can scale to any future packet sizes. I think this would be very messy. Some hosts would be putting out 9k byte packets and ignoring PTBs, just to see if they could get them to the other host - and trying several times to make sure the failure was due to a genuine MTU problem and not due to random packet loss. - Robin _______________________________________________ rrg mailing list rrg@irtf.org http://www.irtf.org/mailman/listinfo/rrg