Short version: Debating various aspects of LISP with Joel. I made two minor changes to the <=500 word version of my LISP critique which I will send in a separate message.
Hi Noel, Thanks for your message, in which you wrote: > Hi, one observation, and a few comments on your specific text. > > The observation is that you seem to focus on the ALT, and indeed open with > detailed discussion of the ALT's problems. Yes - because there's no sign of a better mapping system for LISP than ALT. LISP began as a framework, with conceptual placeholders for mapping systems. NERD was one such system. (BTW, since Eliot's msg05274 November announcement he has updated the ID twice, I guess preparing it to be an experimental RFC.) I think CONS was the first mapping system to be supported by multiple LISP people. It was a global query server system, with new network elements and protocols. ALT replaced CONS in December 2007 as the mapping system the main LISP team focused on. It too is a global query server system, but the global network was made from router and tunnel functions which already exist. The LISP test network uses ALT and there is a LISP-WG devoted to developing ALT as an experimental protocol. I wrote about my concerns with ALT's "aggressive aggregation" structure involving long paths nearly two years ago - 2008-01-28: http://www.ietf.org/mail-archive/web/rrg/current/msg01171.html and have continued to raise this, along with the contradiction between the "aggressive aggregation" requirement (which is in the ID) and the need for avoiding single points of failure. The most recent version of this critique was on the LISP list on 2009-12-07: ALT structure, robustness and the long-path problem http://www.ietf.org/mail-archive/web/lisp/current/msg01801.html There was no proposal for any change to ALT which would solve these problems, and one LISP developer suggested (msg01846) that "it would be great to see someone working on testing a non-alt mapping system." As far as I know, no-one in the main LISP team has developed a better mapping system or suggested that one should or will be developed, so LISP needs to be evaluated in terms of ALT. > LISP currently has a service interface to the mapping system specified > (the Map-Server/Map-Resolver interface), which hides the details of the > mapping system from the rest of the system (e.g. the xTRs). This interface > is intended, in part, to allow replacement of the mapping system with a > better one. There is at least one proposed replacement currently circulating > in the LISP community. Please mention what this is, with references. > In light of that, you might want to move the ALT discussion to the > end, and clearly separate it from the discussion of LISP as a > whole. Since I don't yet know what the alternative is, my critique is of LISP with ALT. >> ALT is a mapping distribution system with globally distributed query >> servers: ETRs and Map Servers. > > Also Map-Resolvers, which are the places ITRs go to ask for mappings. OK. The first sentence of V2 of my <=500 word critique is now: LISP-ALT distributes mapping to ITRs via (optional, local, potentially-caching) Map Resolvers and globally distributed query servers: ETRs and optional Map Servers. >> ITRs drop the packet(s) they have no mapping for. > > 'currently drop'; obviously, it's easy enough to change this behaviour, one > xTR at a time, if it proves problematic in service. I agree "currently drop" is more informative, since I recall the ID allow for both dropping and for buffering and awaiting the map reply. The test network runs by dropping the packets. I understand this is due to this being preferable to holding onto packets for a potentially long time, so that by the time they are tunneled they are are likely to be more disruptive than helpful for the sending host's attempt to communicate with the destination host. I am describing ALT as it is currently judged, by the LISP team, to function best. But I agree with this judgement and I don't think that buffering would help. So if I was to contemplate buffering in my critique, I would have to add a sentence about how this is probably a worse option than dropping them. Without the 500 word limit, I would do this. >> These "initial packet delays" reduce performance and so create a major >> barrier to voluntary adoption on wide enough basis to solve the routing >> scaling problem. > > This is an assertion, not empirical data. Sure. The question of how widely LISP-ALT could be voluntarily adopted is something which could only answered empirically in the future. Since we have no time-machines, you can't expect "empirical data" on this! I know there is a view that the delays which can easily be foreseen in LISP-ALT are not significant enough to prevent it being adopted widely enough to solve the routing scaling problem. I and some other people disagree with this viewpoint. Its a fact that the delay in any packet reduces performance. I assert that in almost any imaginable communication, with the possible exception NTP, if the initial packet in a new session is delayed by a few tens of milliseconds - which is the most it would usually take for APT or Ivip ITRs to get mapping from their local full database query server (Default Mapper or QSD) - that this is insignificant. If it was 100ms or so, I would say it *might* be significant in some instances. With a global query server network, the delay in getting mapping will frequently be longer than this, since the answer has to come from the other side of the planet, which typically takes 350 to 400ms. That is significant, I believe. With ALT, the time could be longer because the path taken up and down the ALT hierarchy is quite likely not to follow the shortest (in BGP terms) path to the destination query server. An any global query server system, and especially with ALT's paths which are likely to be longer still, there is an increased chance of the query packet or the response being dropped. In ALT, the response comes back via the Internet, so the ALT long-path problem only exacerbates the chance of the query being lost. So I assert that the delay times which ALT will often impose on the "initial packet" is significant. But "initial packet delay" is an oversimplification. The initial one or more packets are *dropped* by the ITR. After the mapping arrives at the ITR - which could take the majority of a second, maybe a second and half, or maybe many seconds if a packet is lost and the ITR times out and retries - the ITR has to wait for the sending host to send another packet. Maybe the sending host has given up on this destination and is trying an alternative destination host, in another EID prefix which the ITR has no mapping for. (A caching Map Resolver may help, but there would still be many instances it does not have the mapping in its cache.) So the whole process would begin again. If the ALT system takes a second to get the mapping and the sending host doesn't resend to the same address of multiple options within a second, then the sending host will go through all its possible destinations without reaching any of them. This is unlikely, but it is not impossible. I assert that any global query server system for mapping lookups will involve a significant performance degradation - sufficient to affect the experience of users. I also assert that even if the measured impact on end-users is minimal, the perception of this reduced performance will significantly reduce the chance of widespread voluntary adoption to a degree which threatens the ability of the system to solve the routing scaling problem. In short, if LISP-ALT is perceived to be a second-class service in terms of something like responsiveness, then there's very little chance of the great majority of end-user networks (who want portability, multihoming and TE) of all sizes adopting it. We really need most of these networks to adopt it voluntarily - otherwise there will be too many who are still using unscalable PI space, or not getting portability etc. to solve the routing scaling problem. >> No solution has been proposed for these problems > > As mentioned, there is a proposal circulating in the LISP community for an > alternative mapping system which fixes many of the problems you mention (and > one I think you don't, the concentration of query traffic at the top-level > ALT nodes). I don't recall any solution being proposed in the LISP list discussion which arose from my 2009-12-07 message. AFAIK, there's no mention of a fix for these problems in the documents listed in the RRG summary for LISP: http://www.ietf.org/mail-archive/web/rrg/current/msg05503.html >> with UDP and variable-length LISP headers in all traffic packets. > > Ah, no - user-data packets have a fixed-length LISP header (currently 64 > bits, IIRC). OK - thanks for this correction. It is 64 bits for both IPv6 and IPv6: http://tools.ietf.org/html/draft-ietf-lisp-05#section-5.1 http://tools.ietf.org/html/draft-ietf-lisp-05#section-5.2 The new version is: with UDP and 64-bit LISP headers in all traffic packets. >> the MN cannot be behind NAT. > > This is incorrect. A mechanism is of course needed to ascertain the > 'external' address of the ETR, and a possible one has been coded and > field-tested, but an ETR can be behind a NAT (and IIRC there are current > test deployments of this). You don't provide any references. I assert the ETR can't be behind NAT in any system which is practical enough for widespread voluntary adoption. The ETR needs to be reachable from multiple ITRs, and any self-respecting NAT box is well within its rights not to translate packets which arrive from some other ITR, even if, by some means, the ETR was able to send something first to the ITR to get the NAT box to accept packets from this first ITR. How could the ITRs send packets to the NAT box? The ETR's address is being NAT, so the ETR would have to figure out the NAT box's global unicast address and put that in the mapping. Then an ITR would send an encapsulated packet to the LISP UDP port on the NAT box. How is the NAT box going to know what to do with it? It would have to have been cajoled into this by the ETR sending out a UDP packet to some other host (such as a STUN server) and receiving a UDP packet back from that host. But why would the NAT box translate packets to the ETR if they arrived from some other address than this host? The ETR can't very well send out packets to every ITR which needs to send it packets, for obvious reasons of performance and delay. How could the ETR find out an ITR needs to send it packets unless the ITR can reach it through the NAT box? What if there were two or more LISP-MNs behind the one NAT box? ITRs only tunnel to a single UDP port, so there's no way you could have two MNs, each being their own ETR, behind a single NAT box. Also, LISP-MNs need to be on RLOC space. With TRR Mobility, none of these problems occur. The MN can be behind one or more layers of NAT, since it tunnels to a (typically) nearby TTR, which acts as its ETR and also sends outgoing packets. The MN be on SPI (EID) space. It can be behind a NAT box which is on SPI space. It can be on address space which itself is SPI space of another MN. For instance an aircraft could have a NAT box which is on SPI space (the aircraft's NAT box is the MN and tunnels to its own TTR on the ground) and an MN could be working fine, with its own micronets of SPI space, behind that NAT box. A further MN could be operating from one of those SPI addresses! >> which LISP cannot achieve. > > Also an assertion. The full sentence is: Mapping changes must be sent instantly to all relevant ITRs every time the MN gets a new address - which LISP cannot achieve. There is an implicit assumption that mobility requires no breaks in connectivity as long as the MN has at least one address which is is working. Every time an LISP-MN gets a new RLOC address, if connectivity is to be continued (and even a fraction of a second loss is a problem if the MN is doing VoIP) then all the ITRs in the world need to suddenly change their mapping to the new RLOC address. Actually, to avoid up to ~200ms gaps, due to the time it takes packets to get to the ETR from the ITRs at various distances, the ITRs need to change their tunneling a fraction of a second before the MN gets its new address. No mapping system of LISP is capable of changing ITR tunneling behavior in real-time. Even Ivip would take between 0.2 and a few seconds. But with the TTR mobility system, there's no need to change mapping when the MN gets a new address, no matter where it is. The mapping change is only needed - and not particularly urgently - when the MN selects a new TTR, which it would only normally do if it moves a distance such as 100km or more. LISP could work with the TTR mobility architecture, but Ivip would be better, because the MN's tunnels to the previous TTR could be dropped much sooner than with LISP. I stand by my assertion. If there are arguments against it, they can be mentioned in the "rebuttal" and in further discussions. - Robin _______________________________________________ rrg mailing list rrg@irtf.org http://www.irtf.org/mailman/listinfo/rrg