Hi Noel, I am glad you have written another critique of LISP. I think it confirms some things I wrote, and that it adds some things I didn't have space for.
There's no way either of us would be happy with only the other's critique - so I think they should both be in the RRG report. There's not a physical shortage of space, since the LISP summary was only 206 words. You wrote of my critique: too much focus on minor, passing problems, and not enough attention to unavoidable architectural limitations but that is exactly how I would describe yours! I will respond to the full 875 word version: > LISP is an architectural enhancement to the Internet; it provides an > identification/location separation scheme Whoah . . . we disagree already! I argue here: http://tools.ietf.org/html/draft-whittle-ivip-arch-03#section-3.7 that "Locator / Identifier Separation" refers to Core-Edge Elimination schemes and not to LISP. If you are ready for further aggravation, take a look at this unedited attempt to establish this beyond doubt: http://www.firstpr.com.au/ip/ivip/loc-id-sep-vs-ces/ but this is just a question of the name, not a criticism of the LISP architecture(s). > LISP is an architectural enhancement to the Internet; it provides an > identification/location separation scheme which is intended to meet the goals > of both i) practical short-term deployability and ii) long-term > growth/flexibility. OK. > It is based on encapsulation, and is intended for deployment at the edges, > initially between sites and the core (to minimize required changes to > deployed base), although deployment at hosts (in particular, mobile hosts) is > also planned. OK. However there are no plans to allow an ITR function in hosts, as Ivip allows. > It currently consists of devices which wrap and unwrap user traffic (ITRs and > ETRs), devices which interface to a mapping system (Map-Resolvers and > Map-Servers), and a prototype mapping system, ALT, which re-uses existing > technology (BGP and tunnels) to allow mappings to be distributed. OK - so you are not discussing NERD or any other mapping system at present. ALT is indeed a prototype. I believe no-one should consider it scalable to very large numbers of end-user networks. So why do the LISP folks bother developing it? Its been 3 years since the first LISP ID: http://tools.ietf.org/html/draft-farinacci-lisp-00 and ALT dates from November 2007. The LISP-WG is supposed to develop ALT and the base LISP specification as experimental RFCs by March 2010 - and you acknowledge below that ALT will be replaced by a DNS mapping system which you believe will be superior. > It initially uses existing namespaces (IPv4 and IPv6) for both identity and > location; existing namespaces are chosen to reduce the initial deployment > difficulty, and both IP versions are supported to maximize the applicability > of LISP. OK - I agree, except I don't know what you mean by "initially". Previously we disagreed and I tried to document the discussion and my arguments here: http://www.firstpr.com.au/ip/ivip/namespace/ LISP for IPv4 always uses the one namespace for its host addresses and ETR addresses (sometimes considered Identifiers and Locators) - the namespace by which IPv4 global unicast addresses are interpreted. Likewise for IPv6. Hosts and all ordinary routers make no distinction between the subset of addresses which are EID addresses (used to identify the subset of hosts which are on LISP-mapped addresses in the end-user networks which adopt LISP) and the remainder of the global unicast addresses, which I refer to as "conventional global unicast addresses". These are known as RLOC addresses within LISP (Routing Locator addresses) despite the fact that only a few of them are used by ETRs and that there remain many hosts (not in LISP-using networks) which use them as their identifier as well. Only ITRs treat the EID subset of the global unicast addresses differently - if a packet arrives with such an address in the destination field. > Any list of LISP concerns is somewhat evanescent, as constant changes are > being made based on lessons learned in actual deployment. To some extent this is true - but what lessons have been learnt? The whole of LISP is based on the idea that we can't, or shouldn't, get mapping in real-time to ITRs. If we can do this, then there would be no need for multiple ETR addresses or to have the ITRs trying to figure out, individually, which of various ETRs can be used to reach the end-user network. LISP also assumes you must have the outer header's source address be that of the ITR, which leads to ETRs having to replicate any source address filtering which ISP BRs apply on packets arriving from the DFZ. The LISP folks did adopt PTRs around November 2007, after I suggested the same concept on June 15 - but until then, they either ignored or criticised the idea. In the same message: http://www.ietf.org/mail-archive/web/ram/current/msg01518.html I described TTR Mobility. There's no evidence they learnt anything from this, because the LISP-MN ID has the MN being its own ETR, which is full of problems. I agree that the team learns from problems they find in the test network - but I see no evidence they learn from the arguments of others that some of their fundamental architectural choices were wrong, dooming them to adding more and more complexity in an effort to make LISP work. My critique was not of any details about LISP which have changed since ALT's inception in late 2007 - though LISP-MN is from July 2009. > In particular, > potential problems for which there are local, incremental fixes (i.e. no need > for global coordination, such as protocol changes) are being by-passed until > operational experience shows that they actually need to be handled. I don't clearly understand this, but the "operational experience" with a test network will not give rise to the scaling problems which ALT faces. I am not suggesting that ALT and its test network shouldn't be developed. I am pointing out that there are no fixes for the problems which will prevent ALT being a good enough mapping system for a really large scale development. So I think there should be no claims that LISP is the best solution to the scalable routing problem until a complete system, with mapping system, is proposed which doesn't have such obvious problems. > A good example is the handling of packets which arrive at a LISP device which > does not yet have an identity->location mapping for the destination; such > packets are currently discarded. If this proves to have a significant > performance impact (predictive opinions differ), it is easy to change this so > that such packets are buffered, waiting for a mapping to be returned. The > LISP team in fact has a moderately lengthy list of such items (roughly a > dozen or so), but since they are not significant they are not covered here. Sure. There are arguments for and against this. I suggest buffering for half a second or a second or so, while looking out for any other packets the same host sends to the same destination host and instead buffering the new one. This way, if the mapping arrives fast enough, then the original packet or the most recent of the resent ones will be tunneled. Otherwise, the ITR sits there with the mapping and waits for the sending host to retry. There is lots of fiddly stuff in LISP because there is so much work for the ITR to do in choosing which of multiple ETRs to send to, and of course the difficulty of waiting an unknown amount of time for mapping to arrive - and perhaps having to send out a replacement request if nothing arrives in a few seconds. All this would disappear if they could get mapping in real-time to the ITRs which need it. I wrote a way of doing it in July 2007. I just wrote an improved version: http://tools.ietf.org/html/draft-whittle-ivip-fpr-00 but the LISP project continues as if this is impossible or undesirable, without ever saying why. > The protocols also have a great deal of flexibility built in, to allow > incremental changes guided by experience and changing circumstances. Translation: The protocols are overly complex because the architecture requires difficult or impossible things of its ITRs - and because the protocols have been changed from time-to-time to solve problems which had not been foreseen, and to accommodate the requests of people other than the main LISP team who had different views on how things should be done. > A good > example is the user-data headers, which constitute a low data-rate channel > piggy-backed on existing traffic between the ITRs and ETRs. The fields can be > shared between a number of uses - some as yet undefined, so that additional > low-data rate control functions can be added as their need becomes obvious. Ivip is superior in that it doesn't need these things. So it uses IP-in-IP encapsulation, whereas LISP data packets need the outer IP header, the UDP header and the 8 byte LISP header. > This critique will therefore focus on i) fundamental architectural > limitations, and ii) potential problems where amelioration will require > co-ordinated change; they are listed in rough order of significance. OK! > - LISP's most serious challenges are due to the fact that it is effectively a > new packet-switching layer, with all the challenges (neighbour liveness > detection, etc) that such layers bring - but with a much larger fan-out than > is typical in packet-switching systems, since any ITR might communicate with > any ETR. Yes - this is a problem with any Core-Edge Separation scheme. There could be large numbers of ITRs tunneling to a single ETR, for one or many destination networks. To have the ITRs figure out from previously sent mapping options, which of multiple ETRs to tunnel to is a very difficult business, because the ITR doesn't have a direct way of knowing which of the ETRs the destination network is reachable by. There could be trouble between the ITR and the ETR - and some kinds of trouble can only be reliably found by the failure of the ETR to repeatedly respond to some kind of request from the ITR. Other kinds of failure involve the link from the ETR to the destination network. But how is the ETR to tell the ITR about this - every ITR which needs to know - especially when the ETR could be handling traffic for large numbers of such networks, each with a different state of being reachable or not? Also, when a Core-Edge Separation architecture is supporting a mobile host, this is not a multihoming service restoration situation - so the tunneling behavior of all the ITRs needs to be guided by a completely different mechanism than by giving each one a list of ETRs and expecting each one to figure out which ones can be used to reach the destination network. > There are three goals which are often in conflict: minimizing overhead, > minimizing complexity, and maximizing performance. Mechanisms which meet one > (e.g. performance) often fail another (e.g. overhead), due to the fan-out > issues. Clever engineering (e.g. the use of the piggy-backed control channel) > can handle many of these. Any amount of clever engineering in service of a poor architectural choice will always result in a lousy outcome, probably with lots more seemingly impressive engineering mechanisms and a poorer result. A reasonable definition of good architecture is a series of high-level design choices which produce the best outcomes with the least effort, including especially effort involving complexity, software, hardware and/or the sending of packets. > Some of this is also under the control of users; if > they want higher performance, and are willing to pay the overhead costs, they > can change configuration to do so. > > - One important example of this is caching of mappings; this improves the > performance, but introduces the problem of detecting, and replacing, outdated > mappings. This is a very lengthy topic, which cannot be covered here in any > detail. The end-user network can set the caching time of their map replies to a low value. Assuming ITRs respect this (and some of them might be configured to ignore short caching times) then this means the end-user network is placing an unreasonable burden on all ITRs which are sending packets to their EID prefixes, and on the entire ALT structure between those ITRs and the end-user network's ETR or Map Server. The end-user network doesn't pay any cost for this. The costs are born by other parties. This is very similar to the problem we are trying to avoid - thousands or millions of uppity end-user networks adverting PI space in the DFZ, and especially them chopping and changing how they advertise it. > - Although LISP does provide significant tools for multi-homing, > load-sharing, optimal-entry-selection, etc, these currently depend on correct > configuration; response to failures is also limited. It may be possible to > ameliorate this problem with automated configuration, although this has not > yet been examined. After taking the correct turn at the first fork in the road - choosing Core-Edge Separation - LISP-ALT and LISP-NERD both took wrong turns at the next junction - where APT and Ivip correctly chose local full-database query servers. I think the only way of saving LISP is to forget everything after that first correct turn, and to follow APT and Ivip in having full-database local query servers. The next fork in the road involves choosing either slow or real-time mapping to the query servers and the ITRs which need it. Slow leads to a lot of the same troubles LISP is having now, due to the need for ITRs to figure out reachability of end-user networks through ETRs, which they currently have no way of doing directly - and which large numbers of ITRs will never have a way of doing efficiently since they are all working in isolation. Slow means more complex mapping information too. Real time mapping means you only need to send a single ETR address, and you can make your own choices (or pay someone else to make them for you) about reachability, inbound TE, mobility or whatever it is you really want to do. Read all about it: http://tools.ietf.org/html/draft-whittle-ivip-arch > - LISP cannot easily test reachability of ultimate destinations (e.g. behind > an ETR), only other LISP devices. It therefore is inevitably (and > unavoidably) dependent on the correct functioning of any network > infrastructure on the other side of a LISP device. Yes - unless you back track and follow the path to real-time mapping to all ITRs which need it, via full database local query servers - LISP will be doomed to more and more effort and complexity trying build more and more functionality into ITRs and ETRs so they can work together to reliably perform multihoming service restoration. > - LISP is currently working through NAT boxes, but only in limited > configurations. In particular, due to the use of fixed UDP ports, it is not > possible to support more than one ETR behind a NAT box. OK - thanks for confirming part of my critique of LISP-MN. > (Although since > multiple ETRs behind a single NAT box would present a single point of > failure, it is not clear that this is a problem.) I don't think that's a problem. The NAT box and the multiple MNs behind it is not part of the CES architecture. With the TTR Mobility architecture you can have as many MNs as you like behind a NAT box. Each one has one or more two-way tunnels to one or more TTRs, which are typically, but not necessarily nearby. The LISP team could have read and copied TTR Mobility from my 2007-07-15 message. > - Namespace provider lock-in might result, due to the need to be able to look > up identification (EIDs), unless some mechanism can be worked out to allow > multiple competing providers to provide resolutions for any given segment of > the identification name-space (perhaps as part of a new mapping system). There's no such thing as a "namespace provider". There are no new namespaces in LISP - or APT, or Ivip or TRRP. There is no separate "identification name-space". EID addresses are in a bunch of BGP advertised prefixes and this set of prefixes (MABs - Mapped Address Blocks - in Ivip) is a subset of the global unicast address range. All such addresses are interpreted according to the same namespace. I think you are referring to administrative arrangements for which organisation might run a Map Server covering the EID space of multiple end-user networks - and/or which organisation runs the part of the ALT network which covers a particular DFZ advertised prefix which contains EID prefixes of multiple end-user networks. You need to have some organisation paying for PTRs to advertise these prefixes - but they need to get paid by the end-user networks who benefit. Ivip proposes arrangements for all this, and as far as I know, LISP doesn't. http://psg.com/lists/rrg/2008/msg01158.html > - Systems which query for mappings will inevitably have some performance > impact. Even when all potential delay causes which can be handled are dealt > with (e.g. the packet drop example at the top), attempts to communicate with > a 'new' site will occasionally result in some unavoidable delay. Since the > delay in those few cases will be of the same order as DNS delays, which are > currently acceptable, this is probably not a significant issue. I and others believe it is a significant problem - and we argue that the delays could be longer than with DNS, due to ALT's long-path problem and the greater chance of a query packet being lost in that system. However DNS lookups may involve queries and responses with multiple servers, which likewise adds up to longer paths. > - The ALT mapping system has some potential performance and scaling issues > (e.g. concentration of request load at the top-level nodes), although an > interface is built into the system to allow replacement of the mapping > system. Since a superior mapping system based on DNS is already in design, > this is not felt to be a serious issue. OK - so you admit that ALT is inferior to DNS lookup. An advantage of ALT - beyond the fact it could be easily constructed from existing building-blocks - is that it would be possible, in principle, to send the initial data packet along it, so it is actually delivered to the ETR without the ITR having to know the mapping yet. But this raises problem with the load of these long packets (and subsequently sent ones, before the ITR gets the mapping) which means the ALT network needs to be more capable. DNS doesn't offer such a facility. DNS has always been an obvious choice of looking up mapping. Why wasn't this adopted in mid 2007? Why didn't the LISP team say anything positive about Bill Herrin's TRRP proposal, which also used a DNS-like mapping system. A DNS-mapping system is also a global query server system will also be a slow and unreliable compared to using local full-database query server. DNS lookup only provides non-real-time mapping, so you will still have to pre-load long mapping information into ITRs and give them the complex functionality they need to figure out, the best they can, reachability of the destination network via various ETRs. - Robin _______________________________________________ rrg mailing list rrg@irtf.org http://www.irtf.org/mailman/listinfo/rrg