Re: [rrg] Alternative LISP critique

Robin Whittle Thu, 21 Jan 2010 23:31:16 -0800

Hi Noel,

I am glad you have written another critique of LISP.  I think it
confirms some things I wrote, and that it adds some things I didn't
have space for.


There's no way either of us would be happy with only the other's
critique - so I think they should both be in the RRG report.  There's
not a physical shortage of space, since the LISP summary was only 206
words.

You wrote of my critique:

  too much focus on minor, passing problems, and not enough
  attention to unavoidable architectural limitations

but that is exactly how I would describe yours!



I will respond to the full 875 word version:

> LISP is an architectural enhancement to the Internet; it provides an
> identification/location separation scheme 

Whoah . . . we disagree already!

I argue here:

  http://tools.ietf.org/html/draft-whittle-ivip-arch-03#section-3.7

that "Locator / Identifier Separation" refers to Core-Edge
Elimination schemes and not to LISP.  If you are ready for further
aggravation, take a look at this unedited attempt to establish this
beyond doubt:

  http://www.firstpr.com.au/ip/ivip/loc-id-sep-vs-ces/

but this is just a question of the name, not a criticism of the LISP
architecture(s).


> LISP is an architectural enhancement to the Internet; it provides an
> identification/location separation scheme which is intended to meet the goals
> of both i) practical short-term deployability and ii) long-term
> growth/flexibility.

OK.


> It is based on encapsulation, and is intended for deployment at the edges,
> initially between sites and the core (to minimize required changes to
> deployed base), although deployment at hosts (in particular, mobile hosts) is
> also planned.

OK.  However there are no plans to allow an ITR function in hosts, as
Ivip allows.


> It currently consists of devices which wrap and unwrap user traffic (ITRs and
> ETRs), devices which interface to a mapping system (Map-Resolvers and
> Map-Servers), and a prototype mapping system, ALT, which re-uses existing
> technology (BGP and tunnels) to allow mappings to be distributed.

OK - so you are not discussing NERD or any other mapping system at
present.

ALT is indeed a prototype.  I believe no-one should consider it
scalable to very large numbers of end-user networks.  So why do the
LISP folks bother developing it?  Its been 3 years since the first
LISP ID:

  http://tools.ietf.org/html/draft-farinacci-lisp-00

and ALT dates from November 2007.  The LISP-WG is supposed to develop
ALT and the base LISP specification as experimental RFCs by March
2010 - and you acknowledge below that ALT will be replaced by a DNS
mapping system which you believe will be superior.



> It initially uses existing namespaces (IPv4 and IPv6) for both identity and
> location; existing namespaces are chosen to reduce the initial deployment
> difficulty, and both IP versions are supported to maximize the applicability
> of LISP.

OK - I agree, except I don't know what you mean by "initially".
Previously we disagreed and I tried to document the discussion and my
arguments here:

  http://www.firstpr.com.au/ip/ivip/namespace/

LISP for IPv4 always uses the one namespace for its host addresses
and ETR addresses (sometimes considered Identifiers and Locators) -
the namespace by which IPv4 global unicast addresses are interpreted.
 Likewise for IPv6.

Hosts and all ordinary routers make no distinction between the subset
of addresses which are EID addresses (used to identify the subset of
hosts which are on LISP-mapped addresses in the end-user networks
which adopt LISP) and the remainder of the global unicast addresses,
which I refer to as "conventional global unicast addresses".  These
are known as RLOC addresses within LISP (Routing Locator addresses)
despite the fact that only a few of them are used by ETRs and that
there remain many hosts (not in LISP-using networks) which use them
as their identifier as well.

Only ITRs treat the EID subset of the global unicast addresses
differently - if a packet arrives with such an address in the
destination field.


> Any list of LISP concerns is somewhat evanescent, as constant changes are
> being made based on lessons learned in actual deployment. 

To some extent this is true - but what lessons have been learnt?  The
whole of LISP is based on the idea that we can't, or shouldn't, get
mapping in real-time to ITRs.  If we can do this, then there would be
no need for multiple ETR addresses or to have the ITRs trying to
figure out, individually, which of various ETRs can be used to reach
the end-user network.

LISP also assumes you must have the outer header's source address be
that of the ITR, which leads to ETRs having to replicate any source
address filtering which ISP BRs apply on packets arriving from the DFZ.

The LISP folks did adopt PTRs around November 2007, after I suggested
the same concept on June 15 - but until then, they either ignored or
criticised the idea.

In the same message:

  http://www.ietf.org/mail-archive/web/ram/current/msg01518.html

I described TTR Mobility.  There's no evidence they learnt anything
from this, because the LISP-MN ID has the MN being its own ETR, which
is full of problems.

I agree that the team learns from problems they find in the test
network - but I see no evidence they learn from the arguments of
others that some of their fundamental architectural choices were
wrong, dooming them to adding more and more complexity in an effort
to make LISP work.

My critique was not of any details about LISP which have changed
since ALT's inception in late 2007 - though LISP-MN is from July 2009.


> In particular,
> potential problems for which there are local, incremental fixes (i.e. no need
> for global coordination, such as protocol changes) are being by-passed until
> operational experience shows that they actually need to be handled.

I don't clearly understand this, but the "operational experience"
with a test network will not give rise to the scaling problems which
ALT faces.  I am not suggesting that ALT and its test network
shouldn't be developed.  I am pointing out that there are no fixes
for the problems which will prevent ALT being a good enough mapping
system for a really large scale development.  So I think there should
be no claims that LISP is the best solution to the scalable routing
problem until a complete system, with mapping system, is proposed
which doesn't have such obvious problems.


> A good example is the handling of packets which arrive at a LISP device which
> does not yet have an identity->location mapping for the destination; such
> packets are currently discarded. If this proves to have a significant
> performance impact (predictive opinions differ), it is easy to change this so
> that such packets are buffered, waiting for a mapping to be returned. The
> LISP team in fact has a moderately lengthy list of such items (roughly a
> dozen or so), but since they are not significant they are not covered here.

Sure.  There are arguments for and against this.  I suggest buffering
for half a second or a second or so, while looking out for any other
packets the same host sends to the same destination host and instead
buffering the new one.  This way, if the mapping arrives fast enough,
then the original packet or the most recent of the resent ones will
be tunneled.  Otherwise, the ITR sits there with the mapping and
waits for the sending host to retry.

There is lots of fiddly stuff in LISP because there is so much work
for the ITR to do in choosing which of multiple ETRs to send to, and
of course the difficulty of waiting an unknown amount of time for
mapping to arrive - and perhaps having to send out a replacement
request if nothing arrives in a few seconds.

All this would disappear if they could get mapping in real-time to
the ITRs which need it.  I wrote a way of doing it in July 2007.  I
just wrote an improved version:

  http://tools.ietf.org/html/draft-whittle-ivip-fpr-00

but the LISP project continues as if this is impossible or
undesirable, without ever saying why.


> The protocols also have a great deal of flexibility built in, to allow
> incremental changes guided by experience and changing circumstances. 

Translation:  The protocols are overly complex because the
architecture requires difficult or impossible things of its ITRs -
and because the protocols have been changed from time-to-time to
solve problems which had not been foreseen, and to accommodate the
requests of people other than the main LISP team who had different
views on how things should be done.


> A good
> example is the user-data headers, which constitute a low data-rate channel
> piggy-backed on existing traffic between the ITRs and ETRs. The fields can be
> shared between a number of uses - some as yet undefined, so that additional
> low-data rate control functions can be added as their need becomes obvious.

Ivip is superior in that it doesn't need these things.  So it uses
IP-in-IP encapsulation, whereas LISP data packets need the outer IP
header, the UDP header and the 8 byte LISP header.


> This critique will therefore focus on i) fundamental architectural
> limitations, and ii) potential problems where amelioration will require
> co-ordinated change; they are listed in rough order of significance.

OK!


> - LISP's most serious challenges are due to the fact that it is effectively a
> new packet-switching layer, with all the challenges (neighbour liveness
> detection, etc) that such layers bring - but with a much larger fan-out than
> is typical in packet-switching systems, since any ITR might communicate with
> any ETR.

Yes - this is a problem with any Core-Edge Separation scheme.  There
could be large numbers of ITRs tunneling to a single ETR, for one or
many destination networks.

To have the ITRs figure out from previously sent mapping options,
which of multiple ETRs to tunnel to is a very difficult business,
because the ITR doesn't have a direct way of knowing which of the
ETRs the destination network is reachable by.

There could be trouble between the ITR and the ETR - and some kinds
of trouble can only be reliably found by the failure of the ETR to
repeatedly respond to some kind of request from the ITR.  Other kinds
of failure involve the link from the ETR to the destination network.
  But how is the ETR to tell the ITR about this - every ITR which
needs to know - especially when the ETR could be handling traffic for
large numbers of such networks, each with a different state of being
reachable or not?

Also, when a Core-Edge Separation architecture is supporting a mobile
host, this is not a multihoming service restoration situation - so
the tunneling behavior of all the ITRs needs to be guided by a
completely different mechanism than by giving each one a list of ETRs
and expecting each one to figure out which ones can be used to reach
the destination network.


> There are three goals which are often in conflict: minimizing overhead,
> minimizing complexity, and maximizing performance. Mechanisms which meet one
> (e.g. performance) often fail another (e.g. overhead), due to the fan-out
> issues. Clever engineering (e.g. the use of the piggy-backed control channel)
> can handle many of these. 

Any amount of clever engineering in service of a poor architectural
choice will always result in a lousy outcome, probably with lots more
seemingly impressive engineering mechanisms and a poorer result.

A reasonable definition of good architecture is a series of
high-level design choices which produce the best outcomes with the
least effort, including especially effort involving complexity,
software, hardware and/or the sending of packets.


> Some of this is also under the control of users; if
> they want higher performance, and are willing to pay the overhead costs, they
> can change configuration to do so.
> 
> - One important example of this is caching of mappings; this improves the
> performance, but introduces the problem of detecting, and replacing, outdated
> mappings. This is a very lengthy topic, which cannot be covered here in any
> detail.

The end-user network can set the caching time of their map replies to
a low value.  Assuming ITRs respect this (and some of them might be
configured to ignore short caching times) then this means the
end-user network is placing an unreasonable burden on all ITRs which
are sending packets to their EID prefixes, and on the entire ALT
structure between those ITRs and the end-user network's ETR or Map
Server.

The end-user network doesn't pay any cost for this.  The costs are
born by other parties.  This is very similar to the problem we are
trying to avoid - thousands or millions of uppity end-user networks
adverting PI space in the DFZ, and especially them chopping and
changing how they advertise it.


> - Although LISP does provide significant tools for multi-homing,
> load-sharing, optimal-entry-selection, etc, these currently depend on correct
> configuration; response to failures is also limited. It may be possible to
> ameliorate this problem with automated configuration, although this has not
> yet been examined.

After taking the correct turn at the first fork in the road -
choosing Core-Edge Separation - LISP-ALT and LISP-NERD both took
wrong turns at the next junction - where APT and Ivip correctly chose
local full-database query servers.

I think the only way of saving LISP is to forget everything after
that first correct turn, and to follow APT and Ivip in having
full-database local query servers.

The next fork in the road involves choosing either slow or real-time
mapping to the query servers and the ITRs which need it.

Slow leads to a lot of the same troubles LISP is having now, due to
the need for ITRs to figure out reachability of end-user networks
through ETRs, which they currently have no way of doing directly -
and which large numbers of ITRs will never have a way of doing
efficiently since they are all working in isolation.  Slow means more
complex mapping information too.

Real time mapping means you only need to send a single ETR address,
and you can make your own choices (or pay someone else to make them
for you) about reachability, inbound TE, mobility or whatever it is
you really want to do.  Read all about it:

  http://tools.ietf.org/html/draft-whittle-ivip-arch


> - LISP cannot easily test reachability of ultimate destinations (e.g. behind
> an ETR), only other LISP devices. It therefore is inevitably (and
> unavoidably) dependent on the correct functioning of any network
> infrastructure on the other side of a LISP device.

Yes - unless you back track and follow the path to real-time mapping
to all ITRs which need it, via full database local query servers -
LISP will be doomed to more and more effort and complexity trying
build more and more functionality into ITRs and ETRs so they can work
together to reliably perform multihoming service restoration.


> - LISP is currently working through NAT boxes, but only in limited
> configurations. In particular, due to the use of fixed UDP ports, it is not
> possible to support more than one ETR behind a NAT box. 

OK - thanks for confirming part of my critique of LISP-MN.


> (Although since
> multiple ETRs behind a single NAT box would present a single point of
> failure, it is not clear that this is a problem.)

I don't think that's a problem.  The NAT box and the multiple MNs
behind it is not part of the CES architecture.  With the TTR Mobility
architecture you can have as many MNs as you like behind a NAT box.
Each one has one or more two-way tunnels to one or more TTRs, which
are typically, but not necessarily nearby.

The LISP team could have read and copied TTR Mobility from my
2007-07-15 message.


> - Namespace provider lock-in might result, due to the need to be able to look
> up identification (EIDs), unless some mechanism can be worked out to allow
> multiple competing providers to provide resolutions for any given segment of
> the identification name-space (perhaps as part of a new mapping system).

There's no such thing as a "namespace provider".  There are no new
namespaces in LISP - or APT, or Ivip or TRRP.  There is no separate
"identification name-space".

EID addresses are in a bunch of BGP advertised prefixes and this set
of prefixes (MABs - Mapped Address Blocks - in Ivip) is a subset of
the global unicast address range.  All such addresses are interpreted
according to the same namespace.


I think you are referring to administrative arrangements for which
organisation might run a Map Server covering the EID space of
multiple end-user networks - and/or which organisation runs the part
of the ALT network which covers a particular DFZ advertised prefix
which contains EID prefixes of multiple end-user networks.

You need to have some organisation paying for PTRs to advertise these
prefixes - but they need to get paid by the end-user networks who
benefit.  Ivip proposes arrangements for all this, and as far as I
know, LISP doesn't.

  http://psg.com/lists/rrg/2008/msg01158.html


> - Systems which query for mappings will inevitably have some performance
> impact. Even when all potential delay causes which can be handled are dealt
> with (e.g. the packet drop example at the top), attempts to communicate with
> a 'new' site will occasionally result in some unavoidable delay. Since the
> delay in those few cases will be of the same order as DNS delays, which are
> currently acceptable, this is probably not a significant issue.

I and others believe it is a significant problem - and we argue that
the delays could be longer than with DNS, due to ALT's long-path
problem and the greater chance of a query packet being lost in that
system.  However DNS lookups may involve queries and responses with
multiple servers, which likewise adds up to longer paths.


> - The ALT mapping system has some potential performance and scaling issues
> (e.g. concentration of request load at the top-level nodes), although an
> interface is built into the system to allow replacement of the mapping
> system. Since a superior mapping system based on DNS is already in design,
> this is not felt to be a serious issue.

OK - so you admit that ALT is inferior to DNS lookup.

An advantage of ALT - beyond the fact it could be easily constructed
from existing building-blocks - is that it would be possible, in
principle, to send the initial data packet along it, so it is
actually delivered to the ETR without the ITR having to know the
mapping yet.  But this raises problem with the load of these long
packets (and subsequently sent ones, before the ITR gets the mapping)
which means the ALT network needs to be more capable.  DNS doesn't
offer such a facility.

DNS has always been an obvious choice of looking up mapping.

Why wasn't this adopted in mid 2007?  Why didn't the LISP team say
anything positive about Bill Herrin's TRRP proposal, which also used
a DNS-like mapping system.

A DNS-mapping system is also a global query server system will also
be a slow and unreliable compared to using local full-database query
server.   DNS lookup only provides non-real-time mapping, so you will
still have to pre-load long mapping information into ITRs and give
them the complex functionality they need to figure out, the best they
can, reachability of the destination network via various ETRs.


  - Robin

_______________________________________________
rrg mailing list
rrg@irtf.org
http://www.irtf.org/mailman/listinfo/rrg

Re: [rrg] Alternative LISP critique

Reply via email to