Re: [rrg] LISP critique

Robin Whittle Wed, 20 Jan 2010 18:46:47 -0800

Short version:   Debating various aspects of LISP with Joel.
                 I made two minor changes to the <=500 word
                 version of my LISP critique which I will
                 send in a separate message.

Hi Noel,

Thanks for your message, in which you wrote:

> Hi, one observation, and a few comments on your specific text.
> 
> The observation is that you seem to focus on the ALT, and indeed open with
> detailed discussion of the ALT's problems.

Yes - because there's no sign of a better mapping system for LISP
than ALT.

LISP began as a framework, with conceptual placeholders for mapping
systems.  NERD was one such system.  (BTW, since Eliot's msg05274
November announcement he has updated the ID twice, I guess preparing
it to be an experimental RFC.)

I think CONS was the first mapping system to be supported by multiple
LISP people.  It was a global query server system, with new network
elements and protocols.

ALT replaced CONS in December 2007 as the mapping system the main
LISP team focused on.  It too is a global query server system, but
the global network was made from router and tunnel functions which
already exist.  The LISP test network uses ALT and there is a LISP-WG
devoted to developing ALT as an experimental protocol.

I wrote about my concerns with ALT's "aggressive aggregation"
structure involving long paths nearly two years ago - 2008-01-28:

  http://www.ietf.org/mail-archive/web/rrg/current/msg01171.html

and have continued to raise this, along with the contradiction
between the "aggressive aggregation" requirement (which is in the ID)
and the need for avoiding single points of failure.

The most recent version of this critique was on the LISP list on
2009-12-07:

  ALT structure, robustness and the long-path problem
  http://www.ietf.org/mail-archive/web/lisp/current/msg01801.html

There was no proposal for any change to ALT which would solve these
problems, and one LISP developer suggested (msg01846) that "it would
be great to see someone working on testing a non-alt mapping system."

As far as I know, no-one in the main LISP team has developed a better
mapping system or suggested that one should or will be developed, so
LISP needs to be evaluated in terms of ALT.

> LISP currently has a service interface to the mapping system specified
> (the Map-Server/Map-Resolver interface), which hides the details of the
> mapping system from the rest of the system (e.g. the xTRs). This interface
> is intended, in part, to allow replacement of the mapping system with a
> better one. There is at least one proposed replacement currently circulating
> in the LISP community.

Please mention what this is, with references.

> In light of that, you might want to move the ALT discussion to the
> end, and clearly separate it from the discussion of LISP as a
> whole.

Since I don't yet know what the alternative is, my critique is of
LISP with ALT.

>> ALT is a mapping distribution system with globally distributed query
>> servers: ETRs and Map Servers.
> 
> Also Map-Resolvers, which are the places ITRs go to ask for mappings.

OK.  The first sentence of V2 of my <=500 word critique is now:

    LISP-ALT distributes mapping to ITRs via (optional, local,
    potentially-caching) Map Resolvers and globally distributed query
    servers: ETRs and optional Map Servers.

>> ITRs drop the packet(s) they have no mapping for.
> 
> 'currently drop'; obviously, it's easy enough to change this behaviour, one
> xTR at a time, if it proves problematic in service.

I agree "currently drop" is more informative, since I recall the ID
allow for both dropping and for buffering and awaiting the map reply.

The test network runs by dropping the packets.  I understand this is
due to this being preferable to holding onto packets for a
potentially long time, so that by the time they are tunneled they are
are likely to be more disruptive than helpful for the sending host's
attempt to communicate with the destination host.

I am describing ALT as it is currently judged, by the LISP team, to
function best.  But I agree with this judgement and I don't think
that buffering would help.  So if I was to contemplate buffering in
my critique, I would have to add a sentence about how this is
probably a worse option than dropping them.  Without the 500 word
limit, I would do this.

>> These "initial packet delays" reduce performance and so create a major
>> barrier to voluntary adoption on wide enough basis to solve the routing
>> scaling problem.
> 
> This is an assertion, not empirical data.

Sure.  The question of how widely LISP-ALT could be voluntarily
adopted is something which could only answered empirically in the
future.  Since we have no time-machines, you can't expect "empirical
data" on this!

I know there is a view that the delays which can easily be foreseen
in LISP-ALT are not significant enough to prevent it being adopted
widely enough to solve the routing scaling problem.  I and some other
people disagree with this viewpoint.

Its a fact that the delay in any packet reduces performance.

I assert that in almost any imaginable communication, with the
possible exception NTP, if the initial packet in a new session
is delayed by a few tens of milliseconds - which is the most it would
usually take for APT or Ivip ITRs to get mapping from their local
full database query server (Default Mapper or QSD) - that this is
insignificant.  If it was 100ms or so, I would say it *might* be
significant in some instances.

With a global query server network, the delay in getting mapping will
 frequently be longer than this, since the answer has to come from
the other side of the planet, which typically takes 350 to 400ms.
That is significant, I believe.

With ALT, the time could be longer because the path taken up and down
the ALT hierarchy is quite likely not to follow the shortest (in BGP
terms) path to the destination query server.

An any global query server system, and especially with ALT's paths
which are likely to be longer still, there is an increased chance of
the query packet or the response being dropped.  In ALT, the response
comes back via the Internet, so the ALT long-path problem only
exacerbates the chance of the query being lost.

So I assert that the delay times which ALT will often impose on the
"initial packet" is significant.

But "initial packet delay" is an oversimplification.  The initial one
or more packets are *dropped* by the ITR.  After the mapping arrives
at the ITR - which could take the majority of a second, maybe a
second and half, or maybe many seconds if a packet is lost and the
ITR times  out and retries - the ITR has to wait for the sending host
to send another packet.

Maybe the sending host has given up on this destination and is trying
an alternative destination host, in another EID prefix which the ITR
has no mapping for.  (A caching Map Resolver may help, but there
would still be many instances it does not have the mapping in its
cache.)  So the whole process would begin again.  If the ALT system
takes a second to get the mapping and the sending host doesn't resend
to the same address of multiple options within a second, then the
sending host will go through all its possible destinations without
reaching any of them.  This is unlikely, but it is not impossible.

I assert that any global query server system for mapping lookups will
involve a significant performance degradation - sufficient to affect
the experience of users.  I also assert that even if the measured
impact on end-users is minimal, the perception of this reduced
performance will significantly reduce the chance of widespread
voluntary adoption to a degree which threatens the ability of the
system to solve the routing scaling problem.

In short, if LISP-ALT is perceived to be a second-class service in
terms of something like responsiveness, then there's very little
chance of the great majority of end-user networks (who want
portability, multihoming and TE) of all sizes adopting it.  We really
need most of these networks to adopt it voluntarily - otherwise there
will be too many who are still using unscalable PI space, or not
getting portability etc. to solve the routing scaling problem.

>> No solution has been proposed for these problems 
> 
> As mentioned, there is a proposal circulating in the LISP community for an
> alternative mapping system which fixes many of the problems you mention (and
> one I think you don't, the concentration of query traffic at the top-level
> ALT nodes).

I don't recall any solution being proposed in the LISP list
discussion which arose from my 2009-12-07 message.

AFAIK, there's no mention of a fix for these problems in the
documents listed in the RRG summary for LISP:

  http://www.ietf.org/mail-archive/web/rrg/current/msg05503.html

>> with UDP and variable-length LISP headers in all traffic packets. 
> 
> Ah, no - user-data packets have a fixed-length LISP header (currently 64
> bits, IIRC).

OK - thanks for this correction.  It is 64 bits for both IPv6 and IPv6:

  http://tools.ietf.org/html/draft-ietf-lisp-05#section-5.1
  http://tools.ietf.org/html/draft-ietf-lisp-05#section-5.2

The new version is:

  with UDP and 64-bit LISP headers in all traffic packets.

>> the MN cannot be behind NAT.
> 
> This is incorrect. A mechanism is of course needed to ascertain the
> 'external' address of the ETR, and a possible one has been coded and
> field-tested, but an ETR can be behind a NAT (and IIRC there are current
> test deployments of this).

You don't provide any references.

I assert the ETR can't be behind NAT in any system which is practical
enough for widespread voluntary adoption.  The ETR needs to be
reachable from multiple ITRs, and any self-respecting NAT box is well
within its rights not to translate packets which arrive from some
other ITR, even if, by some means, the ETR was able to send something
first to the ITR to get the NAT box to accept packets from this first
ITR.

How could the ITRs send packets to the NAT box?  The ETR's address is
being NAT, so the ETR would have to figure out the NAT box's global
unicast address and put that in the mapping.

Then an ITR would send an encapsulated packet to the LISP UDP port on
the NAT box.  How is the NAT box going to know what to do with it?
It would have to have been cajoled into this by the ETR sending out a
UDP packet to some other host (such as a STUN server) and receiving a
UDP packet back from that host.  But why would the NAT box translate
packets to the ETR if they arrived from some other address than this
host?  The ETR can't very well send out packets to every ITR which
needs to send it packets, for obvious reasons of performance and
delay.  How could the ETR find out an ITR needs to send it packets
unless the ITR can reach it through the NAT box?

What if there were two or more LISP-MNs behind the one NAT box?  ITRs
only tunnel to a single UDP port, so there's no way you could have
two MNs, each being their own ETR, behind a single NAT box.

Also, LISP-MNs need to be on RLOC space.

With TRR Mobility, none of these problems occur.  The MN can be
behind one or more layers of NAT, since it tunnels to a (typically)
nearby TTR, which acts as its ETR and also sends outgoing packets.
The MN be on SPI (EID) space.  It can be behind a NAT box which is on
SPI space.  It can be on address space which itself is SPI space of
another MN.  For instance an aircraft could have a NAT box which is
on SPI space (the aircraft's NAT box is the MN and tunnels to its own
TTR on the ground) and an MN could be working fine, with its own
micronets of SPI space, behind that NAT box.  A further MN could be
operating from one of those SPI addresses!

>> which LISP cannot achieve.
> 
> Also an assertion.

The full sentence is:

  Mapping changes must be sent instantly to all relevant ITRs every
  time the MN gets a new address - which LISP cannot achieve.

There is an implicit assumption that mobility requires no breaks in
connectivity as long as the MN has at least one address which is is
working.

Every time an LISP-MN gets a new RLOC address, if connectivity is
to be continued (and even a fraction of a second loss is a problem if
the MN is doing VoIP) then all the ITRs in the world need to suddenly
change their mapping to the new RLOC address.

Actually, to avoid up to ~200ms gaps, due to the time it takes
packets to get to the ETR from the ITRs at various distances, the
ITRs need to change their tunneling a fraction of a second before the
MN gets its new address.

No mapping system of LISP is capable of changing ITR tunneling
behavior in real-time.

Even Ivip would take between 0.2 and a few seconds.

But with the TTR mobility system, there's no need to change mapping
when the MN gets a new address, no matter where it is.  The mapping
change is only needed - and not particularly urgently - when the MN
selects a new TTR, which it would only normally do if it moves a
distance such as 100km or more.  LISP could work with the TTR
mobility architecture, but Ivip would be better, because the MN's
tunnels to the previous TTR could be dropped much sooner than with LISP.

I stand by my assertion.  If there are arguments against it, they can
be mentioned in the "rebuttal" and in further discussions.

  - Robin
_______________________________________________
rrg mailing list
rrg@irtf.org
http://www.irtf.org/mailman/listinfo/rrg

Re: [rrg] LISP critique

Reply via email to