Here is my current understanding of Fred Templin's IRON Core-Edge
Separation scalable routing proposal.  Its proper name (msg05979) is
"IRON-RANGER", but I am using "IRON" for short.

The proposal was called RANGER, but RANGER is an over-arching system
capable of many things, and there is a new ID IRON to explain how
RANGER, SEAL and VET are used for scalable routing:

  http://tools.ietf.org/html/draft-templin-iron-00

My understanding is incomplete, and so has questions and suggestions.

At the end, I have a draft critique.  I am relying on Fred to review
all this, suggest corrections etc.  Then I hope to be able to
finalise the critique.

I think IRON has some interesting characteristics, including being
able to handle packets without the "initial packet delays" (actually
"initial packets being dropped, and then later ones being tunneled")
of LISP-ALT.  IRON also operates without a mapping system in the
usual sense of the word.  There is a two-stage arrangement by which
initial packets get to the destination network, which is replaced by
a direct path after that.

I don't think IRON would be as good as Ivip, but I suggest that
anyone interested in Core-Edge Separation architectures would find it
intriguing.

  - Robin



The reference documents are, in order of importance:

Discussions between Fred and me recently.  Generally the later ones
are more relevant, but the one marked ** is where Fred gave the best
initial account of IRON.

  RANGER and SEAL critique
  http://www.ietf.org/mail-archive/web/rrg/current/msg05796.html RW
  http://www.ietf.org/mail-archive/web/rrg/current/msg05803.html   FT
  http://www.ietf.org/mail-archive/web/rrg/current/msg05806.html RW
  http://www.ietf.org/mail-archive/web/rrg/current/msg05807.html   FT
  http://www.ietf.org/mail-archive/web/rrg/current/msg05810.html RW
**http://www.ietf.org/mail-archive/web/rrg/current/msg05815.html   FT
  http://www.ietf.org/mail-archive/web/rrg/current/msg05817.html RW
  http://www.ietf.org/mail-archive/web/rrg/current/msg05889.html RW
  http://www.ietf.org/mail-archive/web/rrg/current/msg05937.html   FT

I haven't yet replied to Fred's last message, but we have been
communicating off-list too.  He has since written the IRON ID, so the
following explanation is really a response to that ID and the last
message above.

  See also the RFC-to-be from:
  http://tools.ietf.org/html/draft-templin-ranger-09

  http://tools.ietf.org/html/draft-russert-rangers-01
  http://tools.ietf.org/html/draft-templin-intarea-vet-06

Regarding SEAL tunneling with PMTUD, see:  draft-templin-
intarea-seal-08 and my recent message and whatever Fred writes about it:

  Re: [rrg] IRON: SEAL summary V2
  http://www.ietf.org/mail-archive/web/rrg/current/msg05982.html

The IRON ID and most of RANGER uses IPv6 examples.  I will use IPv4,
in part because I want to know how it would work with IPv4.



Virtual Prefixes (VPs)
----------------------

IRON uses a subset of the global unicast space called "edge" space -
the remainder is "core" space.  Please see

  CES & CEE are completely different (graphs)
  http://www.ietf.org/mail-archive/web/rrg/current/msg05865.html

for a general description of how CES architectures achieve scalable
routing.

"Edge" space in IRON is made of multiple Virtual Prefixes (VPs), each
of which is handled by one, or perhaps several IRON routers.  For a
given VP, the one (or more) such routers is (are) known as the VP
router(s).

Not all IRON routers handle VPs, and a single IRON router could
handle multiple VPs.  For simplicity, in most of the following
discussion, a single VP router is assumed for each VP.  In previous
discussions, this was the router in Seattle.

It is not clear to me how IRON is be introduced so that each End
User Network (EUN) which was using edge address space could have the
benefits - portability, multihoming and inbound TE (and supposedly
mobility, though I don't know how) for all incoming packets, when not
all ISPs and other networks (PI EUNs connecting straight to the DFZ)
had adopted IRON.  So below I assume 100% adoption of IRON by all
ISPs and any other networks connecting directly to the DFZ.

The sum total of all these VPs constitutes "edge" space - and all of
it can be divided very finely into individual prefixes for EUNs which
use this space.  It is not clear what the limits are for IPv4, but I
guess within IPv4 it would be divisible to prefixes as long as /32
(single IPv4 address).  For IPv6, the longest prefix IRON would
handle is /56 (Fred mentioned this off-list).

As far as I know, according to the IDs and discussions so far, these
VPs of "edge" space are neither advertised by DFZ routers directly,
nor are they covered by any prefixes advertised in the DFZ.

With Ivip, the MAB (Mapped Address Block) prefixes are advertised in
the DFZ by DITRs and in LISP, the same things (which have no name)
are advertised in the DFZ by PTRs.   However, if these VPs were
advertised by one or ideally more IRON routers in the DFZ, then this
would enable all packets, including those sent from non-upgraded
networks, to be handled through the IRON system - so all adoptors of
the IRON "edge" ("EID") space would then get benefits of portability
and multihoming for all incoming traffic.

The VPs referred to here are not necessarily isolated.

For instance, there could be four VPs on contiguous prefixes:

   33.44.0.0 / 16
   33.45.0.0 / 16
   33.46.0.0 / 16
   33.47.0.0 / 16

which might each be handled within the IRON system by separate IRON
routers.  To advertise this in the DFZ, an IRON router would only
advertise a single prefix:

   33.44.0.0 / 14

Therefore, the VPs could be more numerous than the number of prefixes
to be advertised in the DFZ, if this were adopted.

Generally speaking, the more VPs there are the less each VP router
needs to do, in terms of handling packets, having more-specific
routes in its FIB etc.

The more VPs there are, generally the greater the number of prefixes
in the RIBs and FIBs of all IRON routers.  My guess is that there
should be no more than a hundred thousand or so - which is presumably
something BGP can handle.

However, in the IRON ID, Fred my have implied a very much lower
number of VPs, because he mentions (page 5) IPv6 /8 prefixes.  Even
if the whole of IPv6's address space was used for global unicast
address space, this would imply no more than 256 VPs.  I would have
thought that 10,000 to 100,000 or 200,000 VPs would be a better way
of spreading the load over multiple VP IRON routers.

In (msg05937) Fred mentions: "BAA::/16" as an example of a VP - so
this anticipates there being many more than hundred of them.

As long as "edge" space could be covered by some lower number of
prefixes for advertising in the DFZ (such as 50,000) I think this
would be fine.  However, the IRON proposal as I understand it does
not anticipate advertising covering prefixes for "edge" space in the DFZ.


IRON routers
------------

IRON routers are not necessarily DFZ routers.  However, they are
probably located topologically close to DFZ routers, near the borders
of ISP networks and of other large networks such as PI-using
corporations and universities etc. who may have their own DFZ
routers, or who connect to the DFZ via one or more ISPs.

In principle it would be possible to implement an IRON router in a
DFZ router, but the intention of IRON is that the IRON routers are
not DFZ routers.

IRON routers connect to internal routing systems of the networks of
ISPs, and EUNs which advertise their own PI space in the DFZ -
including those who do so directly, with their own DFZ routers and
without using an ISP.

IRON routers do not participate in the DFZ control plane.  They have
their own BGP implementation and these are linked in sessions with
other IRON routers to form the IRON BGP control plane (my term).

The IRON BGP control plane is completely separate from the DFZ
control plane.

As far as I know, every IRON router must advertise the complete set
of VPs (that is, the totality of the IRON-managed "edge" space) in
the local routing systems of whatever ISP, corporation, university
etc. they are located in.  As noted above, this would probably be a
fraction of the number of VPs, since I assume that many VPs could be
aggregated into shorter prefixes.

Fred wrote in (msg05937):

> I was actually thinking that the IRON routers would only advertise
> "default" into the local routing system, but they could just as
> well advertise 42.0.0.0 /16 if they wanted to.

I think the IRON routers must advertise only the prefixes which cover
all the "edge" space.  They couldn't advertise the default route,
since this always leads "towards the rest of the Internet" until we
get to a router which has no such default - a DFZ router - because
some prefixes of "the rest of the Internet" have best paths out one
interface and outer prefixes have best paths out one or more other
interfaces.

IRON routers need a peer connection to one or more internal or
Border (typically DFZ) Routers  by which they can advertise the
VP prefixes.  They also need an IP address by which they can send and
receive packets from other IRON routers - potentially from any other
IRON router in the world.

They do not need a connection to any DFZ router.  As far as I
understand IRON, there is no provision for them advertising the VP
prefixes in the DFZ - however, as noted above, if some of IRON
routers did this, this would be acting like DITRs or PTRs.

IRON routers discover (in Fred's description) other nearby IRON
routers, such as those in nearby ISPs, corporate networks etc.  I am
unclear about multiple IRON routers in a single ISP, corporation etc.
linking to each other.

I guess that IRON routers could best be implemented, initially at
least, as software in a server - though in the future these functions
could be added to routers from the major vendors.

Fred describes IRON routers discovering nearby routers via PRLs
(Possible Router Lists) which are part of RANGER, or via some
DNS-based methods.  I am interested in understanding IRON with as
little as possible of RANGER, since I find RANGER vary open-ended,
complex and hard to understand.  To me, it would be acceptable if
each IRON router was manually configured with the IP addresses of a
handful of "nearby" IRON routers.

IRON routers set up their BGP sessions over VET/SEAL tunnels, using
the internal "VET interface" construct.  I don't clearly understand
VET, but I view it as some kind of software construct by which
packets can be sent to remote devices - in this case other IRON
routers, via SEAL tunnels, which are in themselves unidirectional,
but which can be used in both directions to make a 2 way link.

>From the point of view of the IRON router, every other IRON router in
the world is a "single hop" away, via VET - because the VET
"interface" tunnels packets going outwards and receives tunnel
packets coming in, for all IRON routers, just as if they were all
directly connected (from BGP's point of view) to the non-physical VET
"interface".

So if an IRON router A has an IP address of another IRON router B, it
can send it packets out the VET interface, and receive them from B as
well.

There is no need to establish a SEAL tunnel before sending any
packets using such a tunnel.  When an IRON router A with address
22.33.44.55 sends a packet to an IRON router B, with address
66.77.88.99, it does so via its internal VET interface which uses
SEAL to tunnel the packets, using the outer header destination
address 66.77.88.99.  This is then forwarded out of the IRON router,
into the local routing system, where it is (typically) forwarded to a
DFZ router, various other DFZ routers and eventually (perhaps through
some internal routers of the network in which B is located) to B
using its 66.77.88.99 IP address.

This tunnel behaves like a physical link, since via the VET
interface, a packet can be sent from A to B which is not addressed to
B - traffic packets can be sent just like they could be put out of a
point-to-point link from one router another.  However, the
"link" is a tunnel, typically across the DFZ, with SEAL's PMTUD
mechanisms.

The BGP sessions are made over these tunnels using the VET interface.

According to the ID, these BGP sessions should be with IRON routers
nearby.  However, I think that if there is only one VP router for
each VP, it doesn't matter what the structure of the IRON BGP links
is.

Multiple IRON routers "owning" a VP are possible - I think the
word "selected" means one or more such routers handing a single VP.
Then, I think it would be important (but not absolutely essential)
for each IRON router to know the IP address of the nearest one of the
multiple VP routers for each VP.  This would only be possible if each
IRON router generally had BGP sessions with the IRON routers of
"nearby" ASNs other than its own - and if the global system of IRON
routers had each one using the ASN of the network it was operating
within.  "Nearby" means close according to the physical links between
DFZ routers.  Only then would BGP's natural path selection mechanisms
provide a given IRON router A with the IP address of the genuinely
closest of multiple VP routers which were all advertising the same VP
in the IRON BGP control plane.


The New Zealand - Seattle example continued
-------------------------------------------

To continue the example from the previous discussion, a sending host
(SH) in the North Island of New Zealand sends a packet to an edge
address of a multihomed IRON-edge-address-using EUN of a tour company
 in the Fox Glacier township.  The tour company's EID prefix is
43.0.56.76 /30 and the packet is addressed to 43.0.56.78.

The tour company has this space multihomed via some kind of router at
its site which connects to two ISPs in the South Island ISP-4 and
ISP-5.  The ISP-4 link is via a fixed IP address DSL link with the
address 33.22.22.33.  There's probably only a single fibre or cable
going to this remote and marvellous part of New Zealand.  (Every
establishment has its own generators because trees regularly fall
down and bring down the power line, causing blackouts on a very
frequent basis.)

Lets imagine that ISP-5 has a 3G data network there and the tour
company also has a suitable modem, with a fixed IP address service
for this, on 55.66.66.55.  Or perhaps there is an expensive, slow,
high-latency geosynchronous satellite service.  Normally, the tour
company prefers data to come in via the DSL line.

Somehow, in ISP-4 there is an IRON router D which can forward packets
for the 43.0.56.76 /30 prefix to the tour-company's router via the
DSL service.  Likewise ISP-5 has an IRON router E which can forward
packets addressed to this prefix to the 3G modem.

In this example, one of the thousands of VPs is 43.0.0.0 /16 - and
this covers the EID prefix of the tour company.  In this example,
only one IRON router advertises this VP in the IRON BGP control plane
- a router B in Seattle.

There must be some direct or indirect commercial relationship between
the tour company and the ISP - or whatever kind of company it is -
which runs the Seattle router.  The Seattle router "owns" this VP,
which means its owners pay for its upkeep and connectivity - which
means they must be paid directly or indirectly to do this by
potentially thousands of companies such as the Fox Glacier tour
company.  Maybe this is a branch office of a glacier tour company in
Washington state - and they rented a larger set of "edge" space from
the Seattle ISP, which was renting space in 43.0.0.0 /16 to thousands
of EUNs.  These EUNs could be anywhere in the world. The do not need
to be connected to the Seattle ISP to be able to use this "edge"
space, which is managed by the IRON system.


The packet from the North Island host is forwarded in the network of
the Auckland ISP towards its IRON router A.  (Maybe it has more than
one, but this will do.)  This is because A is advertising to the
local routing system all the prefixes which cover IRON's "edge"
address space, including a prefix such as 43.0.0.0 /16 or 43.0.0.0
/14 which covers 43.0.56.78.

The IRON router A may have BGP neighbours in the North and South
Island, and perhaps a neighbour in Australia, Fiji or Los Angeles.

The IRON routers form a globally connected system - all via VET/SEAL
tunnels - to create their own BGP control plane.   By this means, the
IRON router finds the best path for packets matching the VP 43.0.0.0
/16 - and this best path is towards the IP address of the Seattle
router - which is IRON router B.

Generally, each IRON router in its RIB and FIB has a minimum set of
things:

  1 - The best paths for all the VPs.

  2 - Best path for prefixes which cover the IP addresses of its
      IRON BGP control plane neighbours.

They may also have additional routes in their FIB alone, for two
reasons, which are explained below.  One is a complete set of
"more-specifics" in the VP router(s) FIB (not RIB) - all the EUN
prefixes in that VP, of which there could be tens of thousands.  The
other is individual such prefixes installed temporarily in the FIBs
of IRON routers near the sending host, as a result of receiving a
SEAL redirect message from a VP router it just tunneled a packet to.

By means which are not at all clear to me, the Seattle router B has
securely installed in its FIB (but not RIB) a prefix for 43.0.56.76
/30 with a best path leading to the IP address of the IRON router D.
 I am not sure how multihoming service restoration works in IRON,
which I think must be a crucial function of this "registration" process.

  See in msg05980 the mention of "bubbles".  Fred described in an
  off-list message how the Fox Glacier township router could
  propagate its prefix upwards in the routing system by means
  of Router Advertisements.  I don't really understand these, and
  as far as I know they are part of IPv6 only.  Hopefully he will
  explain this better, especially for IPv4.

The FIB of the Seattle router has an additional set of prefixes - a
complete set of prefixes such as just described for all the other
"edge"-using EUNs whose "edge" space is within the 43.0.0.0 /16
prefix.  This could be thousands or tens of thousands of prefixes,
since many EUNs will be fine with a single IPv4 address at each of
their sites.  In principle, this /16 could have 2^16 separate EID
prefixes - so this is a substantial addition to the FIB of the
Seattle router.  In this example, so far, the Seattle router B is the
only IRON router to be the VP router for this 43.0.0.0 /16 prefix.

The IRON router A in Auckland finds that the packet matches the
43.0.0.0 /16 or 43.0.0.0 /14 prefix in its FIB, and that the best BGP
path for packets matching this prefix ends in an IP address which is
one of the IRON routers - since this best-path came via one of its
IRON BGP neighbours.

Through the magic of VET (which means I assume this can be done, but
I don't exactly understand how) the A router tunnels the traffic
packet to the Seattle router.  This means the encapsulated packet has
the Seattle router's address as its outer destination address - and
the A router forwards it to the local routing system, where it is
forwarded towards a DFZ router, and so forwarded to the Seattle IRON
router B, just like any other packet.

The continually active tunnels between IRON BGP control plane
neighbours primarily carry BGP messages.  These tunnels could also
carry a traffic packet, tunnelled as just described.  Then the tunnel
would already have been established, so SEAL would have state for it
at both ends and would have figured out the PMTU in both directions.

If we assume that the Auckland router A had never sent a packet to
the Seattle router B, then this packet marks the beginning of a
one-way tunnel from A to B, so the A router's SEAL tunneling software
would instantiate new variables for the SEAL state for router B.
This includes choosing a random 32 bit value for the first SEAL_ID
value.  Subsequent packets will use values one more than the last.

When the packet arrives at the Seattle router, it is decapsulated and
emerges from the VET interface, to be handled by the FIB.

It is possible that a packet sent to the Seattle router is addressed
to a host in an EUN directly connected to that Seattle router.  In
this case, as usually, this is not true.

The Seattle router's FIB has a more-specific prefix which matches
this destination address - the prefix 43.0.56.76 /30 which has a best
path to IRON router D in the South Island - the one which has the DSL
link to the tour company.

The Seattle router now tunnels the packet to the router D.  This is
on page 6:

Translating the sentence:

   'C' then forwards the packet to an IRON router 'D' which
   connects the RANGER network where 'E' currently resides.

to represent the current example:

   The Seattle router 'B' then forwards the packet to an IRON router
   'D' which connects the ISP-4 network where the tour company
   currently prefers its packets to be delivered.

However, "forward" in this sentence is not, as far as I know,
ordinary forwarding in the DFZ.  The previous reference to "forward"
was "forwards the packet via VET automatic tunneling" - so I think
the second usage also implies VET automatic tunneling:

   IRON router 'B' then consults its FIB and discovers a VP that
   covers the 'E' prefix, then forwards the packet via VET automatic
   tunneling to an IRON router 'C' that owns the VP.

translated:

   Auckland ISP IRON router 'A' then consults its FIB and discovers a
   VP 43.0.0.0 /16 that covers the destination address 43.0.56.78,
   then forwards the packet via VET automatic tunneling to an IRON
   router 'B' in Seattle that owns the VP.


So I think the Seattle router A uses VET tunneling to "forward" the
packet to the IRON router D in the South Island - which will deliver
it to the tour company's DSL service.

The most obvious problem with this is that the packet had to traverse
the Pacific Ocean and the Equator back and forth to get from the
North to the South Island.

This is where the RANGER "route optimization" comes into play.

But how does the Seattle router B get the packet to router D in the
ISP-4 of the South Island?

I thought that B would use VET tunneling to D.

However, what Fred told me about router discovery made me think that
perhaps the tour company router, via D, does some kind of "bubble
blowing" process by which D winds up with an FIB entry for the
43.0.56.76 /30 prefix, with a best path which leads to intermediate
routers including D.

I don't know how this would work for IPv6, much less IPv4 - or how it
would scale considering there will be millions of EUN "edge"
prefixes, like the one used by the tour company in the South Island.



Route optimization
------------------

The B router in Seattle will send back a SEAL message, via a SEAL
tunnel from B to A, to the A router in the North Island.  This tells
the A router that for any packets addressed to the 43.0.56.76 /30
prefix, it should on longer forward them on the path to the B router
in Seattle, but should forward them directly to the IRON router D in
the South Island.

This is, in effect, a route redirect message.  It would also come
with a caching time.

This results in the installation of a "more-specific" prefix in the
FIB of the A router in the North Island.  This has precedence over
the 43.0.0.0 /16 or 43.0.0.0 /14 prefix which all IRON routers have.

As best I understand Fred's plans, the A router will have a locally
configured STALETIME, such as 120 seconds.  I understand that if no
traffic packets use this new "more-specific" FIB entry within any 120
second period, then it will be deleted.

I understand that the A router also caches a SEAL_ID with this - the
SEAL_ID which came with the redirect message, which itself was copied
from the initial traffic packet which A sent to B.  So this SEAL_ID,
which A generated, enabled A to authenticate the SEAL redirect message.

I think it could also be used to authenticate a second redirect
message from B, but as far as I know, B would not send such a
message, at least in respect of the initial traffic packet.

Now, as long as traffic packets keep arriving for this prefix less
than 120 seconds after each other, and as long as the redirect's
cache time has not expired - and as long as nothing else happens -
the A router in the North Island will tunnel packets to the D router
in the South Island, and all will be well.

If the D router becomes unreachable, or if it cannot reach the router
in the tour company (say the prodigious rainfall and stiff winds
bring down another tree and pull down a fibre cable line which the
DSL service depends upon), then the A router will delete this
more-specific entry and its cached SEAL_ID.  This would only occur if
the D router sent a destination unreachable message to the A router,
or if the D router was somehow unreachable - but that would require
some other router to send a destination unreachable, I think, since I
understood that all IRON routers are presumed to be reachable via the
VET interface.

The next time a packet arrives at the A router, with a destination
address matching the 43.0.56.76 /30 prefix, the A router will once
again tunnel the packet to the B router in Seattle and the process
will begin again.

However, by now - by some means I don't fully understand - the B
router in Seattle knows that the packet should be tunneled to the E
router in the South Island, which uses a 3G link or whatever to the
tour company's network.  So that is where the packet is sent by B,
and the A router gets a redirect to the E router, rather than the D
router.

Somehow:

  1 - The VP router (B in Seattle) already knew about both D and E as
      being IRON routers which could accept packets addressed to the
      43.0.56.76 /30.

  2 - The VP router initially knew that both D and E were reachable,
      and that they could reach the tour company's router.

  3 - The VP router knew that the D router was preferred over the E
      router.  (I don't know if this is possible via Router
      Advertisements.)

  4 - After the outage, the VP router was told that D could not be
      used any more, so it altered the path in its FIB for the more
      specific route 43.0.56.76 /30 to point the E router instead.


Let's say the outage happened a minute after the first packet, and
by some means the VP router in Seattle found out about it 10 seconds
later.  Could the VP router send a second redirect to the A router?
I guess it could, but as far as I know, this is not part of IRON.

The caching time in the redirect is to avoid the A router from
sending packets for too long according to the redirect, when it
should periodically forget the redirect and let the next packet(s) go
to the VP router in Seattle, and await any redirect which results.

The STALETIME value is to reduce unwanted clutter in the A router's
FIB in the absence of them actually being used.


Multiple VP routers
-------------------

I understand there can be multiple routers such as the one in Seattle
which advertise the 43.0.0.0 /16 Virtual Prefix in the IRON BGP
control plane.

This would have three advantages at least:

  1 - The load for this prefix would be spread over more than one
      VP router.

  2 - There would be natural failure recovery - if the Seattle
      router was down, whatever IRON routers had a path to it for
      this prefix would adapt by choosing a path to another VP
      router advertising the same prefix.

  3 - Generally, subject to conditions discussed below, the A router
      would find the closest of multiple VP routers - so reducing
      total path lengths and delays for the first packet or packets.

      There could be a flurry of packets sent from A to B before
      B's redirect gets to A - especially if one or more of the
      the redirect packets are lost.  So the B router in Seattle
      would need to get all those packets to the correct D or E
      router.


However, now the D and E routers need to communicate their
"ownership" of 43.0.56.76 /30 to multiple VP routers all over the
world.  Likewise their lack of ability to handle packets for this
prefix if there is an outage.

These VP routers could be anywhere in the world.

So how does the proposed "blowing bubbles" method (I think based on
IPv6 or RANGER Router Advertisements) scale properly?  Does it happen
over the IRON BGP control plane only - or is it somehow a process
which happens outside this?  The EUN router in the tour company
office is not part of this control plane.

I understand that this process is a continual one - the D and E
routers need to keep doing it, based on some caching time in the VP
routers, I guess.

There are going to be millions of these EUN prefixes, and for each
one, if it is multihomed to two ISPs, there are going to be two IRON
routers "blowing bubbles" in a manner which will continually reach
one or more IRON VP routers anywhere in the world.

The selection of the "closest" VP router depends on the tunneled BGP
neighbour links between all IRON routers generally following the
"nearby" rule, based on the underlying physical topology over which
DFZ BGP routers conduct their sessions.

If there was only a single VP router for a given prefix of "edge"
space such as 43.0.0.0 /16, then it doesn't matter how the IRON
routers are connected.  It would be fine for a New Zealand IRON
router to tunnel to IRON routers in Moscow, London and South Africa.



What if?
--------

The above structure is interesting and unique.  TIDR had all the DFZ
BRs (not transit DFZ routers) routers communicating via a second BGP
instance - so it doesn't really solve one of the crucial parts of the
routing scaling problem: reducing load on the DFZ control plane.  But
IRON involves new routers, in similar places to DFZ routers,
communicating in a way which does not burden the DFZ control plane at
all.

IRON uses a data-driven method of gaining "mapping" while also
delivering the initial packet - without excessive delay and without a
fancy new network such as the ALT network.

The "map reply" is the SEAL redirect message.

Why not forget about most of these IRON routers and simply have the
VP router advertise its prefix in the DFZ?  Because then it can't
send redirects to the routers closer to the sending host, since those
routers are just ordinary routers, are not ready to accept such
things, and because the VP router wouldn't be able to know their IP
address.

Why not have large numbers of VP routers?  This depends on how the D
and E routers, and most or all other IRON routers handling millions
of EUN "edge" prefixes, communicate their aliveness and IP address to
the multiple VP routers.

If there were a hundred VP routers, then maybe there wouldn't need to
be any redirects - since one of them would be close enough to the
path between the A router and either D or E for the system to work
fine.  This degenerates into LISP with hundreds or tens of thousands
of PTRs - and no other ITRs.  (Or Ivip with all DITRs, where every
DITR advertises all the "edge" space, as MABs in the DFZ).  In both
cases, the dominant problem would then be getting the "mapping" to
these tens of thousands of routers, for the millions of EUN "edge"
prefixes in a scalable, secure, fashion fast enough for multihoming
service restoration controlled by the IRON routers which deliver
packets to the EUNs.



Draft Critique
--------------

I hope Fred will be able to comment on this - after he does, I will
revise it and then hopefully move on to other proposals.

This is about 750 words.  I can try chopping it down to 500 once I
hear from Fred.   I will be making an ID of the full versions of all
critiques which do not make it into the RRG Report, so a non-chopped
down version can be in that ID.




IRON-RANGER (hereafter "IRON") uses principles from RANGER, VET and
SEAL to construct a Core-Edge Separation scalable routing solution.
Separate IRON networks would be used for IPv4 and IPv6, but perhaps
they could be combined in some way if this was desired.

IRON does not have a mapping system such as that of LISP or Ivip.

A single global network of IRON routers communicate over tunnels,
each using their own BGP instance, to form the IRON BGP control
plane.  This is unrelated to the DFZ's BGP control plane.  While each
IRON router advertises all "edge" prefixes in the routing system of
the networks they are based in (of ISPs and large corporations,
universities etc.), the current IDs do not call for them to advertise
any such prefixes in the DFZ.  Therefore, as currently described,
IRON could only support packets sent by all hosts if it was adopted
by all such networks.  However, IRON could easily be adapted to do
this by having multiple widely-dispersed IRON routers advertise the
complete set of "edge" prefixes in the DFZ.

Each IRON router processes packets addressed to "edge" addresses by
forwarding them to a particular IRON router which, inside the IRON
BGP control plane, advertises a particular Virtual Prefix.  There may
be one or more of these VP routers for a given prefix, and the number
of VP prefixes for the entire "edge" subset of the global unicast
address space would be limited, in part, but the ability of the IRON
BGP control plane to handle this number of prefixes.

IRON routers peer with topologically nearby IRON routers to be their
BGP neighbours.  When the traffic packet arrives at the VP router, it
is forwarded (via a tunnel again?) to the IRON router which can
deliver the packet to the destination network.

The VP router also sends a SEAL redirect router to the first IRON
router and thereafter, that first IRON router tunnels the packets
directly to the IRON router which connects to the destination
end-user network.

The VP router's FIB for has more-specific routes for each end-user
network prefix which is covered by this VP.

There are unresolved scaling questions regarding:

  1 - The ability of the initial IRON router to handle in its
      FIB the temporarily installed more-specific routes due
      to the redirect messages it receives from VP routers.

  2 - Likewise, questions of FIB and/or route processor ability
      to handle the churn in these, since they will typically
      last for seconds or minutes, before having to be withdrawn
      and perhaps replaced after a further redirect.

  3 - The number of VP routers - more than one would be necessary
      for robustness.

  4 - The ability of the VP routers to discern which of the multiple
      advertising IRON routers had the highest priority for use
      in a multihoming scenario when both were advertising the one
      end-user network "edge" prefix.

  5 - The scaling problems inherent in these IRON routers advertising
      their collectively millions of end-user "edge" prefixes all
      over the IRON network, since the one or more VP routers could
      be located anywhere with respect to these advertising IRON
      routers.

  6 - The speed with which VP routers can learn of outages detected
      by the IRON routers which are capable of delivering packets to
      the end-user networks.


IRON is not yet described in sufficient detail for these questions to
be answered.  It is not clear how, or if, it would implement load
sharing or other forms of inbound TE.  Nor is it clear what approach
to mobility the system would adopt, or how this would scale to
billions of mobile devices.

There is no current description of the business relationships between
the various users and operators of routers - so it is difficult to
envisage business arrangements in which costs are generally borne by
those who benefit, without unfair burdens being placed on any
participants.  Nor is there a description of how IRON could be
introduced so as to provide portability, multihoming etc. for all
packets received by an adopting network, before all networks have
their own IRON routers.

IRON is a novel CES architecture in an early stage of its design
process.  It can be decentralised in every respect, and uses
data-driven "redirect" messages as a form of mapping distribution.
However, it is not yet clear how the VP routers learn the mapping for
the end-user prefixes in their VP.  If this an be done in a secure,
fast and scalable fashion - then IRON may be worth considering as a
scalable routing system, at least for providing portability and
multihoming to non-mobile end-user networks.

_______________________________________________
rrg mailing list
rrg@irtf.org
http://www.irtf.org/mailman/listinfo/rrg

Reply via email to