[rrg] Ivip's new distributed mapping distribution system

Robin Whittle Sat, 06 Feb 2010 06:27:20 -0800

Here is a rough description of how I plan to make Ivip's essentially
real-time mapping distribution system more distributed than the way
it is described in the current IDs.


This explanation shows how the new arrangement enables the provision
of SPI space ("edge" space for end-user networks) before there is a
global fast-push mapping system.  This includes the use of SPI space
for TTR Mobility.

It will take me a while to update the Ivip IDs, since I still have a
lot of work to do reading the proposals.  I hope to finish working on
RANGER soon - it has been difficult but most interesting.  Then I
plan to look at hIPv4 and again at Name Based Sockets, before turning
to GLI-Split.


The previous concept of RUAS (Root Update Authorisation Server)
companies is now performed by MAB (Mapped Address Block) Operating
Companies.  They may receive mapping updates directly from end-user
networks, or they may have one or more levels of UAS (Update
Authorization Server) companies between them and the end-user
networks, just as the RUAS companies do in the current IDs.

Instead of multiple RUASes sending mapping update packets to a set of
8 or so "Level 0 Replicators", which flood each other with the
information so they all get same payload of information at least
once, even if the RUAS only sent it to one of them, and instead of a
set of Replicators in Level 1, Level 2 etc. fanning out packets with
the same payload of information, there is a different arrangement of
Replicators.

I recently introduced a system of "Missing Packet Servers".  I think
these will still be needed - and won't discuss them further below.

The new design involves a global mesh of Replicators, meshed in all
sorts of ways without any particular level-based or tree-like
structure.  This is driven at multiple points by packets from servers
of multiple MAB Operating Companies.  In the previous arrangement,
the 8 or so Level 0 Replicators were the narrowest part of the
system.  In the new arrangement, there is no such narrow point.

The following description should make sense to those who have read:

  http://tools.ietf.org/html/draft-whittle-ivip-fpr-00

and are broadly familiar with Ivip after reading:

  http://tools.ietf.org/html/draft-whittle-ivip-arch-03
  http://tools.ietf.org/html/draft-whittle-ivip-db-fast-push-03

The following examples use IPv4 addresses but the new arrangement
applies for IPv6 too.

A MAB is a DFZ-advertised prefix, such as 12.34.0.0 /16, in which all
the address space is now "edge" space.  This is the subset of the
address space managed by the Ivip system.  All such space is known as
SPI (Scalable Provider Independent) space.

Ivip is a Core-Edge Separation architecture.  "Edge" space is a
subset of the global unicast address range which is suitable for
End-User Networks (EUNs) to use for portability, multihoming (with
inbound TE) and (with the TTR Mobility architecture) global mobility.

Initially there would be just one MAB.  With wide adoption there
could be tens of thousands of MABs.  In principle, a single MAB might
be used as a single "micronet" of SPI space, but in general each MAB
will be split into hundreds to tens of thousands of separately mapped
micronets.  A micronet is a contiguous integer number of IPv4
addresses, or IPv6 /64 prefixes, which is mapped to a single ETR address.

Each EUN (End User Network) has a subset of a MAB (usually - in
principle it could have the whole MAB) called a User Address Block
(UAB), also a contiguous integer number of IPv4 addresses or IPv6
/64s.  Each EUN can split up its one or more UABs into micronets of
any size in these units.

The most common mapping change is to change the ETR address to which
a micronet is mapped.  Other mapping changes involve splitting and
joining micronets.


Initially, we start with the IPv4 Internet as it is today - no "edge"
space.   Then a company "M001" sets up shop, with a prefix to use as
the very first MAB.  Later it will have multiple MABs and there will
be multiple such companies.  Each company which operates one or more
MABs is called a MABOC - Mapped Address Block Operating Company.  So
"M001" is our name for the first MABOC.

Ultimately there may be hundreds of MABOCs and tens of thousands of
MABs.

Generally, the larger the MABs, the fewer will be needed - and the
fewer MABs there are, the less load will be placed on the DFZ control
plane, and the DFZ routers' FIBs, since each MAB is just like any
other prefix (from an ISP or for PI space): it involves the RIB of
each DFZ router in work (BGP conversations about this prefix with all
neighbours and it requires a route in its FIB.


Here is how M001 goes into business renting SPI space to EUNs,
perhaps without any need for IETF standards and with no need yet for
any fast-push real-time mapping distribution system.

M001 sets up one or more servers widely distributed around the world
- for instance at 12 IXes.  At each site several functions will be
performed.  Perhaps they could be performed by single servers - or
perhaps all combined into one server.

At each site M001 runs a DITR - Default ITR in the DFZ.  The DITR is
on a stub connection to a peering point and functions as a BGP router
advertising, initially, the one MAB.  Later it will advertise all the
MABs which M001 runs.  The MABOC will need to pay for this
connectivity, since it is accepting and sending packets, but not
providing transit or peering.

For simplicity I will assume that M001 "owns" this MAB and rents out
space on it to EUNs.  However, perhaps one or more MABs are "owned"
by some other organisation, which contracts M001 to handle the
mapping of these MABs and the provision of DITRs for them.

If there isn't already some IETF RFCs on Ivip ITR and ETR protocols,
then M001 will develop its own, and provide source code for the ETR
function to its EUN customers.

In all this discussion, Ivip uses encapsulation for tunneling packets
from ITRs to ETRs.  All ITRs and ETRs should be written to be able to
switch to Modified Header Forwarding (MHF) once this becomes
possible.  MHF eliminates the encapsulation overhead and some complex
PMTUD functions which ITRs and ETRs must perform due to
encapsulation.  However, they are only possible once all DFZ and some
other routers are upgraded.  In the long term, all will be capable of
this, without any significant cost - so all ITRs and ETRs should be
capable of switching to MHF at some time in the future.  The initial
ITR and ETR implementations wouldn't need to do this - but once Ivip
became widely used, ITR and ETR code should have these capabilities
built in.

An end-user network customer EUN0001 rents some space, such as from
12.34.50.10 to 12.34.50.21 from M001. This is is 12 IPv4 address UAB.
 EUN0001 can use it as a single micronet, or split it into as many as
12 single IPv4 address micronets.

EUN0001 needs one or more ETRs, which would be implemented in server
- assuming conventional routers don't yet have the capability.  EUN
can use each of its micronets via any ISP in the world, provided the
ISP gives it a stable IP address for the ETR to run from.

For instance, if EUN0001 wants to have an office in Hong Kong, with a
multihomed single IPv4 address (12.34.50.14) of SPI space, it gets
two fixed IP address internet services into the building (such as a
DSL link from one company and a fibre link from another) and connects
each to its ETR box.  The ETR software performs an ETR function for
each link, and all the office's traffic goes in and out of this box.

For this to be work well, M001 needs to have a DITR not too far from
Hong Kong.  Otherwise, packets sent from hosts in Hong Kong would
need to travel some distance to the nearest (in BGP terms) DITR,
where they would be tunneled to either the DSL ETR address or the
fibre ETR address, depending on the mapping EUN0001 provides to M001.

Let's say M001 has a DITR (and other functions to be described below)
at the sites: Beijing, Hong Kong, Tokyo, Singapore, Sydney, Los
Angeles, New York, Sao Paulo, London, Düsseldorf, Moscow and Mumbai.

All the DITRs at these sites advertise 12.34.0.0 /16 in the DFZ.  So
each DITR only needs in its FIB the micronet start and end addresses
and the ETR address to which each micronet is mapped - for all the
micronets in this MAB.    Later, when there are other MABs run by
M001, it will have the mapping for these too.  When other MABOCs are
operating, they will have their own DITRs, and M001's DITRs will only
handle packets addressed to any of M001's MABs.

So a DITR is different from the general purpose ITRs which will come
later.  ITRs will be in ISP networks and will advertise the MABs of
all MABOCs.

M001, by whatever means it chooses, accepts real-time mapping from
its customers such as EUN0001 - and by one means or another transmits
it in real-time (a second or so) to all its DITRs.  It could do this
via simple encrypted tunnels and its own mapping change data formats.

Later, at each site there would be a Replicator, to fan out mapping
change packets to Replicators at some or all of the other DITR sites
of this MABOC.  This would be a partly or fully meshed flooding
arrangement between the Replicators at these sites, so as long as at
least one of them gets the mapping change payload, and is connected
to one of the others by at least one functioning tunnel, then the
rest of them will also get this payload of mapping changes.

In the longer term, it would be desirable for M001 to use private
network links to send mapping changes to its 12 or more DITR sites,
to avoid problems with DoS packets overloading the server at each
site which receives the mapping changes from M001's central servers.
 Partial cross-linking of these Replicators via private network links
would make the whole set of Replicators at each of the 12 sites a
robust system, with no single point of failure, by which all sites
would quickly and robustly get the mapping information.

There are various ways of ensuring these sites get the same
information, and some challenges if one or more sites are completely
disconnected, even briefly.  There may be a need for "missing payload
servers" to cope with this.

However it is done, it should be a solvable problem for M001 to
reliably get its mapping changes in real-time to servers at these 12
sites.

Initially, this is all there is to the Ivip system.  A single MABOC
with a single MAB, renting out the space to potentially thousands of
end-user networks such as EUN0001.

The ITR could be a caching ITR querying a local query server, with
the query server receiving mapping changes from the Replicator or
whatever method M001 uses to get mapping to all sites.  Alternatively
the ITR could be non-caching - it would have its FIB already loaded
with all the mapping for the one or more MABs it advertises.

The number of micronets a single MABOC handles may be suitable for a
single FIB in a server-based DITR.  If not, then two or more physical
servers can be used, each advertising a different subset of M001's
MABs.  This will spread the traffic load over multiple such DITRs at
the one site, and also reduce the number of micronets each one's FIB
needs to handle.  Whatever scaling problems there are with traffic
volumes at a single site can be solved by adding more servers and
splitting up the MAB address range between them.

As business improves, the other way of expanding the capacity of the
system is to establish more such sites - which will also reduce the
total path length between the sending host and the ETR.

Ideally, there would be a standardized protocol by which each EUN -
or a Multihoming Monitoring Company (MMC) the EUN hires to control
the mapping of some or all of its micronets - can send mapping
changes to all the MABOCs.

Ideally, there would be an established protocol for ITRs (and so
DITRs) and ETRs, so the one ETR can be used to receive tunnels from
the DITRs of all MABOCs, and later from ITRs operated by ISPs and
other networks other than the MABOCs.

The situation at this stage of the example is 20 or so MABOCs, in
total running 200 MABs (say /22 to /16, but in principle from /24 to
/8), covering 2 million IPv4 addresses.  Let's say 100,000 end-user
networks (EUNs) are using this SPI space, as 300,000 micronets - some
as small as a single IPv4 address.  Some or many of these may be
using TTR Mobility - the market is not just for non-mobile
portability and multihoming.

This would be solving the routing scaling problem in a big way.
300,000 PI prefixes, if advertised in the DFZ, would double today's
number of prefixes - and with the smallest prefixes (/24) this would
be 77 million IPv4 addresses.

Instead, we have 100,000 EUNs with 300,000 portable, multihomable
(and potentially TTR Mobile) micronets of space, with the burden of
only 200 prefixes in the DFZ, and using 75 million less IPv4 addresses.

Each MABOC doesn't necessarily need to have DITRs all over the world.
 If a given MABOC had customers who only used their micronets in
Europe, it would only need DITRs in Europe.  DITRs need to be capable
of handling the peaks in traffic, and in order to not add appreciably
 to the total path length, they need to be roughly on-path between
the sending host and the ETR.

Perhaps some MABOCs would offer a service with DITRs only in Europe,
or only in North America.  This might suit some customers, and it
would presumably be cheaper to use such a MABOC than to use one which
had DITR support all over the world.  Also, a MABOC might have one or
more of its MABs only supported with DITRs in certain areas - and be
able to rent SPI space in this MAB at a cheaper rate to those who
found this restriction acceptable.

EUNs will not only rent SPI space from MABOCs, they will pay per
mapping change, and pay for the load on DITRs due to packets
addressed to their micronets.

By the time the Ivip system grows to this size, there will be
pressure from ISPs to get ITRs in their own networks.  For instance,
ISP01 may wish to provide a better service for its customers by
running one or more ITRs inside its own networks, so packets would be
surely tunneled to the correct ETRs, rather than relying on the DITRs
outside the ISP's network.  This could distinguish ISP01 in marketing
terms and in reality from its competitors which were not so hip to Ivip.

Another reason an ISP would want its own ITR is as follows.  Suppose
ISP01 has one or more likely hundreds of its customers using SPI
space.  Maybe the ISP runs ETRs which multiple customers share.
Maybe the customers run their own ETRs on fixed IP address PA
services - so the ISP wouldn't necessarily know of this usage except
by seeing lots of tunneled packets going to that customer's IP address.

ISP01 will have some of its customers sending packets to these
customers who are using SPI space.  Without one or more internal
ITRs, those packets will go outside the ISP's network to the nearest
DITR and then come back, tunneled to the ETR address inside the ISP's
network.  Such packets cost money - since the upstream link is one of
the ISP's greatest expenses.

ISP01 could solve this by asking some or all of the MABOCs to put an
ITR inside its networks.  But that could mean 20 different servers or
routers - so would be costly and messy.

What ISP01 wants is an ITR which handles all the MABs.

Here is how ISP01 and ISPs all over the world would do it - with the
help of the MABOCs.

The MABOCs will be keen to have ISPs install their own ITRs, since
this will handle some of the traffic sent to the MABOC's customers'
micronets without burdening the MABOC's DITRs.  Probably most of the
packets which ISP01's ITR handles will be tunneled out of ISP01's
network, which doesn't save the ISP any money.  But those which are
to be tunneled to ETRs in the network will never need to go out and
come back again.

The DITRs each MABOC runs will probably be simple software devices,
or suitably capable routers from Cisco, Juniper etc. - and that they
will have in their FIBs the full set of ETR addresses for all the
micronets of all the MABs this MABOC runs.  However, maybe these
DITRs will be caching ITRs and get their mapping from a local query
server at each site the MABOC runs around the world.  That query
server need not be a full database query server (QSD) as described
below.  It only needs to know the mapping of micronets in MABs run by
this MABOC.

One likely variation on the above is that one or more companies could
establish sites all around the world, and provide DITR services for
any MABOC who preferred this arrangement to running their own DITR
sites.  There are obvious economies of scale here, and so it is quite
possible that DITR functions for multiple MABOCs may be performed by
the one ITR, which would then most likely be a caching ITR getting
mapping from a local query server which contains either all the
mapping of all MABOCs (as described below for ISPs) or just the
mapping of the MABOCs which this company is providing DITR services for.

In either case, the following depends on each MABOC having a bunch of
widely distributed sites which simultaneously get the mapping changes
for that MABOC's MABs.  One or more servers at each site will be able
act like a Replicator, sending out streams of packets with mapping
changes to ISPs nearby.  If the site is run by a single MABOC, the
payloads of these packets will contain only the mapping changes of
that MABOC.  If it is the site of a company working for multiple
MABOCs, the server will output the mapping changes of all those
MABOCs.  It doesn't matter if sometimes this site is dead, or if
sometimes, packets are not sent.  As long as most of them are
sending, all will be well.  At a pinch, in theory, as long as just
one of these sites in the whole world is sending the updates, all
will be well.

ISP01 sets up one or more caching ITRs and ideally two full database
query servers (QSDs).  The ITRs get their mapping from the QSDs,
perhaps by one or more levels of caching query servers (QSCs).  As
explained in the Ivip IDs, the QSD can update the mapping in ITRs for
micronets which have just had their mapping changed - for all ITRs
which were sent mapping for this micronet within some caching time.

So all ITRs currently handling packets for any micronet will have
their tunneling changed to the new ETR in a second or a few seconds.
 (If there is some glitch in connectivity to a QSD, it may take a few
more seconds to get the updates via missing payload servers - so
occasionally, some ITRs might be delayed by 5 seconds or so in
changing their tunneling.)

The next section explores different ways that ISP01 and other ISPs
can reliably get mapping for all MABs to its two QSDs.

By this time, there definitely needs to be IETF-standardised
protocols and data formats for the Replicator and QSD system and
their flooding system of packets with DTLS-protected payloads, as
described in:

   http://tools.ietf.org/html/draft-whittle-ivip-fpr-00

In the future, version 01 will be updated to include details of this
new distributed approach.

One approach would be for ISP01 to contact each of the 20 or so
MABOCs and provide the IP addresses or FQDNs of their QSD01A and
QSD01B full database query servers, asking each MABOC to send at
least two streams of mapping update packets to these QSDs.  Ideally,
these streams would come from geographically and topologically
diverse sites of each MABOC to provide redundancy.

This could work, but it is administratively cumbersome and it would
require each MABOC site to send out a lot of packets once lots of
ISPs request this..

The first enhancement of this approach is for ISP01 and its
neighbouring ISPs to all set up Replicators close to their DFZ
routers, with cross-linking between them so they all flood each other
with the same information.

If there were 5 ISPs, each with two Replicators, they could form a
fully-meshed set so each Replicator drove packets to all the other 9
Replicators.  This would be highly robust.  The set of 10 Replicators
could get three or four feeds of mapping from all MABOCs, and as long
as just one packet with a particular payload arrived at one of the
Replicators, within a few milliseconds they would all have the same
payload.

The Replicators of each ISP would also send feeds of whatever they
receive to the one or more QSDs each ISP runs.

So now five ISPs get a robust feed of mapping information, without
each needing multiple feeds from MABOC Replicators at the MABOC's
DITR sites.

This is simple to extend.  If this fully, or partially, meshed set of
10 Replicators for 5 ISPs has a few bidirectional feeds to and from
Replicators of other ISPs, and if this pattern continues, then the
all the Replicators of all ISPs in region, country or the world could
be linked into a single partly meshed flooding system for all the
mapping information of all the MABOCs.  Such a large-scale
arrangement would have potential problems, because a single malicious
operator could add packets with extra payloads which would flood all
around the world.  So a more likely arrangement is groups of ISPs who
all trust each other setting up a partially meshed set of
Replicators, and receiving multiple feeds to this system at various
points, so even if the system is broken into two sections by an
outage, both halves will still get the full feed.

Replicators can receive feeds from Replicators all over the world,
so, with the required permission, it would be no problem to get a
feed from a distant Replicator, in addition to ones nearby.

Although I have described a Replicator giving a feed to another
Replicator, or to a QSD, the DTLS link over which this occurs is
established by the recipient, and can only be set up with the
credentials which the sending Replicator allows.  So there is no
possibility of a recipient receiving payloads from uninvited sources.

Technically, this is the guts of the new highly distributed fast-push
mapping system for Ivip.  There is no distinct tree-like structure of
unidirectional replication of payloads.  A relatively free-form cross
linking of multiple ISP's Replicators will work just fine.

Since the MABOCs already have sites around the world, with secure
(probably private network) links getting mapping to them in
real-time, it is straightforward to send this to nearby ISPs.

The payloads are only accepted by the QSDs and used to update the
mapping after being authenticated with the public key of the MABOC
which generated them.  So it will not be possible to inject bogus
mapping information into QSDs.

The time between the end-user sending the mapping command to the
MABOC (or via one or more UASes the MABOC uses) and the mapping
arriving at QSDs all over the world could probably be less than a
second.  I can get a packet from Melbourne to pretty much any host in
the world in about 200ms, so the fast-push mapping system could, in
principle, work very quickly.

There probably needs to be a system of lost-packet servers, as
described in draft-whittle-ivip-fpr-00 to deal with situations where
whole meshed sets of Replicators have been temporarily unreachable,
so none of them got packets with particular payloads.  Also, as
described in that ID, the MABOCs (rather than the RUASes in the
current version) would have servers by which QSDs could download
snapshots of the mapping of each MAB.  They would do this during
boot-up, and to resynch if there was too many lost payloads due to a
major disruption in connectivity.

Here are some administrative elaborations.  Ivip can't stop one
person's activity being a burden on others - but its technical
structure is intended to facilitate commercial arrangements so that
burdens are paid for by those who benefit from them.

One elaboration is to help ISPs get feeds or mapping data.  Rather
than asking 20 individual MABOCs for feeds, each ISP should be able
to ask a single consortium or mapping coordination company which
represents all MABOCs and coordinates how their Replicators accept
requests from ISP's Replicators.  Likewise, the consortium would
coordinate the ISP's Replicators, giving them the FQDNs of the
Replicators from which feeds would be sent, and username and
passwords to establish the DTLS sessions with those Replicators.

If some MABOCs didn't like this consortium, coordinating company or
whatever, they could form another.  If there were a handful of such
coordinating companies, then this would still be better than each ISP
having to negotiate separately with 20 - or 200+ - different MABOCs.

What if some MABOCs sent a very high number of updates?  ISPs might
be reluctant to have their QSDs labouring away updating their
database so frequently.  So perhaps the MABOC companies might need to
pay the ISPs according to the number of updates they send.  It would
be very much in the interests of the MABOCs to have the ISPs take
their updates and run ITRs covering their MABs - since this improves
the service for the MABOC's EUN customers, and reduces the load on
their DITRs.

A small ISP which wasn't sending much traffic to a MABOC's EUNs
wouldn't have much cause to expect payment from a MABOC for accepting
its mapping changes, but a big ISP might.  The MABOC decides how it
charges its EUN customers for each mapping change, so there would be
a perfectly good basis for market-based mechanisms balancing out
these payments.  If a MABOC found it was costing more to get all ISPs
to accept its large volume of updates, than it was receiving from its
EUN customers for making these changes, then it must be charging too
little per update.  Increasing the fee will reduce the volume of
updates and/or provide more funding for paying the ISPs to accept them.

Mapping changes due to multihoming service restoration will be
infrequent and highly insensitive to cost pressures.  Mapping changes
due to TTR mobility will be infrequent, since the would only
typically be made when the MN moves more than 1000km.

Mapping changes for dynamic inbound TE - steering streams of incoming
traffic dynamically between different ISP links - could be an
extremely valuable business for end-user networks, if it enabled them
to run their links at higher average levels than usual, while
generally avoiding congestion.  There could be a huge demand for this
kind of mapping change, depending on how low the price per change
was.  This could easily be the most common class of mapping change -
and the MABOCs would compete to make their price per change low
enough, while still getting enough from these dynamic TE-using EUNs
to pay ISPs whatever they need to accept this increased number of
mapping changes.


Rather than money traversing from MABOCs to multiple ISPs, it is
possible that the mapping coordination companies could be the conduit
for these payments.  Then the ISP would only deal with one or a few
such coordinating companies.  The coordinating companies would charge
the MABOCs for ensuring their mapping updates were accepted by all
ISPs.  I think this new distributed mapping arrangement provides a
good technical basis for a flexible and commercially viable food-chain.


The above description shows how Ivip services could begin with few
technical standards and one or a few companies operating alone, and
then grow to a globally coordinated system, which is nonetheless
highly decentralised in both technical and commercial senses.

One or more companies could be providing TTR mobility services:

  draft-whittle-ivip-fpr-00

giving globally mobile SPI IPv4 addresses or IPv6 /64 prefixes, using
 the above systems, including from the very start.

The TTR company could be a MABOC, or it could be separate, and send
mapping change commands to the MABOC whose micronet the mobile
customer wanted to be used on the mobile device.

Commercial services for Ivip-style portability, multihoming and
inbound TE could be started on a relatively modest basis, before
there were any Replicators, ITRs in ISPs etc.  The TTR Mobility
extensions require more complex software in the TTRs themselves and
in the MNs, but would probably be highly valued by a much greater
number of end-users - potentially hundreds of millions, and
ultimately billions.

  - Robin      http://www.firstpr.com.au/ip/ivip/

_______________________________________________
rrg mailing list
rrg@irtf.org
http://www.irtf.org/mailman/listinfo/rrg

[rrg] Ivip's new distributed mapping distribution system

Reply via email to