Short version: Detailed discussion, with examples, of anycast ETRs
anycast hosts and Bill's current real-world project
which is arguably anycast.
Ivip (and I guess other core-edge elimination
schemes) handle anycast hosts just fine - no extra
complications.
Anycast ETRs are also possible with Ivip, but they
would typically require their own MAB prefix and
so would not involve any reduction in burden on the
DFZ compared to doing ordinary anycast hosts as is
done today.
I can't imagine how a core-edge elimination scheme
would do anycast hosts. But that is a question for
those who design such things. I think ILNP is the
only such design which is intended to help with
routing scalability (this is not a goal of HIP) -
and I don't understand ILNP clearly enough to try
to think about it doing something like anycast.
Hi Bill,
You wrote, in part:
> Anycast, on the other hand, is precisely the same as unicast in both
> the forwarding and data planes...
I agree. As I wrote:
http://www.ietf.org/mail-archive/web/rrg/current/msg04884.html
Routers can't tell the difference between anycast and unicast. Only
someone with complete knowledge of the network can determine which is
which. If there are two routers advertising the host's prefix, that
could be either. If each such router sends its packet to the one
destination host, it is unicast. If each sends its packets to a
separate destination host, it is anycast.
> And as I mentioned, it's relatively trivial to get it to look
> precisely the same in each of the two major solution strategies
> as well.
I think this is probably not true of Strategy B (core-edge
elimination, in which new host stack and perhaps application
functions do all the work and end-user networks never have their own
stable address space.
Most of this message considers anycast in a core-edge separation
scheme (Strategy A, no host changes - such as LISP, APT, Ivip and
TRRP).
I think core-edge separation schemes naturally support globally
anycast hosts without any difficulties. They may also be able to
support anycast ETRs - but that is somewhat separate question, since
you could have anycast ETRs and a single, unicast, host. Anycast
ETRs and anycast hosts may also be possible.
Basically, core-edge separation schemes seem to support anycast hosts
naturally and without any extra complication. I can't easily imagine
how a core-edge elimination scheme could do so without a lot of extra
complexity. This is one more reason for me preferring core-edge
elimination over separation. Other reasons include:
1 - Separation is possible to introduce voluntarily, since it
involves no changes to host stacks or applications. It seems
very challenging or impossible to introduce core-edge
elimination on a voluntary basis due to the need for stack and
probably application changes an the consequent lack of
support in the new system for hosts without these upgrades.
2 - I think core-edge separation schemes support the best division
of labour between hosts and the routing system: the host gets
a stable IP address and doesn't have to worry about network-
based things like multihoming, outages, reconfigurations,
portability, TE, mobility etc. (Supporters of core-edge
elimination schemes prefer the network to do less and the
hosts to do more.)
Perhaps a more stringent statement is true:
Within a core-edge separation scheme, it will be just as
impossible for a router or the routing system in general
to tell the difference between anycast and unicast as it
is without the scheme.
If so, then I think there is no extra effort required on the part of
host anycast operations or in altering the core-edge separation
scheme to make host anycast indistinguishable from host unicast.
However . . . "host anycast" with a core-edge separation scheme does
not necessarily mean "ETR anycast". The two could go together, but
without "ETR anycast" a core-edge separation scheme with "host
anycast" alone doesn't look much like the common conventional use of
"host anycast".
With ordinary BGP (and/or an IGP I guess, or the two combined?), just
looking at the routing system, we can't distinguish between unicast
with two separate routers advertising the same address (for
simplicity assumed here to be advertising the same prefix) and anycast:
R2------R5--\
/ \ | \
SH---->R1 R4---R6 DH Unicast with R5 and R7 advertising
\ / | / the same prefix which includes the
R3------R7--/ one host.
Fig 1 \___________/ << Scope of the routing system
R2------R5--->DH1
/ \ | Anycast: same address used by two
SH---->R1 R4---R6 hosts, each host connected to a
\ / | separate router. Each router
R3------R7--->DH2 advertises the same prefix.
Fig 2 \___________/ << Scope of the routing system
Any router and indeed the entire routing system can't tell whether it
is host anycast or not - and host anycast works just fine, within its
well-known limits of being unable to support session based
communications unless the destination hosts have a link and share
session state.
As I wrote in the earlier message, thinking of Ivip for the moment,
where ETRs are typically in the ISP (ignoring mobility for this
discussion - anycast and mobility make no practical sense to me), we
could have anycast ETRs and/or with anycast hosts.
In the following, for brevity, I will assume just the BGP routing
system, but there will also be IGP routers which are not shown.
ITR ETR
R2------R5
/ \ | \ Single ETR with dual border routers
SH-I1->R1 R4---R6 E1->DH advertising its address.
\ / | / Single destination host = unicast.
R3------R7
\___________/ << Scope of the BGP routing system
\_/ \_/ << Scope of the core-edge separation
scheme
\_____________________/ << Scope of the now extended
interdomain routing system
Fig 3
In Fig 3 we have a destination host DH which is part of an SPI-using
(Ivip's Scalable PI space, of individually mapped micronets) end-user
network. That network is not shown, but is in the same location as DH.
This host's network's ETR E1 is single-homed to an ISP which has two
border routers R5 and R7, both of which advertise the ISP's prefix
which encompasses E1's address. Many other end-user networks can use
this ETR, and many other ETRs and PI customers can use this prefix,
so Ivip is being used here in a way which greatly reduces the burden
on the BGP control plane compared to each end-user network having its
own BGP advertised prefix.
Fig 4 shows a multihomed example of the same arrangement:
ITR ETRs
R2------R5
/ \ | \ Single ETR in each ISP. Each ISP
SH-I1->R1 R4---R6 E1 has dual border routers advertising
\ / | / \ the ETR address.
R3------R7 \
| \
R9------R10 DH Single destination host = unicast.
/ \ \ /
R12--R13 E2--/
\ / /
R14-----R11
Fig 4
ISP1 has ETR E1 whose address is advertised by R5 and R7.
ISP2 has ETR E2 whose address is advertised by R10 and R11.
This is a typical core-edge separation approach to a multihomed
end-user network. With LISP, the ETRs would be at the destination
network. With Ivip, they could be, but would probably be in the ISP
networks, where each ETR would also be serving multiple other
end-user networks, via its one IP address. These ETR devices may
also be ITRs. (For TTR mobility, E1 and E2 are Translating Tunnel
Routers - and they may be near, or perhaps within, the access
networks where DH has a care-of address.)
The next diagram depicts the theoretical possibility of "anycast"
ETRs. This is extending the usage of "anycast" somewhat, since it
normally only refers to destination hosts. However, from the point
of view of the packet being tunneled from the ITR to the ETR, the ETR
is the destination host of the tunneled packet. If there are two
ETRs on the same address, with separate routers advertising this
address, then it is fair to think of the ETRs themselves being "anycast".
Note regarding Modified Header Forwarding:
The above is fine for encapsulation. I haven't considered
"anycast ETRs" with Modified Header Forwarding. I think it
is possible, but haven't considered the details or what
benefits there might be.
Whether or not there are one or more destination hosts is a separate
issue.
ITR ETRs
R2---- R5--->E1
/ \ | \ "Anycast" ETRs
SH-I1->R1 R4---R6 DH Single host (functionally unicast)
\ / | /
R3-----R7--->E2
Fig 5
This should be possible with Ivip's non-encapsulation approaches:
Modified Header Forwarding (MHF) across the DFZ. With the two
separate MHF approaches for IPv4 and IPv6, there is no communication
whatsoever from the ETR to the ITR. The ITR neither knows nor cares
whether there are 2 separate "anycast" ETRs or not. As far as host
communications are concerned, the situation is clearly unicast.
Maybe there would be reasons for doing this.
With encapsulation, it would be trickier, since the Ivip ITRs do
sometimes need to engage in two-way communication with ETRs to manage
PMTUD - Path MTU Discovery. I will assume for the moment this could
be done. (The communications are brief and the system would probably
cope with a router switching its forwarding so the ITR found itself
dealing with E2 instead of E1.)
Changes in the routing system, such as R1 choosing R3 instead of R2
as its next hop for the ETRs prefix, may still result in a differing
PMTU between I1 and whichever ETR receives the packet. But that
would have occurred in the Fig 4 and Fig 5 scenarios anyway. I will
consider this anycast ETR scenario when developing Ivip's PMTUD system.
No matter what the ETR arrangement, assuming it works, the above is
still unicast from the point of view of the host. As with Fig 1 and
2, the BGP routing system can't tell the difference between host
unicast and host anycast.
If encapsulation is used, for PMTUD handling, the core-edge
separation scheme may need to have special mechanisms to cope with
anycast ETRs which presumably don't communicate directly with each
other.
In the above diagram, there is only one ETR address for the micronet,
or in LISP terms, only one RLOC address for the EID. Multihoming is
not being attempted, but this could still be used for portability,
including for the TTR mobility model.
So in this case, there is no multihoming failure detection decision
to make. With both Ivip and LISP, the ITRs always tunnel packets to
the one ETR address. If one ETR becomes unusable, then packets will
be lost as long as its router still advertised that prefix. This is
the risk taken by whoever chose to have anycast ETRs. However, if
both ETRs are working fine and can connect to DH, then if R5 dies, or
becomes unreachable to the rest of the Net, then BGP will adapt and
cause all tunneled packets from all ITRs to go to R7 and therefore
E2, with no significant disruption in host-to-host connectivity.
There could be significant differences between how Ivip and LISP
would handle multihoming failures if there were anycast ETRs at one
or both ETR addresses. Fig 6 depicts two ISPs each with two anycast
ETRs, each with its own border router advertising the ETR address to
the DFZ. There is still a single destination host. I am not sure
why anyone would do this, but there may be a reason.
ITRs ETRs
R2------R5-->E1
/ \ | \ Each ISP has anycast ETRs.
SH-I1->R1 R4---R6 \
\ / | \
R3------R7-->E2---\
| DH Single destination host = unicast.
R9------R10->E3---/
/ \ /
SH2-I2-R12--R13 /
\ / /
R14-----R11->E4
Fig 6
This also shows a second sending host SH2 with its ITR I2.
With LISP, APT or I guess TRRP, each ITR figures out for itself which
of the two ETR address in the mapping it will use for tunneling
packets with the EID address of DH.
With LISP, I will assume the E1/E2 RLOC address is preferred in the
mapping, when the ITR can reach both E1/E2 and E3/E4. For example,
the initial path taken through the routing system by the encapsulated
packets is:
I1 R1 R2 R5 E1
I2 R12 R9 R3 R7 E2
(I2's packets may well have gone via R4, R6, R6 and then to E1, but I
want an example in which I2's packets go to a different ETR than E1,
even though they are anycast on the same address.)
If R7 dies, I2 will still find the RLOC address of E1/E2 reachable,
since the BGP routers will quickly readjust themselves to send the
packets:
I2 R12 R9 R3 R4 R6 R5 E1
That is fine, since E1 is just as good as E2 (other than any problems
with potential session state in ITR<->ETR communications - maybe LISP
has no such stateful ITR<->ETR communications). However, going back
to the initial state, if E2 dies, or can't reach DH and in either
case R7 still advertises its address, then there will be trouble
which LISP's I2 ITR will recover from:
I2 will find (by some means, I am not sure what) that E2 is
unreachable. Since R7 is still advertising the prefix which
encompasses the E1/E2 ETR address, there's no way I2's packets to
this address will get to E1.
So I2 will figure out that the ETR (it assumes one ETR) at the
address of E1/E2 is dead and will therefore use the other RLOC
address instead - that of E3/E4. Now, the two paths used by packets
will be:
I1 R1 R2 R5 E1
I2 R12 R9 R10 E3
The free-wheel'in LISP ITR will have figured out its own arrangements
for coping with a partial outage in the reachability of what, to the
LISP system, is a "single" ETR.
The situation with Ivip might be less happy, since Ivip ITRs don't do
any reachability testing (except, as a by-product of PMTUD
management, which is only needed for encapsulation). With the above
example, both ITRs would initially be tunneling packets to a single
ETR address: E1/E2. Ivip ITRs have only a single ETR address to use.
It is up to the end-user network to send a mapping change if it wants
the ITRs to tunnel packets to another ETR address.
With the above failure of E2 on its own, with R7 continuing to
advertise E2's address, I2 would not be able to get packets to DH at all.
The only way the Ivip system could respond to the outage properly
would be for the mapping to be changed to the E3/E4 address instead.
(With Ivip, an end-user network would typically pay a probing company
to constantly probe reachability and to issue mapping changes as
necessary to restore connectivity in the even of one ETR being
unusable - by sending a mapping change with the address of the other ETR.
Therefore, if an end-user network chose to have anycast ETRs, it had
better implement some fancy scheme for testing that every single one
of them is reachable from wherever ITRs might be - and that that each
such anycast ETR is able to get packets to the destination network.
Depending on how topologically separated E1 is from E2, and likewise
E3 from E4, and depending on how many probing servers were used, and
where they were topologically located, it may or may not be possible
to achieve this.
I am not sure why anycast ETRs would be an advantage. Ivip could
probably support them for MHF, and probably for encapsulation too. I
doubt if LISP-ALT could support them, because (AFAIK) there is so
much ITR-ETR communication and because ETRs are the authoritative
source of mapping and so have to communicate with the ALT network,
probably via Map Servers. Assuming Ivip could support anycast ETRs,
special care would be needed in probing the connectivity of all such
ETRs.
The question of whether ETRs are anycast is in some ways separate
from whether the destination hosts are anycast. Figs 3, 4, 5 and 6
show anycast ETRs with a single unicast DH destination host.
It is easy to have a core edge separation with unicast ETRs and
*locally* anycast destination hosts. Here is an adaptation of Fig 4
showing three destination hosts, all with the same IP address, in the
same end-user network:
ITR ETRs
R2------R5
/ \ | \ Single ETR in each ISP. Each ISP
SH-I1->R1 R4---R6 E1 has dual border routers advertising
\ / | / \ the ETR address.
R3------R7 \ DH1
| \ /
R9------R10 [network]--DH2
/ \ \ / \
R12--R13 E2--/ DH3
\ / /
R14-----R11 Three destination hosts = locally
Fig 7 unicast.
This local use of anycast clearly has nothing to do with the BGP
routing system or the core-edge separation system, which would
support it perfectly.
What about anycast ETRs each with their own destination hosts, which
are therefore also host anycast?
ITRs ETRs
R2------R5-->E1->DH1
/ \ | Each ISP has anycast ETRs.
SH-I1->R1 R4---R6
\ / |
R3------R7-->E2->DH2
|
R9------R10->E3->DH3
/ \
SH2-I2-R12--R13 Four destination hosts, each one
\ / anycast in a global sense.
R14-----R11->E4->DH4
Fig 8
E1 and E2 are on the same (RLOC) address, which is in a prefix
advertised by both R5 and R7. Likewise, E3 and E4 are both on
another (RLOC) address, which is advertised by both R10 and R11.
Note: Just because this could be made to work doesn't
mean it is a good idea or that this is a use of Ivip
etc. which helps with scalable routing.
Fig 8 could in principle help with routing scalability
since the ETRs E1/E2 are on one address in a prefix of
one ISP and the E3/E4 ETRs are on another address of
another ISP. However, it is only a scalable use of Ivip
if these same prefixes can be used by many other end-user
networks. Presumably R5 and R7 are geographically at
very different locations, otherwise there would be little
point in having separate ETRs. But why would an ISP do
this? Maybe they would. Maybe other customers want a
similar arrangement. But if they don't, then this
represents an irregular use of a prefix and probably the
only beneficiary is going to be for network with above
ETRs. That makes it one or two BGP advertised prefixes
for one end-user network, which achieves no benefits
in routing scaling.
See also Figs 9 and 10 for a similar example. It might
be best not to use Ivip or any other core-edge separation
technique.
I think this would have similar potential problems as the scenario
depicted in Fig 6: LISP-ALT probably not going to work, due to the
difficulties supporting the complex communications of ETRs with ITRs
and the ALT network. If it did, the free-wheel'in ITRs may be able
to cope on their own with some outages which would be trickier to
detect and cope with if the system was Ivip.
This is genuinely host anycast from the point of view of the hosts.
However, the fact that it is host anycast is only visible to the hosts.
As far as I can see, in all these scenarios, whether the destination
host is unicast, locally anycast (Fig 7) or globally anycast (Fig 8),
the BGP system and the core-edge separation system both work fine and
dandy. They can't tell anything about the unicast/anycast status of
the destination hosts *because* the core-edge separation scheme has
nothing at all to do with hosts.
I would be surprised if the same would be true of a core-edge
separation scheme. There, the hosts have much more complex stack and
perhaps application software. I can't imagine how this would work
with two physical hosts having the same IP address, within the one
local network (Fig 7) or in separate topological places as in Fig 8.
So I think this is another reason to prefer core-edge separation
schemes over core-edge elimination schemes. The former continues to
allow for the use of anycast hosts, and the latter either doesn't
allow it, or would involve some additional complications to make it,
or something similar to it, work. However, perhaps in a core-edge
elimination scheme, all the things done by anycast at present are no
longer needed. (I can't imagine how, but I don't clearly understand
any core-edge elimination schemes.)
I don't support core-edge elimination schemes (AKA Strategy B in
Bill's taxonomy) and will leave someone else to debate how they would
work with, or replace, anycast hosts.
>> What are the requirements for Anycast in a new architecture?
>
> Well, that's the crux of the discussion, isn't it? Anycast and unicast
> are identical in every respect in the current architecture. How do we
> bound the unforeseeable consequences if that isn't true in the new
> one?
Hopefully the above treatise will be a starting point for foreseeing
any differences and at least some of the consequences.
> I'm getting ready to introduce an anycast route into the table in
> order to implement a "continuing operations" system with three sites
> half a world apart. "Continuing operations" is a fancy phrase for
> "disaster recovery," something that became really popular almost 8
> years ago. Once a packet hits any of the always-running sites, a VPN
> takes it back to the site with the servers flagged "best" for that
> particular address, so holistically its unicast but from the
> perspective of the Internet core its anycast from three distinct
> locations.
If you had a single server at your single data centre, then this
would be an ordinary unicast server with three separate "border
routers" in your global VPN-linked network.
If you had three separate servers somehow operating on the same
address in your data centre, this would be locally anycast hosts with
three separate routers advertising the same prefix. I am not sure
why anyone would do this, or how they would work except for stateless
communications, but it would be possible. Presumably there would be
three routers in the local network, one for each host, and each one
would stop advertising the address as soon as its host died. That
would give automatic failover.
I understand you have three separate hosts, in the one data centre,
with the same IP address (at least as far as their communications
with the outside world are concerned - they would have other
addresses so you could talk to them separately via TCP etc.). You
have some system for directing packets to the three servers based on
which VPN they arrived from. You presumably also take the emitted
packets from each server and send them out on the matching VPN.
Assuming there is a direct 1:1 relationship between VPN and server,
then I think you have a genuine host anycast arrangement. However,
the servers are co-located and could, in principle, share session
state. If they did this, then you could use them for session-based
communications.
Anycast servers are not inherently incapable of being used for
session-based communications. It is just that they are usually
assumed to be physically separate and therefore incapable of sharing
the state of each session.
I think you could implement your project with Ivip. You want a
single IP address for your three servers and you want packets sent to
this address to be tunneled to your data centre via physically
separate routers at geographically and topologically distant sites
SA, SB and SC. These routers connect to your data centre by
presumably robust VPN links.
I think the motivation for this must be one or the other or both of:
1 - You don't trust the interdomain routing system (BGP + Ivip
or whatever) to get packets from sending hosts to your
data centre.
AND you would prefer to rely on the interdomain routing
system to get the packets to the generally closer sites
SA, SB and SC together with the costs, reliability etc.
inherent in the VPN links to your central data centre.
This may be motivated by concerns over path lengths, delays
packet loss rates etc. It may also be motivated by wanting
to reduce the ability of certain parties to observe the passage
of these packets between those hosts and your servers.
2 - You want to use the approximate geographical / topological
information implied about the sending host due to its
packet arriving at one of your S1, S2 or S3 sites.
So you presumably choose to do this rather than use a
geographically / topologically sensitive DNS system to
give out different IP addresses to hosts in different
areas of the Net, as is done by Akamai.
I guess the way you are achieving this now is like:
S1
R2------R5---\
/ \ | \
SH---->R1 R4---R6 \
\ / | \ /-DH1
R3------R7------[Data center]--DH2
| S2 / \-DH3
| /
R9------R10---/
/ \ / S3
SH2----R12--R13-/ 3 destination hosts, each one
\ / anycast in a global sense since they
R14 have the same address.
Fig 9
All three routers R5, R7 and R10 advertise the same prefix. Since you
can't use this prefix anywhere else, unless you are using it also for
some other purpose, you have one prefix of your own for this
particular project. This is either your own PI prefix, or some
prefix you have convinced a single ISP to advertise (and they have
routers at all three sites) or you have somehow convinced separate
ISPs to advertise the one prefix which is not yours. In the former
case you have portability and in the latter two you do not.
If R5, R7 and R10 are yours and you multihome them like this, then it
should work pretty well, provided each router stops advertising the
prefix the moment the VPN link dies and if your data centre and
servers copes OK with one region's hosts suddenly connecting via one
of the other routers and therefore using a different VPN. To survive
that, if your servers use session-based protocols, they need to be
able to share the session state.
To achieve point 1 with Ivip, you need to have an ETR at each site
S1, S2 and S3. These need to have the same IP address, since the
destination hosts have the one IP address and therefore all ITRs will
use the same mapping. So these will be three anycast ETRs.
(Side-note on LISP: If you were doing this with LISP, but LISP
couldn't support anycast ETRs, you could make it happen by
having each ETR on its own RLOC address and then have all three
addresses in the mapping. Each ETR would have to somehow drop
packets with source addresses which it decided were indicative
of the source packet not being in its area. Then, each ITR would
discover that only one of the ETRs was reachable. How could the
ETRs do this? They would need to look at the outer source
address of the packets, which shows the ITR address, and do some
fancy algorithm to figure out where the ITR was in its
preferred area or not. This cannot be discovered from the BGP
system by any conventional means. So this is much messier and
probably less effective than using anycast ETRs with Ivip, in
which case the BGP system does all the work of sending packets
from ITRs in a given "area" as defined by the DFZ's current
forwarding behaviour.)
ITRs ETRs
S1
R2------R5-->E1 3 Anycast ETRs - all the same address.
/ \ | \
SH-I1->R1 R4---R6 \
\ / | \ /-DH1
R3------R7-->E2-[Data center]--DH2
| S2 / \-DH3
| /
R9------R10->E3
/ \ / S3
SH2-I2-R12--R13-/ 3 destination hosts, each one
\ / anycast in a global sense since they
R14 have the same address.
Fig 10
To do this, R5, R7 and R10 would all be advertising the same prefix.
The most likely arrangement is that you have this prefix only for the
use of this particular set of sites, and this set of 3 servers - and
perhaps also for other similar sets of servers.
This would mean that you are either operating your own routers
directly, or that you are getting someone else to advertise your prefix.
As long as you were the only end-user network using this arrangement,
then there is no benefit to using Ivip, at least in terms of
scalability. Your prefix would be a MAB (Mapped Address Block) and
this would be only used for your end-user network. Therefore the
scaling benefits of Ivip - many end-user networks sharing the one MAB
and therefore spreading their burden on the BGP control plane over
exactly one prefix, rather than one or more for each network - would
not apply in this case. (The same would be true of LISP, APT or TRRP.)
Unless there was some other benefit to using Ivip, I suggest you not
bother with it and continue to do exactly what you are doing now - as
in Fig 9.
In some core-edge separation schemes, there is talk of "transition
periods" and the like, as if one day, it won't be possible for an
end-user network to have its own conventional BGP-mapped prefix.
That is not the case with Ivip. So you would always be able to do
what you are currently doing. It is inherently unscalable, but
Ivip can't do exactly what you want in a way which helps with
scalability - unless there are other end-user networks such as your
own which also want to use these ETRs or ETRs in the same prefix in
the same sites S1, S2 and S3.
> Would this have any hope of working right in an architecture that
> didn't accommodate anycast? How many such uses are out there, ready to
> impede the deployment of the unwary plan?
You can do it now. You could probably do it with Ivip (if
encapsulation is used for Ivip, I would need to ensure that the PMTUD
stuff worked OK with anycast ETRs - I am not sure this is possible),
but you are probably better doing it as you do now. LISP probably
couldn't do it.
AFAIK, as long as you are the only network with anycast anything in
these three locations, then you need your own BGP advertised prefix.
You need this for the way you do it now and for Ivip - so there are
no scaling advantages to using Ivip.
How would you do this with a core-edge elimination scheme? That
would have to be answered by someone who designed such a scheme.
- Robin
_______________________________________________
rrg mailing list
[email protected]
http://www.irtf.org/mailman/listinfo/rrg