Re: IP failover/migration question.

2006-07-05 Thread Michael . Dillon

> It's actually a rather frustrating
> situation for people who aren't big enough to justify a /19 and an
> AS#, but require geographically dispersed locations answering on the
> same IP(s).

If the number of IPs you require is small, then you can
probably solve the problem with IPv4 anycasting. Several
people have built out distributed anycast networks but 
the problem is that they think IPv4 anycast is a "DNS thing".
Therefore they don't sell anycast hosting services to
people like you who need it.

Of course, if you made them more aware of market
demand, this could change.

--Michael Dillon



Re: IP failover/migration question.

2006-06-27 Thread Andy Davidson


[EMAIL PROTECTED] wrote:


Andy Davidson wrote:
 

24 hours + outage whilst stale dns disappears will never do in  
internet retail.
   


And yet, with 90% of the net implementing the "will never do" scenario,
we manage to get a lot of internet retail done anyhow.  I'm obviously going
to need a *lot* more caffeine to sort through that conundrum 
 




Hi, Vladis --

Thanks for the email.  Unlike a number of ISPs operating in the same 
geographic region as ourselves, a loss of core systems or connectivity 
causes something more than the sweat on the forehead of the people 
responsible for SLA-credit-avoidance... it causes a 100% revenue loss 
for the period of the outage.


When you reach a certain size, in the UK at least, your business 
insurance partners will expect you to demonstrate how you have taken 
steps to avoid this entire outage. 

When the company reaches another size, you may find your continued 
employment also depends on some degree of constant availability.


This is what I mean by 'will never do'.  A lot of small firms can sit 
out an evening without trading. To us, this represents 'real pain'. 


Best wishes,
Andy


Re: IP failover/migration question.

2006-06-27 Thread infowolfe


On 6/27/06, Owen DeLong <[EMAIL PROTECTED]> wrote:


> Uptime might not matter for small hosts that do mom and pop websites
> or so-called "beta" blog-toys, but every time Level3 takes a dump,
> it's my wallet that feels the pain. It's actually a rather frustrating
> situation for people who aren't big enough to justify a /19 and an
> AS#, but require geographically dispersed locations answering on the
> same IP(s).

I'm not sure why you think you need to be that big to get portable IP
space.  Policy 2002-3 allows for the issuance of a /22 to any organization
which can show a need and the ability to utilize at least 50% of a /22
with multihoming.  An ASN can be obtained pretty easily if you intend
to multihome.  About the only thing that might stand in the way of
a small organization is the up front cost, but, even that is less than
$2000.



It's entirely possible that I was mistaken with regards to /19 vs /22,
but a /22 is still way more ips than I really need, I mean hell, I'm
not really using my /24 currently. I don't nearly have 256 machines,
and I certainly (without honepotting almost all of it) justify 1,024
ips.

In fact, in my network infrastructure currently, I've got one
loadbalancer that sits in front of 6 machines that don't have public
ips, so there goes any thought of justification ;-) and yet, when I'm
at 4 load balancers, I'll want them in geographically dispersed
locations, with a variety of upstream providers so that I don't have
to deal with the issues surrounding single-homed networking.


Re: IP failover/migration question.

2006-06-27 Thread Owen DeLong

> Uptime might not matter for small hosts that do mom and pop websites
> or so-called "beta" blog-toys, but every time Level3 takes a dump,
> it's my wallet that feels the pain. It's actually a rather frustrating
> situation for people who aren't big enough to justify a /19 and an
> AS#, but require geographically dispersed locations answering on the
> same IP(s).

I'm not sure why you think you need to be that big to get portable IP
space.  Policy 2002-3 allows for the issuance of a /22 to any organization
which can show a need and the ability to utilize at least 50% of a /22
with multihoming.  An ASN can be obtained pretty easily if you intend
to multihome.  About the only thing that might stand in the way of
a small organization is the up front cost, but, even that is less than
$2000.

Owen


-- 
If it wasn't crypto-signed, it probably didn't come from me.


pgpKcOw8cZD6r.pgp
Description: PGP signature


Re: IP failover/migration question.

2006-06-27 Thread Patrick W. Gilmore


On Jun 27, 2006, at 1:32 PM, Gregory Hicks wrote:

And yet, with 90% of the net implementing the "will never do"  
scenario,
we manage to get a lot of internet retail done anyhow.  I'm  
obviously going

to need a *lot* more caffeine to sort through that conundrum


OR, it *could* be that the retailers know about the way 'things'
operate and don't make many changes once they get their site up...


You don't know many retailers, do you? :-)

Besides, changes to the site wouldn't help or hurt a network fault.

--
TTFN,
patrick


Re: IP failover/migration question.

2006-06-27 Thread Gregory Hicks


> From: [EMAIL PROTECTED]
> Date: Tue, 27 Jun 2006 11:20:38 -0400
> 
> On Tue, 27 Jun 2006 14:51:30 BST, Andy Davidson said:
> > Popular web browsers running on popular desktop operating systems  
> > also display extra-long dns cache time 'bugs'.
> 
> A well known fact, which leads right into your next comment...
> 
> > 24 hours + outage whilst stale dns disappears will never do in  
> > internet retail.
> 
> And yet, with 90% of the net implementing the "will never do" scenario,
> we manage to get a lot of internet retail done anyhow.  I'm obviously going
> to need a *lot* more caffeine to sort through that conundrum 

OR, it *could* be that the retailers know about the way 'things'
operate and don't make many changes once they get their site up...

---

I am perfectly capable of learning from my mistakes.  I will surely
learn a great deal today.

"A democracy is a sheep and two wolves deciding on what to have for
lunch.  Freedom is a well armed sheep contesting the results of the
decision." - Benjamin Franklin

"The best we can hope for concerning the people at large is that they
be properly armed." --Alexander Hamilton




Re: IP failover/migration question.

2006-06-27 Thread david raistrick


On Tue, 27 Jun 2006, infowolfe wrote:


it's my wallet that feels the pain. It's actually a rather frustrating
situation for people who aren't big enough to justify a /19 and an
AS#, but require geographically dispersed locations answering on the
same IP(s)


All you need is the ASN, you don't need your own IP space.  I've happily 
announced down to /24s (provider provided and PI) without any issues, 
where needs required it, and so do many others.


...david

---
david raistrickhttp://www.netmeister.org/news/learn2quote.html
[EMAIL PROTECTED] http://www.expita.com/nomime.html



Re: IP failover/migration question.

2006-06-27 Thread Patrick W. Gilmore


On Jun 27, 2006, at 12:26 PM, Randy Bush wrote:


Could you imagine slashdot, amazon or google going down for 24 hours?
I think there would be panic in the streets.


piffle

there would certainly be panic inside of those organizations trying
to fix things.  but most people have real lives.


Inside those organizations.  Inside their transit providers.  Inside  
the call centers of the big eyeball networks.  Etc.  It would be a  
Big Deal on the 'Net.


Of course, the Internet is not "real life", but neither is just about  
any other business.  That doesn't mean they aren't important.   
Although I do admit "panic in the streets" is hyperbole - but I  
recognized it as hyperbole.  (Not that YOU have ever used hyperbole  
in your posts, Randy. :-)


--
TTFN,
patrick


Re: IP failover/migration question.

2006-06-27 Thread Randy Bush

> Could you imagine slashdot, amazon or google going down for 24 hours?
> I think there would be panic in the streets.

piffle

there would certainly be panic inside of those organizations trying
to fix things.  but most people have real lives.

randy



Re: IP failover/migration question.

2006-06-27 Thread infowolfe


On 6/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

On Tue, 27 Jun 2006 14:51:30 BST, Andy Davidson said:
> Popular web browsers running on popular desktop operating systems
> also display extra-long dns cache time 'bugs'.

A well known fact, which leads right into your next comment...

> 24 hours + outage whilst stale dns disappears will never do in
> internet retail.

And yet, with 90% of the net implementing the "will never do" scenario,
we manage to get a lot of internet retail done anyhow.  I'm obviously going
to need a *lot* more caffeine to sort through that conundrum



Could you imagine slashdot, amazon or google going down for 24 hours?
I think there would be panic in the streets.

Uptime might not matter for small hosts that do mom and pop websites
or so-called "beta" blog-toys, but every time Level3 takes a dump,
it's my wallet that feels the pain. It's actually a rather frustrating
situation for people who aren't big enough to justify a /19 and an
AS#, but require geographically dispersed locations answering on the
same IP(s).


Re: IP failover/migration question.

2006-06-27 Thread Patrick W. Gilmore


On Jun 27, 2006, at 11:20 AM, [EMAIL PROTECTED] wrote:


On Tue, 27 Jun 2006 14:51:30 BST, Andy Davidson said:

Popular web browsers running on popular desktop operating systems
also display extra-long dns cache time 'bugs'.


A well known fact, which leads right into your next comment...


24 hours + outage whilst stale dns disappears will never do in
internet retail.


And yet, with 90% of the net implementing the "will never do"  
scenario,
we manage to get a lot of internet retail done anyhow.  I'm  
obviously going

to need a *lot* more caffeine to sort through that conundrum


We do because we don't wait for DNS to time out in broken browsers,  
we have actual multi-homing with real networks.


--
TTFN,
patrick


Re: IP failover/migration question.

2006-06-27 Thread Valdis . Kletnieks
On Tue, 27 Jun 2006 14:51:30 BST, Andy Davidson said:
> Popular web browsers running on popular desktop operating systems  
> also display extra-long dns cache time 'bugs'.

A well known fact, which leads right into your next comment...

> 24 hours + outage whilst stale dns disappears will never do in  
> internet retail.

And yet, with 90% of the net implementing the "will never do" scenario,
we manage to get a lot of internet retail done anyhow.  I'm obviously going
to need a *lot* more caffeine to sort through that conundrum 


pgpTwuECnPWSZ.pgp
Description: PGP signature


Re: IP failover/migration question.

2006-06-27 Thread Andy Davidson



Hi, guys

Very late reply, but this is a 'hot topic' in my space..

On 12 Jun 2006, at 04:02, Randy Bush wrote:

I'm trying to get a more clear understanding as to what is  
involved in

terms of moving the IPs, and how fast it can potentially be done.

can we presume that separate ip spaces and changing dns, i.e. maybe
ten minutes at worst, is insufficiently fast?


Ten minutes at worst, only if everyone is behaving.  Some of the UK's  
largest (in terms of consumer customer numbers) ISPs disobey short  
dns refresh times, and will cache expired or old records for 24(+?)  
hours.


Popular web browsers running on popular desktop operating systems  
also display extra-long dns cache time 'bugs'.



24 hours + outage whilst stale dns disappears will never do in  
internet retail.  BGP, two datacentres, both equivalent endpoints for  
customer traffic, same IP space, and an e-commerce application which  
will happily run 'active/active' is the holy grail, I think.  The  
problem isn't setting this up in IP, it's getting your commerce  
application to fit this model (a problem I have today).



Best wishes,
Andy


Re: IP failover/migration question.

2006-06-12 Thread Christopher L. Morrow


On Mon, 12 Jun 2006 [EMAIL PROTECTED] wrote:

>
> > clear understanding as to what is involved in terms of moving the IPs,
> > and how fast it can potentially be done.
>
> I don't believe there is any way to get the IPs
> moved in any kind of reasonable time frame for
> an application that needs this level of failover
> support.
>

There may be actually... if you don't have to be TOO far apart:

soemthing like (that no one at mci/vzb seems to want to market :( as a
product)

2 external connections (isp)
2 internal connections (private network)
2 cities (washington, DC and NYC for this arguement)
2 Metro-Private-Ethernet connections
2 Nokia Firewall devices (IP740 or IP530 ish)
2 catalyst switches
2 copies of equipment in 'datacenter' (one in each location)

Make the nokia's do BGP with the outside world, do state-sync across the
MPLE link, make the MPLE link look like a front-side VLAN, backside VLAN,
and state-sync VLAN (you could do this with a single MPLE connection of
course) announce all routes out NYC, if that link goes dark push routes
out DC link.

State sync on the firewalls Checkpoint/Nokia says will work if the link
has less than 10ms latency (or so... they aren't much with the hard
numbers on this since they noramally site in the same rack). you could
even (probably) make things work in NYC for NYC users and DC for DC
users... though backside state-sync in the apps might get hairy.

-chris


Re: IP failover/migration question.

2006-06-12 Thread Michael . Dillon

> clear understanding as to what is involved in terms of moving the IPs,
> and how fast it can potentially be done.

I don't believe there is any way to get the IPs
moved in any kind of reasonable time frame for
an application that needs this level of failover
support.

If I were you I would focus my attention on
maintaining two live connections, one to each
data centre. If you can change the client software,
they they could simply open two sockets, one for
traffic and one for keepalives. If the traffic
destination datacentre fails, your backend magic
starts up the failover datacentre and the traffic
then flows over the keepalive socket.

And if you can't change the clients, you can do
much the same by using two tunnels of some sort,
MPLS LSPs, multicast dual-feed, GRE tunnels. 
The Chicago Mercantile Exchange has published
a network guide that covers similar use cases.
In the case of market data, they generally run
both links with duplicate data and the client
chooses whichever packets arrive first. Since
market data applications can win or lose millions
of dollars per hour, they are the most time-sensitive
applications on the planet.
http://www.cme.com/files/NetworkingGuide.pdf

> When I desire to migrate hosts to the failover site, B would send a
> BGP update advertizing  that the redundant link should become
> preferred,

There is your biggest timing problem which is 
also effectively out of your control. By maintaining
two live connections over two separate paths to
two separate data centers, you have more control
over when to switch and how quickly to switch.

--Michael Dillon



Re: IP failover/migration question.

2006-06-11 Thread Christopher L. Morrow


On Sun, 11 Jun 2006, Andrew Warfield wrote:

> > I think there is some cisco magic you could do with 'dial backup'... you
> > may even be able to rig this up with an ibgp session (even if that goes
> > out over the external provider) to swing the routes.
> >
> > NOTE: this could make your site oscillate if there are connectivity issues
> > between the sites, it could get messy FAST, and it could be hard to
> > troubleshoot. Basically look before you leap :)
> >
> > This link may b e of assistance:
> > http://tinyurl.com/l8zpm
>
> This link asks me for a login...

aw crap, sorry... try:

http://tinyurl.com/zh7wk

(12.0 code reference infos)

>
> > to get greed into it.. are you sure you want to be 'stuck' with a single
> > carrier? :) What if the carrier dies wouldn't you want redundant carrier
> > links as well?
>
> I'd love a multi-ISP solution.  I just assumed that anything involving
> more than a single upstream AS across the two links would leave me
> having to consider BGP convergence instead of just IGP reconfig.  I

both are bgp convergence actually, unless the routes are put from BGP ->
IGP inside the single provider, which is a little scary.

Consider that loctions A and B exist. A is primary, B secondary. B's
routes don't exist in ISP's network. A explodes, the network above A has
to withdraw the routes, the network above B (it's not the same POP nor
POP router right?) has to get new routes from B then send them out.

You'll gain SOME possibly, but that probably depends on the bgp/ibgp
architecture inside the ISP in question :(

> didn't presume that that would likely be something that happened in
> seconds.  If there's a fast approach to be had here, I'd love to hear
> it.
>

get with the greed man! :)


Re: IP failover/migration question.

2006-06-11 Thread Edward B. DREGER

AW> Date: Sun, 11 Jun 2006 20:55:42 -0700
AW> From: Andrew Warfield

AW> I'd love a multi-ISP solution.  I just assumed that anything involving
AW> more than a single upstream AS across the two links would leave me
AW> having to consider BGP convergence instead of just IGP reconfig.  I
AW> didn't presume that that would likely be something that happened in
AW> seconds.  If there's a fast approach to be had here, I'd love to hear
AW> it.

(1) Internal link between locations.
(2) Same ISPs at all locations.

The closer to the source, the faster the convergence.

Be sure to test.  I've had multiple links to one provider _within the
same datacenter_ where their iBGP-fu (or whatever they had) was lacking.
Bouncing one and only eBGP session to them triggered globally-visible
flapping. :-(


Eddy
--
Everquick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita

DO NOT send mail to the following addresses:
[EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED]
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.


Re: IP failover/migration question.

2006-06-11 Thread Andrew Warfield



I think there is some cisco magic you could do with 'dial backup'... you
may even be able to rig this up with an ibgp session (even if that goes
out over the external provider) to swing the routes.

NOTE: this could make your site oscillate if there are connectivity issues
between the sites, it could get messy FAST, and it could be hard to
troubleshoot. Basically look before you leap :)

This link may b e of assistance:
http://tinyurl.com/l8zpm


This link asks me for a login...


to get greed into it.. are you sure you want to be 'stuck' with a single
carrier? :) What if the carrier dies wouldn't you want redundant carrier
links as well?


I'd love a multi-ISP solution.  I just assumed that anything involving
more than a single upstream AS across the two links would leave me
having to consider BGP convergence instead of just IGP reconfig.  I
didn't presume that that would likely be something that happened in
seconds.  If there's a fast approach to be had here, I'd love to hear
it.

thanks,
a.


Re: IP failover/migration question.

2006-06-11 Thread Andrew Warfield



> I'm trying to get a more clear understanding as to what is involved in
> terms of moving the IPs, and how fast it can potentially be done.

can we presume that separate ip spaces and changing dns, i.e. maybe
ten minutes at worst, is insufficiently fast?


Absolutely.  We are trying to explore the (arguably insane) idea of
failing things over sufficiently fast (and state-fully) that open
connections remain completely functional.


> I'm fairly sure that what I would like to do is to arrange what is
> effectively dual-homing, but with two geographically distinct homes:

uh, that kinda inverts what we normally mean by 'multi-homing'.
that's usually two upstream providers for a single site.


Yep, which is what I want -- It's just that the single site is going to move. ;)

Consider a traditional (single site) dual-homed situation, where I'm
not doing any kind of balancing across the links.  In that (my
understanding of) that case, I would use a private stub AS with the
two upstream links going to the common provider AS, and advertize a
change to the link weight on the backup link when I wanted a switch to
happen.  (Or if the primary failed this would presumably happen
automatically through it's link disappearing.)

In this new scheme, I want to make _everything_ redundant.  The backup
link is to a geographically distinct site, and all of the hosts in the
primary site are actively mirrored to the backup site: OS,
applications, TCP connection state and all.  So it's _kind of_ dual
homing -- two upstream links for a single (virtual) site.


...
i am sure others can come up with more clever hacks.  beware if
they're too clever.


I completely agree with your comments regarding clever hacks, which is
why I'm trying to draw analogy to dual-homing, a technique that's
known, trusted, and clearly not fraught with corner-cases and devilish
complexity. ;)  Seriously though, I'm trying to convince myself that
there is a reasonable approach here that is within the means of
datacenter operators and their ISPs, and would allow a switch with on
the order of seconds of reconfiguration time.


persistent tcp connections from clients would not fare well unless
you actually did the hacks to migrate the sessions, i.e. tcp serial
numbers and all the rest of the tcp state.  hard to do.


Since we move the entire OS, the TCP state goes with it.  We've done
this in the past on the local link by migrating the host and sending
an unsolicited ARP reply to notify the switch that the IP has moved to
a new MAC (http://www.cl.cam.ac.uk/~akw27/papers/nsdi-migration.pdf),
I think that order-of-seconds reconfiguration should allow the same
sort of migration to work at a larger scope.


well, you left of mention of us legislative follies and telco and
cable greed.  but maybe you can get away with a purely technical
question once if you promise not to do it again. :-)


Thanks!  And thanks everyone for the feedback -- incredibly helpful.
I'll try for follies and greed next time. ;)

a.


Re: IP failover/migration question.

2006-06-11 Thread Edward B. DREGER



Date: Sun, 11 Jun 2006 19:34:12 -0700 (PDT)
From: [EMAIL PROTECTED]



[A] somewhat cleaner way to do this would be to advertize a less
specific route from the DR location covering the more specific route
of the primary location.  If the primary route is withdrawn, voila ..
traffic starts moving to the less specific route automatically without
you having to scramble at the time of the outage to inject a new
route.


This certainly is easier if it's flexible enough.  (If one desires high
splay across several locations, this approach is lacking.)  The tough
part then becomes internal application consistency.


Eddy
--
Everquick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita

DO NOT send mail to the following addresses:
[EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED]
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.


Re: IP failover/migration question.

2006-06-11 Thread Christopher L. Morrow

On Sun, 11 Jun 2006, Randy Bush wrote:
>
> > I'm fairly sure that what I would like to do is to arrange what is
> > effectively dual-homing, but with two geographically distinct homes:
>
> uh, that kinda inverts what we normally mean by 'multi-homing'.
> that's usually two upstream providers for a single site.
>

This almost sounds like an anycasted version of his site... only unicast
from one location then popping up at another location if the primary dies?

> > Assuming that I have an in-service primary site A, and an emergency
> > backup site B, each with a distinct link into a common provider AS, I
> > would configure B's link as redundant into the stub AS for A -- as if
> > the link to B were the redundant link in a (traditional single-site)
> > dual-homing setup.
>
> not clear what you mean by redundant.  as the common transit
> provider will not do well with hearing the same ip space from two
> sources, this type of hack might best be accomplished by B not
> announcing the space until A goes off the air and stops announcing
> it.  [ clever folk might try to automate this, but it would make me
> nervous. ]

I think there is some cisco magic you could do with 'dial backup'... you
may even be able to rig this up with an ibgp session (even if that goes
out over the external provider) to swing the routes.

NOTE: this could make your site oscillate if there are connectivity issues
between the sites, it could get messy FAST, and it could be hard to
troubleshoot. Basically look before you leap :)

This link may b e of assistance:
http://tinyurl.com/l8zpm

>
> i am sure others can come up with more clever hacks.  beware if
> they're too clever.

yes... this probably is...



> > I hope this is reasonably on-topic for the list.
>
> well, you left of mention of us legislative follies and telco and
> cable greed.  but maybe you can get away with a purely technical
> question once if you promise not to do it again. :-)

to get greed into it.. are you sure you want to be 'stuck' with a single
carrier? :) What if the carrier dies wouldn't you want redundant carrier
links as well?


Re: IP failover/migration question.

2006-06-11 Thread Edward B. DREGER

RB> Date: Sun, 11 Jun 2006 17:02:14 -1000
RB> From: Randy Bush

RB> persistent tcp connections from clients would not fare well unless
RB> you actually did the hacks to migrate the sessions, i.e. tcp serial
RB> numbers and all the rest of the tcp state.  hard to do.

Actually, the TCP goo isn't too terribly difficult [when one has kernel
source].  What's tricky is (1) handling splits, and (2) ensuring that
the app is consistent and deterministic.

One transit provider handling multiple locations shouldn't present a
problem.  Of course many things that should be, aren't.

== below respsonses are general, not re Randy's post ==

Note also that redundancy/propagation is at odds with RTT latency.  The
proof of this is left as an exercise for the reader.

Finally, an internal network between locations is a good thing.  (Hint:
compare internal convergence times with global ones.)


Eddy
--
Everquick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita

DO NOT send mail to the following addresses:
[EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED]
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.


Re: IP failover/migration question.

2006-06-11 Thread Randy Bush

> a somewhat cleaner way to do this would be to advertize a less specific
> route from the DR location covering the more specific route of the primary
> location.  If the primary route is withdrawn, voila .. traffic starts
> moving to the less specific route automatically without you having to
> scramble at the time of the outage to inject a new route.

aha!  much cleaner indeed!  and works single or multi provider.


randy



Re: IP failover/migration question.

2006-06-11 Thread Randy Bush

> I'm trying to get a more clear understanding as to what is involved in
> terms of moving the IPs, and how fast it can potentially be done.

can we presume that separate ip spaces and changing dns, i.e. maybe
ten minutes at worst, is insufficiently fast?

> I'm fairly sure that what I would like to do is to arrange what is
> effectively dual-homing, but with two geographically distinct homes:

uh, that kinda inverts what we normally mean by 'multi-homing'.
that's usually two upstream providers for a single site.

> Assuming that I have an in-service primary site A, and an emergency
> backup site B, each with a distinct link into a common provider AS, I
> would configure B's link as redundant into the stub AS for A -- as if
> the link to B were the redundant link in a (traditional single-site)
> dual-homing setup.  

not clear what you mean by redundant.  as the common transit
provider will not do well with hearing the same ip space from two
sources, this type of hack might best be accomplished by B not
announcing the space until A goes off the air and stops announcing
it.  [ clever folk might try to automate this, but it would make me
nervous. ]

alternatively, you might arrange for the common transit provider to
statically route the ip space to A and swap to B on a phone call.
this would be very fast, but would require a very solid (and tested
monthly if you're paranoid, which i would be) pre-arrangement with
the provider.

i am sure others can come up with more clever hacks.  beware if
they're too clever.

> Assuming that everything works okay with the virtual machine
> migration, connections would continue as they were and clients
> would be unaware of the reconfiguration.

persistent tcp connections from clients would not fare well unless
you actually did the hacks to migrate the sessions, i.e. tcp serial
numbers and all the rest of the tcp state.  hard to do.

> I hope this is reasonably on-topic for the list.

well, you left of mention of us legislative follies and telco and
cable greed.  but maybe you can get away with a purely technical
question once if you promise not to do it again. :-)

randy



Re: IP failover/migration question.

2006-06-11 Thread ennova2005-nanog
You dont say who the "clients" are - I presume this is a web based application so essentially you are trying to migrate service in flight to another set of servers within the TCP/HTTP session timeout without the client missing a beat ?If another kind of client, does it also have auto reconnect/retry logic built in for service restoral if the connection timesout ?Is the session/host state worth preserving for communication between the servers in the cluster or between the clients and the service also ?I know of people who have been able to do this on LANs using SANs to store shared host states and having a new VM pick up the connections, but on an internet-wide scale you are likely looking only at a probabilistic guarentee assuming that your routing would always converge in time and packets start flowing to the Disaster Recovery (DR) site.This is much easier if you can stick within a single AS ofcourse. Others will be able to answer
 whether these routing changes will attract dampening penalties if you have to pick providers in different ASes.Assuming all of that doesnt matter, then a somewhat cleaner way to do this would be to advertize a less specific route from the DR location covering the more specific route of the primary location.  If the primary route is withdrawn, voila .. traffic starts moving to the less specific route automatically without you having to scramble at the time of the outage to inject a new route.Andrew Warfield <[EMAIL PROTECTED]> wrote: I've got a bit of a network reconfiguration question that I'mwondering if anyone on NANOG might be able to provide a bit of adviceon:I'm working on a project to provide failover of entire cluster-based(and so multi-host) applications to a
 geographically distinct backupsite.  The general idea is that as one datacentre burns down, a liveservice may be moved over to an alternate site without anyinterruption to clients.  All of the host-state migration is doneusing virtual machines and associated magic; I'm trying to get a moreclear understanding as to what is involved in terms of moving the IPs,and how fast it can potentially be done.I'm fairly sure that what I would like to do is to arrange what iseffectively dual-homing, but with two geographically distinct homes:Assuming that I have an in-service primary site A, and an emergencybackup site B, each with a distinct link into a common provider AS, Iwould configure B's link as redundant into the stub AS for A -- as ifthe link to B were the redundant link in a (traditional single-site)dual-homing setup.  B would additionally host it's own IP range, usedfor control traffic between the two sites in normal
 operation.When I desire to migrate hosts to the failover site, B would send aBGP update advertizing  that the redundant link should becomepreferred, and (hopefully) the IGP in the provider AS would seamlesslyredirect traffic.  Assuming that everything works okay with thevirtual machine migration, connections would continue as they were andclients would be unaware of the reconfiguration.Does the routing reconfiguration story here sound plausible?  Doesanyone have any insight as to how long such a reconfiguration wouldreasonably take and/or if it is something that I might be able tonegotiate a SLA for with a provider if I wanted to actually deploythis sort of redundancy as a service?  Is anyone aware of similarhigh-speed failover schemes in use on the network today?Thoughts appreciated, I hope this is reasonably on-topic for the list.best,a.

IP failover/migration question.

2006-06-11 Thread Andrew Warfield


I've got a bit of a network reconfiguration question that I'm
wondering if anyone on NANOG might be able to provide a bit of advice
on:

I'm working on a project to provide failover of entire cluster-based
(and so multi-host) applications to a geographically distinct backup
site.  The general idea is that as one datacentre burns down, a live
service may be moved over to an alternate site without any
interruption to clients.  All of the host-state migration is done
using virtual machines and associated magic; I'm trying to get a more
clear understanding as to what is involved in terms of moving the IPs,
and how fast it can potentially be done.

I'm fairly sure that what I would like to do is to arrange what is
effectively dual-homing, but with two geographically distinct homes:
Assuming that I have an in-service primary site A, and an emergency
backup site B, each with a distinct link into a common provider AS, I
would configure B's link as redundant into the stub AS for A -- as if
the link to B were the redundant link in a (traditional single-site)
dual-homing setup.  B would additionally host it's own IP range, used
for control traffic between the two sites in normal operation.

When I desire to migrate hosts to the failover site, B would send a
BGP update advertizing  that the redundant link should become
preferred, and (hopefully) the IGP in the provider AS would seamlessly
redirect traffic.  Assuming that everything works okay with the
virtual machine migration, connections would continue as they were and
clients would be unaware of the reconfiguration.

Does the routing reconfiguration story here sound plausible?  Does
anyone have any insight as to how long such a reconfiguration would
reasonably take and/or if it is something that I might be able to
negotiate a SLA for with a provider if I wanted to actually deploy
this sort of redundancy as a service?  Is anyone aware of similar
high-speed failover schemes in use on the network today?

Thoughts appreciated, I hope this is reasonably on-topic for the list.

best,
a.