Re: anycast (Re: .ORG problems this evening)
On Thu, Sep 18, 2003 at 02:38:18PM -0400, Todd Vierling quacked: On Thu, 18 Sep 2003, E.B. Dreger wrote: : EBD That's why one uses a daemon with main loop including : EBD something like: : EBD : EBDsuccess = 1 ; : EBDfor ( i = checklist ; i-callback != NULL ; i++ ) : EBDsuccess = i-callback(foo) ; : EBDif ( success ) : EBDsend_keepalive(via_some_ipc_mechanism) ; Yes, I hope that UltraDNS implements something like this, if they have not already. It's still not a guarantee that things will get withdrawn -- or be reachable, even if working but not withdrawn -- in case of a problem. That still leaves the DNS for a gTLD at risk for a single point of failure. The whole problem with only listing two anycast servers is that you leave yourself vulnerable to other kinds of faults. Your upstream ISP fat-fingers ip route 64.94.110.11 null0 and accidentally blitzes the netblock from which the anycast servers are announced. A router somewhere between customers and the anycast servers stops forwarding traffic, or starts corrupting transit data, without interrupting its route processing. packet filters get misconfigured.. (Observe how divorced route processing and packet processing are in modern routing architectures and it's pretty easy to see how this can happen. With load balancing, traffic can get routed down a non-functional path while routing takes place over the other one - BBN did that to us once, was very entertaining). Route updates in BGP take a while to propagate. Much longer than the 15ms RTT from me to, say, a.root-server.net. The application retry in this context can be massively faster than waiting 30+ seconds for a BGP update interval. The availability of the DNS is now co-mingled with the success of the magic route tweak code; the resulting system is a fair bit more complex than simply running a bunch of different DNS servers. God forbid that zebra ever has bugs... http://www.geocrawler.com/lists/3/GNU/372/0/ In contrast, talking to a few DNS servers gives you an end-to-end test of how well the service is working. You still depend on the answers being correct, but you can intuit a lot from whether or not you actually get answers, instead of sitting around twiddling your thumbs thinking, gee, I sure wish that routing update would get sent out so I could use the 'net. -Dave -- work: [EMAIL PROTECTED] me: [EMAIL PROTECTED] MIT Laboratory for Computer Science http://www.angio.net/ I do not accept unsolicited commercial email. Do not spam me.
Re: anycast (Re: .ORG problems this evening)
On Mon, 22 Sep 2003, David G. Andersen wrote: Yes, I hope that UltraDNS implements something like this, if they have not already. It's still not a guarantee that things will get withdrawn -- or be reachable, even if working but not withdrawn -- in case of a problem. That still leaves the DNS for a gTLD at risk for a single point of failure. The whole problem with only listing two anycast servers is that you leave yourself vulnerable to other kinds of faults. Your upstream ISP fat-fingers ip route 64.94.110.11 null0 and accidentally blitzes the netblock from which the anycast servers are announced. A router somewhere between customers and the anycast servers stops forwarding traffic, or starts corrupting transit data, without interrupting its route processing. packet filters get misconfigured.. That's a good reason to make sure that you are anycasting from at least two disparate netblocks, isn't it?. :-) /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ Patrick Greenwell Asking the wrong questions is the leading cause of wrong answers \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
Re: anycast (Re: .ORG problems this evening)
DGA Date: Mon, 22 Sep 2003 18:32:19 -0400 DGA From: David G. Andersen DGA The whole problem with only listing two anycast servers is that DGA you leave yourself vulnerable to other kinds of faults. Your DGA upstream ISP fat-fingers ip route 64.94.110.11 null0 and DGA accidentally blitzes the netblock from which the anycast servers DGA are announced. A router somewhere between customers and the And this is peculiar to anycast? DGA anycast servers stops forwarding traffic, or starts corrupting And this is peculiar to anycast? DGA transit data, without interrupting its route processing. DGA packet filters get misconfigured.. And this is peculiar to anycast? DGA Route updates in BGP take a while to propagate. Much longer DGA than the 15ms RTT from me to, say, a.root-server.net. The application DGA retry in this context can be massively faster than waiting 30+ seconds DGA for a BGP update interval. If a location goes dark, that's a problem. With redundant machines locally anycasted and inter-location transport, it becomes a question of border router and peer reliability. DGA The availability of the DNS is now co-mingled with the success DGA of the magic route tweak code; the resulting system is a fair The availability of * is co-mingled with the success of the gear advertising its prefixes. The difference between standard multihoming and anycast is that the behind-the-scenes stuff happens to be on different machines in different locations. DGA bit more complex than simply running a bunch of different DGA DNS servers. God forbid that zebra ever has bugs... DGA DGA http://www.geocrawler.com/lists/3/GNU/372/0/ You assume zebra is the only option. Sure, it has bugs. So do Vendors C, J, and R. DGA In contrast, talking to a few DNS servers gives you an end-to-end DGA test of how well the service is working. So splay is bad? Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: anycast (Re: .ORG problems this evening)
On Mon, 22 Sep 2003, David G. Andersen wrote: With load balancing, traffic can get routed down a non-functional path while routing takes place over the other one - BBN did that to us once, was very entertaining). Ah yes, I'll always have a special place in my heart for those Localdirectors. *cough* In contrast, talking to a few DNS servers gives you an end-to-end test of how well the service is working. You still depend on the answers being correct, but you can intuit a lot from whether or not you actually get answers, instead of sitting around twiddling your thumbs thinking, gee, I sure wish that routing update would get sent out so I could use the 'net. Anycast isn't the only thing possibly stuck waiting for routing convergence... Let's not get carried away here. matto [EMAIL PROTECTED]darwin Flowers on the razor wire/I know you're here/We are few/And far between/I was thinking about her skin/Love is a many splintered thing/Don't be afraid now/Just walk on in. #include disclaim.h
Re: .ORG problems this evening
--On 18 September 2003 10:05 -0400 Todd Vierling [EMAIL PROTECTED] wrote: DNS site A goes down, but its BGP advertisements are still in effect. (Their firewall still appears to be up, but DNS requests fail.) Host site C cannot resolve ANYTHING from DNS site A, even though DNS site B is still up and running. But host site C cannot see DNS site B! What you seem to be missing is that the BGP advert goes away when the DNS requests stop working. I have written DNS/BGP code (nothing to do with UltraDNS) and I can tell you it works very well. Even if you unplug the machine from the net you can get rapid failover by tweaking a BGP timer here or there. If you are going to say yes but that means I don't have one of the servers up whilst routing reconverges this is true, but (a) it happens ANYWAY, (b) as the prefered route is in general more local, the rainshadow from routing reconvergence in the event of disruption is smaller. Alex
apathy (was Re: .ORG problems this evening)
On Fri, 19 Sep 2003, Alex Bligh wrote: : DNS site A goes down, but its BGP advertisements are still in effect. : (Their firewall still appears to be up, but DNS requests fail.) Host : site C cannot resolve ANYTHING from DNS site A, even though DNS site B is : still up and running. But host site C cannot see DNS site B! : : What you seem to be missing is that the BGP advert goes away when the DNS : requests stop working. It didn't. That's the problem. I've repeatedly described how I do understand the methodology here. What's being expressed on this list is blind faith and trust in an anycast-only gTLD DNS scheme that has the possibility of routing to a single point of failure. This scheme has already failed once. (When will it fail again?) Established gTLD practice has not put trust in an anycast routing scheme where one (1) destination might serve all queries for a host. What I've tried to express is that the years-established, standard DNS redundancy failover model could and should be implemented to complement -- not replace -- this anycast model for something as critical as a Big Three gTLD. That's fine; I give up due to pervasive community apathy. When this happens again, I'll be sure to bring up the archive URL to the head of this thread. sigh -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
RE: apathy (was Re: .ORG problems this evening)
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Todd Vierling Sent: Friday, September 19, 2003 11:37 AM To: [EMAIL PROTECTED] Subject: apathy (was Re: .ORG problems this evening) I've repeatedly described how I do understand the methodology here. What's being expressed on this list is blind faith and trust in an anycast-only gTLD DNS scheme that has the possibility of routing to a single point of failure. Anyone know if 64.94.110.11 is done via anycast? This scheme has already failed once. (When will it fail again?) In that case, hopefully soon ...
Re: apathy (was Re: .ORG problems this evening)
Todd Vierling wrote: On Fri, 19 Sep 2003, Alex Bligh wrote: : DNS site A goes down, but its BGP advertisements are still in effect. : (Their firewall still appears to be up, but DNS requests fail.) Host : site C cannot resolve ANYTHING from DNS site A, even though DNS site B is : still up and running. But host site C cannot see DNS site B! : : What you seem to be missing is that the BGP advert goes away when the DNS : requests stop working. It didn't. That's the problem. I've repeatedly described how I do understand the methodology here. What's being expressed on this list is blind faith and trust in an anycast-only gTLD DNS scheme that has the possibility of routing to a single point of failure. This scheme has already failed once. (When will it fail again?) Established gTLD practice has not put trust in an anycast routing scheme where one (1) destination might serve all queries for a host. What I've tried to express is that the years-established, standard DNS redundancy failover model could and should be implemented to complement -- not replace -- this anycast model for something as critical as a Big Three gTLD. That's fine; I give up due to pervasive community apathy. When this happens again, I'll be sure to bring up the archive URL to the head of this thread. sigh You started from a point of having no idea that UltraDNS used anycast, confirmed for everyone in your second email that you had no clue about how anycast worked, and migrated by your third email to being an expert on how it should work. And based on assumptions that were flawed in the very beginning, you've created a one megabyte thread and a s+n/n ration almost unparalleled by anything I've ever seen on NANOG before. As I told you privately, I'm working on a response that tries to deal with all the misinformation you've spouted. There is so much, however, that it is taking more than the 10 minutes you took to decide you knew it all. So you can call it apathy, or anything else you want. It seems consistent with your way of jumping to conclusions based on flawed assumptions. But it's really just that other people actually take time to research issues before mouthing off. YMMV, and apparently it does. In the interim, feel free to post your operational experience and qualification with tlds and their dns. -- Rodney Joffe CenterGate Research Group, LLC. http://www.centergate.com Technology so advanced, even we don't understand it!(R)
Re: apathy (was Re: .ORG problems this evening)
On Fri, 19 Sep 2003, Rodney Joffe wrote: : You started from a point of having no idea that UltraDNS used anycast, : confirmed for everyone in your second email that you had no clue about : how anycast worked, Please stop the bellicose, holier-than-thou attitude because you feel like assuming that I don't have networking experience. It's getting tiresome. I apologize for whatever I've done to offend you. What I didn't know at first was that UltraDNS's system was based on anycast. Yes, it was my oversight, probably due to my own complacency with the gTLDs Just Working for so long. Once I was notified of that fact, my perspective on the problem changed quite a bit. I do know how anycast routing works, and that it failed miserably in this particular case. The implementation failure specifics are not my concern on this point; the simple fact is that a critical gTLD resource failed. Blindly trusting that the all-anycast implementation in use will work better in the future seemed a rather bad idea to me in the context of a gTLD. I was trying to figure out, with the help of others who have been far more gracious, what possibilities exist that could help keep the failure from happening again -- outside the scope of this particular anycast implementation. : But it's really just that other people actually take time to research : issues before mouthing off. Actually, my first few requests for corroborating information (research) received several mouthing-off responses. Much of this thread has required me to fend off rather improper personal attacks -- this one included -- from people such as yourself, while at the same time attempting to get assistance to analyze a difficult to see, corner case problem with a critical resource. I have apologized offlist to a few people whose heated remarks to me received heated messages in response, and I apologize to all on-list right now. That is not appropriate here in either direction. : In the interim, feel free to post your operational experience Ultimatum demands like this are just not called for, and I will not be a party to it. However, I'm happy to discuss it offlist with anyone who may be interested; there are business-vs.-personal reasons that I cannot discuss this on-list. -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: apathy (was Re: .ORG problems this evening)
On Fri, Sep 19, 2003 at 01:36:41PM -0400, Todd Vierling wrote: On Fri, 19 Sep 2003, Rodney Joffe wrote: : You started from a point of having no idea that UltraDNS used anycast, : confirmed for everyone in your second email that you had no clue about : how anycast worked, Please stop the bellicose, holier-than-thou attitude because you feel like assuming that I don't have networking experience. It's getting tiresome. I apologize for whatever I've done to offend you. On behalf of the entire NANOG community, please stop pretending that you DO have a clue just because you believe people shouldn't assume you don't. Trust me when I say that is is no longer an assumption. Please also do not mistake apathy for annoyance at your continued incessant whining about anycast and UltraDNS. You don't have anything more useful to say, so please do us all a favor and just stop now. -- Richard A Steenbergen [EMAIL PROTECTED] http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Jared Mauch wrote: : ultradns uses the power of anycast to have these ips that appear : to be on close subnets in geographyically diverse locations. Oh, that's brilliant. How nice of them to defeat the concept of redundancy by limiting me to only two of their servers for a gTLD. VeriSign might be doing some loathsome things lately, but at least my named has several more servers than just two to choose from. : could you provide some more technical details, other than : your postulations that they have two machines on : network-wise close subnets and that is the problem? I tracerouted to both IPs from two different locations in the USA; both took the same route before hitting !H from an ultradns.com rDNS machine. And both servers for that route were completely unresponsive from both tried locations during the outage period. -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Majdi S. Abbas wrote: : I didn't have a problem with .org this evening, and I've asked : around and others don't seem to have noticed anything either. It would be : more helpful if you told us your source prefix, and which filter you're : hitting when you traceroute to tld[12].ultradns.net. 12 dellfweqab.ultradns.net (204.74.103.2) 24.811 ms !H Same machine for both tld1 and tld2, seen through XO last night and Verio this morning, from source prefix 66.56.64.0/19 (as well as two others, one on the US east coast and one in US midwest which I cannot name publicly). So as far as my machine's source address is concerned, even if the servers are anycast, there are still only two servers which reside on a single point of failure. Anycasting doesn't help me one whit if there are only two servers for my named to choose and both of the ones visible from my location are down (even though their routes are up) -- this is IMNSHO irresponsible for a gTLD operator. If anycast is the game, there should be much more than just two addresses to choose. Ideally, there should be about six, and certain servers should deliberately *not* advertise certain anycast networks, in an overlap mesh that allows one point to fail while others still respond. For instance: USA server location A advertises networks 1, 3, 5; USA server location B advertises networks 1, 3, 4; Europe server location A advertises networks 3, 4, 6; Asia server location A advertises networks 2, 5, 6; or something to that effect. -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Stephen J. Wilcox wrote: : they have two distinct servers by IP, globally they have N x clusters. i'm sure : each instance is actualyl more than a single linux PeeCee Doesn't matter if it's a cluster at each location. The fact remains that there were only two IP addresses visible to my named, and both were unresponsive to my machine. As far as my machine was concerned, .ORG was down for the count, no matter how many servers, that were invisible to me, were still working. : so even if what i see as tld1 now goes into failure.. for the minute or two it : takes to go offline and reconverge on antoerh tld1 i still see tld2 The routes I saw never went offline, as far as I could tell -- and from my location tld1 and tld2 have the *same* route and end up at the same physical connectivity location. So much for redundancy. : maybe its firewalled? I see !H too but my .org is working fine for dns resolving Yes, it is firewalled. I was pointing out that the route is the same for tld1 and tld2 for me, all the way up to the firewall. -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
Todd Vierling wrote: Yes, it is firewalled. I was pointing out that the route is the same for tld1 and tld2 for me, all the way up to the firewall. Please post traceroutes from your location, as well as from the two locations in different parts of the USA (You said earlier: I tracerouted to both IPs from two different locations in the USA; both took the same route before hitting !H from an ultradns.com rDNS machine. ) Then please post the results of sho ip bgp 204.74.112.1 and sho ip bgp 204.74.113.1 from your location. Thanks -- Rodney Joffe CenterGate Research Group, LLC. http://www.centergate.com Technology so advanced, even we don't understand it!(SM)
Re: .ORG problems this evening
On Thu, 18 Sep 2003, just me wrote: : If you're still confused, have a read here: : : http://www.ultradns.com/support/managed_dns_faq.cfm : : Q. I read that your service is supposed to make use of several : servers all over the world, but you only give users two server : addresses to provide to their registrar. How do I make use of all the : other servers? I know what anycast does. See the other sister thread. The problem is that their answer is frankly *wrong*: A. The two server addresses you supply your registrar when you set up a domain on the UltraDNS system are actually 'virtual' addresses that will route to the best possible server on our network, based on a number of factors. This highly intelligent mechanism allows you to achieve full redundancy and reliability with only two name server addresses actually listed. In fact, if the registrar would allow you to do so, you could achieve the same level of reliability with only one name server address. Anycast is *NOT* a redundancy and reliability system when dealing with application-based services like DNS. Rather, anycast is a geographically biased traffic distribution system. There is a subtle but important difference here: DNS site A advertises anycast networks 1.2.3.0/24 and 1.2.4.0/24. DNS site B advertises anycast networks 1.2.3.0/24 and 1.2.4.0/24. Host site C attempts to use DNS servers from DNS sites A or B based on best anycast route selection. Host site C's router happens to pick DNS site A as best route for both 1.2.3.0/24 and 1.2.4.0/24. DNS site A goes down, but its BGP advertisements are still in effect. (Their firewall still appears to be up, but DNS requests fail.) Host site C cannot resolve ANYTHING from DNS site A, even though DNS site B is still up and running. But host site C cannot see DNS site B! Get the picture yet? -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
In a message written on Thu, Sep 18, 2003 at 10:05:15AM -0400, Todd Vierling wrote: Anycast is *NOT* a redundancy and reliability system when dealing with application-based services like DNS. Rather, anycast is a geographically I think you'll find most people on the list would disagree with you on this point. Many ISP's run anycast for customer facing DNS servers, and I'll bet if you ask the first reason why isn't because they provide faster service, or distribute load, but because the average customer only wants one or two IP's to put in his DNS config, and gets real annoyed when they don't work. So it is a redundancy and reliability thing, the customer can configure (potentially) one address, and the ISP can have 10 servers for it so if one dies all is well. Is it appropriate for a gTLD? Now that's a whole different can of worms. Personally I think they should return the two anycast addresses, and as many actual server addresses as will fit in the packet. This is the best of both worlds. When it works, geographicly distributed load, redundancy at the IP layer, quick responces. When one of the failure modes is encountered (eg, stuck route) DNS has the information it needs to switch to a backup as well. Redundancy is good. Redundancy at two levels is even better, particularly when they can back each other up. Plus, in this case it costs them nothing, they just have to tweek a config. -- Leo Bicknell - [EMAIL PROTECTED] - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/ Read TMBG List - [EMAIL PROTECTED], www.tmbg.org pgp0.pgp Description: PGP signature
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Leo Bicknell wrote: : Anycast is *NOT* a redundancy and reliability system when dealing with : application-based services like DNS. Rather, anycast is a geographically : : I think you'll find most people on the list would disagree with you : on this point. Many ISP's run anycast for customer facing DNS : servers, and I'll bet if you ask the first reason why isn't because : they provide faster service, or distribute load, but because the : average customer only wants one or two IP's to put in his DNS config, : and gets real annoyed when they don't work. And guess what: neither of the two addresses supplied by UltraDNS worked last night for some sites, because their anycast configuration is not allowing DNS redundancy. It is depending on every site somehow choosing different routes for both addresses, which is not guaranteed. Anycasting only works as a redundancy scheme when you have a mesh of *partially* overlapping BGP advertisements, so that a client has a guarantee that at least one address in the mix is located elsewhere from the rest. : So it is a redundancy and reliability thing, the customer can configure : (potentially) one address, and the ISP can have 10 servers for it so if : one dies all is well. But if all such anycast addresses have the ability to point to the same physical location, there is only an illusion of redundancy, because there's no way to get an alternate access point to the zone if a site is choosing a dead route for all server addresses. It doesn't matter how many other servers at the DNS provider are still working, because some sites can choose -- and have demonstrably chosen -- a single, dead site for all available anycast NS addresses in a setup like this (UltraDNS's .ORG configuration). : Is it appropriate for a gTLD? UltraDNS's setup isn't even appropriate for a 2LD. I'm damned glad that I don't have my subdomains hosted there. -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
Speaking on Deep Background, the Press Secretary whispered: : I think you'll find most people on the list would disagree with you : on this point. Many ISP's run anycast for customer facing DNS : servers, and I'll bet if you ask the first reason why isn't because : they provide faster service, or distribute load, but because the : average customer only wants one or two IP's to put in his DNS config, : and gets real annoyed when they don't work. And/or, the networking stack may accept 3,4{...}50 DNS addresses, but only really looks at the first. -- A host is a host from coast to [EMAIL PROTECTED] no one will talk to a host that's close[v].(301) 56-LINUX Unless the host (that isn't close).pob 1433 is busy, hung or dead20915-1433
Re: .ORG problems this evening
TV Date: Thu, 18 Sep 2003 10:05:15 -0400 (EDT) TV From: Todd Vierling TV DNS site A goes down, but its BGP advertisements are still in TV effect. Or are they? Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
TV Date: Thu, 18 Sep 2003 11:39:17 -0400 (EDT) TV From: Todd Vierling TV And guess what: neither of the two addresses supplied by TV UltraDNS worked last night for some sites, because their TV anycast configuration is not allowing DNS redundancy. It is TV depending on every site somehow choosing different routes for TV both addresses, which is not guaranteed. I don't know what UDNS does internally, but ideally anycast: + Has steady, unchanging EGP adverts + Has service-providing boxen that advert/withdraw prefixes in the IGP depending on their status + Includes an internal network, so that flaps are contained. If done properly, anycast means _all_ pods must fail to create a failure condition. If done improperly, it means _any_ pod failure can create a partial failure condition -- which means the probability of failure _increases_ with the number of pods. TV Anycasting only works as a redundancy scheme when you have a TV mesh of *partially* overlapping BGP advertisements, so that a TV client has a guarantee that at least one address in the mix TV is located elsewhere from the rest. Don't be silly. This is like claiming that multihoming only works if you spread services over different netblocks. TV But if all such anycast addresses have the ability to point TV to the same physical location, there is only an illusion of TV redundancy, because there's no way to get an alternate access TV point to the zone if a site is choosing a dead route for all TV server addresses. It doesn't matter how many other servers Ergo, that's why one withdraws the routes when a pod dies. Routes need to reflect what's up. Funny thing is, standard BGP has the same requirement. You're correct that an incorrect anycast setup can cause trouble, and arguably more than unicast. However, claiming that anycast is inherently bad is really, really silly. Eddy (no selfish interest in defending UltraDNS) -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
On Thu, 18 Sep 2003, E.B. Dreger wrote: : TV Date: Thu, 18 Sep 2003 10:05:15 -0400 (EDT) : TV From: Todd Vierling : : TV DNS site A goes down, but its BGP advertisements are still in : TV effect. : : Or are they? I couldn't know for sure from some sites, but traceroutes sure got there. That would imply that (at their end) the advertisements were still up. BGP has no way to know that an internal network problem occurred. If someone mistakenly tripped over a network cable that disconnected DNS clusters from a router, how would the router know to drop anycast advertisements? (Sure, you could run zebra on the cluster. But what about if the name server SEGVs? There's a lot of possible scenarios) -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
On Thu, 18 Sep 2003, E.B. Dreger wrote: : TV Anycasting only works as a redundancy scheme when you have a : TV mesh of *partially* overlapping BGP advertisements, so that a : TV client has a guarantee that at least one address in the mix : TV is located elsewhere from the rest. : : Don't be silly. This is like claiming that multihoming only : works if you spread services over different netblocks. We're talking about application (DNS) redundancy here, not transport-level (6to4 anycast RFC comes to mind) redundancy. With this in mind: : Ergo, that's why one withdraws the routes when a pod dies. : Routes need to reflect what's up. BGP doesn't know when a DNS server dies. Therein lies the findamental problem of using anycast as an application redundancy scheme. -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
TV Date: Thu, 18 Sep 2003 13:01:18 -0400 (EDT) TV From: Todd Vierling TV BGP doesn't know when a DNS server dies. Therein lies the TV findamental problem of using anycast as an application TV redundancy scheme. But it can and should. Again, seeing if the process is running is easy; verifying correct functionality requires more work, but definitely is doable. Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
TV Date: Thu, 18 Sep 2003 12:52:29 -0400 (EDT) TV From: Todd Vierling TV I couldn't know for sure from some sites, but traceroutes TV sure got there. That would imply that (at their end) the TV advertisements were still up. Which would be an implementation flaw, not something inherently wrong with anycast. TV (Sure, you could run zebra on the cluster. But what about if TV the name server SEGVs? There's a lot of possible TV scenarios) That's why the routing daemon must be aware if the service is up or not. It requires custom or modified routing software. Having zebra stat(2) a file that the DNS daemon periodically touches is a quick way to verify that the DNS server software is still running. Easy enough. Gross, but effective, and easy enough. A proper implementation has the routing daemon monitor the service in question -- in this case DNS. If a series of test queries provide the correct response, all is well; if not, it's time to yank the route. Again, perhaps there are implementation flaws... I don't know anything about UltraDNS's internal network. But these can be fixed, and do not make anycast inherently unreliable. If one understands, thinks about, and approaches the problem, it can be solved. Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
TV BGP doesn't know when a DNS server dies. Therein lies the TV findamental problem of using anycast as an application TV redundancy scheme. But it can and should. Again, seeing if the process is running is easy; verifying correct functionality requires more work, but definitely is doable. Eddy -- Ick. you really believe that BGP can or should be augmented to understand application liveness? BGP reaching past the router, running a ps -augx and then performing applications specific tricks? I guess that when all you have/understand is a hammer, everything becomes a nail. Wait... Its a joke! you just forgot the :) --bill
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Todd Vierling wrote: On Thu, 18 Sep 2003, E.B. Dreger wrote: : TV Date: Thu, 18 Sep 2003 10:05:15 -0400 (EDT) : TV From: Todd Vierling : : TV DNS site A goes down, but its BGP advertisements are still in : TV effect. : : Or are they? I couldn't know for sure from some sites, but traceroutes sure got there. That would imply that (at their end) the advertisements were still up. BGP has no way to know that an internal network problem occurred. If someone mistakenly tripped over a network cable that disconnected DNS clusters from a router, how would the router know to drop anycast advertisements? (Sure, you could run zebra on the cluster. But what about if the name server SEGVs? There's a lot of possible scenarios) ALmost there.. just make sure your zebra IGPs are redistributing to your BGP so that a failure such as that knocks out the bgp too Steve
Re: .ORG problems this evening
Todd Vierling wrote: BGP doesn't know when a DNS server dies. Therein lies the findamental problem of using anycast as an application redundancy scheme. You ever think that maybe, just maybe, Ultra wrote some code to do this? Yes, it might have concievably failed in a way that seems to have left you and one or two others in the veritable dark, but I don't think, at this point, using NANOG to debug the problem, no matter where it was, is going to be very productive. But, of course, I don't know anything about using DNS and anycast. ;-) Bob
Re: .ORG problems this evening
E.B. Dreger wrote: TV Date: Thu, 18 Sep 2003 13:01:18 -0400 (EDT) TV From: Todd Vierling TV BGP doesn't know when a DNS server dies. Therein lies the TV findamental problem of using anycast as an application TV redundancy scheme. But it can and should. Again, seeing if the process is running is easy; verifying correct functionality requires more work, but definitely is doable. And, I might add, in the case of a highly complex anycast application, you will need to check not only for correctness, but for timeliness. And, again, in the case of a highly complex app such as an anycast DNS, you need to check several behind the scenes apps, such as maybe a db, the responsivness of your high avail partner server, the dns daemon, connectivity through two or more network paths, connectivity to master update servers, BGP on whatever boxes are providing BGP, etc, the list goes on. But again, that's just my opinion, I could be wrong. ;-)
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Todd Vierling wrote: BGP has no way to know that an internal network problem occurred. If someone mistakenly tripped over a network cable that disconnected DNS clusters from a router, how would the router know to drop anycast advertisements? (Sure, you could run zebra on the cluster. But what about if the name server SEGVs? There's a lot of possible scenarios) I can assure you, this is a solved problem. [EMAIL PROTECTED]darwin Flowers on the razor wire/I know you're here/We are few/And far between/I was thinking about her skin/Love is a many splintered thing/Don't be afraid now/Just walk on in. #include disclaim.h
Re: .ORG problems this evening
BGP has no way to know that an internal network problem occurred. If someone mistakenly tripped over a network cable that disconnected DNS clusters from a router, how would the router know to drop anycast advertisements? (Sure, you could run zebra on the cluster. But what about if the name server SEGVs? There's a lot of possible scenarios) ALmost there.. just make sure your zebra IGPs are redistributing to your BGP so that a failure such as that knocks out the bgp too Steve Sorry no zebra. Perhaps I should run my TLDs DNS service on my Juniper Routers. some expect/cron work should provide the needed glue... Now if I could just get cisco to add authoritative DNS service to IOS, right up there with the HTTP, firewall, content caching, and load-balancing cruft they have added to their basic routing code... I could use cisco too! (may still need some glue tho) In case it was not clear, I think that multi-tasking hardware might be the wrong choice. I want my routers to route and not do apps work. For apps, I want them to be single-app specific. DNS service on its own hardware, NTP on its platform, HTTP outsourced to (vendor), etc. This has impact on the design of anycast solutions. Ultra has one model, ISC has another, and PCH uses a third. The more generic content crowd has its favorites. Then there are the load-balancing vendors who cater to these folks. One size does not fit all. --bill
anycast (Re: .ORG problems this evening)
Date: Thu, 18 Sep 2003 13:47:01 -0400 From: Keptin Komrade Dr. BobWrench III esq. And, I might add, in the case of a highly complex anycast application, you will need to check not only for correctness, but for timeliness. In a realtime system, something that is late is considered incorrect. A DNS response that arrives after three seconds is unsat, and (from a RT perspective) incorrect. I should have been more clear in my wording. And, again, in the case of a highly complex app such as an anycast DNS, you need to check several behind the scenes apps, such as maybe a db, the responsivness of your high avail partner server, the dns daemon, connectivity through two or more network paths, connectivity to master update servers, BGP on whatever boxes are providing BGP, etc, the list goes on. Yes on all counts, except perhaps connectivity... BGP handles that. If you mean killing the link in case of saturation, I'd argue that's a bad idea -- that just means the large traffic quantity will go elsewhere. But again, that's just my opinion, I could be wrong. ;-) That's why one uses a daemon with main loop including something like: success = 0 ; for ( i = checklist ; i-callback != NULL ; i++ ) success = i-callback(foo) ; if ( success ) send_keepalive(via_some_ipc_mechanism) ; The BGP mechanism listens for keepalives via the IPC mechanism. Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
Date: Thu, 18 Sep 2003 10:29:06 -0700 (PDT) From: bmanning Ick. you really believe that BGP can or should be augmented to understand application liveness? BGP reaching past the And why not? BGP deals in reachability information. Perhaps it conventionally represents interface and link state, but there is nothing making that the One True Way. From the BGP scanner's perpective, it's just checking another keepalive. What generates the keepalive for the route matters not. Do you mean that a dead server is just as up as a live server, yet a dead link is not as up as a live link? That's preposterous. router, running a ps -augx and then performing applications specific tricks? No need to use gross shell scripts. Far better means of IPC exist. Please read my previous messages. I guess that when all you have/understand is a hammer, everything becomes a nail. If you have any specific technical complaints (not how it's usually done doesn't count), I'm all ears. I'm also open to a better way; my MUA seems to have truncated the part where you suggested one. :-) Wait... Its a joke! you just forgot the :) No. It works well, as long as flaps are confined. Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Keptin Komrade Dr. BobWrench III esq. wrote: : And, I might add, in the case of a highly complex anycast application, : you will need to check not only for correctness, but for timeliness. All this still assumes that DNS should be trusting a single anycast location as the only point of access (a situation which is the case for UltraDNS if both records' routes go to the same place). There's a reason DNS does not trust exactly one server if multiple ones are provided: too many things can and do go wrong. What is going on right now with .ORG is that DNS is being forced to believe that BGP knows what is best for it, and it's already demonstrated that BGP did not always know best. -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: anycast (Re: .ORG problems this evening)
EBD Date: Thu, 18 Sep 2003 18:01:07 + (GMT) EBD From: E.B. Dreger EBD That's why one uses a daemon with main loop including EBD something like: EBD EBDsuccess = 0 ; EBDfor ( i = checklist ; i-callback != NULL ; i++ ) EBDsuccess = i-callback(foo) ; EBDif ( success ) EBDsend_keepalive(via_some_ipc_mechanism) ; Eek! s,success = 0,success = 1, Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
On Thu, 18 Sep 2003, John Fraizer wrote: : As has been stated by others, UltraDNS, like the roots and other TLD hosts : is under nearly constant attack. Perhaps your local nodes were effected : by an attack. IE; the pipe was full but the service was still alive so the : anycast prefix wasn't retracted. Bummer. Sucks to be you. Sucks to be anyone trying to use the service whose routers pick those nodes as the only ones available. That's the fault of the implementor, not the client. The major issue here is that no *gTLD*, particularly one of the Big Three, should be subject to a SPOF -- even if it's only a regionally visible SPOF due to anycast selection. It should *always* be possible to attempt queries to more than one physical location's servers for a gTLD. Yet last night, I could not query .ORG from several different locations in the continental US, even though there were perfectly functional servers available (in the same country, no less). BGP errors happen (everyone here should be able to attest to that readily), and they did. What's to stop some other boneheaded DoS or oversight from causing this again? And again? This particular outage was in the late evening in what appeared to be the affected area from my probing, which is why people like you don't appear to care; it didn't affect you. What about when it happens in the middle of the day in your neck of the woods? : Doesn't really matter to me though. Bitch and moan all you like. : Demonstrate your lack of experience and understanding. Uh-huh. Quite a few people here know better; they also know I am surrounded by cloak/ on this list and others. If my public resume were up to date and filled in more detail, you'd know otherwise. Don't try to speak for my experience from your pedestal when you don't have the information to make that kind of baseless judgment. On the other hand, if you can't see the fatal flaw in a major Internet infrastructure service depending on a single point of failure, I can point you at a few books that could enlighten you. -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
Bill, I know you know better, so let's try more facts and less FUD. Mmmmkay? Your above paragraph is a red herring that is analogous to saying all multihomed services must be run on the router itself. yes, it does lean that way... but to expose a sigma-six blip in how some people may think about anycasting techniques. Here's the deal: DNS server runs a BGP/OSPF/whatever speaker. One model. ISC is enamored of this model. I'm not. http://www.isc.org/tn/isc-tn-2003-1.txt You won't find a turnkey RPM to do it, but that doesn't mean it's impossible. In fact, if you slow down and read previous posts, you'll note some very big hints re how to build such a working system. If you're limited to installing out-of-the-box packages, you _will_ have a huge mess... but that's not my problem. Nope, it can even be done w/ COTS technologies. Been there, Done that. Ate the cheese as fondue. This has impact on the design of anycast solutions. Ultra has one model, ISC has another, and PCH uses a third. The more generic content crowd has its favorites. Then there are the load-balancing vendors who cater to these folks. One size does not fit all. Okay, I'll give you credit for that paragraph. thanks. we now return you to your worst-design showtell. (my fav today optical connectors!) Eddy --bill
Re: .ORG problems this evening
TV Date: Thu, 18 Sep 2003 14:22:19 -0400 (EDT) TV From: Todd Vierling TV Sucks to be anyone trying to use the service whose routers TV pick those nodes as the only ones available. That's the TV fault of the implementor, not the client. Yes. TV The major issue here is that no *gTLD*, particularly one of TV the Big Three, should be subject to a SPOF -- even if it's TV only a regionally visible SPOF Yes. TV due to anycast selection. Which would be due to a broken implementation. Broken unicast is bad. Not all unicast is bad. Broken anycast is bad. Not all anycast is bad. TV It should *always* be possible to attempt queries to more TV than one physical location's servers for a gTLD. _Or_ guarantee that the physical location selected was indeed up. Again, it smells an awful lot like plain old multihoming... if you advertise the route, you'd better be ready to handle the traffic. (Did someone say 7007?) TV BGP errors happen (everyone here should be able to attest to TV that readily), and they did. What's to stop some other TV boneheaded DoS or oversight from causing this again? And TV again? I've had problems with unicast when a link went down, yet the upstream continued advertising the routes. BGP stupidity happens with unicast service, too. Yes, anycast requires some additional thought and out-of-box thinking. But that doesn't make it inherently unstable. Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: anycast (Re: .ORG problems this evening)
On Thu, 18 Sep 2003, E.B. Dreger wrote: : EBD That's why one uses a daemon with main loop including : EBD something like: : EBD : EBD success = 0 ; : EBD for ( i = checklist ; i-callback != NULL ; i++ ) : EBD success = i-callback(foo) ; : EBD if ( success ) : EBD send_keepalive(via_some_ipc_mechanism) ; : : Eek! : : s,success = 0,success = 1, Heh. I'll send you some coffee. Yes, I hope that UltraDNS implements something like this, if they have not already. It's still not a guarantee that things will get withdrawn -- or be reachable, even if working but not withdrawn -- in case of a problem. That still leaves the DNS for a gTLD at risk for a single point of failure. Maybe I should just chalk this up to history at this point. I have a feeling, though, that the head of this thread's archive URL will show up as a citation some time from now when something else goes wrong with the zone. sigh -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
Date: Thu, 18 Sep 2003 11:36:37 -0700 (PDT) From: bmanning Bill, I know you know better, so let's try more facts and less FUD. Mmmmkay? Your above paragraph is a red herring that is analogous to saying all multihomed services must be run on the router itself. yes, it does lean that way... but to expose a sigma-six blip in how some people may think about anycasting techniques. Regardless of the technology, one can _always_ create a stupid way of doing things. With any luck, however, a _good_ way exists, too. Here's the deal: DNS server runs a BGP/OSPF/whatever speaker. One model. ISC is enamored of this model. I'm not. http://www.isc.org/tn/isc-tn-2003-1.txt Yes, one model. Skimming the ISC paper, I also have mixed feelings about some sections. The basic principle, however, boils down to getting traffic to the right place based on factors such as reachability and correctness. Nope, it can even be done w/ COTS technologies. Noted. I suppose some implementations may indeed be turnkey... just that we've never seen the One True Tarball for the way we like to do it. My fault for overgeneralizing. Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
On Thu, 18 Sep 2003, John Fraizer wrote: : Todd, you don't make the announcement for the anycast address from your : border.. You do it from within the anycast cluster as a CONDITIONAL : announcement. IE; you use a specially written BGP daemon that makes the : announcement when the service is alive and retracts it when it isn't. Um, I did in fact previously mention running BGP on the cluster -- which was referring directly to the DNS service machines -- and you even responded to that message. Yes, I do understand. (Ref: One of the things I do for a living is work on a BGP4 peer implementation written from scratch.) Doing this requires implementing keepalive handling in the service monitoring side of the world correctly. Which, obviously, *the entity in question did not*. Because of this, I can no longer trust them to get it right next time without changing their fundamental design. It's not like this is all that hard to grasp: the services for a TLD are much more critical than a 2LD or 3LD and should be given much more thought into failover handling than just anycast will do it for us. The other two of the Big Three gTLDs, and most ccTLDs, allow a client to attempt queries to geographically diverse DNS servers at any time, regardless of the BGP table's correctness, in order to allow some additional level of failover and reliability assessment by the DNS client. Some of these servers could run anycast, and I wouldn't even know it without looking deeper. What I can trivially see, though, is where geographically diverse servers are available on said TLDs, I can get a guarantee that at least two from each zone's NS group go to different places. Why is .ORG somehow different and special that I/we should trust a third party to do the whole operation solely via anycast, where said anycast has the possibility of becoming a single point of failure? -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: .ORG problems this evening
On Thu, Sep 18, 2003 at 02:22:19PM -0400, Todd Vierling wrote: Sucks to be anyone trying to use the service whose routers pick those nodes as the only ones available. That's the fault of the implementor, not the client. I have a sneaking suspicion that if UltraDNS's tld cluster that is apparently located in Equinix-Ashburn stopped responding to queries for two hours last night, a lot more people would have noticed. A *lot* more people. I think it's out of line to speculate on how UltraDNS has configured these clusters, particularly in terms of how reachability information is verified and propagated without any knowledge of their configuration. The major issue here is that no *gTLD*, particularly one of the Big Three, should be subject to a SPOF -- even if it's only a regionally visible SPOF due to anycast selection. It should *always* be possible to attempt queries to more than one physical location's servers for a gTLD. Yet last night, I could not query .ORG from several different locations in the continental US, even though there were perfectly functional servers available (in the same country, no less). First it was two locations, one of which you can't tell us about (Deep inside OSPF Area 51?) -- now it's several? I've tried myself from many different hosts today, and they all route to different clusters. I'm having trouble finding more than one, geographically diverse host that routes to the same cluster. BGP errors happen (everyone here should be able to attest to that readily), and they did. What's to stop some other boneheaded DoS or oversight from causing this again? And again? Are you absolutely, positively sure this cluster was responding to 0 queries, but still propagating those two /24's? This particular outage was in the late evening in what appeared to be the affected area from my probing, which is why people like you don't appear to care; it didn't affect you. What about when it happens in the middle of the day in your neck of the woods? The reason for this is simple -- given the query volume a tld like .org receives, and given just how close this cluster is to so many millions of users in the eastern US, the odds of you being the *only* person, even amongst the few thousand on this list, to notice a problem... are incredibly slim. Since you won't tell us where these several hosts you tried to query from are addressed, and you won't tell us exactly which queries you tried, and how...it is incredibly hard to look into. This is the equivalent of calling every fire department in the nation and telling them that there is a fire, but refusing to tell them where you are, or what you've witnessed. Uh-huh. Quite a few people here know better; they also know I am surrounded by cloak/ on this list and others. If my public resume were up to date and filled in more detail, you'd know otherwise. Don't try to speak for my experience from your pedestal when you don't have the information to make that kind of baseless judgment. On the other hand, if you can't see the fatal flaw in a major Internet infrastructure service depending on a single point of failure, I can point you at a few books that could enlighten you. It isn't a single point of failure, but even if it were, I can assure you that the collective experience of this list would fill quite a few more volumes then you are capable of referring us to. You ask that we make no assumptions as to your experience -- grant us the same courtesy. --msa
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Majdi S. Abbas wrote: : Sucks to be anyone trying to use the service whose routers pick those nodes : as the only ones available. That's the fault of the implementor, not the : client. : I think it's out of line to speculate on how UltraDNS has configured : these clusters, I don't care what the underlying implementation is. I care about the effect: that for at least one hour, possibly up to two last night, one of the physical locations went dead but was still considered available via BGP, while being considered the best.available path to both nets. : First it was two locations, one of which you can't tell us about : (Deep inside OSPF Area 51?) I can't provide all the exact source machines for reasons I can discuss offlist, but I'm happy to do so to a representative of UltraDNS. My home machine, though, is 66.56.93.94. : now it's several? Three to be exact that I verified last night to be unable to query DNS from either IP address: one at my home (Atlanta GA), one at my employer (Atlanta GA), and one in Chicago IL. However, here's three straw examples of both IPs going to the same place from spot checks right now (funny, my home machine actually gets two different ones at this moment): = Southern CA = traceroute to tld1.ultradns.net (204.74.112.1): 1-30 hops, 38 byte packets ... . p4-1-0-0.r00.lsanca01.us.bb.verio.net (129.250.16.80) 16.9 ms (ttl=251!) . p16-1-1-0.r21.lsanca01.us.bb.verio.net (129.250.2.10) 19.5 ms (ttl=250!) . ge-1-0.a01.lsanca02.us.ra.verio.net (129.250.29.131) 3.44 ms . 66.238.50.26.ptr.us.xo.net (66.238.50.26) 13.2 ms (ttl=248!) . dellfwisi.ultradns.net (204.74.98.2) 13.8 ms (ttl=57!) !H traceroute to tld2.ultradns.net (204.74.113.1): 1-30 hops, 38 byte packets ... . p5-1-0-0.RAR1.LA-CA.us.xo.net (65.106.5.13) 2.64 ms (ttl=250!) . p0-0-0.MAR1.LA-CA.us.xo.net (65.106.5.6) 2.73 ms (ttl=249!) . p1-0.CHR1.LA-CA.us.xo.net (207.88.81.166) 2.78 ms . 66.238.50.26.ptr.us.xo.net (66.238.50.26) 35.0 ms . dellfwisi.ultradns.net (204.74.98.2) 29.7 ms (ttl=57!) !H = Dallas TX = traceroute to tld1.ultradns.net (204.74.112.1): 1-30 hops, 38 byte packets ... . p16-0-0-0.r01.atlnga03.us.bb.verio.net (129.250.4.195) 25.3 ms (ttl=250!) . p16-2-0-0.r00.atlnga03.us.bb.verio.net (129.250.5.16) 25.3 ms (ttl=249!) . p16-1-0-0.r01.mclnva02.us.bb.verio.net (129.250.2.48) 40.8 ms (ttl=247!) . ge-1-0-0.a00.mclnva02.us.ra.verio.net (129.250.31.170) 40.8 ms (ttl=246!) . 168.143.247.38 (168.143.247.38) 44.1 ms (ttl=246!) . 64.124.112.141.ultradns.com (64.124.112.141) 45.0 ms (ttl=244!) . dellfwpxvn.ultradns.net (204.74.104.2) 43.7 ms (ttl=53!) !H traceroute to tld2.ultradns.net (204.74.113.1): 1-30 hops, 38 byte packets ... . sl-bb26-fw-5-1.sprintlink.net (144.232.20.147) 7.54 ms . sl-bb25-fw-15-0.sprintlink.net (144.232.11.89) 32.0 ms . sl-bb23-atl-10-0.sprintlink.net (144.232.20.60) 36.4 ms . sl-bb26-rly-14-1.sprintlink.net (144.232.20.65) 33.3 ms . sl-st21-ash-14-2.sprintlink.net (144.232.20.3) 34.8 ms . sl-xocomm-5-0.sprintlink.net (144.223.246.50) 34.2 ms . p5-0-0.RAR1.Washington-DC.us.xo.net (65.106.3.133) 35.3 ms (ttl=245!) . p6-1-0.MAR1.Washington-DC.us.xo.net (65.106.3.182) 35.7 ms (ttl=244!) . p0-0.CHR1.Washington-DC.us.xo.net (207.88.87.10) 35.7 ms . 64.124.112.141.ultradns.com (64.124.112.141) 39.7 ms (ttl=244!) . dellfwpxvn.ultradns.net (204.74.104.2) 40.0 ms (ttl=53!) !H = Chicago IL = traceroute to tld1.ultradns.net (204.74.112.1): 1-30 hops, 38 byte packets ... . gige3-2.core2.Chicago1.Level3.net (209.244.8.185) 0.796 ms . so-4-1-0.bbr1.Chicago1.level3.net (209.247.10.165) 0.905 ms (ttl=250!) . so-6-0-0.edge1.Chicago1.Level3.net (209.244.8.10) 1.01 ms (ttl=249!) . verio-level3-oc12.Chicago1.Level3.net (209.0.227.66) 0.860 ms (ttl=251!) . ge-1-2.a00.chcgil07.us.ra.verio.net (129.250.25.136) 0.967 ms (ttl=253!) . fa-2-1.a00.chcgil07.us.ce.verio.net (128.242.186.134) 1.04 ms (ttl=251!) . dellfweqch.ultradns.net (204.74.102.2) 0.881 ms (ttl=60!) !H traceroute to tld2.ultradns.net (204.74.113.1): 1-30 hops, 38 byte packets ... . 0.so-1-0-0.XL2.CHI13.ALTER.NET (152.63.69.182) 1.58 ms (ttl=251!) . POS7-0.BR1.CHI13.ALTER.NET (152.63.73.22) 1.29 ms . a11-0d114.IR1.Chicago2-IL.us.xo.net (206.111.2.73) 1.11 ms (ttl=251!) . p5-0-0.RAR1.Chicago-IL.us.xo.net (65.106.6.133) 1.40 ms . p4-0-0.MAR1.Chicago-IL.us.xo.net (65.106.6.142) 2.03 ms . p0-0.CHR1.Chicago-IL.us.xo.net (207.88.84.10) 1.80 ms (ttl=248!) . * . dellfweqch.ultradns.net (204.74.102.2) 1.48 ms (ttl=60!) !H === : Are you absolutely, positively sure this cluster was responding to 0 : queries, Yes. My mail server was more or less dead (it's a .org) for an hour, and I was trying frantically to get DNS to resolve with all kinds of dig requests directly to the IPs and traceroute tests until I gave up after an hour. : but still propagating those
Re: .ORG problems this evening
On Thu, Sep 18, 2003 at 12:50:28AM -0400, Todd Vierling wrote: tld[12].ultradns.net, the NS for .ORG, was completely unreachable for about an hour or two this evening, timing out on all DNS queries. Anyone else see similar? (The hosts are unpingable and untracerouteable, so I had to use DNS queries to determine when they were back up.) It makes me wonder how UltraDNS got a contract to manage the domain on all of two nameservers hosted on the same subnet, given that they were supposed to have deployed geographically diverse (or something like that) servers. But then, we know ICANN smokes the crack liberally at times dare i say duh, but ... ultradns uses the power of anycast to have these ips that appear to be on close subnets in geographyically diverse locations. go to europe, traceroute to them, it goes to a place in europe. go to asia, traceroute to them, it goes to a machine in asia. in the us, it goes to one of a few geographical locations ... could you provide some more technical details, other than your postulations that they have two machines on network-wise close subnets and that is the problem? - jared sigh -- -- Todd Vierling [EMAIL PROTECTED] [EMAIL PROTECTED] -- Jared Mauch | pgp key available via finger from [EMAIL PROTECTED] clue++; | http://puck.nether.net/~jared/ My statements are only mine.
Re: .ORG problems this evening
TV Date: Thu, 18 Sep 2003 00:50:28 -0400 (EDT) TV From: Todd Vierling TV tld[12].ultradns.net, the NS for .ORG, was completely TV unreachable for about an hour or two this evening, timing out TV on all DNS queries. Anyone else see similar? (The hosts are I don't recall having troubles this evening. Perhaps there was a DoS or something pounding the anycast node you were hitting? With multiple sinkholes, it's no longer all or nothing. Anycast is good stuff, IMHO, but not impervious to flooding. Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Todd Vierling wrote: It makes me wonder how UltraDNS got a contract to manage the domain on all of two nameservers hosted on the same subnet, given that they were supposed to have deployed geographically diverse (or something like that) servers. But then, we know ICANN smokes the crack liberally at times Just because they hosts are on the same subnet and are apparently behind the same end device for you doesn't make them non-geographically diverse if they are really anycast pods, does it? It really just means one anycast pod was down for a time :( It is one of the things that anycast makes difficult though :( Troubleshooting anycast from the outside is a bear.
Re: .ORG problems this evening
On Thu, 18 Sep 2003, Christopher L. Morrow wrote: On Thu, 18 Sep 2003, Todd Vierling wrote: It makes me wonder how UltraDNS got a contract to manage the domain on all of two nameservers hosted on the same subnet, given that they were supposed to have deployed geographically diverse (or something like that) servers. But then, we know ICANN smokes the crack liberally at times Just because they hosts are on the same subnet and are apparently behind the same end device for you doesn't make them non-geographically diverse if they are really anycast pods, does it? It really just means one anycast pod was down for a time :( It is one of the things that anycast makes difficult though :( Troubleshooting anycast from the outside is a bear. Oh, and 'same subnet' doesn't mean 'same ethernet' all auth dns servers in 198.6.1.0/24 aren't on one ethernet, though it'd sure make MY life easier if they were :)
Re: .ORG problems this evening
CLM Date: Thu, 18 Sep 2003 05:28:05 + (GMT) CLM From: Christopher L. Morrow CLM Just because they hosts are on the same subnet and are CLM apparently behind the same end device for you doesn't make CLM them non-geographically diverse if they are really anycast CLM pods, does it? It really just means one anycast pod was down CLM for a time :( Ideally, though, an anycast node should yank the route if the service in question dies. I say ideally because we still haven't had DNS properly make friends with BGP... and such flaps really shouldn't be seen, which means having a contiguous internal network and [properly] decoupling IGP from EGP... ...and suddenly I'm making many assumptions. ;-) CLM It is one of the things that anycast makes difficult though CLM :( Troubleshooting anycast from the outside is a bear. It's a lot like multihoming, only with different geography. Unicast IP addresses are analogous to world-facing router interfaces. Tip for anyone considering playing with anycast, particularly on the same ethernet segment: Bind the anycast IP addresses to your loopback interface. Eddy -- Brotsman Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses : [EMAIL PROTECTED] -or- [EMAIL PROTECTED] -or- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: .ORG problems this evening
Todd Vierling wrote: tld[12].ultradns.net, the NS for .ORG, was completely unreachable for about an hour or two this evening, timing out on all DNS queries. Anyone else see similar? (The hosts are unpingable and untracerouteable, so I had to use DNS queries to determine when they were back up.) At any given moment, UltraDNS (and I am sure other root and tld servers) are under attack somewhere from someone. Additionally the monitors that test each of the anycast nodes reported no outages. Neither did the useful monitors that Rob Thomas runs at (http://www.cymru.com/DNS/gtlddns-o.html) Nor did the many helpful customers who use UltraDNS, and who run constant tests to each individual anycast node in search of an SLA event that may provide a service credit. ;-) Perhaps you had a network problem internally? It makes me wonder how UltraDNS got a contract to manage the domain on all of two nameservers hosted on the same subnet, given that they were supposed to have deployed geographically diverse (or something like that) servers. Fortunately ICANN and the other decision makers were actually network clueful, and could tell that 204.74.112.1 and 204.74.113.1 are actually different subnets ;-) As an aside, using ping or traceroute at *any* time to see if dns servers are working is not a great idea. -- Rodney Joffe Speaking on behalf of no-one other himself.