Re: Help with unresolvable domain (subdomain, actually)
In message , John Wobus write s: > >> Then the load balancer should return default records or 0.0.0.0/:: to > >> indicate the name is good but doesn't currently have a address. > > I like that solution, actually. Even if the client doesn't recognize > > it > > as a "special" address, hopefully if it tries to connect to it, the > > packet won't make it past the first router or switch hop... > > > > Has anyone proposed this to the load-balancer vendors? > > Isn't this just a specific instance of configuring a load balancer's > fallback address? E.g., when server A and B are both down, give > address of > server C. Some load balancers allow configuration of a server D to > be used only if C is down as well. Address C or D could be configured > to be 0.0.0.0 and configured with no test for "up-ness". > > (Not that I'm completely happy with 0.0.0.0 or any other address that > local folks could conceivably have figured out some crazy use for.) 0.0.0.0, means I don't know my address. If you see packets on the wire with 0.0.0.0, which you do at boot time, the machine that sent them doesn't know its IP address yet. > John > ___ > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
Then the load balancer should return default records or 0.0.0.0/:: to indicate the name is good but doesn't currently have a address. I like that solution, actually. Even if the client doesn't recognize it as a "special" address, hopefully if it tries to connect to it, the packet won't make it past the first router or switch hop... Has anyone proposed this to the load-balancer vendors? Isn't this just a specific instance of configuring a load balancer's fallback address? E.g., when server A and B are both down, give address of server C. Some load balancers allow configuration of a server D to be used only if C is down as well. Address C or D could be configured to be 0.0.0.0 and configured with no test for "up-ness". (Not that I'm completely happy with 0.0.0.0 or any other address that local folks could conceivably have figured out some crazy use for.) John ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
On 3/2/2011 1:57 PM, David Sparro wrote: On 3/2/2011 1:20 PM, Kevin Darcy wrote: I'm not saying I agree with this perspective, only that I've dealt with load-balancer vendors enough (Cisco in particular) to understand that this is where they're coming from. Besides, what alternative is there? If the load-balancer returns an address that it knows to not be working, then it's purposely causing the client to go into a relatively-slow connection-timeout failure mode. Is that responsible behavior? Short answer: yes. The DNS side of the load-balancer has does't know why it got the query. Maybe I was trying to ping the endpoint, I could have been trying to make an FTP connection, or HTTPS, etc. In order for it to be consistent, it would have to be able to figure out that a SERVFAIL should be returned for the query from my gopher:// connection, but an IP should be returned for http://. That's an implementation decision. If an implementor decides to run a bunch of disparate services under a single FQDN (as opposed to, say, www.example.com/ftp.example.com/gopher.example.com and so forth), then they'd need to come up with a reasonable way with their load-balancer keepalives to decide when the whole thing is "down" or not. If the vast majority of their traffic is web-based (typical), they may choose to call the whole thing "down" if the web part is down, and the other parts (FTP, gopher, whatever) will just have to suffer. That's the price to be paid for the convenience of having a single name for a bunch of different services -- lack of granularity. Things would be better, of course, if clients used SRV records for accessing resources -- then a single "service" name could be differentiated by protocol. But for whatever reason client software authors have not, by and large, embraced this idea. If it gives a "normal" response that is lacking answer information (NODATA, NXDOMAIN), then this response gets negatively cached, and the negative cache entry may delay clients from re-trying the resource even after it recovers. So, what's left? NOTIMP? FORMERR? REFUSED? NOTAUTH? Those aren't any better than SERVFAIL from a strictly functional perspective, and are even more misleading and confusing with respect to the real source of the problem. SERVFAIL caching is coming to a BIND server release this year. (I listened to the BIND 9.8 features webinar this morning. I don't remember which version (9.9 or 9.10) had this attached to it on the What's Next slide.) I think Mark has the right approach: return a "special" address (e.g. 0.0.0.0 or the IPv6 equivalent) in this situation, instead of messing with the RCODE. - Kevin ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
On Mar 2, 2011, at 1:21 PM, Mike Bernhardt wrote: What's really strange is that when we attempt a query, be it DIG or an attempt to browse tools.cisco.com, they send some sort of query back to us from/to UDP 53 Many GSLB solutions attempt to figure out what the best location to serve from is by sending a query to the server that just queried *them* -- this allows them to figure out latency and decide which cluster might be closest I'm suspecting (although I avoid Cisco LB like the plague and so am not sure) that this is the cause. The other possibility -- I ran tcpdump to see if I could see what the query might be I found that I was getting a FormErr response to my initial query, causing me to requery without DNSSEC / EDNS0 -- maybe you are actually not seeing a query from them, mebe its a FormErr response that your FW is noting? W wkumari@vimes:~/src/perl/IODEF$ dig +edns=0 tools.cisco.com @128.107.227.197 ; <<>> DiG 9.7.2-P3 <<>> +edns=0 tools.cisco.com @128.107.227.197 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 41568 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; Query time: 75 msec ;; SERVER: 128.107.227.197#53(128.107.227.197) ;; WHEN: Wed Mar 2 14:17:38 2011 ;; MSG SIZE rcvd: 33 wkumari@vimes:~/src/perl/IODEF$ dig tools.cisco.com @128.107.227.197 ; <<>> DiG 9.7.2-P3 <<>> tools.cisco.com @128.107.227.197 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54960 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; ANSWER SECTION: tools.cisco.com.20 IN A 173.37.145.8 ;; Query time: 75 msec ;; SERVER: 128.107.227.197#53(128.107.227.197) ;; WHEN: Wed Mar 2 14:17:45 2011 ;; MSG SIZE rcvd: 49 . We drop it at the firewall due to some sort of "sanity check" so I can't see the contents. This is in addition to the SERVFAIL message. Although I get SERVFAIL, Kloth.net does not, even if we DIG the same server: cax01-bb14-dcz01n-gss1.cisco.com From Kloth ; <<>> DiG 9.3.2 <<>> @cax01-bb14-dcz01n-gss1.cisco.com tools.cisco.com A ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41388 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; ANSWER SECTION: tools.cisco.com.20 IN A 72.163.4.38 ;; Query time: 131 msec ;; SERVER: 173.37.144.100#53(173.37.144.100) ;; WHEN: Wed Mar 2 19:15:04 2011 ;; MSG SIZE rcvd: 49 From Us [root@ns1 ~]# dig -b 148.165.3.10 @cax01-bb14-dcz01n-gss1.cisco.com tools.cisco.com ; <<>> DiG 9.4.3-P3 <<>> -b 148.165.3.10 @cax01-bb14-dcz01n- gss1.cisco.com tools.cisco.com ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 26463 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; Query time: 45 msec ;; SERVER: 173.37.144.100#53(173.37.144.100) ;; WHEN: Wed Mar 2 10:15:31 2011 ;; MSG SIZE rcvd: 33 So I wonder if the query they make is some kind of authentication attempt? -Original Message- From: Mark Andrews [mailto:ma...@isc.org] Sent: Tuesday, March 01, 2011 3:31 PM To: Kevin Darcy Cc: bind-us...@isc.org Subject: Re: Help with unresolvable domain (subdomain, actually) In message <4d6d7268.1080...@chrysler.com>, Kevin Darcy writes: I got a trouble ticket on this too. From the looks of things, Cisco is using GSSes to load-balance this site. GSSes return SERVFAIL if all of the resources behind the load-balancer are down (which it determines via a heartbeat mechanism). So I think this is a "simple" case of a website (or cluster) going down. It was down earlier today, then up again, as of this writing, it is down again. DNS doesn't really have a response code of "requested resource not available", so SERVFAIL is Cisco's closest approximation. It has the drawback, however, of often making other sorts of problems appear to be DNS problems. That's just a cross that we DNS admins have to bear... - Kevin Then the load balancer should return default records or 0.0.0.0/:: to indicate the name is good but doesn't currently have a address. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Austra
RE: Help with unresolvable domain (subdomain, actually)
> A few options: >1: once the LB knows that all back-ends are down, it can continue to answer >with the correct A, but drop the TTL to be much shorter -- this allows >things to recover faster. This would work well because the actually web site wasn't down, at least not yesterday. If I substituted the IP address for the domain name, it was reachable and links maintained the domain portion of the URL in dotted decimal format. It seems only DNS is hosed. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
On 3/1/2011 6:30 PM, Mark Andrews wrote: In message<4d6d7268.1080...@chrysler.com>, Kevin Darcy writes: I got a trouble ticket on this too. From the looks of things, Cisco is using GSSes to load-balance this site. GSSes return SERVFAIL if all of the resources behind the load-balancer are down (which it determines via a heartbeat mechanism). So I think this is a "simple" case of a website (or cluster) going down. It was down earlier today, then up again, as of this writing, it is down again. DNS doesn't really have a response code of "requested resource not available", so SERVFAIL is Cisco's closest approximation. It has the drawback, however, of often making other sorts of problems appear to be DNS problems. That's just a cross that we DNS admins have to bear... - Kevin Then the load balancer should return default records or 0.0.0.0/:: to indicate the name is good but doesn't currently have a address. I like that solution, actually. Even if the client doesn't recognize it as a "special" address, hopefully if it tries to connect to it, the packet won't make it past the first router or switch hop... Has anyone proposed this to the load-balancer vendors? - Kevin ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
On Mar 2, 2011, at 1:20 PM, Kevin Darcy wrote: On 3/2/2011 10:34 AM, David Sparro wrote: On 3/1/2011 5:27 PM, Kevin Darcy wrote: See my other post. This is designed-in behavior for Cisco GSSes, since there is no "service unavailable, try again later" RCODE. When the question is "what is the ip address of 'foo'" an answer of "the web server is down" in nonsensical. Hmmm... matter of perspective I suppose. Load-balancer architecture sees DNS as just the externally-visible portion of a whole subsystem. The SERVFAIL, in their view, does not communicate a DNS problem _per_se_, but a problem with the whole subsystem. It's more of a "what you're trying to get to is unavailable right now" message, communicated, in their view, _through_ DNS (as a sort of conduit), not necessarily _about_ DNS. They don't see it as specifically meaning "I've got a DNS problem". But, everyone else *will*. I'm not saying I agree with this perspective, only that I've dealt with load-balancer vendors enough (Cisco in particular) to understand that this is where they're coming from. Besides, what alternative is there? If the load-balancer returns an address that it knows to not be working, then it's purposely causing the client to go into a relatively-slow connection-timeout failure mode. Is that responsible behavior? If it gives a "normal" response that is lacking answer information (NODATA, NXDOMAIN), then this response gets negatively cached, and the negative cache entry may delay clients from re-trying the resource even after it recovers. So, what's left? NOTIMP? FORMERR? REFUSED? NOTAUTH? Those aren't any better than SERVFAIL from a strictly functional perspective, and are even more misleading and confusing with respect to the real source of the problem. A few options: 1: once the LB knows that all back-ends are down, it can continue to answer with the correct A, but drop the TTL to be much shorter -- this allows things to recover faster. 2: have the LB itself serve a 'sorry' page -- the ability to serve static content locally should be simple, but if it not able to do so it can always return a set of 'sorry' servers optimized for this purpose. You shouldn't be breaking both your serving *and* 'sorry' backends often enough for there to be special handling needed (and, if you are, you shouldn't make things worse by making other folk waste their time debugging your problem). W - Kevin ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users -- I had no shoes and wept. Then I met a man who had no feet. So I said, "Hey man, got any shoes you're not using?" ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
On 3/2/2011 1:20 PM, Kevin Darcy wrote: I'm not saying I agree with this perspective, only that I've dealt with load-balancer vendors enough (Cisco in particular) to understand that this is where they're coming from. Besides, what alternative is there? If the load-balancer returns an address that it knows to not be working, then it's purposely causing the client to go into a relatively-slow connection-timeout failure mode. Is that responsible behavior? Short answer: yes. The DNS side of the load-balancer has does't know why it got the query. Maybe I was trying to ping the endpoint, I could have been trying to make an FTP connection, or HTTPS, etc. In order for it to be consistent, it would have to be able to figure out that a SERVFAIL should be returned for the query from my gopher:// connection, but an IP should be returned for http://. If it gives a "normal" response that is lacking answer information (NODATA, NXDOMAIN), then this response gets negatively cached, and the negative cache entry may delay clients from re-trying the resource even after it recovers. So, what's left? NOTIMP? FORMERR? REFUSED? NOTAUTH? Those aren't any better than SERVFAIL from a strictly functional perspective, and are even more misleading and confusing with respect to the real source of the problem. SERVFAIL caching is coming to a BIND server release this year. (I listened to the BIND 9.8 features webinar this morning. I don't remember which version (9.9 or 9.10) had this attached to it on the What's Next slide.) -- Dave ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
RE: Help with unresolvable domain (subdomain, actually)
What's really strange is that when we attempt a query, be it DIG or an attempt to browse tools.cisco.com, they send some sort of query back to us from/to UDP 53. We drop it at the firewall due to some sort of "sanity check" so I can't see the contents. This is in addition to the SERVFAIL message. Although I get SERVFAIL, Kloth.net does not, even if we DIG the same server: cax01-bb14-dcz01n-gss1.cisco.com >From Kloth ; <<>> DiG 9.3.2 <<>> @cax01-bb14-dcz01n-gss1.cisco.com tools.cisco.com A ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41388 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; ANSWER SECTION: tools.cisco.com. 20 IN A 72.163.4.38 ;; Query time: 131 msec ;; SERVER: 173.37.144.100#53(173.37.144.100) ;; WHEN: Wed Mar 2 19:15:04 2011 ;; MSG SIZE rcvd: 49 >From Us [root@ns1 ~]# dig -b 148.165.3.10 @cax01-bb14-dcz01n-gss1.cisco.com tools.cisco.com ; <<>> DiG 9.4.3-P3 <<>> -b 148.165.3.10 @cax01-bb14-dcz01n-gss1.cisco.com tools.cisco.com ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 26463 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; Query time: 45 msec ;; SERVER: 173.37.144.100#53(173.37.144.100) ;; WHEN: Wed Mar 2 10:15:31 2011 ;; MSG SIZE rcvd: 33 So I wonder if the query they make is some kind of authentication attempt? -Original Message- From: Mark Andrews [mailto:ma...@isc.org] Sent: Tuesday, March 01, 2011 3:31 PM To: Kevin Darcy Cc: bind-us...@isc.org Subject: Re: Help with unresolvable domain (subdomain, actually) In message <4d6d7268.1080...@chrysler.com>, Kevin Darcy writes: > I got a trouble ticket on this too. > > From the looks of things, Cisco is using GSSes to load-balance this > site. GSSes return SERVFAIL if all of the resources behind the > load-balancer are down (which it determines via a heartbeat mechanism). > So I think this is a "simple" case of a website (or cluster) going down. > It was down earlier today, then up again, as of this writing, it is down > again. > > DNS doesn't really have a response code of "requested resource not > available", so SERVFAIL is Cisco's closest approximation. It has the > drawback, however, of often making other sorts of problems appear to be > DNS problems. That's just a cross that we DNS admins have to bear... > > - Kevin Then the load balancer should return default records or 0.0.0.0/:: to indicate the name is good but doesn't currently have a address. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
On 3/2/2011 10:34 AM, David Sparro wrote: On 3/1/2011 5:27 PM, Kevin Darcy wrote: See my other post. This is designed-in behavior for Cisco GSSes, since there is no "service unavailable, try again later" RCODE. When the question is "what is the ip address of 'foo'" an answer of "the web server is down" in nonsensical. Hmmm... matter of perspective I suppose. Load-balancer architecture sees DNS as just the externally-visible portion of a whole subsystem. The SERVFAIL, in their view, does not communicate a DNS problem _per_se_, but a problem with the whole subsystem. It's more of a "what you're trying to get to is unavailable right now" message, communicated, in their view, _through_ DNS (as a sort of conduit), not necessarily _about_ DNS. They don't see it as specifically meaning "I've got a DNS problem". I'm not saying I agree with this perspective, only that I've dealt with load-balancer vendors enough (Cisco in particular) to understand that this is where they're coming from. Besides, what alternative is there? If the load-balancer returns an address that it knows to not be working, then it's purposely causing the client to go into a relatively-slow connection-timeout failure mode. Is that responsible behavior? If it gives a "normal" response that is lacking answer information (NODATA, NXDOMAIN), then this response gets negatively cached, and the negative cache entry may delay clients from re-trying the resource even after it recovers. So, what's left? NOTIMP? FORMERR? REFUSED? NOTAUTH? Those aren't any better than SERVFAIL from a strictly functional perspective, and are even more misleading and confusing with respect to the real source of the problem. - Kevin ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
On Mar 1, 2011, at 5:27 PM, Kevin Darcy wrote: See my other post. This is designed-in behavior for Cisco GSSes, since there is no "service unavailable, try again later" RCODE. Yes[0]. W [0]: there is no "service unavailable, try again later" RCODE. - Kevin On 3/1/2011 4:25 PM, Mark Andrews wrote: Ring Cisco and complain that their nameservers are broken for the zone. ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13389 ;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; Query time: 204 msec ;; SERVER: 72.163.4.28#53(rcdn9-14p-dcz05n-gss1.cisco.com) ;; WHEN: Wed Mar 2 08:23:59 2011 ;; MSG SIZE rcvd: 33 ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users -- There are only 10 types of people in this world -- those who understand binary arithmetic and those who don't. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
On 3/1/2011 5:27 PM, Kevin Darcy wrote: See my other post. This is designed-in behavior for Cisco GSSes, since there is no "service unavailable, try again later" RCODE. - Kevin When the question is "what is the ip address of 'foo'" an answer of "the web server is down" in nonsensical. -- Dave ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
In message <4d6d7268.1080...@chrysler.com>, Kevin Darcy writes: > I got a trouble ticket on this too. > > From the looks of things, Cisco is using GSSes to load-balance this > site. GSSes return SERVFAIL if all of the resources behind the > load-balancer are down (which it determines via a heartbeat mechanism). > So I think this is a "simple" case of a website (or cluster) going down. > It was down earlier today, then up again, as of this writing, it is down > again. > > DNS doesn't really have a response code of "requested resource not > available", so SERVFAIL is Cisco's closest approximation. It has the > drawback, however, of often making other sorts of problems appear to be > DNS problems. That's just a cross that we DNS admins have to bear... > > - Kevin Then the load balancer should return default records or 0.0.0.0/:: to indicate the name is good but doesn't currently have a address. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
See my other post. This is designed-in behavior for Cisco GSSes, since there is no "service unavailable, try again later" RCODE. - Kevin On 3/1/2011 4:25 PM, Mark Andrews wrote: Ring Cisco and complain that their nameservers are broken for the zone. ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13389 ;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; Query time: 204 msec ;; SERVER: 72.163.4.28#53(rcdn9-14p-dcz05n-gss1.cisco.com) ;; WHEN: Wed Mar 2 08:23:59 2011 ;; MSG SIZE rcvd: 33 ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Help with unresolvable domain (subdomain, actually)
I got a trouble ticket on this too. From the looks of things, Cisco is using GSSes to load-balance this site. GSSes return SERVFAIL if all of the resources behind the load-balancer are down (which it determines via a heartbeat mechanism). So I think this is a "simple" case of a website (or cluster) going down. It was down earlier today, then up again, as of this writing, it is down again. DNS doesn't really have a response code of "requested resource not available", so SERVFAIL is Cisco's closest approximation. It has the drawback, however, of often making other sorts of problems appear to be DNS problems. That's just a cross that we DNS admins have to bear... - Kevin On 3/1/2011 4:08 PM, Mike Bernhardt wrote: I should add that tools.cisco.com was resolvable at one time, so either Cisco's behavior has changed, or our firewall's behavior has changed. We obviously haven't upgraded our BIND version in a while (9.4.3P3), so I don't think the problem is BIND. -Original Message- From: Mike Bernhardt [mailto:bernha...@bart.gov] Sent: Tuesday, March 01, 2011 12:40 PM To: bind-users@lists.isc.org Subject: Help with unresolvable domain (subdomain, actually) For some reason, we can no longer resolve tools.cisco.com. there are several clues to the problem but I can't put them together. Here is some dig output. I know that the time stamps don't all match up below, but the results are typical: [root@ns1 ~]# dig +trace -b 148.165.3.10 tools.cisco.com ;<<>> DiG 9.4.3-P3<<>> +trace -b 148.165.3.10 tools.cisco.com ;; global options: printcmd . 90550 IN NS i.root-servers.net. . 90550 IN NS h.root-servers.net. . 90550 IN NS e.root-servers.net. . 90550 IN NS d.root-servers.net. . 90550 IN NS j.root-servers.net. . 90550 IN NS k.root-servers.net. . 90550 IN NS l.root-servers.net. . 90550 IN NS g.root-servers.net. . 90550 IN NS f.root-servers.net. . 90550 IN NS a.root-servers.net. . 90550 IN NS m.root-servers.net. . 90550 IN NS c.root-servers.net. . 90550 IN NS b.root-servers.net. ;; Received 512 bytes from 148.165.3.10#53(148.165.3.10) in 0 ms com.172800 IN NS l.gtld-servers.net. com.172800 IN NS e.gtld-servers.net. com.172800 IN NS k.gtld-servers.net. com.172800 IN NS i.gtld-servers.net. com.172800 IN NS m.gtld-servers.net. com.172800 IN NS j.gtld-servers.net. com.172800 IN NS a.gtld-servers.net. com.172800 IN NS g.gtld-servers.net. com.172800 IN NS c.gtld-servers.net. com.172800 IN NS f.gtld-servers.net. com.172800 IN NS b.gtld-servers.net. com.172800 IN NS d.gtld-servers.net. com.172800 IN NS h.gtld-servers.net. ;; Received 505 bytes from 198.41.0.4#53(a.root-servers.net) in 13 ms cisco.com. 172800 IN NS ns1.cisco.com. cisco.com. 172800 IN NS ns2.cisco.com. ;; Received 101 bytes from 192.54.112.30#53(h.gtld-servers.net) in 154 ms tools.cisco.com.86400 IN NS rcdn9-14p-dcz05n-gss1.cisco.com. tools.cisco.com.86400 IN NS rtp5-dmz-gss1.cisco.com. tools.cisco.com.86400 IN NS sjck-dmz-gss1.cisco.com. tools.cisco.com.86400 IN NS cax01-bb14-dcz01n-gss1.cisco.com. ;; Received 226 bytes from 64.102.255.44#53(ns2.cisco.com) in 75 ms ;; Received 33 bytes from 72.163.4.28#53(rcdn9-14p-dcz05n-gss1.cisco.com) in 47 ms Now, focusing in on rtp5-dmz-gss1.cisco.com for further analysis (just picked it out of the group): [root@ns1 ~]# dig -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com ;<<>> DiG 9.4.3-P3<<>> -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5165 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;tools.cisco.com.
Re: Help with unresolvable domain (subdomain, actually)
.com. cisco.com. 172800 IN NS ns2.cisco.com. ;; Received 101 bytes from 192.35.51.30#53(f.gtld-servers.net) in 104 ms tools.cisco.com.86400 IN NS rcdn9-14p-dcz05n-gss1.cisco.com. tools.cisco.com.86400 IN NS rtp5-dmz-gss1.cisco.com. tools.cisco.com.86400 IN NS cax01-bb14-dcz01n-gss1.cisco.com. tools.cisco.com.86400 IN NS sjck-dmz-gss1.cisco.com. ;; Received 226 bytes from 64.102.255.44#53(ns2.cisco.com) in 27 ms tools.cisco.com.20 IN A 128.107.242.16 ;; Received 49 bytes from 64.102.246.5#53(rtp5-dmz-gss1.cisco.com) in 32 ms You might be able to reolve it now too. -- Shaoquan Lin, Computer Systems Manager School of Engineering, City College of New York Phone: (212) 650 6762 Fax: (212) 650 5768 E-mail: l...@ccny.cuny.edu - Original Message - From: "Mike Bernhardt" To: Sent: Tuesday, March 01, 2011 3:39 PM Subject: Help with unresolvable domain (subdomain, actually) For some reason, we can no longer resolve tools.cisco.com. there are several clues to the problem but I can't put them together. Here is some dig output. I know that the time stamps don't all match up below, but the results are typical: [root@ns1 ~]# dig +trace -b 148.165.3.10 tools.cisco.com ; <<>> DiG 9.4.3-P3 <<>> +trace -b 148.165.3.10 tools.cisco.com ;; global options: printcmd . 90550 IN NS i.root-servers.net. . 90550 IN NS h.root-servers.net. . 90550 IN NS e.root-servers.net. . 90550 IN NS d.root-servers.net. . 90550 IN NS j.root-servers.net. . 90550 IN NS k.root-servers.net. . 90550 IN NS l.root-servers.net. . 90550 IN NS g.root-servers.net. . 90550 IN NS f.root-servers.net. . 90550 IN NS a.root-servers.net. . 90550 IN NS m.root-servers.net. . 90550 IN NS c.root-servers.net. . 90550 IN NS b.root-servers.net. ;; Received 512 bytes from 148.165.3.10#53(148.165.3.10) in 0 ms com.172800 IN NS l.gtld-servers.net. com.172800 IN NS e.gtld-servers.net. com.172800 IN NS k.gtld-servers.net. com.172800 IN NS i.gtld-servers.net. com.172800 IN NS m.gtld-servers.net. com.172800 IN NS j.gtld-servers.net. com.172800 IN NS a.gtld-servers.net. com.172800 IN NS g.gtld-servers.net. com.172800 IN NS c.gtld-servers.net. com.172800 IN NS f.gtld-servers.net. com.172800 IN NS b.gtld-servers.net. com.172800 IN NS d.gtld-servers.net. com.172800 IN NS h.gtld-servers.net. ;; Received 505 bytes from 198.41.0.4#53(a.root-servers.net) in 13 ms cisco.com. 172800 IN NS ns1.cisco.com. cisco.com. 172800 IN NS ns2.cisco.com. ;; Received 101 bytes from 192.54.112.30#53(h.gtld-servers.net) in 154 ms tools.cisco.com.86400 IN NS rcdn9-14p-dcz05n-gss1.cisco.com. tools.cisco.com.86400 IN NS rtp5-dmz-gss1.cisco.com. tools.cisco.com.86400 IN NS sjck-dmz-gss1.cisco.com. tools.cisco.com.86400 IN NS cax01-bb14-dcz01n-gss1.cisco.com. ;; Received 226 bytes from 64.102.255.44#53(ns2.cisco.com) in 75 ms ;; Received 33 bytes from 72.163.4.28#53(rcdn9-14p-dcz05n-gss1.cisco.com) in 47 ms Now, focusing in on rtp5-dmz-gss1.cisco.com for further analysis (just picked it out of the group): [root@ns1 ~]# dig -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com ; <<>> DiG 9.4.3-P3 <<>> -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5165 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; Query time: 75 msec ;; SERVER: 64.102.246.5#53(64.102.246.5) ;; WHEN: Tue Mar 1 12:22:57 2011 ;; MSG SIZE rcvd: 33 Here is the output of tcpdump on my server, querying the same server via nslookup elsewhere: [root@ns1 ~]# tcpdump host -i bond0 64.102.246.5 -n -p -vvv tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size 96 bytes 12:14:53.373614 IP (tos 0x0, ttl 64
Re: Help with unresolvable domain (subdomain, actually)
Ring Cisco and complain that their nameservers are broken for the zone. ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13389 ;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; Query time: 204 msec ;; SERVER: 72.163.4.28#53(rcdn9-14p-dcz05n-gss1.cisco.com) ;; WHEN: Wed Mar 2 08:23:59 2011 ;; MSG SIZE rcvd: 33 -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
RE: Help with unresolvable domain (subdomain, actually)
I should add that tools.cisco.com was resolvable at one time, so either Cisco's behavior has changed, or our firewall's behavior has changed. We obviously haven't upgraded our BIND version in a while (9.4.3P3), so I don't think the problem is BIND. -Original Message- From: Mike Bernhardt [mailto:bernha...@bart.gov] Sent: Tuesday, March 01, 2011 12:40 PM To: bind-users@lists.isc.org Subject: Help with unresolvable domain (subdomain, actually) For some reason, we can no longer resolve tools.cisco.com. there are several clues to the problem but I can't put them together. Here is some dig output. I know that the time stamps don't all match up below, but the results are typical: [root@ns1 ~]# dig +trace -b 148.165.3.10 tools.cisco.com ; <<>> DiG 9.4.3-P3 <<>> +trace -b 148.165.3.10 tools.cisco.com ;; global options: printcmd . 90550 IN NS i.root-servers.net. . 90550 IN NS h.root-servers.net. . 90550 IN NS e.root-servers.net. . 90550 IN NS d.root-servers.net. . 90550 IN NS j.root-servers.net. . 90550 IN NS k.root-servers.net. . 90550 IN NS l.root-servers.net. . 90550 IN NS g.root-servers.net. . 90550 IN NS f.root-servers.net. . 90550 IN NS a.root-servers.net. . 90550 IN NS m.root-servers.net. . 90550 IN NS c.root-servers.net. . 90550 IN NS b.root-servers.net. ;; Received 512 bytes from 148.165.3.10#53(148.165.3.10) in 0 ms com.172800 IN NS l.gtld-servers.net. com.172800 IN NS e.gtld-servers.net. com.172800 IN NS k.gtld-servers.net. com.172800 IN NS i.gtld-servers.net. com.172800 IN NS m.gtld-servers.net. com.172800 IN NS j.gtld-servers.net. com.172800 IN NS a.gtld-servers.net. com.172800 IN NS g.gtld-servers.net. com.172800 IN NS c.gtld-servers.net. com.172800 IN NS f.gtld-servers.net. com.172800 IN NS b.gtld-servers.net. com.172800 IN NS d.gtld-servers.net. com.172800 IN NS h.gtld-servers.net. ;; Received 505 bytes from 198.41.0.4#53(a.root-servers.net) in 13 ms cisco.com. 172800 IN NS ns1.cisco.com. cisco.com. 172800 IN NS ns2.cisco.com. ;; Received 101 bytes from 192.54.112.30#53(h.gtld-servers.net) in 154 ms tools.cisco.com.86400 IN NS rcdn9-14p-dcz05n-gss1.cisco.com. tools.cisco.com.86400 IN NS rtp5-dmz-gss1.cisco.com. tools.cisco.com.86400 IN NS sjck-dmz-gss1.cisco.com. tools.cisco.com.86400 IN NS cax01-bb14-dcz01n-gss1.cisco.com. ;; Received 226 bytes from 64.102.255.44#53(ns2.cisco.com) in 75 ms ;; Received 33 bytes from 72.163.4.28#53(rcdn9-14p-dcz05n-gss1.cisco.com) in 47 ms Now, focusing in on rtp5-dmz-gss1.cisco.com for further analysis (just picked it out of the group): [root@ns1 ~]# dig -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com ; <<>> DiG 9.4.3-P3 <<>> -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5165 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; Query time: 75 msec ;; SERVER: 64.102.246.5#53(64.102.246.5) ;; WHEN: Tue Mar 1 12:22:57 2011 ;; MSG SIZE rcvd: 33 Here is the output of tcpdump on my server, querying the same server via nslookup elsewhere: [root@ns1 ~]# tcpdump host -i bond0 64.102.246.5 -n -p -vvv tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size 96 bytes 12:14:53.373614 IP (tos 0x0, ttl 64, id 45237, offset 0, flags [none], proto: UDP (17), length: 61) 148.165.3.10.18673 > 64.102.246.5.domain: [bad udp cksum a78b!] 26095 A? tools.cisco.com. (33) 12:14:53.455684 IP (tos 0x0, ttl 54, id 7623, offset 0, flags [DF], proto: UDP (17), length: 61) 64.102.246.5.domain > 148.165.3.10.18673: [udp sum ok] 26095 ServFail- q: A? tools.cisco.com. 0/0/0 (33) Lastly, I see on our firewall log that we have a Checkpoint Smart Defense log entry due to it's belief that Cisco is sending us a malformed query packet, and it's being dr
Help with unresolvable domain (subdomain, actually)
For some reason, we can no longer resolve tools.cisco.com. there are several clues to the problem but I can't put them together. Here is some dig output. I know that the time stamps don't all match up below, but the results are typical: [root@ns1 ~]# dig +trace -b 148.165.3.10 tools.cisco.com ; <<>> DiG 9.4.3-P3 <<>> +trace -b 148.165.3.10 tools.cisco.com ;; global options: printcmd . 90550 IN NS i.root-servers.net. . 90550 IN NS h.root-servers.net. . 90550 IN NS e.root-servers.net. . 90550 IN NS d.root-servers.net. . 90550 IN NS j.root-servers.net. . 90550 IN NS k.root-servers.net. . 90550 IN NS l.root-servers.net. . 90550 IN NS g.root-servers.net. . 90550 IN NS f.root-servers.net. . 90550 IN NS a.root-servers.net. . 90550 IN NS m.root-servers.net. . 90550 IN NS c.root-servers.net. . 90550 IN NS b.root-servers.net. ;; Received 512 bytes from 148.165.3.10#53(148.165.3.10) in 0 ms com.172800 IN NS l.gtld-servers.net. com.172800 IN NS e.gtld-servers.net. com.172800 IN NS k.gtld-servers.net. com.172800 IN NS i.gtld-servers.net. com.172800 IN NS m.gtld-servers.net. com.172800 IN NS j.gtld-servers.net. com.172800 IN NS a.gtld-servers.net. com.172800 IN NS g.gtld-servers.net. com.172800 IN NS c.gtld-servers.net. com.172800 IN NS f.gtld-servers.net. com.172800 IN NS b.gtld-servers.net. com.172800 IN NS d.gtld-servers.net. com.172800 IN NS h.gtld-servers.net. ;; Received 505 bytes from 198.41.0.4#53(a.root-servers.net) in 13 ms cisco.com. 172800 IN NS ns1.cisco.com. cisco.com. 172800 IN NS ns2.cisco.com. ;; Received 101 bytes from 192.54.112.30#53(h.gtld-servers.net) in 154 ms tools.cisco.com.86400 IN NS rcdn9-14p-dcz05n-gss1.cisco.com. tools.cisco.com.86400 IN NS rtp5-dmz-gss1.cisco.com. tools.cisco.com.86400 IN NS sjck-dmz-gss1.cisco.com. tools.cisco.com.86400 IN NS cax01-bb14-dcz01n-gss1.cisco.com. ;; Received 226 bytes from 64.102.255.44#53(ns2.cisco.com) in 75 ms ;; Received 33 bytes from 72.163.4.28#53(rcdn9-14p-dcz05n-gss1.cisco.com) in 47 ms Now, focusing in on rtp5-dmz-gss1.cisco.com for further analysis (just picked it out of the group): [root@ns1 ~]# dig -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com ; <<>> DiG 9.4.3-P3 <<>> -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5165 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;tools.cisco.com. IN A ;; Query time: 75 msec ;; SERVER: 64.102.246.5#53(64.102.246.5) ;; WHEN: Tue Mar 1 12:22:57 2011 ;; MSG SIZE rcvd: 33 Here is the output of tcpdump on my server, querying the same server via nslookup elsewhere: [root@ns1 ~]# tcpdump host -i bond0 64.102.246.5 -n -p -vvv tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size 96 bytes 12:14:53.373614 IP (tos 0x0, ttl 64, id 45237, offset 0, flags [none], proto: UDP (17), length: 61) 148.165.3.10.18673 > 64.102.246.5.domain: [bad udp cksum a78b!] 26095 A? tools.cisco.com. (33) 12:14:53.455684 IP (tos 0x0, ttl 54, id 7623, offset 0, flags [DF], proto: UDP (17), length: 61) 64.102.246.5.domain > 148.165.3.10.18673: [udp sum ok] 26095 ServFail- q: A? tools.cisco.com. 0/0/0 (33) Lastly, I see on our firewall log that we have a Checkpoint Smart Defense log entry due to it's belief that Cisco is sending us a malformed query packet, and it's being dropped. I don't know why they're sending the query in the first place. Number: 2595791 Date: 1Mar2011 Time: 12:22:53 Type: Log Action: Drop Service:domain-udp (53) Source Port:domain-udp Source: rtp5-dmz-gss1.cisco.com Destination:ns Protocol: udp Information:Packet info: Packet data size: 28 Attack: Malformed Packet