Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On Mon, Jul 11, 2011 at 04:06:42PM -0400, Bill Owens wrote: On Mon, Jul 11, 2011 at 02:11:57PM -0400, Jonathan Kamens wrote: The number of DNS queries required for each address lookup requested by a client has gone up considerably because of IPV6. The problem is being exacerbated by the fact that many DNS servers on the net don't yet support IPV6 queries. The result is that address lookups are frequently taking so long that the client gives up before getting the result. I've seen the same thing, and poked around enough to see that the Wikipedia name servers are returning the wrong authority info for these and other queries (it isn't just - try TXT, SRV, etc.) Some digging through the archives finds this: https://lists.isc.org/pipermail/bind-users/2011-March/083109.html in which the first sentence says it all: The nameservers for wikipedia.org are broken. And this followup: https://lists.isc.org/pipermail/bind-users/2011-March/083113.html It's PowerDNS 2.9.22 that is breaking this, and it will be fixed by PowerDNS 3.0 once that's released, and we get around to deploying it. Looks like PowerDNS was in RC2 as of April 19, not released yet. . . Updating that - according to Bert Hubert (via Twitter): Friday the 22nd is... PowerDNS Authoritiative Server 3.0 release day! Bill. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
In message 4e1d3c05.7040...@kamens.us, Jonathan Kamens writes: You seem to have a really big chip on your shoulder about people who run = broken DNS servers. I don't like them any more than you do. But I=20 learned Be generous in what you accept and conservative in what you=20 generate way back when I started playing with the Internet well over=20 two decades ago. It holds up now as well as it did back then, and=20 there's no good reason why it shouldn't apply in this case. Perhaps I do, but it is with good justification. There is that much garbage out there that it is hard to get answers back within the 2-4 seconds a client waits for a response. There are broken servers out there. There are misconfigured servers out there. There are broken/misconfigured firewalls out there. There are broken NAT boxes out there. There are broken DNS proxies out there. There are administrator out there that don't care. What should be a clean straight forward request / response protocol no longer is. There are lots of workarounds built into recursive servers. It got to the point that its getting hard to add new workarounds without breaking old workarounds or breaking good answer processing. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
No. The fix is to correct the nameservers. They are not correctly following the DNS protocol and everything else is a fall out from that. Well, all the prodding from people here prompted me to investigate further exactly what's going on. The problem isn't what I thought it was. It appears to be a bug in glibc, and I've filed a bug report and found a workaround. There is no bug in glibc. In a nutshell, the getaddrinfo function in glibc sends both A and queries to the DNS server at the same time and then deals with the responses as they come in. Unfortunately, if the responses to the two queries come back in reverse order, /and/ the first one to come back is a server failure, both of which are the case when you try to resolve en.wikipedia.org immediately after restarting your DNS server so nothing is cached, the glibc code screws up and decides it didn't get back a successful response even though it did. There is *nothing* wrong with sending both queries at once. If you do the same lookup again, it works, because the CNAME that was sent in response to the A query is cached, so both the A and queries get back valid responses from the DNS server. And even if that weren't the case, since the CNAME is cached it gets returned first, since the server doesn't need to do a query to get it, whereas it does need to do another query to get the record (which recall isn't being cached because of the previously discussed FORMERR problem). It'll keep working until the cached records time out, at which point it'll happen again, and then be OK again until the records time out, etc. The workaround is to put options single-request in /etc/resolv.conf to prevent the glibc innards from sending out both the A and queries at the same time. FYI, here's the glibc bug I filed about this: http://sourceware.org/bugzilla/show_bug.cgi?id=12994 Thank you for telling me I was full of it and making me dig deeper into this until I located the actual cause of the issue. :-) jik Note your fix won't help clients that only ask for records because it is the authoritative servers that are broken, not the resolver library or the recursive server. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 07/13/2011 02:13 AM, Mark Andrews wrote: No. The fix is to correct the nameservers. They are not correctly following the DNS protocol and everything else is a fall out from that. You're right that everything else is fallout from that. But that doesn't do me much good, does it? It's my system that keeps getting bogus name resolution errors. It's my RSS feed reader that keeps failing on an hourly basis when the cached records for en.wikipedia.org expire. It's all very well and good to say that the Wikipedia folks and other people with this problem should fix their nameservers -- I totally agree with that -- but it doesn't help me solve my problem /now/. I'm a real user in the real world with a real problem. Yelling at Wikipedia to fix their DNS servers may feel good, but it doesn't make my DNS work. As far as I and all the other users who are being impacted /now/ by this problem are concerned, it's just pissing into the wind. Well, all the prodding from people here prompted me to investigate further exactly what's going on. The problem isn't what I thought it was. It appears to be a bug in glibc, and I've filed a bug report and found a workaround. There is no bug in glibc. To be blunt, that's bullshit. If glibc makes an A query and an query, and it gets back a valid response to the A query and an invalid response to the query, then it should ignore the invalid response to the query and return the valid A response to the user as the IP address for the host. Please note, furthermore, that as I explained in detail in my bug report and in my last message, glibc behaves differently based on the /order/ in which the two responses are returned by the DNS server. Since there's nothing that says a DNS server has to respond to two queries in the order in which they were received, and that would be an impossible requirement to impose in any case, since the queries and responses are sent via UDP which doesn' guarantee order, it's perfectly clear that glibc needs to be prepared to function the same regardless of the order in which it receives the responses. What's more, there's plenty of code in the glibc files I spent hours poring over which is clearly an attempt to do exactly that. The people who wrote the code just got it wrong. Which isn't surprising, given how god-awful the code is. This is not an either/or situation. The broken nameservers should be fixed, /and/ glibc should be fixed to properly handle the case of when it sends two queries and gets back one valid response and one server error in reverse order. In a nutshell, the getaddrinfo function in glibc sends both A and queries to the DNS server at the same time and then deals with the responses as they come in. Unfortunately, if the responses to the two queries come back in reverse order, /and/ the first one to come back is a server failure, both of which are the case when you try to resolve en.wikipedia.org immediately after restarting your DNS server so nothing is cached, the glibc code screws up and decides it didn't get back a successful response even though it did. There is *nothing* wrong with sending both queries at once. I didn't say there was. You really don't seem to be paying very good attention. Do you understand what the word /workaround/ means? Note your fix won't help clients that only ask for records because it is the authoritative servers that are broken, not the resolver library or the recursive server. I am aware of that. It is irrelevant, because it is not the problem I am trying to solve. I, and 99.99% of the users in the world, are /not/ only ask[ing] for records. Nobody actually trying to use the internet for day-to-day work is doing that right now, because to say that IPv6 support is not yet ubiquitous would be a laughably momentous understatement. You seem to have a really big chip on your shoulder about people who run broken DNS servers. I don't like them any more than you do. But I learned Be generous in what you accept and conservative in what you generate way back when I started playing with the Internet well over two decades ago. It holds up now as well as it did back then, and there's no good reason why it shouldn't apply in this case. It's clear that this is a religious issue for you. I'm not here to debate religion, I'm here to get help making my DNS work, and to help other people, to whatever extent I can, make /their/ DNS work. If you continue to send religious screeds on this topic while making no effort to actually read and understand what I write, please do not expect me to respond further. Jonathan Kamens smime.p7s Description: S/MIME Cryptographic Signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 7/13/2011 2:35 AM, Jonathan Kamens wrote: On 07/13/2011 02:13 AM, Mark Andrews wrote: Well, all the prodding from people here prompted me to investigate further exactly what's going on. The problem isn't what I thought it was. It appears to be a bug in glibc, and I've filed a bug report and found a workaround. There is no bug in glibc. To be blunt, that's bullshit. If glibc makes an A query and an query, and it gets back a valid response to the A query and an invalid response to the query, then it should ignore the invalid response to the query and return the valid A response to the user as the IP address for the host. Please note, furthermore, that as I explained in detail in my bug report and in my last message, glibc behaves differently based on the /order/ in which the two responses are returned by the DNS server. Since there's nothing that says a DNS server has to respond to two queries in the order in which they were received, and that would be an impossible requirement to impose in any case, since the queries and responses are sent via UDP which doesn' guarantee order, it's perfectly clear that glibc needs to be prepared to function the same regardless of the order in which it receives the responses. I agree that the order of the A/ responses shouldn't matter to the result. The whole getaddrinfo() call should fail regardless of whether the failure is seen first or the valid response is seen first. Why? Because getaddrinfo() should, if it isn't already, be using the RFC 3484 algorithm (and/or whatever the successor to RFC 3484 ends up being) to sort the addresses, and for that algorithm to work, one needs *both* the IPv4 address(es) *and* the IPv6 address(es) available, in order to compare their scopes, prefixes, etc.. If one of the lookups fails, and this failure is presented to the RFC 3484 algorithm as NODATA for a particular address family, then the algorithm could make a bad selection of the destination address, and this can lead to other sorts of breakage, e.g. trying to use a tunneled connection where no tunnel exists. The *safe* thing for glibc to do is to promote the failure of either the A lookup or the lookup to a general lookup failure, which prompts the user/administrator to find the source of the problem and fix it. It's rarely a good idea to mask undeniable errors as if there were no error at all. It leads to unpredictable behavior and really tough troubleshooting challenges. I think glibc is erring on the side of openness and transparency here, rather than trying to cover up the fact that something is horribly wrong. Note your fix won't help clients that only ask for records because it is the authoritative servers that are broken, not the resolver library or the recursive server. I am aware of that. It is irrelevant, because it is not the problem I am trying to solve. I, and 99.99% of the users in the world, are /not/ only ask[ing] for records. Nobody actually trying to use the internet for day-to-day work is doing that right now, because to say that IPv6 support is not yet ubiquitous would be a laughably momentous understatement. What about clients in a NAT64/DNS64 environment? They could be configured as IPv6-only but normally able to access the IPv4 Internet just fine. Even with your glibc fix in place, though, they'll presumably break if the authoritative nameservers are giving garbage responses to queries (could someone with practical experience in DNS64 please confirm this?). Another possibility you're not considering is that the invoking application itself may make independent IPv4-specific and IPv6-specific getaddrinfo() lookups. Why would it do this? Why not? Maybe IPv6 capability is something the user has to buy a separate license for, so the IPv6 part is a slightly separate codepath, added in a later version, than the base product, which is IPv4-only. When one of the getaddrinfo() calls returns address records and the other returns garbage, your fix doesn't prevent such an application from doing something unpredictable, possibly catastrophic. So it's really not a general solution to the problem. - Kevin ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 7/13/2011 1:06 PM, Kevin Darcy wrote: On 7/13/2011 2:35 AM, Jonathan Kamens wrote: On 07/13/2011 02:13 AM, Mark Andrews wrote: Well, all the prodding from people here prompted me to investigate further exactly what's going on. The problem isn't what I thought it was. It appears to be a bug in glibc, and I've filed a bug report and found a workaround. There is no bug in glibc. To be blunt, that's bullshit. If glibc makes an A query and an query, and it gets back a valid response to the A query and an invalid response to the query, then it should ignore the invalid response to the query and return the valid A response to the user as the IP address for the host. Please note, furthermore, that as I explained in detail in my bug report and in my last message, glibc behaves differently based on the /order/ in which the two responses are returned by the DNS server. Since there's nothing that says a DNS server has to respond to two queries in the order in which they were received, and that would be an impossible requirement to impose in any case, since the queries and responses are sent via UDP which doesn' guarantee order, it's perfectly clear that glibc needs to be prepared to function the same regardless of the order in which it receives the responses. I agree that the order of the A/ responses shouldn't matter to the result. The whole getaddrinfo() call should fail regardless of whether the failure is seen first or the valid response is seen first. Why? Because getaddrinfo() should, if it isn't already, be using the RFC 3484 algorithm (and/or whatever the successor to RFC 3484 ends up being) to sort the addresses, and for that algorithm to work, one needs *both* the IPv4 address(es) *and* the IPv6 address(es) available, in order to compare their scopes, prefixes, etc.. If one of the lookups fails, and this failure is presented to the RFC 3484 algorithm as NODATA for a particular address family, then the algorithm could make a bad selection of the destination address, and this can lead to other sorts of breakage, e.g. trying to use a tunneled connection where no tunnel exists. The *safe* thing for glibc to do is to promote the failure of either the A lookup or the lookup to a general lookup failure, which prompts the user/administrator to find the source of the problem and fix it. It's rarely a good idea to mask undeniable errors as if there were no error at all. It leads to unpredictable behavior and really tough troubleshooting challenges. I think glibc is erring on the side of openness and transparency here, rather than trying to cover up the fact that something is horribly wrong. Note your fix won't help clients that only ask for records because it is the authoritative servers that are broken, not the resolver library or the recursive server. I am aware of that. It is irrelevant, because it is not the problem I am trying to solve. I, and 99.99% of the users in the world, are /not/ only ask[ing] for records. Nobody actually trying to use the internet for day-to-day work is doing that right now, because to say that IPv6 support is not yet ubiquitous would be a laughably momentous understatement. What about clients in a NAT64/DNS64 environment? They could be configured as IPv6-only but normally able to access the IPv4 Internet just fine. Even with your glibc fix in place, though, they'll presumably break if the authoritative nameservers are giving garbage responses to queries (could someone with practical experience in DNS64 please confirm this?). Another possibility you're not considering is that the invoking application itself may make independent IPv4-specific and IPv6-specific getaddrinfo() lookups. Why would it do this? Why not? Maybe IPv6 capability is something the user has to buy a separate license for, so the IPv6 part is a slightly separate codepath, added in a later version, than the base product, which is IPv4-only. When one of the getaddrinfo() calls returns address records and the other returns garbage, your fix doesn't prevent such an application from doing something unpredictable, possibly catastrophic. So it's really not a general solution to the problem. Oh, I should also point out that this brokenness by the wikipedia/wikimedia nameservers *isn't* just specific to queries, and therefore *isn't* fixable with getaddrinfo() alone. Try doing an MX query of en.wikipedia.org. Or a PTR query. Or any of the other old (yet non-deprecated) query types (e.g. NS, TXT, HINFO). The only QTYPEs that are answered correctly are A, CNAME and (oddly enough) SOA. So they don't even have the excuse of well, queries are kinda new, we haven't got around to handling them properly yet. This behavior has failed to conform to the standard, for as long as the standard has existed; it's not recent, IPv6-specific breakage.
RE: Clients get DNS timeouts because ipv6 means more queries for each lookup
I agree that the order of the A/ responses shouldn't matter to the result. The whole getaddrinfo() call should fail regardless of whether the failure is seen first or the valid response is seen first. Why? Because getaddrinfo() should, if it isn't already, be using the RFC 3484 algorithm (and/or whatever the successor to RFC 3484 ends up being) to sort the addresses, and for that algorithm to work, one needs *both* the IPv4 address(es) *and* the IPv6 address(es) available, in order to compare their scopes, prefixes, etc.. RFC 3484 tells you how to sort addresses you've got. If you've only got one address, then bang! It's already sorted for you. You don't need RFC 3484 to tell you how to sort it. I have to say that some of the people on this list seem completely detached from what real users in the real world want their computers to do. If I am trying to connect to a site on the internet, then I want my computer to do its best to try to connect to the site. I don't want it to throw up its hands and say, Oh, I'm sorry, one of my address lookups failed, so I'm not going to let you use the other address lookup, the one that succeeded, because some RFC somewhere could be interpreted as implying that's a bad idea, if I wanted to do so. Please, that's ridiculous. If one of the lookups fails, and this failure is presented to the RFC 3484 algorithm as NODATA for a particular address family, then the algorithm could make a bad selection of the destination address, and this can lead to other sorts of breakage, e.g. trying to use a tunneled connection where no tunnel exists. If the address the client gets doesn't work, then the address doesn't work. How is being unable to connect because the address turned out to not be routable different from being unable to connect because the computer refused to even try? Another possibility you're not considering is that the invoking application itself may make independent IPv4-specific and IPv6-specific getaddrinfo() lookups. Why would it do this? Why not? Maybe IPv6 capability is something the user has to buy a separate license for, so the IPv6 part is a slightly separate codepath, added in a later version, than the base product, which is IPv4-only. When one of the getaddrinfo() calls returns address records and the other returns garbage, your fix doesn't prevent such an application from doing something unpredictable, possibly catastrophic. So it's really not a general solution to the problem. I have no idea what you're talking about. If the application makes independent IPv4 and IPv6 getaddrinfo() lookups, then the change I'm proposing to glibc is completely irrelevant and does not impact the existing functionality in any way. The IPv4 lookup will succeed, the IPv6 lookup will fail, and the application is then free to decide what to do. In summary, getattrinfo() with AF_UNSPEC has a very clear meaning - Give me whatever addresses you can. The man page says, and I am quoting, The value AF_UNSPEC undicates that getaddrinfo() should return socket addresses for any address family (either IPv4 or IPv6, for example) that can be used with node and service. I don't see how the language could be any more clear. To suggest that it's reasonable and correct for it to refuse to return a successfully fetched address is simply ludicrous. I hope and pray that people who maintain the glibc code have more common sense about what users want and expect from their software. In the meantime, it's clear that I don't belong on this mailing list, so I'm out of here. Jonathan Kamens ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 7/13/2011 2:39 PM, Jonathan Kamens wrote: I agree that the order of the A/ responses shouldn't matter to the result. The whole getaddrinfo() call should fail regardless of whether the failure is seen first or the valid response is seen first. Why? Because getaddrinfo() should, if it isn't already, be using the RFC 3484 algorithm (and/or whatever the successor to RFC 3484 ends up being) to sort the addresses, and for that algorithm to work, one needs *both* the IPv4 address(es) *and* the IPv6 address(es) available, in order to compare their scopes, prefixes, etc.. RFC 3484 tells you how to sort addresses you've got. If you've only got one address, then bang! It's already sorted for you. You don't need RFC 3484 to tell you how to sort it. No, you've got one address, and one unspecified nameserver failure. Garbage in, garbage out. To say that a nameserver failure is equivalent to NODATA is not only technically incorrect, it leads to all sorts of operational problems in the real world. I have to say that some of the people on this list seem completely detached from what real users in the real world want their computers to do. Really? Do you think I'm an academic? Do you think I sit and write Internet Drafts and RFCs all day? No, I'm an implementor. I deal with DNS operational problems and issues all day, every workday. And I can tell you that I don't appreciate library routines making wild-ass assumptions that, in the face of some questionable behavior by a nameserver, maybe, possibly some quantity of addresses that I've acquired from that dodgy nameserver are good enough for my clients to try and connect to. No thanks. If there's a real problem I want to know about it as clearly and unambiguously as possible. I can't deal effectively with a problem if it's being masked by some library routine doing something weird behind my back. If I am trying to connect to a site on the internet, then I want my computer to do its best to try to connect to the site. I don't want it to throw up its hands and say, Oh, I'm sorry, one of my address lookups failed, so I'm not going to let you use the /other/ address lookup, the one that succeeded, because some RFC somewhere could be interpreted as implying that's a bad idea, if I wanted to do so. Please, that's ridiculous. No, what's more ridiculous is if users can't get to a site SOME OF THE TIME, because someone's DNS is broken, a moronic library routine then routes the traffic some unexpected way, and a whole raft of other variables enter the picture, without anyone realizing or paying attention to the dependencies and interconnectivity that is required to keep the client working. There is a certain threshold of brokenness where the infrastructure has to throw up its hands, as you put it, and say nuh uh, not gonna happen, because to try to work around the problem based on not enough information about the topology, the environment, the dependencies, etc. you're likely to cause more harm than good by making the failure modes way more complex than necessary. If one of the lookups fails, and this failure is presented to the RFC 3484 algorithm as NODATA for a particular address family, then the algorithm could make a bad selection of the destination address, and this can lead to other sorts of breakage, e.g. trying to use a tunneled connection where no tunnel exists. If the address the client gets doesn't work, then the address doesn't work. How is being unable to connect because the address turned out to not be routable different from being unable to connect because the computer refused to even try? Because the failure modes are substantially different and it could take significant man-hours to determine that the root cause of the problem is actually DNS brokenness rather than something else in the network infrastructure (routers, switches, VPN concentrators, firewalls, IPSes, load-balancers, etc.) or in the client or server (OS, application, middleware, etc.) Have you ever actually troubleshot a difficult connectivity problem in a complex networking environment? Trust me, you want clear symptoms, clear failure modes. Not a bunch of components making dumb assumptions and/or trying to be helpful outside of their defined scope of functionality. That kind of help is like offering a glass of water to a drowning man. Another possibility you're not considering is that the invoking application itself may make independent IPv4-specific and IPv6-specific getaddrinfo() lookups. Why would it do this? Why not? Maybe IPv6 capability is something the user has to buy a separate license for, so the IPv6 part is a slightly separate codepath, added in a later version, than the base product, which is IPv4-only. When one of the getaddrinfo() calls returns address records and the other returns garbage, your fix doesn't prevent such an application from doing something unpredictable, possibly catastrophic.
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
Well, all the prodding from people here prompted me to investigate further exactly what's going on. The problem isn't what I thought it was. It appears to be a bug in glibc, and I've filed a bug report and found a workaround. In a nutshell, the getaddrinfo function in glibc sends both A and queries to the DNS server at the same time and then deals with the responses as they come in. Unfortunately, if the responses to the two queries come back in reverse order, /and/ the first one to come back is a server failure, both of which are the case when you try to resolve en.wikipedia.org immediately after restarting your DNS server so nothing is cached, the glibc code screws up and decides it didn't get back a successful response even though it did. If you do the same lookup again, it works, because the CNAME that was sent in response to the A query is cached, so both the A and queries get back valid responses from the DNS server. And even if that weren't the case, since the CNAME is cached it gets returned first, since the server doesn't need to do a query to get it, whereas it does need to do another query to get the record (which recall isn't being cached because of the previously discussed FORMERR problem). It'll keep working until the cached records time out, at which point it'll happen again, and then be OK again until the records time out, etc. The workaround is to put options single-request in /etc/resolv.conf to prevent the glibc innards from sending out both the A and queries at the same time. FYI, here's the glibc bug I filed about this: http://sourceware.org/bugzilla/show_bug.cgi?id=12994 Thank you for telling me I was full of it and making me dig deeper into this until I located the actual cause of the issue. :-) jik ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
Jonathan Kamens j...@kamens.us wrote: I said above that the problem is exacerbated by the fact that many DNS servers don't yet support IPV6 queries. This is because the queries don't get NXDOMAIN responses, which would be cached, but rather FORMERR responses, which are not cached. As a result, the scenario describes above happens much more frequently because the DNS server has to redo the queries often. Your upstream resolver is broken if it returns FORMERR responses to queries. The behaviour you describe is not normal. Have a look at bind's filter--on-v4 and deny-answer-addresses options which should allow you prevent applications from trying to use IPv6. The latter might also quell queries for IPv6 addresses of name servers (though I haven't verified that). Also perhaps it'll help to declare all IPv6 name servers bogus -- server ::/0 { bogus yes; }; Tony. -- f.anthony.n.finch d...@dotat.at http://dotat.at/ North Bailey: Variable becoming southeasterly 3 or 4. Slight or moderate. Fair. Good. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
Jonathan Kamens wrote: I said above that the problem is exacerbated by the fact that many DNS servers don't yet support IPV6 queries. This is because the queries don't get NXDOMAIN responses, which would be cached, but rather FORMERR responses, which are not cached. As a result, the scenario describes above happens much more frequently because the DNS server has to redo the queries often. I think the main issue here is - why is your nameserver thinking it has IPv6 connectivity? If you don't have a working IPv6 connectivity, do one / both of these: 1) Disable or at least configure IPv6 properly on your server 2) Tell BIND to not use IPv6 transport, typically by starting named with the command line option -4. How to do that depends on your operating system / distribution / packaging system etc. Regards Eivind Olsen ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 7/11/2011 3:10 PM, Tony Finch wrote: Jonathan Kamensj...@kamens.us wrote: I said above that the problem is exacerbated by the fact that many DNS servers don't yet support IPV6 queries. This is because the queries don't get NXDOMAIN responses, which would be cached, but rather FORMERR responses, which are not cached. As a result, the scenario describes above happens much more frequently because the DNS server has to redo the queries often. Your upstream resolver is broken if it returns FORMERR responses to queries. The behaviour you describe is not normal. There are people reporting all over the net that they're getting tons of messages like this in their logs with recent BIND versions: Jul 11 12:00:06 jik2 named[31354]: error (FORMERR) resolving 'en.wikipedia.org//IN': 208.80.152.130#53 I've got 397 of them in my logs for just the last 24 hours. I'm aware that this means the upstream DNS server is broken; isn't what what I said, i.e., that it isn't responding properly to queries? The problem is that I have no control over the upstream resolver. All I have control over is my own name server. I am not the only one who is going to encounter this problem. I've found several reports of it on the net with a minimal amount of searching. I think something more general has to be done than giving me advice about what to change in my named.conf. I appreciate the advice for how to fix the problem for myself, but I think it needs to be fixed for everyone. Have a look at bind's filter--on-v4 and deny-answer-addresses options which should allow you prevent applications from trying to use IPv6. Neither of these options are documented in named.conf(5) or resolv.conf(5). Is this a problem that is specific to the Fedora 15 versions of these man pages, or is the documentation distributed with BIND out-of-date? I tried to use the option and I get is not configured in my log when named starts up and then parsing failed, so I think my BIND must not be compiled with --enable-filter-, right? That makes it difficult to use this solution. Perhaps that's also why it isn't listed in the man page? jik smime.p7s Description: S/MIME Cryptographic Signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 7/11/2011 3:26 PM, Eivind Olsen wrote: I think the main issue here is - why is your nameserver thinking it has IPv6 connectivity? No, this isn't the issue. I see the FORMERR errors in syslog and the timeouts resolving host names even when I start named with -4. Named is querying for records even when it is started with -4, and it is the querying, not the connectivity, that is the issue. jik smime.p7s Description: S/MIME Cryptographic Signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On Mon, Jul 11, 2011 at 02:11:57PM -0400, Jonathan Kamens wrote: The number of DNS queries required for each address lookup requested by a client has gone up considerably because of IPV6. The problem is being exacerbated by the fact that many DNS servers on the net don't yet support IPV6 queries. The result is that address lookups are frequently taking so long that the client gives up before getting the result. I've seen the same thing, and poked around enough to see that the Wikipedia name servers are returning the wrong authority info for these and other queries (it isn't just - try TXT, SRV, etc.) Some digging through the archives finds this: https://lists.isc.org/pipermail/bind-users/2011-March/083109.html in which the first sentence says it all: The nameservers for wikipedia.org are broken. And this followup: https://lists.isc.org/pipermail/bind-users/2011-March/083113.html It's PowerDNS 2.9.22 that is breaking this, and it will be fixed by PowerDNS 3.0 once that's released, and we get around to deploying it. Looks like PowerDNS was in RC2 as of April 19, not released yet. . . Bill. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 07/11/2011 07:11 PM, Jonathan Kamens wrote: The number of DNS queries required for each address lookup requested by a client has gone up considerably because of IPV6. The problem is being exacerbated by the fact that many DNS servers on the net don't yet support IPV6 queries. The result is that address lookups are frequently taking so long that the client gives up before getting the result. Can you be more specific here? Do you mean many DNS servers don't support queries with qtype= or many DNS servers don't support queries over IPv6/UDP or IPv6/TCP? This is fine when the wikipedia.org nameservers are working, but let's postulate for the moment that two of them are down, unreachable, or responding slowly, which apparently happens pretty often. Then we end up doing: wikipedia.org DNS en.wikipedia.org /times out /en.wikipedia.org /times out /en.wikipedia.org en.wikipedia.org A /times out/ en.wikipedia.org A /times out /en.wikipedia.org A I don't quite see how you're getting this behaviour. Every operating system that I know of recommends getaddrinfo or some similar variant for doing multiprotocol IPv4/IPv6 lookups, and as far as I'm aware, they all do something very similar - namely, send the A and lookups in parallel. When I try this against a bind server, I see this makes bind perform the A/ lookups in parallel too. So, at worst you should have something like: 0.0001 A query 0.0002 query ... 1. A query timeout 1.0001 query timeout ...repeated X+1 times for X non-responding NS records. That is, the lookups should happen in parallel, so the time taken should not double. If your app is doing its own DNS requests and it's doing them in series, then it's broken, for exactly this reason, and should use the system resolver. By now the end of that sequence, the typical 30-second DNS request timeout has been exceeded, and the client gives up. I said above that the problem is exacerbated by the fact that many DNS servers don't yet support IPV6 queries. This is because the queries don't get NXDOMAIN responses, which would be cached, but rather FORMERR Not in my observations. As Tony has said, you seem to have a broken upstream resolver. I'm interested to hear if other people are encountering this problem and No, we are not seeing this problem, and we have thousands of IPv6-enabled clients making A/ DNS requests constantly. It just works (tm). This is not to say -ve caching of FORMERR is a bad idea; it may well be a good idea. But I think there is more going on here than simply a failure of -ve caching. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 7/11/2011 4:06 PM, Bill Owens wrote: https://lists.isc.org/pipermail/bind-users/2011-March/083109.html in which the first sentence says it all: The nameservers for wikipedia.org are broken. It's not just wikipedia.org that's broken, obviously. I see this error in my logs for 19 domains since July 3: Even if PowerDNS is the only source of this issue, and even if the new version of PowerDNS is released tomorrow, I'm sure there will still be sites running the old version a year from now. So just relying on a PowerDNS release to fix this problem seems unwise. Users are experiencing this problem /now/ in the field, and more users will be experiencing it as BIND is upgraded in more and more places. Every single user relying on a Fedora 15 DNS server, for example, is going to see occasional unnecessary DNS timeouts when trying to resolve host names. It seems clear to me that a generally available, generally applicable fix to BIND is needed to avoid this issue and perhaps similar issues like it. jik smime.p7s Description: S/MIME Cryptographic Signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 7/11/2011 2:11 PM, Jonathan Kamens wrote: The number of DNS queries required for each address lookup requested by a client has gone up considerably because of IPV6. The problem is being exacerbated by the fact that many DNS servers on the net don't yet support IPV6 queries. The result is that address lookups are frequently taking so long that the client gives up before getting the result. The example I am seeing this with most frequently is my RSS feed reader, rss2email, trying to read a feed from en.wikipedia.org in a cron job that runs every 15 minutes. I am regularly seeing this in the output of the cron job: W: Name or service not known [8] http://en.wikipedia.org/w/index.php?title=/[elided]/feed=atomaction=history The wikipedia.org domain has three DNS servers. Let's assume that the root and org. nameservers are cached already when rss2email does its query. If so, then it has to do the following queries: wikipedia.org DNS en.wikipedia.org en.wikipedia.org A This is fine when the wikipedia.org nameservers are working, but let's postulate for the moment that two of them are down, unreachable, or responding slowly, which apparently happens pretty often. Then we end up doing: wikipedia.org DNS en.wikipedia.org /times out /en.wikipedia.org /times out /en.wikipedia.org en.wikipedia.org A /times out/ en.wikipedia.org A /times out /en.wikipedia.org A By now the end of that sequence, the typical 30-second DNS request timeout has been exceeded, and the client gives up. The math isn't working. I just ran a quick test and named (9.7.x) failed over from a non-working delegated NS to a working delegated NS in less than 30 milliseconds. How are you reaching a 30-*second* timeout threshold in only 6 queries? In practice, it would also be quite unlikely that named would pick dead nameservers before live ones for *both* the and the A query. At the very least, once the timeouts were encountered for the query, those NSes would be penalized in terms of NS selection, so they are unlikely to be chosen *again*, ahead of the working NS, for the A query. Any en.wikipedia.org NSes which were found to be *persistently* broken, would gravitate to the bottom of the selection list, and be tried approximately never. I think maybe you need to probe deeper and find out what _else_ is going on. - Kevin ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
I'm unclear how BIND could be modified to fix this. The querying client machines are asking BIND for records. BIND goes out to the authoritative nameservers to attempt to resolve said records. The broken nameservers (PowerDNS 3.0 etc) timeout or otherwise hand out bad responses (FORMERR, NXDOMAIN). What would BIND do differently to avoid this? Even if BIND was modified, why would the responsibility fall on all BIND administrators to implement this hack as opposed to the onus being on the owners of the broken nameservers to upgrade their broken authoritative servers? -Tim On Mon, Jul 11, 2011 at 1:25 PM, Jonathan Kamens j...@kamens.us wrote: On 7/11/2011 4:06 PM, Bill Owens wrote: https://lists.isc.org/pipermail/bind-users/2011-March/083109.html in which the first sentence says it all: The nameservers for wikipedia.org are broken. It's not just wikipedia.org that's broken, obviously. I see this error in my logs for 19 domains since July 3: Even if PowerDNS is the only source of this issue, and even if the new version of PowerDNS is released tomorrow, I'm sure there will still be sites running the old version a year from now. So just relying on a PowerDNS release to fix this problem seems unwise. Users are experiencing this problem now in the field, and more users will be experiencing it as BIND is upgraded in more and more places. Every single user relying on a Fedora 15 DNS server, for example, is going to see occasional unnecessary DNS timeouts when trying to resolve host names. It seems clear to me that a generally available, generally applicable fix to BIND is needed to avoid this issue and perhaps similar issues like it. jik ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On Jul 11, 2011, at 1:25 PM, Jonathan Kamens wrote: Even if PowerDNS is the only source of this issue, and even if the new version of PowerDNS is released tomorrow, I'm sure there will still be sites running the old version a year from now. So just relying on a PowerDNS release to fix this problem seems unwise. OK, but this same reasoning applies to making a change to BIND: even if we had such a change available tomorrow, there will be sites running older versions of BIND a year from now, also. :-) Users are experiencing this problem now in the field, and more users will be experiencing it as BIND is upgraded in more and more places. Every single user relying on a Fedora 15 DNS server, for example, is going to see occasional unnecessary DNS timeouts when trying to resolve host names. It seems clear to me that a generally available, generally applicable fix to BIND is needed to avoid this issue and perhaps similar issues like it. What you probably want is a change to your local implementation of getaddrinfo() for your libc / glibc so that it prefers to issue T_A queries before it issues T_ queries, and will only issue T_ queries if IPv6 networking is compiled into the system. In my experience, not only does this significantly help resolver performance in the face of nameservers which break when facing IPv6 queries, it is a solution which many people ignore. http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=42405 Regards, -- -Chuck ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On Mon, Jul 11, 2011 at 04:25:59PM -0400, Jonathan Kamens wrote: On 7/11/2011 4:06 PM, Bill Owens wrote: https://lists.isc.org/pipermail/bind-users/2011-March/083109.html in which the first sentence says it all: The nameservers for wikipedia.org are broken. It's not just wikipedia.org that's broken, obviously. I see this error in my logs for 19 domains since July 3: I have FORMERR entries in my logs for 79 names since June 19, a total of 5185 error messages. 2247 of those are for wikipedia-related names. Spot-checking shows that the others appear to be unrelated issues; mostly bizarre-looking misconfigurations. Even if PowerDNS is the only source of this issue, and even if the new version of PowerDNS is released tomorrow, I'm sure there will still be sites running the old version a year from now. So just relying on a PowerDNS release to fix this problem seems unwise. A fix to the PowerDNS problem won't remove all the FORMERR messages, but a fixed version running the wikipedia-related domains would repair your original problem, and that seems like a reasonable thing to expect. More reasonable than asking BIND to ignore incorrect responses, IMO. . . Bill. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
In message 4e1b562b.2070...@kamens.us, Jonathan Kamens writes: On 7/11/2011 3:26 PM, Eivind Olsen wrote: I think the main issue here is - why is your nameserver thinking it has= IPv6 connectivity? No, this isn't the issue. I see the FORMERR errors in syslog and the timeouts resolving host names = even when I start named with -4. -4 and -6 affect what transport is used. They have no impact on data. Named is querying for records even when it is started with -4, and=20 it is the querying, not the connectivity, that is the issue. jik --050505010300040209020902 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable html head meta content=3Dtext/html; charset=3DISO-8859-1 http-equiv=3DContent-Type /head body text=3D#00 bgcolor=3D#FF On 7/11/2011 3:26 PM, Eivind Olsen wrote: blockquote cite=3Dmid:43d661f35b8a94b70c2e3eddf6a29027.squirrel@webmail.amino= r.no type=3Dcite pre wrap=3D I think the main issue here is - why is your nameserver thinking it has IPv6 connectivity?/pre /blockquote No, this isn't the issue.br br I see the FORMERR errors in syslog and the timeouts resolving host names even when I start named with -4.br br Named is querying for records even when it is started with -4, and it is the querying, not the connectivity, that is the issue.br br nbsp; jikbr br /body /html --050505010300040209020902-- --ms05070505090404070809 Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=smime.p7s Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIKtTCC BIowggNyoAMCAQICECf06hH0eobEbp27bqkXBwcwDQYJKoZIhvcNAQEFBQAwbzELMAkGA1UE BhMCU0UxFDASBgNVBAoTC0FkZFRydXN0IEFCMSYwJAYDVQQLEx1BZGRUcnVzdCBFeHRlcm5h bCBUVFAgTmV0d29yazEiMCAGA1UEAxMZQWRkVHJ1c3QgRXh0ZXJuYWwgQ0EgUm9vdDAeFw0w NTA2MDcwODA5MTBaFw0yMDA1MzAxMDQ4MzhaMIGuMQswCQYDVQQGEwJVUzELMAkGA1UECBMC VVQxFzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNUIE5l dHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVRO LVVTRVJGaXJzdC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMIIBIjANBgkqhkiG 9w0BAQEFAAOCAQ8AMIIBCgKCAQEAsjmFpPJ9q0E7YkY3rs3BYHW8OWX5ShpHornMSMxqmNVN NRm5pELlzkniii8efNIxB8dOtINknS4p1aJkxIW9hVE1eaROaJB7HHqkkqgX8pgV8pPMyaQy lbsMTzC9mKALi+VuG6JG+ni8om+rWV6lL8/K2m2qL+usobNqqrcuZzWLeeEeaYji5kbNoKXq vgvOdjp6Dpvq/NonWz1zHyLmSGHGTPNpsaguG7bUMSAsvIKKjqQOpdeJQ/wWWq8dcdcRWdq6 hw2v+vPhwvCkxWeM1tZUOt4KpLoDd7NlyP0e03RiqhjKaJMeoYV+9Udly/hNVyh00jT/MLbu 9mIwFIws6wIDAQABo4HhMIHeMB8GA1UdIwQYMBaAFK29mHo0tCb3+sQmVO8DveAky1QaMB0G A1UdDgQWBBSJgmd9xJ0mcABLtFBIfN49rgRufTAOBgNVHQ8BAf8EBAMCAQYwDwYDVR0TAQH/ BAUwAwEB/zB7BgNVHR8EdDByMDigNqA0hjJodHRwOi8vY3JsLmNvbW9kb2NhLmNvbS9BZGRU cnVzdEV4dGVybmFsQ0FSb290LmNybDA2oDSgMoYwaHR0cDovL2NybC5jb21vZG8ubmV0L0Fk ZFRydXN0RXh0ZXJuYWxDQVJvb3QuY3JsMA0GCSqGSIb3DQEBBQUAA4IBAQAZ2IkRbyispgCi 54fBm5AD236hEv0e8+LwAamUVEJrmgnEoG3XkJIEA2Z5Q3H8+G+v23ZF4jcaPd3kWQR4rBz0 g0bzes9bhHIt5UbBuhgRKfPLSXmHPLptBZ2kbWhPrXIUNqi5sf2/z3/wpGqUNVCPz4FtVbHd WTBK322gnGQfSXzvNrv042n0+DmPWq1LhTq3Du3Tzw1EovsEv+QvcI4l+1pUBrPQxLxtjftz Mizpm4QkLdZ/kXpoAlAfDj9N6cz1u2fo3BwuO/xOzf4CjuOoEwqlJkRl6RDyTVKnrtw+ymsy XEFs/vVdoOr/0fqbhlhtPZZH5f4ulQTCAMyOofK7MIIGIzCCBQugAwIBAgIQe1dBw9n5Eb2l Om+3Ltt3aTANBgkqhkiG9w0BAQUFADCBrjELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAlVUMRcw FQYDVQQHEw5TYWx0IExha2UgQ2l0eTEeMBwGA1UEChMVVGhlIFVTRVJUUlVTVCBOZXR3b3Jr MSEwHwYDVQQLExhodHRwOi8vd3d3LnVzZXJ0cnVzdC5jb20xNjA0BgNVBAMTLVVUTi1VU0VS Rmlyc3QtQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBFbWFpbDAeFw0xMDA3MjgwMDAwMDBa Fw0xMTA3MjgyMzU5NTlaMIHYMTUwMwYDVQQLEyxDb21vZG8gVHJ1c3QgTmV0d29yayAtIFBF UlNPTkEgTk9UIFZBTElEQVRFRDFGMEQGA1UECxM9VGVybXMgYW5kIENvbmRpdGlvbnMgb2Yg dXNlOiBodHRwOi8vd3d3LmNvbW9kby5uZXQvcmVwb3NpdG9yeTEfMB0GA1UECxMWKGMpMjAw MyBDb21vZG8gTGltaXRlZDEYMBYGA1UEAxMPSm9uYXRoYW4gS2FtZW5zMRwwGgYJKoZIhvcN AQkBFg1qaWtAa2FtZW5zLnVzMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0H8C qtK52hCbrMnjyIfIJjIbfEO4FK4/NJWaXXh0ZBKUpqsaoK51v39KcNdXqmlLue+Rjck1dzRv x4nScjAqIsO86gC8lrfbA4Mq9TjBDdzU8oTVkNbZQzbKbmJvtLts3sHwkVQAc4BJLn3D2TtY LhuyBrmRJU8gircgdTLMa9htydGNbelt3I1rXPLcpQr/RQhyzii6CgxIurpfV4fCLx1pibCJ /8NnLUsluIUDfaId8uBSPEBENbn2HpS9Z/z52C6rxMfLVWIyz2mWhxF9TLw//35uyKGkAQ4k gUWGSNUkQZrCkH8is8FX9Pu7j5BbUhGKZrtPngn7PZei9nIvgwIDAQABo4ICDzCCAgswHwYD VR0jBBgwFoAUiYJnfcSdJnAAS7RQSHzePa4Ebn0wHQYDVR0OBBYEFLLxhINiMjAmBabU27sd 6Hb1y3ITMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMCAGA1UdJQQZMBcGCCsGAQUF BwMEBgsrBgEEAbIxAQMFAjARBglghkgBhvhCAQEEBAMCBSAwRgYDVR0gBD8wPTA7BgwrBgEE AbIxAQIBAQEwKzApBggrBgEFBQcCARYdaHR0cHM6Ly9zZWN1cmUuY29tb2RvLm5ldC9DUFMw gaUGA1UdHwSBnTCBmjBMoEqgSIZGaHR0cDovL2NybC5jb21vZG9jYS5jb20vVVROLVVTRVJG aXJzdC1DbGllbnRBdXRoZW50aWNhdGlvbmFuZEVtYWlsLmNybDBKoEigRoZEaHR0cDovL2Ny
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
In message 4e1b5c57.8090...@kamens.us, Jonathan Kamens writes: On 7/11/2011 4:06 PM, Bill Owens wrote: https://lists.isc.org/pipermail/bind-users/2011-March/083109.html in which the first sentence says it all: The nameservers for wikiped= ia.org are broken. It's not just wikipedia.org that's broken, obviously. I see this error in my logs for 19 domains since July 3: Well you havn't been looking at your logs or you upgraded to a version which logs the condition. Even if PowerDNS is the only source of this issue, and even if the new version of PowerDNS is released tomorrow, I'm sure there will still be sites running the old version a year from now. So just relying on a PowerDNS release to fix this problem seems unwise. Sure, but it is a minor issue overall. FORMERR is a lot better that what used to happen. Nameservers used to drop queries so you got timeouts when all the nameseservers were working instead of when some are working. Users are experiencing this problem /now/ in the field, and more users will be experiencing it as BIND is upgraded in more and more places. Every single user relying on a Fedora 15 DNS server, for example, is going to see occasional unnecessary DNS timeouts when trying to resolve host names. Well complain to the owners of those zones. You have logs that tell you which nameservers are broken. It seems clear to me that a generally available, generally applicable fix to BIND is needed to avoid this issue and perhaps similar issues like it. The DNS has multiple nameservers so that when one is down you can ask another and be able to cache the answer. Here none of the nameservers are giving answers that can be cached. FORMERR, NOTIMP, REFUSED/timeout are per server not per query tuple QNAME/QTYPE/QCLASS. jik -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
On 07/11/2011 11:11, Jonathan Kamens wrote: The number of DNS queries required for each address lookup requested by a client has gone up considerably because of IPV6. The problem is being exacerbated by the fact that many DNS servers on the net don't yet support IPV6 queries. I have to disagree with your premise here. It's true that DNS software has a notoriously long deprecation cycle, but records have been around for long enough that it's highly unlikely there are enough name servers that don't handle them to make a noticeable difference. And even if you can find one, it should be upgraded for a vast array of other reasons. The result is that address lookups are frequently taking so long that the client gives up before getting the result. It sounds to me like you don't have IPv6 connectivity. If so, you've already been given the advice to configure your OS to avoid asking for at all, or at least to ask for A first. Heed this advice. The example I am seeing this with most frequently is my RSS feed reader, rss2email, trying to read a feed from en.wikipedia.org in a cron job that runs every 15 minutes. I am regularly seeing this in the output of the cron job: W: Name or service not known [8] http://en.wikipedia.org/w/index.php?title=/[elided]/feed=atomaction=history The wikipedia.org domain has three DNS servers. Let's assume that the root and org. nameservers are cached already when rss2email does its query. If so, then it has to do the following queries: wikipedia.org DNS en.wikipedia.org en.wikipedia.org A This is fine when the wikipedia.org nameservers are working, but let's postulate for the moment that two of them are down, unreachable, or responding slowly, which apparently happens pretty often. Then we end up doing: wikipedia.org DNS en.wikipedia.org /times out /en.wikipedia.org /times out /en.wikipedia.org en.wikipedia.org A /times out/ en.wikipedia.org A /times out /en.wikipedia.org A By now the end of that sequence, the typical 30-second DNS request timeout has been exceeded, and the client gives up. See above. YOU need to configure your software to not ask for , or to ask for A first. I said above that the problem is exacerbated by the fact that many DNS servers don't yet support IPV6 queries. This is because the queries don't get NXDOMAIN responses, which would be cached, but rather FORMERR responses, which are not cached. As a result, the scenario describes above happens much more frequently because the DNS server has to redo the queries often. Can you provide examples of specific name servers, on the network now, that respond this way? The authoritative name servers for wikipedia.org respond correctly (NOERROR/ANSWER=0) to queries for en.wikipedia.org. If you are seeing a FORMERR response to these queries the problem lies somewhere in your resolution chain. Before taking mitigating steps in correctly functioning software is considered there needs to be substantial evidence that there are enough really really old name servers that behave the way you describe still on line to make the effort worthwhile. hth, Doug -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
Users are experiencing this problem now in the field, and more users will be experiencing it as BIND is upgraded in more and more places. Every single user relying on a Fedora 15 DNS server, for example, is going to see occasional unnecessary DNS timeouts when trying to resolve host names. It seems clear to me that a generally available, generally applicable fix to BIND is needed to avoid this issue and perhaps similar issues like it. What is the fix you want? Negative caching of FORMERR responses? That won't work in the wikipedia case, since the (incorrect) SOA minimum is only 10 minutes, and your cron job runs every 15 minutes. There are millions of broken domains out there. Asking BIND to install kludges to pave over them is probably not the best way to go. michael PS. BTW, it would be incorrect to state that queries for non-existent records for a domain name for which other records exist (e.g. CNAME or A) should get an NXDOMAIN response. They absolutely should not. They should get an empty answer with a NOERROR RCODE. NXDOMAIN means that there are no dns records whatsoever that have the domain name en.wikipedia.org, which is certainly not the case. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Clients get DNS timeouts because ipv6 means more queries for each lookup
Wikipedia have been told multiple times that their nameservers are broken, that they fail to add the CNAME records, as required by RFC 1034, which results in garbage answers being returned. Those garbage answers result in the FORMERR log messages. Both of the answers below should have CNAME chains in them but only the A query has them. Now luckily this doesn't affect every lookup as the CNAME records returned from the A lookup are cached, so every hour the recursive nameserver needs to go through this dance. Asking for A before just hides the problem by priming the cache. Mark ;; Got answer: ;; -HEADER- opcode: QUERY, status: NOERROR, id: 23606 ;; flags: qr aa; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;en.wikipedia.org. IN A ;; ANSWER SECTION: en.wikipedia.org. 3600IN CNAME text.wikimedia.org. text.wikimedia.org. 600 IN CNAME text.pmtpa.wikimedia.org. text.pmtpa.wikimedia.org. 3600 IN A 208.80.152.2 ;; Query time: 411 msec ;; SERVER: 91.198.174.4#53(ns2.wikimedia.org) ;; WHEN: Tue Jul 12 12:02:06 2011 ;; Got answer: ;; -HEADER- opcode: QUERY, status: NOERROR, id: 23260 ;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;en.wikipedia.org. IN ;; AUTHORITY SECTION: wikimedia.org. 86400 IN SOA ns0.wikimedia.org. hostmaster.wikimedia.org. 2011071119 43200 7200 1209600 600 ;; Query time: 306 msec ;; SERVER: 208.80.152.142#53(ns1.wikimedia.org) ;; WHEN: Tue Jul 12 12:00:58 2011 ;; MSG SIZE rcvd: 108 In message 4e1b9222.8090...@dougbarton.us, Doug Barton writes: On 07/11/2011 11:11, Jonathan Kamens wrote: The number of DNS queries required for each address lookup requested by a client has gone up considerably because of IPV6. The problem is being exacerbated by the fact that many DNS servers on the net don't yet support IPV6 queries. I have to disagree with your premise here. It's true that DNS software has a notoriously long deprecation cycle, but records have been around for long enough that it's highly unlikely there are enough name servers that don't handle them to make a noticeable difference. And even if you can find one, it should be upgraded for a vast array of other reasons. The result is that address lookups are frequently taking so long that the client gives up before getting the result. It sounds to me like you don't have IPv6 connectivity. If so, you've already been given the advice to configure your OS to avoid asking for at all, or at least to ask for A first. Heed this advice. The example I am seeing this with most frequently is my RSS feed reader, rss2email, trying to read a feed from en.wikipedia.org in a cron job that runs every 15 minutes. I am regularly seeing this in the output of the cron job: W: Name or service not known [8] http://en.wikipedia.org/w/index.php?title=/[elided]/feed=atomaction=h istory The wikipedia.org domain has three DNS servers. Let's assume that the root and org. nameservers are cached already when rss2email does its query. If so, then it has to do the following queries: wikipedia.org DNS en.wikipedia.org en.wikipedia.org A This is fine when the wikipedia.org nameservers are working, but let's postulate for the moment that two of them are down, unreachable, or responding slowly, which apparently happens pretty often. Then we end up doing: wikipedia.org DNS en.wikipedia.org /times out /en.wikipedia.org /times out /en.wikipedia.org en.wikipedia.org A /times out/ en.wikipedia.org A /times out /en.wikipedia.org A By now the end of that sequence, the typical 30-second DNS request timeout has been exceeded, and the client gives up. See above. YOU need to configure your software to not ask for , or to ask for A first. I said above that the problem is exacerbated by the fact that many DNS servers don't yet support IPV6 queries. This is because the queries don't get NXDOMAIN responses, which would be cached, but rather FORMERR responses, which are not cached. As a result, the scenario describes above happens much more frequently because the DNS server has to redo the queries often. Can you provide examples of specific name servers, on the network now, that respond this way? The authoritative name servers for wikipedia.org respond correctly (NOERROR/ANSWER=0) to queries for en.wikipedia.org. If you are seeing a FORMERR response to these queries the problem lies somewhere in your resolution chain. Before taking mitigating steps in correctly functioning software is considered there needs to be substantial evidence that there are enough really really old name servers that behave the way you describe still on line to make the effort worthwhile. hth, Doug