Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-21 Thread Bill Owens
On Mon, Jul 11, 2011 at 04:06:42PM -0400, Bill Owens wrote:
 On Mon, Jul 11, 2011 at 02:11:57PM -0400, Jonathan Kamens wrote:
  The number of DNS queries required for each address lookup requested by 
  a client has gone up considerably because of IPV6. The problem is being 
  exacerbated by the fact that many DNS servers on the net don't yet 
  support IPV6 queries. The result is that address lookups are frequently 
  taking so long that the client gives up before getting the result.
 
 I've seen the same thing, and poked around enough to see that the Wikipedia 
 name servers are returning the wrong authority info for these and other 
 queries (it isn't just  - try TXT, SRV, etc.) Some digging through the 
 archives finds this:
 
 https://lists.isc.org/pipermail/bind-users/2011-March/083109.html
  in which the first sentence says it all: The nameservers for wikipedia.org 
 are broken.
 
 And this followup:
 https://lists.isc.org/pipermail/bind-users/2011-March/083113.html
  It's PowerDNS 2.9.22 that is breaking this, and it will be fixed by 
  PowerDNS 3.0 once that's released, and we get around to deploying it.
 
 Looks like PowerDNS was in RC2 as of April 19, not released yet. . .

Updating that - according to Bert Hubert (via Twitter): 
Friday the 22nd is... PowerDNS Authoritiative Server 3.0 release day!

Bill.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-14 Thread Mark Andrews

In message 4e1d3c05.7040...@kamens.us, Jonathan Kamens writes:
 You seem to have a really big chip on your shoulder about people who run =
 broken DNS servers. I don't like them any more than you do. But I=20
 learned Be generous in what you accept and conservative in what you=20
 generate way back when I started playing with the Internet well over=20
 two decades ago. It holds up now as well as it did back then, and=20
 there's no good reason why it shouldn't apply in this case.

Perhaps I do, but it is with good justification. There is that much
garbage out there that it is hard to get answers back within the
2-4 seconds a client waits for a response.

There are broken servers out there.
There are misconfigured servers out there.
There are broken/misconfigured firewalls out there.
There are broken NAT boxes out there.
There are broken DNS proxies out there.
There are administrator out there that don't care.

What should be a clean straight forward request / response protocol
no longer is.

There are lots of workarounds built into recursive servers.  It got
to the point that its getting hard to add new workarounds without
breaking old workarounds or breaking good answer processing.

Mark
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-13 Thread Mark Andrews

No.  The fix is to correct the nameservers.  They are not correctly
following the DNS protocol and everything else is a fall out from
that.

 Well, all the prodding from people here prompted me to investigate 
 further exactly what's going on. The problem isn't what I thought it 
 was. It appears to be a bug in glibc, and I've filed a bug report and 
 found a workaround.

There is no bug in glibc.

 In a nutshell, the getaddrinfo function in glibc sends both A and  
 queries to the DNS server at the same time and then deals with the 
 responses as they come in. Unfortunately, if the responses to the two 
 queries come back in reverse order, /and/ the first one to come back is 
 a server failure, both of which are the case when you try to resolve 
 en.wikipedia.org immediately after restarting your DNS server so nothing 
 is cached, the glibc code screws up and decides it didn't get back a 
 successful response even though it did.

There is *nothing* wrong with sending both queries at once.

 If you do the same lookup again, it works, because the CNAME that was 
 sent in response to the A query is cached, so both the A and  
 queries get back valid responses from the DNS server. And even if that 
 weren't the case, since the CNAME is cached it gets returned first, 
 since the server doesn't need to do a query to get it, whereas it does 
 need to do another query to get the  record (which recall isn't 
 being cached because of the previously discussed FORMERR problem). It'll 
 keep working until the cached records time out, at which point it'll 
 happen again, and then be OK again until the records time out, etc.
 
 The workaround is to put options single-request in /etc/resolv.conf to 
 prevent the glibc innards from sending out both the A and  queries 
 at the same time.
 
 FYI, here's the glibc bug I filed about this:
 
 http://sourceware.org/bugzilla/show_bug.cgi?id=12994
 
 Thank you for telling me I was full of it and making me dig deeper into 
 this until I located the actual cause of the issue. :-)
 
jik

Note your fix won't help clients that only ask for  records
because it is the authoritative servers that are broken, not the
resolver library or the recursive server.

Mark
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-13 Thread Jonathan Kamens

On 07/13/2011 02:13 AM, Mark Andrews wrote:

No.  The fix is to correct the nameservers.  They are not correctly
following the DNS protocol and everything else is a fall out from
that.

You're right that everything else is fallout from that.

But that doesn't do me much good, does it? It's my system that keeps 
getting bogus name resolution errors. It's my RSS feed reader that keeps 
failing on an hourly basis when the cached records for en.wikipedia.org 
expire. It's all very well and good to say that the Wikipedia folks and 
other people with this problem should fix their nameservers -- I totally 
agree with that -- but it doesn't help me solve my problem /now/.


I'm a real user in the real world with a real problem. Yelling at 
Wikipedia to fix their DNS servers may feel good, but it doesn't make my 
DNS work. As far as I and all the other users who are being impacted 
/now/ by this problem are concerned, it's just pissing into the wind.

Well, all the prodding from people here prompted me to investigate
further exactly what's going on. The problem isn't what I thought it
was. It appears to be a bug in glibc, and I've filed a bug report and
found a workaround.

There is no bug in glibc.

To be blunt, that's bullshit.

If glibc makes an A query and an  query, and it gets back a valid 
response to the A query and an invalid response to the  query, then 
it should ignore the invalid response to the  query and return the 
valid A response to the user as the IP address for the host.


Please note, furthermore, that as I explained in detail in my bug report 
and in my last message, glibc behaves differently based on the /order/ 
in which the two responses are returned by the DNS server. Since there's 
nothing that says a DNS server has to respond to two queries in the 
order in which they were received, and that would be an impossible 
requirement to impose in any case, since the queries and responses are 
sent via UDP which doesn' guarantee order, it's perfectly clear that 
glibc needs to be prepared to function the same regardless of the order 
in which it receives the responses.


What's more, there's plenty of code in the glibc files I spent hours 
poring over which is clearly an attempt to do exactly that. The people 
who wrote the code just got it wrong. Which isn't surprising, given how 
god-awful the code is.


This is not an either/or situation. The broken nameservers should be 
fixed, /and/ glibc should be fixed to properly handle the case of when 
it sends two queries and gets back one valid response and one server 
error in reverse order.

In a nutshell, the getaddrinfo function in glibc sends both A and 
queries to the DNS server at the same time and then deals with the
responses as they come in. Unfortunately, if the responses to the two
queries come back in reverse order, /and/ the first one to come back is
a server failure, both of which are the case when you try to resolve
en.wikipedia.org immediately after restarting your DNS server so nothing
is cached, the glibc code screws up and decides it didn't get back a
successful response even though it did.

There is *nothing* wrong with sending both queries at once.
I didn't say there was. You really don't seem to be paying very good 
attention.


Do you understand what the word /workaround/ means?

Note your fix won't help clients that only ask for  records
because it is the authoritative servers that are broken, not the
resolver library or the recursive server.
I am aware of that. It is irrelevant, because it is not the problem I am 
trying to solve. I, and 99.99% of the users in the world, are /not/ 
only ask[ing] for  records. Nobody actually trying to use the 
internet for day-to-day work is doing that right now, because to say 
that IPv6 support is not yet ubiquitous would be a laughably momentous 
understatement.


You seem to have a really big chip on your shoulder about people who run 
broken DNS servers. I don't like them any more than you do. But I 
learned Be generous in what you accept and conservative in what you 
generate way back when I started playing with the Internet well over 
two decades ago. It holds up now as well as it did back then, and 
there's no good reason why it shouldn't apply in this case.


It's clear that this is a religious issue for you. I'm not here to 
debate religion, I'm here to get help making my DNS work, and to help 
other people, to whatever extent I can, make /their/ DNS work. If you 
continue to send religious screeds on this topic while making no effort 
to actually read and understand what I write, please do not expect me to 
respond further.


  Jonathan Kamens



smime.p7s
Description: S/MIME Cryptographic Signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-13 Thread Kevin Darcy

On 7/13/2011 2:35 AM, Jonathan Kamens wrote:

On 07/13/2011 02:13 AM, Mark Andrews wrote:

Well, all the prodding from people here prompted me to investigate
further exactly what's going on. The problem isn't what I thought it
was. It appears to be a bug in glibc, and I've filed a bug report and
found a workaround.

There is no bug in glibc.

To be blunt, that's bullshit.

If glibc makes an A query and an  query, and it gets back a valid 
response to the A query and an invalid response to the  query, 
then it should ignore the invalid response to the  query and 
return the valid A response to the user as the IP address for the host.


Please note, furthermore, that as I explained in detail in my bug 
report and in my last message, glibc behaves differently based on the 
/order/ in which the two responses are returned by the DNS server. 
Since there's nothing that says a DNS server has to respond to two 
queries in the order in which they were received, and that would be an 
impossible requirement to impose in any case, since the queries and 
responses are sent via UDP which doesn' guarantee order, it's 
perfectly clear that glibc needs to be prepared to function the same 
regardless of the order in which it receives the responses.
I agree that the order of the A/ responses shouldn't matter to the 
result. The whole getaddrinfo() call should fail regardless of whether 
the failure is seen first or the valid response is seen first. Why? 
Because getaddrinfo() should, if it isn't already, be using the RFC 3484 
algorithm (and/or whatever the successor to RFC 3484 ends up being) to 
sort the addresses, and for that algorithm to work, one needs *both* the 
IPv4 address(es) *and* the IPv6 address(es) available, in order to 
compare their scopes, prefixes, etc.. If one of the lookups fails, and 
this failure is presented to the RFC 3484 algorithm as NODATA for a 
particular address family, then the algorithm could make a bad selection 
of the destination address, and this can lead to other sorts of 
breakage, e.g. trying to use a tunneled connection where no tunnel 
exists.  The *safe* thing for glibc to do is to promote the failure of 
either the A lookup or the  lookup to a general lookup failure, 
which prompts the user/administrator to find the source of the problem 
and fix it.


It's rarely a good idea to mask undeniable errors as if there were no 
error at all. It leads to unpredictable behavior and really tough 
troubleshooting challenges. I think glibc is erring on the side of 
openness and transparency here, rather than trying to cover up the fact 
that something is horribly wrong.





Note your fix won't help clients that only ask for  records
because it is the authoritative servers that are broken, not the
resolver library or the recursive server.
I am aware of that. It is irrelevant, because it is not the problem I 
am trying to solve. I, and 99.99% of the users in the world, are 
/not/ only ask[ing] for  records. Nobody actually trying to use 
the internet for day-to-day work is doing that right now, because to 
say that IPv6 support is not yet ubiquitous would be a laughably 
momentous understatement.
What about clients in a NAT64/DNS64 environment? They could be 
configured as IPv6-only but normally able to access the IPv4 Internet 
just fine. Even with your glibc fix in place, though, they'll 
presumably break if the authoritative nameservers are giving garbage 
responses to  queries (could someone with practical experience in 
DNS64 please confirm this?).


Another possibility you're not considering is that the invoking 
application itself may make independent IPv4-specific and IPv6-specific 
getaddrinfo() lookups. Why would it do this? Why not? Maybe IPv6 
capability is something the user has to buy a separate license for, so 
the IPv6 part is a slightly separate codepath, added in a later version, 
than the base product, which is IPv4-only. When one of the getaddrinfo() 
calls returns address records and the other returns garbage, your fix 
doesn't prevent such an application from doing something unpredictable, 
possibly catastrophic. So it's really not a general solution to the problem.





- Kevin
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-13 Thread Kevin Darcy

On 7/13/2011 1:06 PM, Kevin Darcy wrote:

On 7/13/2011 2:35 AM, Jonathan Kamens wrote:

On 07/13/2011 02:13 AM, Mark Andrews wrote:

Well, all the prodding from people here prompted me to investigate
further exactly what's going on. The problem isn't what I thought it
was. It appears to be a bug in glibc, and I've filed a bug report and
found a workaround.

There is no bug in glibc.

To be blunt, that's bullshit.

If glibc makes an A query and an  query, and it gets back a valid 
response to the A query and an invalid response to the  query, 
then it should ignore the invalid response to the  query and 
return the valid A response to the user as the IP address for the host.


Please note, furthermore, that as I explained in detail in my bug 
report and in my last message, glibc behaves differently based on the 
/order/ in which the two responses are returned by the DNS server. 
Since there's nothing that says a DNS server has to respond to two 
queries in the order in which they were received, and that would be 
an impossible requirement to impose in any case, since the queries 
and responses are sent via UDP which doesn' guarantee order, it's 
perfectly clear that glibc needs to be prepared to function the same 
regardless of the order in which it receives the responses.
I agree that the order of the A/ responses shouldn't matter to the 
result. The whole getaddrinfo() call should fail regardless of whether 
the failure is seen first or the valid response is seen first. Why? 
Because getaddrinfo() should, if it isn't already, be using the RFC 
3484 algorithm (and/or whatever the successor to RFC 3484 ends up 
being) to sort the addresses, and for that algorithm to work, one 
needs *both* the IPv4 address(es) *and* the IPv6 address(es) 
available, in order to compare their scopes, prefixes, etc.. If one of 
the lookups fails, and this failure is presented to the RFC 3484 
algorithm as NODATA for a particular address family, then the 
algorithm could make a bad selection of the destination address, and 
this can lead to other sorts of breakage, e.g. trying to use a 
tunneled connection where no tunnel exists.  The *safe* thing for 
glibc to do is to promote the failure of either the A lookup or the 
 lookup to a general lookup failure, which prompts the 
user/administrator to find the source of the problem and fix it.


It's rarely a good idea to mask undeniable errors as if there were no 
error at all. It leads to unpredictable behavior and really tough 
troubleshooting challenges. I think glibc is erring on the side of 
openness and transparency here, rather than trying to cover up the 
fact that something is horribly wrong.





Note your fix won't help clients that only ask for  records
because it is the authoritative servers that are broken, not the
resolver library or the recursive server.
I am aware of that. It is irrelevant, because it is not the problem I 
am trying to solve. I, and 99.99% of the users in the world, are 
/not/ only ask[ing] for  records. Nobody actually trying to use 
the internet for day-to-day work is doing that right now, because to 
say that IPv6 support is not yet ubiquitous would be a laughably 
momentous understatement.
What about clients in a NAT64/DNS64 environment? They could be 
configured as IPv6-only but normally able to access the IPv4 Internet 
just fine. Even with your glibc fix in place, though, they'll 
presumably break if the authoritative nameservers are giving garbage 
responses to  queries (could someone with practical experience in 
DNS64 please confirm this?).


Another possibility you're not considering is that the invoking 
application itself may make independent IPv4-specific and 
IPv6-specific getaddrinfo() lookups. Why would it do this? Why not? 
Maybe IPv6 capability is something the user has to buy a separate 
license for, so the IPv6 part is a slightly separate codepath, added 
in a later version, than the base product, which is IPv4-only. When 
one of the getaddrinfo() calls returns address records and the other 
returns garbage, your fix doesn't prevent such an application from 
doing something unpredictable, possibly catastrophic. So it's really 
not a general solution to the problem.
Oh, I should also point out that this brokenness by the 
wikipedia/wikimedia nameservers *isn't* just specific to  queries, 
and therefore *isn't* fixable with getaddrinfo() alone. Try doing an 
MX query of en.wikipedia.org. Or a PTR query. Or any of the other old 
(yet non-deprecated) query types (e.g. NS, TXT, HINFO). The only QTYPEs 
that are answered correctly are A, CNAME and (oddly enough) SOA. So they 
don't even have the excuse of well,  queries are kinda new, we 
haven't got around to handling them properly yet. This behavior has 
failed to conform to the standard, for as long as the standard has 
existed; it's not recent, IPv6-specific breakage.


  

RE: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-13 Thread Jonathan Kamens
I agree that the order of the A/ responses shouldn't matter to the
result. The whole getaddrinfo() call should fail regardless of whether the
failure is seen first or the valid response is seen first. Why? Because
getaddrinfo() should, if it isn't already, be using the RFC 3484 algorithm
(and/or whatever the successor to RFC 3484 ends up being) to sort the
addresses, and for that algorithm to work, one needs *both* the IPv4
address(es) *and* the IPv6 address(es) available, in order to compare their
scopes, prefixes, etc..

 

RFC 3484 tells you how to sort addresses you've got.

 

If you've only got one address, then bang! It's already sorted for you. You
don't need RFC 3484 to tell you how to sort it.

 

I have to say that some of the people on this list seem completely detached
from what real users in the real world want their computers to do.

 

If I am trying to connect to a site on the internet, then I want my computer
to do its best to try to connect to the site. I don't want it to throw up
its hands and say, Oh, I'm sorry, one of my address lookups failed, so I'm
not going to let you use the other address lookup, the one that succeeded,
because some RFC somewhere could be interpreted as implying that's a bad
idea, if I wanted to do so. Please, that's ridiculous.

 

If one of the lookups fails, and this failure is presented to the RFC 3484
algorithm as NODATA for a particular address family, then the algorithm
could make a bad selection of the destination address, and this can lead to
other sorts of breakage, e.g. trying to use a tunneled connection where no
tunnel exists.

 

If the address the client gets doesn't work, then the address doesn't work.
How is being unable to connect because the address turned out to not be
routable different from being unable to connect because the computer refused
to even try?



Another possibility you're not considering is that the invoking application
itself may make independent IPv4-specific and IPv6-specific getaddrinfo()
lookups. Why would it do this? Why not? Maybe IPv6 capability is something
the user has to buy a separate license for, so the IPv6 part is a slightly
separate codepath, added in a later version, than the base product, which is
IPv4-only. When one of the getaddrinfo() calls returns address records and
the other returns garbage, your fix doesn't prevent such an application
from doing something unpredictable, possibly catastrophic. So it's really
not a general solution to the problem.

 

I have no idea what you're talking about. If the application makes
independent IPv4 and IPv6 getaddrinfo() lookups, then the change I'm
proposing to glibc is completely irrelevant and does not impact the existing
functionality in any way. The IPv4 lookup will succeed, the IPv6 lookup will
fail, and the application is then free to decide what to do.

 

In summary, getattrinfo() with AF_UNSPEC has a very clear meaning - Give me
whatever addresses you can. The man page says, and I am quoting, The value
AF_UNSPEC undicates that getaddrinfo() should return socket addresses for
any address family (either IPv4 or IPv6, for example) that can be used with
node and service. I don't see how the language could be any more clear. To
suggest that it's reasonable and correct for it to refuse to return a
successfully fetched address is simply ludicrous.

 

I hope and pray that people who maintain the glibc code have more common
sense about what users want and expect from their software.

 

In the meantime, it's clear that I don't belong on this mailing list, so I'm
out of here.

 

  Jonathan Kamens

 

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-13 Thread Kevin Darcy

On 7/13/2011 2:39 PM, Jonathan Kamens wrote:


I agree that the order of the A/ responses shouldn't matter to the 
result. The whole getaddrinfo() call should fail regardless of whether 
the failure is seen first or the valid response is seen first. Why? 
Because getaddrinfo() should, if it isn't already, be using the RFC 
3484 algorithm (and/or whatever the successor to RFC 3484 ends up 
being) to sort the addresses, and for that algorithm to work, one 
needs *both* the IPv4 address(es) *and* the IPv6 address(es) 
available, in order to compare their scopes, prefixes, etc..


RFC 3484 tells you how to sort addresses you've got.

If you've only got one address, then bang! It's already sorted for 
you. You don't need RFC 3484 to tell you how to sort it.


No, you've got one address, and one unspecified nameserver failure. 
Garbage in, garbage out. To say that a nameserver failure is equivalent 
to NODATA is not only technically incorrect, it leads to all sorts of 
operational problems in the real world.


I have to say that some of the people on this list seem completely 
detached from what real users in the real world want their computers 
to do.


Really? Do you think I'm an academic? Do you think I sit and write 
Internet Drafts and RFCs all day? No, I'm an implementor. I deal with 
DNS operational problems and issues all day, every workday. And I can 
tell you that I don't appreciate library routines making wild-ass 
assumptions that, in the face of some questionable behavior by a 
nameserver, maybe, possibly some quantity of addresses that I've 
acquired from that dodgy nameserver are good enough for my clients to 
try and connect to. No thanks. If there's a real problem I want to know 
about it as clearly and unambiguously as possible. I can't deal 
effectively with a problem if it's being masked by some library routine 
doing something weird behind my back.


If I am trying to connect to a site on the internet, then I want my 
computer to do its best to try to connect to the site. I don't want it 
to throw up its hands and say, Oh, I'm sorry, one of my address 
lookups failed, so I'm not going to let you use the /other/ address 
lookup, the one that succeeded, because some RFC somewhere could be 
interpreted as implying that's a bad idea, if I wanted to do so. 
Please, that's ridiculous.


No, what's more ridiculous is if users can't get to a site SOME OF THE 
TIME, because someone's DNS is broken, a moronic library routine then 
routes the traffic some unexpected way, and a whole raft of other 
variables enter the picture, without anyone realizing or paying 
attention to the dependencies and interconnectivity that is required to 
keep the client working. There is a certain threshold of brokenness 
where the infrastructure has to throw up its hands, as you put it, and 
say nuh uh, not gonna happen, because to try to work around the 
problem based on not enough information about the topology, the 
environment, the dependencies, etc. you're likely to cause more harm 
than good by making the failure modes way more complex than necessary.


If one of the lookups fails, and this failure is presented to the 
RFC 3484 algorithm as NODATA for a particular address family, then the 
algorithm could make a bad selection of the destination address, and 
this can lead to other sorts of breakage, e.g. trying to use a 
tunneled connection where no tunnel exists.


If the address the client gets doesn't work, then the address doesn't 
work. How is being unable to connect because the address turned out to 
not be routable different from being unable to connect because the 
computer refused to even try?


Because the failure modes are substantially different and it could take 
significant man-hours to determine that the root cause of the problem is 
actually DNS brokenness rather than something else in the network 
infrastructure (routers, switches, VPN concentrators, firewalls, IPSes, 
load-balancers, etc.) or in the client or server (OS, application, 
middleware, etc.)


Have you ever actually troubleshot a difficult connectivity problem in a 
complex networking environment? Trust me, you want clear symptoms, clear 
failure modes. Not a bunch of components making dumb assumptions and/or 
trying to be helpful outside of their defined scope of functionality. 
That kind of help is like offering a glass of water to a drowning man.



Another possibility you're not considering is that the invoking 
application itself may make independent IPv4-specific and 
IPv6-specific getaddrinfo() lookups. Why would it do this? Why not? 
Maybe IPv6 capability is something the user has to buy a separate 
license for, so the IPv6 part is a slightly separate codepath, added 
in a later version, than the base product, which is IPv4-only. When 
one of the getaddrinfo() calls returns address records and the other 
returns garbage, your fix doesn't prevent such an application from 
doing something unpredictable, possibly catastrophic. 

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-12 Thread Jonathan Kamens
Well, all the prodding from people here prompted me to investigate 
further exactly what's going on. The problem isn't what I thought it 
was. It appears to be a bug in glibc, and I've filed a bug report and 
found a workaround.


In a nutshell, the getaddrinfo function in glibc sends both A and  
queries to the DNS server at the same time and then deals with the 
responses as they come in. Unfortunately, if the responses to the two 
queries come back in reverse order, /and/ the first one to come back is 
a server failure, both of which are the case when you try to resolve 
en.wikipedia.org immediately after restarting your DNS server so nothing 
is cached, the glibc code screws up and decides it didn't get back a 
successful response even though it did.


If you do the same lookup again, it works, because the CNAME that was 
sent in response to the A query is cached, so both the A and  
queries get back valid responses from the DNS server. And even if that 
weren't the case, since the CNAME is cached it gets returned first, 
since the server doesn't need to do a query to get it, whereas it does 
need to do another query to get the  record (which recall isn't 
being cached because of the previously discussed FORMERR problem). It'll 
keep working until the cached records time out, at which point it'll 
happen again, and then be OK again until the records time out, etc.


The workaround is to put options single-request in /etc/resolv.conf to 
prevent the glibc innards from sending out both the A and  queries 
at the same time.


FYI, here's the glibc bug I filed about this:

http://sourceware.org/bugzilla/show_bug.cgi?id=12994

Thank you for telling me I was full of it and making me dig deeper into 
this until I located the actual cause of the issue. :-)


  jik

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Tony Finch
Jonathan Kamens j...@kamens.us wrote:

 I said above that the problem is exacerbated by the fact that many DNS servers
 don't yet support IPV6 queries. This is because the  queries don't get
 NXDOMAIN responses, which would be cached, but rather FORMERR responses, which
 are not cached. As a result, the scenario describes above happens much more
 frequently because the DNS server has to redo the  queries often.

Your upstream resolver is broken if it returns FORMERR responses to 
queries. The behaviour you describe is not normal.

Have a look at bind's filter--on-v4 and deny-answer-addresses options
which should allow you prevent applications from trying to use IPv6. The
latter might also quell queries for IPv6 addresses of name servers (though
I haven't verified that). Also perhaps it'll help to declare all IPv6 name
servers bogus -- server ::/0 { bogus yes; };

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
North Bailey: Variable becoming southeasterly 3 or 4. Slight or moderate.
Fair. Good.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Eivind Olsen
Jonathan Kamens wrote:

 I said above that the problem is exacerbated by the fact that many DNS
 servers don't yet support IPV6 queries. This is because the  queries
 don't get NXDOMAIN responses, which would be cached, but rather FORMERR
 responses, which are not cached. As a result, the scenario describes
 above happens much more frequently because the DNS server has to redo
 the  queries often.

I think the main issue here is - why is your nameserver thinking it has
IPv6 connectivity?
If you don't have a working IPv6 connectivity, do one / both of these:

1) Disable or at least configure IPv6 properly on your server
2) Tell BIND to not use IPv6 transport, typically by starting named with
the command line option -4. How to do that depends on your operating
system / distribution / packaging system etc.

Regards
Eivind Olsen


___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Jonathan Kamens

On 7/11/2011 3:10 PM, Tony Finch wrote:

Jonathan Kamensj...@kamens.us  wrote:

I said above that the problem is exacerbated by the fact that many DNS servers
don't yet support IPV6 queries. This is because the  queries don't get
NXDOMAIN responses, which would be cached, but rather FORMERR responses, which
are not cached. As a result, the scenario describes above happens much more
frequently because the DNS server has to redo the  queries often.

Your upstream resolver is broken if it returns FORMERR responses to 
queries. The behaviour you describe is not normal.
There are people reporting all over the net that they're getting tons of 
messages like this in their logs with recent BIND versions:


Jul 11 12:00:06 jik2 named[31354]: error (FORMERR) resolving 
'en.wikipedia.org//IN': 208.80.152.130#53


I've got 397 of them in my logs for just the last 24 hours.

I'm aware that this means the upstream DNS server is broken; isn't what 
what I said, i.e., that it isn't responding properly to  queries?


The problem is that I have no control over the upstream resolver. All I 
have control over is my own name server.


I am not the only one who is going to encounter this problem. I've found 
several reports of it on the net with a minimal amount of searching. I 
think something more general has to be done than giving me advice about 
what to change in my named.conf. I appreciate the advice for how to fix 
the problem for myself, but I think it needs to be fixed for everyone.


Have a look at bind's filter--on-v4 and deny-answer-addresses options
which should allow you prevent applications from trying to use IPv6.
Neither of these options are documented in named.conf(5) or 
resolv.conf(5). Is this a problem that is specific to the Fedora 15 
versions of these man pages, or is the documentation distributed with 
BIND out-of-date?


I tried to use the option and I get is not configured in my log when 
named starts up and then parsing failed, so I think my BIND must not 
be compiled with --enable-filter-, right? That makes it difficult to 
use this solution. Perhaps that's also why it isn't listed in the man page?


  jik



smime.p7s
Description: S/MIME Cryptographic Signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Jonathan Kamens

On 7/11/2011 3:26 PM, Eivind Olsen wrote:

I think the main issue here is - why is your nameserver thinking it has
IPv6 connectivity?

No, this isn't the issue.

I see the FORMERR errors in syslog and the timeouts resolving host names 
even when I start named with -4.


Named is querying for  records even when it is started with -4, and 
it is the querying, not the connectivity, that is the issue.


  jik



smime.p7s
Description: S/MIME Cryptographic Signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Bill Owens
On Mon, Jul 11, 2011 at 02:11:57PM -0400, Jonathan Kamens wrote:
 The number of DNS queries required for each address lookup requested by 
 a client has gone up considerably because of IPV6. The problem is being 
 exacerbated by the fact that many DNS servers on the net don't yet 
 support IPV6 queries. The result is that address lookups are frequently 
 taking so long that the client gives up before getting the result.

I've seen the same thing, and poked around enough to see that the Wikipedia 
name servers are returning the wrong authority info for these and other queries 
(it isn't just  - try TXT, SRV, etc.) Some digging through the archives 
finds this:

https://lists.isc.org/pipermail/bind-users/2011-March/083109.html
 in which the first sentence says it all: The nameservers for wikipedia.org 
are broken.

And this followup:
https://lists.isc.org/pipermail/bind-users/2011-March/083113.html
 It's PowerDNS 2.9.22 that is breaking this, and it will be fixed by 
 PowerDNS 3.0 once that's released, and we get around to deploying it.

Looks like PowerDNS was in RC2 as of April 19, not released yet. . .

Bill.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Phil Mayers

On 07/11/2011 07:11 PM, Jonathan Kamens wrote:

  The number of DNS queries required for each address lookup requested
by a client has gone up considerably because of IPV6. The problem is
being exacerbated by the fact that many DNS servers on the net don't yet
support IPV6 queries. The result is that address lookups are frequently
taking so long that the client gives up before getting the result.


Can you be more specific here? Do you mean many DNS servers don't 
support queries with qtype= or many DNS servers don't support 
queries over IPv6/UDP or IPv6/TCP?




This is fine when the wikipedia.org nameservers are working, but let's
postulate for the moment that two of them are down, unreachable, or
responding slowly, which apparently happens pretty often. Then we end up
doing:

wikipedia.org DNS
en.wikipedia.org  /times out
/en.wikipedia.org  /times out
/en.wikipedia.org 
en.wikipedia.org A /times out/
en.wikipedia.org A /times out
/en.wikipedia.org A


I don't quite see how you're getting this behaviour.

Every operating system that I know of recommends getaddrinfo or some 
similar variant for doing multiprotocol IPv4/IPv6 lookups, and as far as 
I'm aware, they all do something very similar - namely, send the A and 
 lookups in parallel. When I try this against a bind server, I see 
this makes bind perform the A/ lookups in parallel too. So, at worst 
you should have something like:


0.0001 A query
0.0002  query
...
1. A query timeout
1.0001  query timeout

...repeated X+1 times for X non-responding NS records.

That is, the lookups should happen in parallel, so the time taken should 
not double.


If your app is doing its own DNS requests and it's doing them in series, 
then it's broken, for exactly this reason, and should use the system 
resolver.




By now the end of that sequence, the typical 30-second DNS request
timeout has been exceeded, and the client gives up.

I said above that the problem is exacerbated by the fact that many DNS
servers don't yet support IPV6 queries. This is because the  queries
don't get NXDOMAIN responses, which would be cached, but rather FORMERR


Not in my observations. As Tony has said, you seem to have a broken 
upstream resolver.



I'm interested to hear if other people are encountering this problem and


No, we are not seeing this problem, and we have thousands of 
IPv6-enabled clients making A/ DNS requests constantly. It just 
works (tm).


This is not to say -ve caching of FORMERR is a bad idea; it may well be 
a good idea. But I think there is more going on here than simply a 
failure of -ve caching.

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Jonathan Kamens

On 7/11/2011 4:06 PM, Bill Owens wrote:

https://lists.isc.org/pipermail/bind-users/2011-March/083109.html
  in which the first sentence says it all: The nameservers for wikipedia.org are 
broken.
It's not just wikipedia.org that's broken, obviously. I see this error 
in my logs for 19 domains since July 3:


Even if PowerDNS is the only source of this issue, and even if the new 
version of PowerDNS is released tomorrow, I'm sure there will still be 
sites running the old version a year from now. So just relying on a 
PowerDNS release to fix this problem seems unwise.


Users are experiencing this problem /now/ in the field, and more users 
will be experiencing it as BIND is upgraded in more and more places. 
Every single user relying on a Fedora 15 DNS server, for example, is 
going to see occasional unnecessary DNS timeouts when trying to resolve 
host names.


It seems clear to me that a generally available, generally applicable 
fix to BIND is needed to avoid this issue and perhaps similar issues 
like it.


  jik



smime.p7s
Description: S/MIME Cryptographic Signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Kevin Darcy

On 7/11/2011 2:11 PM, Jonathan Kamens wrote:
The number of DNS queries required for each address lookup requested 
by a client has gone up considerably because of IPV6. The problem is 
being exacerbated by the fact that many DNS servers on the net don't 
yet support IPV6 queries. The result is that address lookups are 
frequently taking so long that the client gives up before getting the 
result.


The example I am seeing this with most frequently is my RSS feed 
reader, rss2email, trying to read a feed from en.wikipedia.org in a 
cron job that runs every 15 minutes. I am regularly seeing this in the 
output of the cron job:


W: Name or service not known [8]

http://en.wikipedia.org/w/index.php?title=/[elided]/feed=atomaction=history

The wikipedia.org domain has three DNS servers. Let's assume that the 
root and org. nameservers are cached already when rss2email does its 
query. If so, then it has to do the following queries:


wikipedia.org DNS
en.wikipedia.org 
en.wikipedia.org A

This is fine when the wikipedia.org nameservers are working, but let's 
postulate for the moment that two of them are down, unreachable, or 
responding slowly, which apparently happens pretty often. Then we end 
up doing:


wikipedia.org DNS
en.wikipedia.org  /times out
/en.wikipedia.org  /times out
/en.wikipedia.org 
en.wikipedia.org A /times out/
en.wikipedia.org A /times out
/en.wikipedia.org A

By now the end of that sequence, the typical 30-second DNS request 
timeout has been exceeded, and the client gives up.
The math isn't working. I just ran a quick test and named (9.7.x) failed 
over from a non-working delegated NS to a working delegated NS in less 
than 30 milliseconds. How are you reaching a 30-*second* timeout 
threshold in only 6 queries?


In practice, it would also be quite unlikely that named would pick 
dead nameservers before live ones for *both* the  and the A query. 
At the very least, once the timeouts were encountered for the  
query, those NSes would be penalized in terms of NS selection, so they 
are unlikely to be chosen *again*, ahead of the working NS, for the A 
query. Any en.wikipedia.org NSes which were found to be *persistently* 
broken, would gravitate to the bottom of the selection list, and be 
tried approximately never.


I think maybe you need to probe deeper and find out what _else_ is going on.



- Kevin




___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Tim Maestas
I'm unclear how BIND could be modified to fix this.  The querying
client machines are asking BIND for  records.  BIND goes out to
the authoritative nameservers to attempt to resolve said  records.
 The broken nameservers (PowerDNS 3.0 etc) timeout or otherwise hand
out bad responses (FORMERR, NXDOMAIN).  What would BIND do differently
to avoid this?

Even if BIND was modified, why would the responsibility fall on all
BIND administrators to implement this hack as opposed to the onus
being on the owners of the broken nameservers to upgrade their broken
authoritative servers?

-Tim


On Mon, Jul 11, 2011 at 1:25 PM, Jonathan Kamens j...@kamens.us wrote:
 On 7/11/2011 4:06 PM, Bill Owens wrote:

 https://lists.isc.org/pipermail/bind-users/2011-March/083109.html
  in which the first sentence says it all: The nameservers for wikipedia.org
 are broken.

 It's not just wikipedia.org that's broken, obviously. I see this error in my
 logs for 19 domains since July 3:

 Even if PowerDNS is the only source of this issue, and even if the new
 version of PowerDNS is released tomorrow, I'm sure there will still be sites
 running the old version a year from now. So just relying on a PowerDNS
 release to fix this problem seems unwise.

 Users are experiencing this problem now in the field, and more users will be
 experiencing it as BIND is upgraded in more and more places. Every single
 user relying on a Fedora 15 DNS server, for example, is going to see
 occasional unnecessary DNS timeouts when trying to resolve host names.

 It seems clear to me that a generally available, generally applicable fix to
 BIND is needed to avoid this issue and perhaps similar issues like it.

   jik


 ___
 Please visit https://lists.isc.org/mailman/listinfo/bind-users to
 unsubscribe from this list

 bind-users mailing list
 bind-users@lists.isc.org
 https://lists.isc.org/mailman/listinfo/bind-users

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Chuck Swiger
On Jul 11, 2011, at 1:25 PM, Jonathan Kamens wrote:
 Even if PowerDNS is the only source of this issue, and even if the new 
 version of PowerDNS is released tomorrow, I'm sure there will still be sites 
 running the old version a year from now. So just relying on a PowerDNS 
 release to fix this problem seems unwise.

OK, but this same reasoning applies to making a change to BIND: even if we had 
such a change available tomorrow, there will be sites running older versions of 
BIND a year from now, also.  :-)

 Users are experiencing this problem now in the field, and more users will be 
 experiencing it as BIND is upgraded in more and more places. Every single 
 user relying on a Fedora 15 DNS server, for example, is going to see 
 occasional unnecessary DNS timeouts when trying to resolve host names.
 
 It seems clear to me that a generally available, generally applicable fix to 
 BIND is needed to avoid this issue and perhaps similar issues like it.

What you probably want is a change to your local implementation of 
getaddrinfo() for your libc / glibc so that it prefers to issue T_A queries 
before it issues T_ queries, and will only issue T_ queries if IPv6 
networking is compiled into the system.

In my experience, not only does this significantly help resolver performance in 
the face of nameservers which break when facing IPv6  queries, it is a 
solution which many people ignore.

  http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=42405

Regards,
-- 
-Chuck

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Bill Owens
On Mon, Jul 11, 2011 at 04:25:59PM -0400, Jonathan Kamens wrote:
 On 7/11/2011 4:06 PM, Bill Owens wrote:
 https://lists.isc.org/pipermail/bind-users/2011-March/083109.html
   in which the first sentence says it all: The nameservers for 
   wikipedia.org are broken.
 It's not just wikipedia.org that's broken, obviously. I see this error 
 in my logs for 19 domains since July 3:

I have FORMERR entries in my logs for 79 names since June 19, a total of 5185 
error messages. 2247 of those are for wikipedia-related names. Spot-checking 
shows that the others appear to be unrelated issues; mostly bizarre-looking 
misconfigurations. 

 Even if PowerDNS is the only source of this issue, and even if the new 
 version of PowerDNS is released tomorrow, I'm sure there will still be 
 sites running the old version a year from now. So just relying on a 
 PowerDNS release to fix this problem seems unwise.

A fix to the PowerDNS problem won't remove all the FORMERR messages, but a 
fixed version running the wikipedia-related domains would repair your original 
problem, and that seems like a reasonable thing to expect. More reasonable than 
asking BIND to ignore incorrect responses, IMO. . .

Bill.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Mark Andrews

In message 4e1b562b.2070...@kamens.us, Jonathan Kamens writes:
 
 On 7/11/2011 3:26 PM, Eivind Olsen wrote:
  I think the main issue here is - why is your nameserver thinking it has=
 
  IPv6 connectivity?
 No, this isn't the issue.
 
 I see the FORMERR errors in syslog and the timeouts resolving host names =
 
 even when I start named with -4.

-4 and -6 affect what transport is used.  They have no impact on data.

 Named is querying for  records even when it is started with -4, and=20
 it is the querying, not the connectivity, that is the issue.
 
jik
 
 
 --050505010300040209020902
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 html
   head
 meta content=3Dtext/html; charset=3DISO-8859-1
   http-equiv=3DContent-Type
   /head
   body text=3D#00 bgcolor=3D#FF
 On 7/11/2011 3:26 PM, Eivind Olsen wrote:
 blockquote
   cite=3Dmid:43d661f35b8a94b70c2e3eddf6a29027.squirrel@webmail.amino=
 r.no
   type=3Dcite
   pre wrap=3D
 I think the main issue here is - why is your nameserver thinking it has
 IPv6 connectivity?/pre
 /blockquote
 No, this isn't the issue.br
 br
 I see the FORMERR errors in syslog and the timeouts resolving host
 names even when I start named with -4.br
 br
 Named is querying for  records even when it is started with -4,
 and it is the querying, not the connectivity, that is the issue.br
 br
 nbsp; jikbr
 br
   /body
 /html
 
 --050505010300040209020902--
 
 --ms05070505090404070809
 Content-Type: application/pkcs7-signature; name=smime.p7s
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename=smime.p7s
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIKtTCC
 BIowggNyoAMCAQICECf06hH0eobEbp27bqkXBwcwDQYJKoZIhvcNAQEFBQAwbzELMAkGA1UE
 BhMCU0UxFDASBgNVBAoTC0FkZFRydXN0IEFCMSYwJAYDVQQLEx1BZGRUcnVzdCBFeHRlcm5h
 bCBUVFAgTmV0d29yazEiMCAGA1UEAxMZQWRkVHJ1c3QgRXh0ZXJuYWwgQ0EgUm9vdDAeFw0w
 NTA2MDcwODA5MTBaFw0yMDA1MzAxMDQ4MzhaMIGuMQswCQYDVQQGEwJVUzELMAkGA1UECBMC
 VVQxFzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNUIE5l
 dHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVRO
 LVVTRVJGaXJzdC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMIIBIjANBgkqhkiG
 9w0BAQEFAAOCAQ8AMIIBCgKCAQEAsjmFpPJ9q0E7YkY3rs3BYHW8OWX5ShpHornMSMxqmNVN
 NRm5pELlzkniii8efNIxB8dOtINknS4p1aJkxIW9hVE1eaROaJB7HHqkkqgX8pgV8pPMyaQy
 lbsMTzC9mKALi+VuG6JG+ni8om+rWV6lL8/K2m2qL+usobNqqrcuZzWLeeEeaYji5kbNoKXq
 vgvOdjp6Dpvq/NonWz1zHyLmSGHGTPNpsaguG7bUMSAsvIKKjqQOpdeJQ/wWWq8dcdcRWdq6
 hw2v+vPhwvCkxWeM1tZUOt4KpLoDd7NlyP0e03RiqhjKaJMeoYV+9Udly/hNVyh00jT/MLbu
 9mIwFIws6wIDAQABo4HhMIHeMB8GA1UdIwQYMBaAFK29mHo0tCb3+sQmVO8DveAky1QaMB0G
 A1UdDgQWBBSJgmd9xJ0mcABLtFBIfN49rgRufTAOBgNVHQ8BAf8EBAMCAQYwDwYDVR0TAQH/
 BAUwAwEB/zB7BgNVHR8EdDByMDigNqA0hjJodHRwOi8vY3JsLmNvbW9kb2NhLmNvbS9BZGRU
 cnVzdEV4dGVybmFsQ0FSb290LmNybDA2oDSgMoYwaHR0cDovL2NybC5jb21vZG8ubmV0L0Fk
 ZFRydXN0RXh0ZXJuYWxDQVJvb3QuY3JsMA0GCSqGSIb3DQEBBQUAA4IBAQAZ2IkRbyispgCi
 54fBm5AD236hEv0e8+LwAamUVEJrmgnEoG3XkJIEA2Z5Q3H8+G+v23ZF4jcaPd3kWQR4rBz0
 g0bzes9bhHIt5UbBuhgRKfPLSXmHPLptBZ2kbWhPrXIUNqi5sf2/z3/wpGqUNVCPz4FtVbHd
 WTBK322gnGQfSXzvNrv042n0+DmPWq1LhTq3Du3Tzw1EovsEv+QvcI4l+1pUBrPQxLxtjftz
 Mizpm4QkLdZ/kXpoAlAfDj9N6cz1u2fo3BwuO/xOzf4CjuOoEwqlJkRl6RDyTVKnrtw+ymsy
 XEFs/vVdoOr/0fqbhlhtPZZH5f4ulQTCAMyOofK7MIIGIzCCBQugAwIBAgIQe1dBw9n5Eb2l
 Om+3Ltt3aTANBgkqhkiG9w0BAQUFADCBrjELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAlVUMRcw
 FQYDVQQHEw5TYWx0IExha2UgQ2l0eTEeMBwGA1UEChMVVGhlIFVTRVJUUlVTVCBOZXR3b3Jr
 MSEwHwYDVQQLExhodHRwOi8vd3d3LnVzZXJ0cnVzdC5jb20xNjA0BgNVBAMTLVVUTi1VU0VS
 Rmlyc3QtQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBFbWFpbDAeFw0xMDA3MjgwMDAwMDBa
 Fw0xMTA3MjgyMzU5NTlaMIHYMTUwMwYDVQQLEyxDb21vZG8gVHJ1c3QgTmV0d29yayAtIFBF
 UlNPTkEgTk9UIFZBTElEQVRFRDFGMEQGA1UECxM9VGVybXMgYW5kIENvbmRpdGlvbnMgb2Yg
 dXNlOiBodHRwOi8vd3d3LmNvbW9kby5uZXQvcmVwb3NpdG9yeTEfMB0GA1UECxMWKGMpMjAw
 MyBDb21vZG8gTGltaXRlZDEYMBYGA1UEAxMPSm9uYXRoYW4gS2FtZW5zMRwwGgYJKoZIhvcN
 AQkBFg1qaWtAa2FtZW5zLnVzMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0H8C
 qtK52hCbrMnjyIfIJjIbfEO4FK4/NJWaXXh0ZBKUpqsaoK51v39KcNdXqmlLue+Rjck1dzRv
 x4nScjAqIsO86gC8lrfbA4Mq9TjBDdzU8oTVkNbZQzbKbmJvtLts3sHwkVQAc4BJLn3D2TtY
 LhuyBrmRJU8gircgdTLMa9htydGNbelt3I1rXPLcpQr/RQhyzii6CgxIurpfV4fCLx1pibCJ
 /8NnLUsluIUDfaId8uBSPEBENbn2HpS9Z/z52C6rxMfLVWIyz2mWhxF9TLw//35uyKGkAQ4k
 gUWGSNUkQZrCkH8is8FX9Pu7j5BbUhGKZrtPngn7PZei9nIvgwIDAQABo4ICDzCCAgswHwYD
 VR0jBBgwFoAUiYJnfcSdJnAAS7RQSHzePa4Ebn0wHQYDVR0OBBYEFLLxhINiMjAmBabU27sd
 6Hb1y3ITMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMCAGA1UdJQQZMBcGCCsGAQUF
 BwMEBgsrBgEEAbIxAQMFAjARBglghkgBhvhCAQEEBAMCBSAwRgYDVR0gBD8wPTA7BgwrBgEE
 AbIxAQIBAQEwKzApBggrBgEFBQcCARYdaHR0cHM6Ly9zZWN1cmUuY29tb2RvLm5ldC9DUFMw
 gaUGA1UdHwSBnTCBmjBMoEqgSIZGaHR0cDovL2NybC5jb21vZG9jYS5jb20vVVROLVVTRVJG
 aXJzdC1DbGllbnRBdXRoZW50aWNhdGlvbmFuZEVtYWlsLmNybDBKoEigRoZEaHR0cDovL2Ny
 

Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Mark Andrews

In message 4e1b5c57.8090...@kamens.us, Jonathan Kamens writes:
 On 7/11/2011 4:06 PM, Bill Owens wrote:
  https://lists.isc.org/pipermail/bind-users/2011-March/083109.html
in which the first sentence says it all: The nameservers for wikiped=
 ia.org are broken.
 It's not just wikipedia.org that's broken, obviously. I see this error
 in my logs for 19 domains since July 3:

Well you havn't been looking at your logs or you upgraded to a version
which logs the condition.
 
 Even if PowerDNS is the only source of this issue, and even if the new
 version of PowerDNS is released tomorrow, I'm sure there will still be
 sites running the old version a year from now. So just relying on a
 PowerDNS release to fix this problem seems unwise.

Sure, but it is a minor issue overall.  FORMERR is a lot better
that what used to happen.  Nameservers used to drop  queries
so you got timeouts when all the nameseservers were working instead
of when some are working.

 Users are experiencing this problem /now/ in the field, and more users
 will be experiencing it as BIND is upgraded in more and more places.
 Every single user relying on a Fedora 15 DNS server, for example, is
 going to see occasional unnecessary DNS timeouts when trying to resolve
 host names.

Well complain to the owners of those zones.  You have logs that tell you
which nameservers are broken.

 It seems clear to me that a generally available, generally applicable
 fix to BIND is needed to avoid this issue and perhaps similar issues
 like it.

The DNS has multiple nameservers so that when one is down you can
ask another and be able to cache the answer.  Here none of the
nameservers are giving answers that can be cached.  FORMERR, NOTIMP,
REFUSED/timeout are per server not per query tuple QNAME/QTYPE/QCLASS.

jik
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Doug Barton
On 07/11/2011 11:11, Jonathan Kamens wrote:
 The number of DNS queries required for each address lookup requested by
 a client has gone up considerably because of IPV6. The problem is being
 exacerbated by the fact that many DNS servers on the net don't yet
 support IPV6 queries.

I have to disagree with your premise here. It's true that DNS software
has a notoriously long deprecation cycle, but  records have been
around for long enough that it's highly unlikely there are enough name
servers that don't handle them to make a noticeable difference. And even
if you can find one, it should be upgraded for a vast array of other
reasons.

 The result is that address lookups are frequently
 taking so long that the client gives up before getting the result.

It sounds to me like you don't have IPv6 connectivity. If so, you've
already been given the advice to configure your OS to avoid asking for
 at all, or at least to ask for A first. Heed this advice.

 The example I am seeing this with most frequently is my RSS feed reader,
 rss2email, trying to read a feed from en.wikipedia.org in a cron job
 that runs every 15 minutes. I am regularly seeing this in the output of
 the cron job:
 
 W: Name or service not known [8]
 
 http://en.wikipedia.org/w/index.php?title=/[elided]/feed=atomaction=history
 
 The wikipedia.org domain has three DNS servers. Let's assume that the
 root and org. nameservers are cached already when rss2email does its
 query. If so, then it has to do the following queries:
 
 wikipedia.org DNS
 en.wikipedia.org 
 en.wikipedia.org A
 
 This is fine when the wikipedia.org nameservers are working, but let's
 postulate for the moment that two of them are down, unreachable, or
 responding slowly, which apparently happens pretty often. Then we end up
 doing:
 
 wikipedia.org DNS
 en.wikipedia.org  /times out
 /en.wikipedia.org  /times out
 /en.wikipedia.org 
 en.wikipedia.org A /times out/
 en.wikipedia.org A /times out
 /en.wikipedia.org A
 
 By now the end of that sequence, the typical 30-second DNS request
 timeout has been exceeded, and the client gives up.

See above. YOU need to configure your software to not ask for , or
to ask for A first.

 I said above that the problem is exacerbated by the fact that many DNS
 servers don't yet support IPV6 queries. This is because the  queries
 don't get NXDOMAIN responses, which would be cached, but rather FORMERR
 responses, which are not cached. As a result, the scenario describes
 above happens much more frequently because the DNS server has to redo
 the  queries often.

Can you provide examples of specific name servers, on the network now,
that respond this way? The authoritative name servers for wikipedia.org
respond correctly (NOERROR/ANSWER=0) to  queries for
en.wikipedia.org. If you are seeing a FORMERR response to these queries
the problem lies somewhere in your resolution chain.

Before taking mitigating steps in correctly functioning software is
considered there needs to be substantial evidence that there are enough
really really old name servers that behave the way you describe still on
line to make the effort worthwhile.


hth,

Doug

-- 

Nothin' ever doesn't change, but nothin' changes much.
-- OK Go

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Michael Sinatra




Users are experiencing this problem now in the field, and more users

will
be experiencing it as BIND is upgraded in more and more places. Every 
single user relying on a Fedora 15 DNS server, for example, is going to 
see occasional unnecessary DNS timeouts when trying to resolve host 

names.

It seems clear to me that a generally available, generally applicable 
fix 
to BIND is needed to avoid this issue and perhaps similar issues like 

it.

What is the fix you want?  Negative caching of FORMERR responses?  That 
won't work in the wikipedia case, since the (incorrect) SOA minimum is 
only 10 minutes, and your cron job runs every 15 minutes.


There are millions of broken domains out there.  Asking BIND to install 
kludges to pave over them is probably not the best way to go.


michael

PS. BTW, it would be incorrect to state that queries for non-existent  
records for a domain name for which other records exist (e.g. CNAME or A) 
should get an NXDOMAIN response.  They absolutely should not.  They should 
get an empty answer with a NOERROR RCODE.  NXDOMAIN means that there are 
no dns records whatsoever that have the domain name en.wikipedia.org, 
which is certainly not the case.


___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Clients get DNS timeouts because ipv6 means more queries for each lookup

2011-07-11 Thread Mark Andrews

Wikipedia have been told multiple times that their nameservers are
broken, that they fail to add the CNAME records, as required by RFC
1034, which results in garbage answers being returned.  Those garbage
answers result in the FORMERR log messages.

Both of the answers below should have CNAME chains in them but only
the A query has them.

Now luckily this doesn't affect every  lookup as the CNAME
records returned from the A lookup are cached, so every hour the
recursive nameserver needs to go through this dance.  Asking for A
before  just hides the problem by priming the cache.

Mark

;; Got answer:
;; -HEADER- opcode: QUERY, status: NOERROR, id: 23606
;; flags: qr aa; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;en.wikipedia.org.  IN  A

;; ANSWER SECTION:
en.wikipedia.org.   3600IN  CNAME   text.wikimedia.org.
text.wikimedia.org. 600 IN  CNAME   text.pmtpa.wikimedia.org.
text.pmtpa.wikimedia.org. 3600  IN  A   208.80.152.2

;; Query time: 411 msec
;; SERVER: 91.198.174.4#53(ns2.wikimedia.org)
;; WHEN: Tue Jul 12 12:02:06 2011

;; Got answer:
;; -HEADER- opcode: QUERY, status: NOERROR, id: 23260
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;en.wikipedia.org.  IN  

;; AUTHORITY SECTION:
wikimedia.org.  86400   IN  SOA ns0.wikimedia.org. 
hostmaster.wikimedia.org. 2011071119 43200 7200 1209600 600

;; Query time: 306 msec
;; SERVER: 208.80.152.142#53(ns1.wikimedia.org)
;; WHEN: Tue Jul 12 12:00:58 2011
;; MSG SIZE  rcvd: 108


In message 4e1b9222.8090...@dougbarton.us, Doug Barton writes:
 On 07/11/2011 11:11, Jonathan Kamens wrote:
  The number of DNS queries required for each address lookup requested by
  a client has gone up considerably because of IPV6. The problem is being
  exacerbated by the fact that many DNS servers on the net don't yet
  support IPV6 queries.
 
 I have to disagree with your premise here. It's true that DNS software
 has a notoriously long deprecation cycle, but  records have been
 around for long enough that it's highly unlikely there are enough name
 servers that don't handle them to make a noticeable difference. And even
 if you can find one, it should be upgraded for a vast array of other
 reasons.
 
  The result is that address lookups are frequently
  taking so long that the client gives up before getting the result.
 
 It sounds to me like you don't have IPv6 connectivity. If so, you've
 already been given the advice to configure your OS to avoid asking for
  at all, or at least to ask for A first. Heed this advice.
 
  The example I am seeing this with most frequently is my RSS feed reader,
  rss2email, trying to read a feed from en.wikipedia.org in a cron job
  that runs every 15 minutes. I am regularly seeing this in the output of
  the cron job:
  
  W: Name or service not known [8]
  http://en.wikipedia.org/w/index.php?title=/[elided]/feed=atomaction=h
 istory
  
  The wikipedia.org domain has three DNS servers. Let's assume that the
  root and org. nameservers are cached already when rss2email does its
  query. If so, then it has to do the following queries:
  
  wikipedia.org DNS
  en.wikipedia.org 
  en.wikipedia.org A
  
  This is fine when the wikipedia.org nameservers are working, but let's
  postulate for the moment that two of them are down, unreachable, or
  responding slowly, which apparently happens pretty often. Then we end up
  doing:
  
  wikipedia.org DNS
  en.wikipedia.org  /times out
  /en.wikipedia.org  /times out
  /en.wikipedia.org 
  en.wikipedia.org A /times out/
  en.wikipedia.org A /times out
  /en.wikipedia.org A
  
  By now the end of that sequence, the typical 30-second DNS request
  timeout has been exceeded, and the client gives up.
 
 See above. YOU need to configure your software to not ask for , or
 to ask for A first.
 
  I said above that the problem is exacerbated by the fact that many DNS
  servers don't yet support IPV6 queries. This is because the  queries
  don't get NXDOMAIN responses, which would be cached, but rather FORMERR
  responses, which are not cached. As a result, the scenario describes
  above happens much more frequently because the DNS server has to redo
  the  queries often.
 
 Can you provide examples of specific name servers, on the network now,
 that respond this way? The authoritative name servers for wikipedia.org
 respond correctly (NOERROR/ANSWER=0) to  queries for
 en.wikipedia.org. If you are seeing a FORMERR response to these queries
 the problem lies somewhere in your resolution chain.
 
 Before taking mitigating steps in correctly functioning software is
 considered there needs to be substantial evidence that there are enough
 really really old name servers that behave the way you describe still on
 line to make the effort worthwhile.
 
 
 hth,
 
 Doug