Re: resolv.conf question / timeout behaviour
On 3/31/21 10:00 AM, Tony Finch wrote: Because of this, if it's important for you to avoid multi-second DNS lookup times ... you need to design your system so that the libc resolver never tries to talk to a DNS server that isn't available. I've seen various client OSs fail in really weird ways when the first DNS server in the list doesn't respond quick enough, much less never. Another way is a high availability setup for your recursive servers. +1 to something like VRRP / CARP / routing tricks to make sure that the Virtual / Service IP that client's use as the first DNS server is always available. Even if the first and second IP are on the same system for a few minutes while the other is patched. -- Grant. . . . unix || die smime.p7s Description: S/MIME Cryptographic Signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: resolv.conf question / timeout behaviour
Tom Preissler wrote: > > at my work place we have a three resolver setup in /etc/resolv.conf. > > We had sometimes, though rarely, response times for DNS like 14000ms, > due to the fact that the *first* listed resolver is down for maintenance > reasons. Sadly the traditional unix stub resolver behaves REALLY BADLY if any of its servers are unavailable. It does not keep enough information about server performance and isn't really designed to be able to do that. The resolv.conf tuning options are too coarse to help in any meaningful way. Because of this, if it's important for you to avoid multi-second DNS lookup times (and it usually is!), you need to design your system so that the libc resolver never tries to talk to a DNS server that isn't available. As Matus Uhlar said, one way is to run a resolver daemon (e.g. BIND configured to forward to your recursive servers) on each machine. Resolver daemons are better able to keep track of which server is up, and they are less likely to be unavailable when the client software needs them since they are on the same machine. Most operating systems have resolver daemons now; it's bascially only oldskool unix that needs extra setup. Another way is a high availability setup for your recursive servers. I use keepalived (my servers are on a resilient layer 2 network that spans multiple locations); or you can use anycast if you need to do failover at layer 3. Of course, you can do both :-) Tony. -- f.anthony.n.finchhttps://dotat.at/ Faeroes: North backing west 5 or 6, decreasing 3 or 4 for a time. Moderate or rough. Fair. Good. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: resolv.conf question / timeout behaviour
On 31.03.21 10:56, Tom Preissler via bind-users wrote: at my work place we have a three resolver setup in /etc/resolv.conf. resolv.conf is not a BIND thing, it's configuration of system libraries. We had sometimes, though rarely, response times for DNS like 14000ms, due to the fact that the *first* listed resolver is down for maintenance reasons. The application we test this with is Oracle/TNSPing. if this is an issue, you can run local caching DNS server like BIND or dnsmasq. They can handle such timeouts better than most libraries. As a mitigation we therefore put in timeout:1, but we just recently got again a TNSPing response of 9000ms. I noticed in man resolv.conf this section on "timeout": timeout:n Sets the amount of time the resolver will wait for a response from a remote name server before retrying the query via a different name server. |This may not be the total time taken by any |resolver API call and there is no guarantee that a |single resolver API call maps to a single timeout. Measured in seconds, the default is RES_TIMEOUT (currently 5, see ). The value for this option is silently capped to 30. I am intrigued by the above sentence marked with "|". Does anybody know what that means in detail, can anybody explain that please? I explained the reason for the 9000ms so that Oracle and its many processes all come together to resolve the DNS name and they *keep hitting* the first resolver - and "timeout" can't kick in due to parallel requests from different processes, hence the high overall response time. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Spam = (S)tupid (P)eople's (A)dvertising (M)ethod ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
resolv.conf question / timeout behaviour
Hi, at my work place we have a three resolver setup in /etc/resolv.conf. We had sometimes, though rarely, response times for DNS like 14000ms, due to the fact that the *first* listed resolver is down for maintenance reasons. The application we test this with is Oracle/TNSPing. As a mitigation we therefore put in timeout:1, but we just recently got again a TNSPing response of 9000ms. I noticed in man resolv.conf this section on "timeout": timeout:n Sets the amount of time the resolver will wait for a response from a remote name server before retrying the query via a different name server. |This may not be the total time taken by any |resolver API call and there is no guarantee that a |single resolver API call maps to a single timeout. Measured in seconds, the default is RES_TIMEOUT (currently 5, see ). The value for this option is silently capped to 30. I am intrigued by the above sentence marked with "|". Does anybody know what that means in detail, can anybody explain that please? I explained the reason for the 9000ms so that Oracle and its many processes all come together to resolve the DNS name and they *keep hitting* the first resolver - and "timeout" can't kick in due to parallel requests from different processes, hence the high overall response time. Kind Regards Thomas Preissler ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users