Re: resolv.conf question / timeout behaviour

2021-03-31 Thread Grant Taylor via bind-users

On 3/31/21 10:00 AM, Tony Finch wrote:
Because of this, if it's important for you to avoid multi-second 
DNS lookup times ... you need to design your system so that the libc 
resolver never tries to talk to a DNS server that isn't available.


I've seen various client OSs fail in really weird ways when the first 
DNS server in the list doesn't respond quick enough, much less never.



Another way is a high availability setup for your recursive servers.


+1 to something like VRRP / CARP / routing tricks to make sure that the 
Virtual / Service IP that client's use as the first DNS server is always 
available.  Even if the first and second IP are on the same system for a 
few minutes while the other is patched.




--
Grant. . . .
unix || die



smime.p7s
Description: S/MIME Cryptographic Signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: resolv.conf question / timeout behaviour

2021-03-31 Thread Tony Finch
Tom Preissler  wrote:
>
> at my work place we have a three resolver setup in /etc/resolv.conf.
>
> We had sometimes, though rarely, response times for DNS like 14000ms,
> due to the fact that the *first* listed resolver is down for maintenance
> reasons.

Sadly the traditional unix stub resolver behaves REALLY BADLY if any of
its servers are unavailable. It does not keep enough information about
server performance and isn't really designed to be able to do that. The
resolv.conf tuning options are too coarse to help in any meaningful way.

Because of this, if it's important for you to avoid multi-second DNS
lookup times (and it usually is!), you need to design your system so that
the libc resolver never tries to talk to a DNS server that isn't
available.

As Matus Uhlar said, one way is to run a resolver daemon (e.g. BIND
configured to forward to your recursive servers) on each machine. Resolver
daemons are better able to keep track of which server is up, and they are
less likely to be unavailable when the client software needs them since
they are on the same machine. Most operating systems have resolver daemons
now; it's bascially only oldskool unix that needs extra setup.

Another way is a high availability setup for your recursive servers. I use
keepalived (my servers are on a resilient layer 2 network that spans
multiple locations); or you can use anycast if you need to do failover at
layer 3.

Of course, you can do both :-)

Tony.
-- 
f.anthony.n.finchhttps://dotat.at/
Faeroes: North backing west 5 or 6, decreasing 3 or 4 for a time.
Moderate or rough. Fair. Good.

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: resolv.conf question / timeout behaviour

2021-03-31 Thread Matus UHLAR - fantomas

On 31.03.21 10:56, Tom Preissler via bind-users wrote:

at my work place we have a three resolver setup in /etc/resolv.conf.


resolv.conf is not a BIND thing, it's configuration of system libraries. 


We had sometimes, though rarely, response times for DNS like 14000ms,
due to the fact that the *first* listed resolver is down for maintenance
reasons. The application we test this with is Oracle/TNSPing.


if this is an issue, you can run local caching DNS server like BIND or
dnsmasq. They can handle such timeouts better than most libraries.


As a mitigation we therefore put in timeout:1, but we just recently got
again a TNSPing response of 9000ms.

I noticed in man resolv.conf this section on "timeout":

 timeout:n
Sets the amount of time the resolver will wait for
a response from a remote name server before
retrying the query via a different name server.
|This may not be the total time taken by any
|resolver API call and there is no guarantee that a
|single resolver API call maps to a single timeout.
Measured in seconds, the default is RES_TIMEOUT
(currently 5, see ).  The value for this
option is silently capped to 30.

I am intrigued by the above sentence marked with "|". Does anybody
know what that means in detail, can anybody explain that please?

I explained the reason for the 9000ms so that Oracle and its many processes
all come together to resolve the DNS name and they *keep hitting* the first
resolver - and "timeout" can't kick in due to parallel requests from different
processes, hence the high overall response time.



--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Spam = (S)tupid (P)eople's (A)dvertising (M)ethod
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


resolv.conf question / timeout behaviour

2021-03-31 Thread Tom Preissler via bind-users
Hi,

at my work place we have a three resolver setup in /etc/resolv.conf.

We had sometimes, though rarely, response times for DNS like 14000ms,
due to the fact that the *first* listed resolver is down for maintenance
reasons. The application we test this with is Oracle/TNSPing.
As a mitigation we therefore put in timeout:1, but we just recently got
again a TNSPing response of 9000ms.

I noticed in man resolv.conf this section on "timeout":

  timeout:n
 Sets the amount of time the resolver will wait for
 a response from a remote name server before
 retrying the query via a different name server.
|This may not be the total time taken by any
|resolver API call and there is no guarantee that a
|single resolver API call maps to a single timeout.
 Measured in seconds, the default is RES_TIMEOUT
 (currently 5, see ).  The value for this
 option is silently capped to 30.

I am intrigued by the above sentence marked with "|". Does anybody
know what that means in detail, can anybody explain that please?

I explained the reason for the 9000ms so that Oracle and its many processes
all come together to resolve the DNS name and they *keep hitting* the first
resolver - and "timeout" can't kick in due to parallel requests from different
processes, hence the high overall response time.


Kind Regards

Thomas Preissler
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users