Re: named UDP retransmit timeouts ?

2021-07-23 Thread Tony Finch
Jason Vas Dias  wrote:
>
>  Please can anyone advise the best way to optimize named's
>  UDP timeout settings for caching-only local resolver usage
>  over a slow network link - I can't seem to find any in the
>  Bv9ARM document specifically describing how named
>  implements UDP re-transmits - please could someone
>  point me at the right pages or place to look, besides
>  the source code, which I am reading now, if there are any ?

I remember being surprised a while back that the retry intervals
and timeouts were more hard-coded than I expected. (But, be warned! I have
not refreshed my memory.)

The rough idea is that there's a certain amount of co-design between the
libc stub resolver (which back in the day came from BIND) and the
recursive server. IIRC, the libc resolver has a query timeout of 10s and
retries three times (so the overall timeout is about half a minute), and
named's resolver has a timeout of about 3s and also retries 3 times, which
neatly fits inside libc's 10s timeout.

At least that's what my memory tells me, but it may be wrong.

But, I think you will not be successful fixing your problems by tweaking
DNS software. One of the problems with DNS as a protocol is that its
transport layer is very simple and very stupid, so if the underlying
network has problems, the DNS isn't able to fight its way through.

>  My problem is that at home my whole internet goes through
>  one 100M CAT-6 ethernet cable to a GSM 3G/4G modem (90% 3G WCDMA) ,
>  it seems no more than about 128 kilobyte/sec download & less upload
>  bandwidth is available, whenever my browser decides to download
>  something large (like a JavaScript blob) , then DNS requests
>  start timing out, the browser keeps re-issuing its requests,
>  and similar nasty feedback situations occur when the GSM
>  modem's DHCP lease expires and it has to re-setup its NAT for
>  the ethernet link, so all UDP requests time out for about
>  10 seconds, building up quite a backlog.

Ugh, that sounds horrible.

I think the basic problem is that TCP is very aggressive about filling up
whatever bandwidth it thinks might be available, but the DNS is not, and
TCP's congestion control algorithms will happily overwhelm a comparatively
reticent protocol like the DNS.

You probably also have buffer bloat, which makes these problems worse.
(check out https://www.bufferbloat.net/ for LOTS of information)

I am lucky enough that I haven't needed to deal with your problems myself,
so the best I can do is give you a few hints, but no specific advice. The
main idea is to prevent your TCP flows from overwhelming your uplink,
and/or from interfering with DNS traffic. You can (with the right
know-how) do this with some stunt network configuration on your Linux
gateway.

* Use traffic classification and priority queueing to ensure that DNS
  packets can jump ahead of everything else. This probably won't be enough
  by itself because of buffer bloat.

* You can use traffic shaping to ensure that the aggregate traffic from
  your Linux box never tries to over-fill your uplink. Years and years
  ago a friend of mine did this to avoid buffer bloat in their cable
  modem.

* Configure FQ-CoDel on your Linux gateway. This is a queueing algorithm
  specifically designed to avoid buffer bloat and to make TCP back off
  before everything becomes terrible.

That's approximately everything I know about tackling your problem, so I
hope it points you in the right direction...

Tony.
-- 
f.anthony.n.finchhttps://dotat.at/
Biscay: Cyclonic in far north, otherwise westerly or southwesterly, 4
to 6, occasionally 7 in north. Slight or moderate becoming moderate or
rough. Squally thundery showers. Good, occasionally poor.

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


named UDP retransmit timeouts ?

2021-07-23 Thread Jason Vas Dias


Good day bind experts -

 Please can anyone advise the best way to optimize named's
 UDP timeout settings for caching-only local resolver usage
 over a slow network link - I can't seem to find any in the
 Bv9ARM document specifically describing how named
 implements UDP re-transmits - please could someone
 point me at the right pages or place to look, besides
 the source code, which I am reading now, if there are any ?

 My problem is that at home my whole internet goes through
 one 100M CAT-6 ethernet cable to a GSM 3G/4G modem (90% 3G WCDMA) ,
 it seems no more than about 128 kilobyte/sec download & less upload
 bandwidth is available, whenever my browser decides to download
 something large (like a JavaScript blob) , then DNS requests
 start timing out, the browser keeps re-issuing its requests,
 and similar nasty feedback situations occur when the GSM
 modem's DHCP lease expires and it has to re-setup its NAT for
 the ethernet link, so all UDP requests time out for about
 10 seconds, building up quite a backlog.

 I have tried playing around with named.conf settings:
   resolver-retry-interval 8;
   resolver-retry-time32;
   max-retry-time 32;
 but they don't seem to help - I still get a 'DNS freeze'
 situation for about 10-30 seconds when the GSM modem
 renegotiates its DHCP lease, during a yum / dnf 'update',
 during large browser downloads or stream playing ...

 My Linux v5.12.17 (Fedora-34) x86_64 box runs named 9.6.18
 from the Fedora RPM, and hosts a Windows 10 VM, which is quite a
 chatty DNS user, and runs a hostapd instance through which traffic from a
 local network of 3 Android mobile phones use as their default
 data connection, which also use the laptop's DNS server,
 and send SIP voice traffic through my company's SIP server which I
 maintain , so the Linux box does NAT for the Windows VM and for the
 Android mobile clients, the laptop named instance serves authorative
 zones for my localhost, local VMs and DMZ Android Mobile phone units, 
 and ALL hosts, including the windows host, use BIND named running
 on the Linux laptop gateway, which is the default route endpoint
 for all hosts, and which has a 'forwarders { ... };'  clause in
 named.conf containing my Cellular Network provider's DNS server IP
 addresses . These remote Cellular DNS servers can respond very slowly at peak
 internet usage times. It is nice to be able to see all packets from the
 android mobile phones with tcpdump, and to be able to receive
 the voice traffic that they send to our cloud SIP server
 (which I can see being NAT-ed), and the SIP server sends
 back, which get NAT-ed to the Windows VM Dispatcher and
 audio playback GUI running on the laptop which I also maintain .
 My BIND named server also implements an RBL blacklist kindly made available
 as a hosts file, which I convert to a Response Policy Zone file,
 at https://someonewhocares.org/hosts . DNSSEC is also enabled
 by default.
 
 My named.conf has a clause:

 allow-query { localhost; 192.168.W.0/24; 192.168.M.0/24;
   192.168.V.0/24;
 };
 where W is Windows VM network, M is mobile device network,
 and V is my corporate L2TP/IPSEC VPN network, also doing NAT,
 and one 'localhost-resolver' "View" with
 match-clients { /* same as above */ } ;
 and
 recursion yes;

 This setup works great on a normal office LAN , where there
 are multiple hops to the internet available, but not on my
 home slow single ethernet connection to the whole ethernet,
 through a modem that must peridically renegotiate a DHCP lease.
 When the modem renegotiates its DHCP lease every hour, I typically
 have to restart named and hostapd . 

 I just want named to notice that the response times to
 the forwarders are increasing , and to increase its
 number of UDP re-transmit attempts and timeout time (time
 between attempts ) accordingly, and vice versa
 (decrease them back to defaults when forwarder responsiveness
 improves).
 
 Before I start hacking the named udp.c server code , please
 could anyone advise if there are ways through configuration
 settings to adjust the named UDP re-transmit timeout & number
 of attempts strategy for slow networks ?

 I can't believe there aren't any ?

Thanks in advance for any informative replies,
Best Regards,
Jason Vas Dias



 

 
 
 
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users