On 14/05/2020 13:29, Wouter Wijngaards via nsd-users wrote:

Hi Wouter,

Yes this applies to incoming queries and to outgoing queries.  120
seconds by default.

Thanks for the clarification. I think the default of 120s should be documented in the man page.

I'm still not clear on what the timeout applies to though. Is it to the time between individual DNS messages in a TCP connection? Or does it apply to any period of inactivity in the connection?

A much smaller value, of 200 msec, is used when the server is nearly
full on capacity, for incoming connections that are over the limit.
Also when the server has updated the existing connections get a smaller
100 msec timeout to wait for them to complete their tcp query to NSD.

That last feature since 4.2.1.  The tcp full shorter timeout is since
4.1.11.

Now that you've explained it here, I recall that there was something about this in the release notes. However, the values of 200ms isn't documented. The release notes have:

"When tcp is more than half full, use short timeout for tcp session." So I'm guessing that "short timeout" here is 200ms. Also, it's not clear whether the timeout is dynamic. What I mean is: is it applied to all sessions (existing and new), or only to new ones. When the number of tcp connections drops to less than half, is the timeout reset to 120s? And is it reset for all sessions, or just new ones?

Dropping from the default 120s, to a mere 200ms when the number of TCP connections goes up, is quite dramatic. And I happen to think that 200ms is too low. A client that's getting an AXFR from such an NSD server is quite likely to suffer disconnects. In fact, I have been observing exactly this behaviour on the servers we run. We have a use case where a user is doing AXFR of some largish zones, and when the client is a bit slow, NSD drops the connection. This causes the client to retry. This, IMHO, is rather wasteful.

The other feature of shortening the timeout to 100ms is also not so obvious. The release notes have:

"Fix #14, tcp connections have 1/10 to be active and have to work
every second, and then they get time to complete during a reload,
this is a process that lingers with the old version during a version update."

The 1/10 there is not very readable. I think that 100ms would be much clearer. And I also don't understad what you mean by "and have to work every second". Could you please explain that?

In my opinion, such details should not be buried in the release notes document. The release notes are useful when comparing one version to another. All these features of how the server dynamically adjusts its behaviour should be in the operations manual or at least the nsd.conf man page.

Imagine a new user of NSD, who is trying to configure and tune the server, and sets "tcp-timeout" to some value, and still observes different behaviour when running the server. This leads to confusion. And it's not reasonable to expect the user to read the entire set of release notes trying to find such undocumented features.

Regards,
Anand Buddhdev
RIPE NCC
_______________________________________________
nsd-users mailing list
[email protected]
https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users

Reply via email to