Re: [dnsdist] backend drops metrics for TCP

2023-10-02 Thread Remi Gacogne via dnsdist

Hi Christoph,

On 13/09/2023 07:30, Christoph via dnsdist wrote:

I've switched back to using UDP.
Is there an easy way to log queries that timeout (2s) - and not log any 
others? To investigate some examples further?


I don't think we have a way to log only these, unfortunately :-/ If you 
have the dnsdist console set up, you can use grepq('1000ms') to look at 
all queries that took more than 1 second, which is usually indicative of 
a problem, or even grepq('2000ms'), as dnsdist records timeouts with a 
very high response time.


Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/



OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] backend drops metrics for TCP

2023-09-12 Thread Christoph via dnsdist
This counter will always be 0 for TCP backends indeed, it is only 
incremented when we give up waiting on a UDP response.


Thanks for confirming.

The default timeout for TCP backends is set at 30s, while for UDP 
responses it is at 2s. So it is very possible that dnsdist no longer 
considers the response a timeout but the application now does. You might 
try to tune the 'tcpRecvTimeout' on `newServer`. Note that this suggests 
that the backend is slow to answer, so tuning dnsdist might not help at 
all and investigating why the backend struggles with these queries might 
be needed.


I've switched back to using UDP.
Is there an easy way to log queries that timeout (2s) - and not log any 
others? To investigate some examples further?


https://dnsdist.org/rules-actions.html?highlight=addaction#ERCodeRule
https://dnsdist.org/reference/constants.html#dnsrcode
The only RCode with "time" in it: DNSRCode.BADTIME

Yes, I'm also investigating the increased timeout rate on the backend 
Recursor side and I'm in contact with Otto about it. So far disabling 
agg. NSEC caching has been the most significant workaround for that problem.


Do you enable out-of-order processing, 
via 'maxInFlight' on `newServer`? 


yes (1k)

If so, are you sure that the backend

actually supports it?


A while back you pointed out a problem in our Recursor config
since then Recursor should work with maxInFlight config.

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] backend drops metrics for TCP

2023-09-12 Thread Remi Gacogne via dnsdist

Hello!

On 11/09/2023 22:34, Christoph via dnsdist wrote:

when playing around with things to reduce the drop rate I noticed
that TCP based backends always have 0 drops in showServers() output and 
these metrics:

dnsdist_server_drops
dnsdist_downstream_timeouts

Is that always the case and that counter has no meaning for TCP based 
backends or can this counter be non-zero for TCP backends as well?


This counter will always be 0 for TCP backends indeed, it is only 
incremented when we give up waiting on a UDP response.



dnsdist's CPU usage doubled after switching to TCP via tcpOnly=true
and the DNS timeout rate as measured by the application generating the 
queries running on the same host as dnsdist actually increased after 
switching dnsdist to use TCP instead of UDP. So switching to TCP 
eliminated the drops problem when measured by dnsdist but it made things 
worse for the application.


The default timeout for TCP backends is set at 30s, while for UDP 
responses it is at 2s. So it is very possible that dnsdist no longer 
considers the response a timeout but the application now does. You might 
try to tune the 'tcpRecvTimeout' on `newServer`. Note that this suggests 
that the backend is slow to answer, so tuning dnsdist might not help at 
all and investigating why the backend struggles with these queries might 
be needed.



All of these values are also at 0:

dnsdist_server_tcpdiedsendingquery{address="127.0.0.1:54"} 0
dnsdist_server_tcpdiedreadingresponse{address="127.0.0.1:54"} 0
dnsdist_server_tcpgaveup{address="127.0.0.1:54"} 0
dnsdist_server_tcpreadtimeouts{address="127.0.0.1:54"} 0
dnsdist_server_tcpwritetimeouts{address="127.0.0.1:54"} 0
dnsdist_server_tcpconnecttimeouts{address="127.0.0.1:54"} 0


These are indeed the ones that would indicate a problem between dnsdist 
and a TCP backend, as seen by dnsdist.



Since sockets=NUM in newServer() is only for UDP and
dnsdist_server_tcpcurrentconnections{address="127.0.0.1:54"} 10
suggests it uses only 10 TCP sockets. How can this be configured?
sockets was set to 32, so this implicit change when sitching from UDP to 
TCP might also have an effect here.


dnsdist will create as many outgoing TCP connections as needed by 
default, unless instructed otherwise via 'maxConcurrentTCPConnections' 
on `newServer`. So from dnsdist's point of view there was no need for 
more TCP connections, apparently. Do you enable out-of-order processing, 
via 'maxInFlight' on `newServer`? If so, are you sure that the backend 
actually supports it?


Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/



OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


[dnsdist] backend drops metrics for TCP

2023-09-11 Thread Christoph via dnsdist

Hello!

when playing around with things to reduce the drop rate I noticed
that TCP based backends always have 0 drops in showServers() output and 
these metrics:

dnsdist_server_drops
dnsdist_downstream_timeouts

Is that always the case and that counter has no meaning for TCP based 
backends or can this counter be non-zero for TCP backends as well?


dnsdist's CPU usage doubled after switching to TCP via tcpOnly=true
and the DNS timeout rate as measured by the application generating the 
queries running on the same host as dnsdist actually increased after 
switching dnsdist to use TCP instead of UDP. So switching to TCP 
eliminated the drops problem when measured by dnsdist but it made things 
worse for the application.


All of these values are also at 0:

dnsdist_server_tcpdiedsendingquery{address="127.0.0.1:54"} 0
dnsdist_server_tcpdiedreadingresponse{address="127.0.0.1:54"} 0
dnsdist_server_tcpgaveup{address="127.0.0.1:54"} 0
dnsdist_server_tcpreadtimeouts{address="127.0.0.1:54"} 0
dnsdist_server_tcpwritetimeouts{address="127.0.0.1:54"} 0
dnsdist_server_tcpconnecttimeouts{address="127.0.0.1:54"} 0

dnsdist_server_latency and
dnsdist_server_tcplatency
are on the same level after switching to TCP for the specific backend.

Since sockets=NUM in newServer() is only for UDP and
dnsdist_server_tcpcurrentconnections{address="127.0.0.1:54"} 10
suggests it uses only 10 TCP sockets. How can this be configured?
sockets was set to 32, so this implicit change when sitching from UDP to 
TCP might also have an effect here.


best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist