Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade
Remi Gacogne via dnsdist: thanks for the pointer, really looking forward to the dnsdist version that has this solved. Sure, I expect to release 1.9.2 including this fix in the next couple weeks. thanks! Note that this metric (doh_http_version_queries) is incremented after doing some sanity checks but before actually parsing the DNS query, so unfortunately we cannot be sure these are valid DoH queries. At this point they could be bots. Can you check doh_version_status_responses for httpversion=1 and status=200 instead? Thanks for pointing that out. In our case these two graphs overlap very closely. Maybe because only requests using the correct hostname in the SNI actually reach dnsdist in the first place. So the practical solution to use dnsdist 1.9.0 with nghttp2 and still support HTTP/1.1 clients is to use a webserver like nginx in front of dnsdist? Yes, a reverse proxy like nginx or HAProxy might be the best option to keep HTTP/1.1 support at this point. Turns out nginx does not speak HTTP/2 with upstream servers but HAProxy does according to the documentation. I'm afraid we are currently not increasing any counter in this exact case, I'll see what I can do about it. Thanks, appreciated. You are correct, but in practice I am yet to see a DoH client using HTTP/1.1 in production. Would be interesting to know how much non-HTTP/2 traffic large DoH service providers see in practice, maybe I'm going to reach out on the dns-operations mailing list. I just don't want to increase the code complexity and attack surface just to reply to crawlers.. Yes, that makes sense :) best regards, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade
Otto Moerbeek wrote: This might be related:https://github.com/PowerDNS/pdns/issues/13850, not backported yet thanks for the pointer, really looking forward to the dnsdist version that has this solved. Remi wrote: In addition to the issue mentioned by Otto, it might also be that the monitoring does not support HTTP/2. yes, that appears to be the case uptimerobot does not support HTTP/2 and was affected, our blackbox_exporter appears to support HTTP/2 and was not affected. The new nghttp2 provider for incoming DNS over HTTPS does not support HTTP/1.1. In 1.9.x it's still possible to switch back to the legacy h2o provider but note that it will likely go away in the next major version of DNSdist. In our testing the lack of HTTP/1.1 support was not an issue for actual DNS over HTTPS clients, with most of HTTP/1.1 queries coming from crawlers/bots, but of course we will reconsider if you find out that legitimate DoH clients are impacted. we see about 5-10% of non-version 2 DoH requests by looking at: sum by (version) (irate(dnsdist_frontend_doh_http_version_queries{job="$job"}[$__rate_interval])) So the practical solution to use dnsdist 1.9.0 with nghttp2 and still support HTTP/1.1 clients is to use a webserver like nginx in front of dnsdist? I expected an increase of this metric during our partial outage but this value did not increase, is this expected? irate(dnsdist_frontend_doh_version_status_responses{httpversion="1",status="400",job="$job"}[$__rate_interval]) dnsdist_frontend_noncompliantqueries also didn't increase. Which value is expected to increase? btw: dnsdist's v1.9.0 answer to HTTP requests not using HTTP/2: This server implements RFC 8484 - DNS Queries over HTTP, and requires HTTP/2 in accordance with section 5.2 of the RFC. but RFC8484 does not actually require HTTP/2, right? https://www.rfc-editor.org/rfc/rfc8484.html#section-5.2 > 5.2. HTTP/2 HTTP/2 [RFC7540] is the minimum RECOMMENDED version of HTTP for use with DoH. It is recommended but not a "MUST". best regards, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
[dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade
Hi, in February we upgraded our test DoH/DoT server from 1.8.3 to 1.9.0 but we did not notice any problems so we upgraded our production server from 1.8.3 to 1.9.0 yesterday. Immediately after upgrading our monitoring claimed our DoH service is unavailable (HTTP 400) but we were unable to reproduce it using firefox. A closer look confirmed that there is some issue because we see about 50% less DoH requests in our grafana graphs showing DoH request rates. Having a look at the request rates per HTTP method suggests that we "loose" almost all GET requests but also a significant fraction of POST DoH requests. sum by (method) (irate(dnsdist_frontend_doh_http_method_queries{job="$job"}[$__rate_interval])) After looking at the TLS versions graph I noticed a clear correlation but then I realized that all our DoH requests are TLS version 1.3 because we set minTLSVersion='tls1.3' - so this might be irrelevant. irate(dnsdist_frontend_tlsqueries{job="$job"}[$__rate_interval]) 2024-03-16 20:57:59 dnsdist upgraded: 1.8.3_1 -> 1.9.0 2024-03-16 20:59 monitoring says DoH is down (HTTP 400 - Bad Request) monitoring requests this: https://doh.applied-privacy.net/query?dns=l1sBAAABA3d3dw1rbm90LXJlc29sdmVyAmN6AAAcAAE Mar 17 02:40:45 bender-dpriv1 kernel: pid 77544 (dnsdist), jid 0, uid 208: exited on signal 11 -> also interesting put likely unrelated? Today we downgraded to 1.8.3, and everything went back to normal. Is anyone else observing similar issues on dnsdist 1.9.0? DoT does not appear to be affected. best regards, Christoph OS: FreeBSD 13.2 dnsdist installed via pkg our dnsdist config: newServer({address="109.70.100.136", maxInFlight=1000, sockets=32, name="clamps"}) newServer({address="109.70.100.140", maxInFlight=1000, sockets=32, name="roberto"}) --newServer({address="109.70.100.133", sockets=4, name="titanius-dpriv1"}) setServerPolicy(leastOutstanding) addTLSLocal("0.0.0.0", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 }) addTLSLocal("[::]", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 }) addDOHLocal("0.0.0.0:444", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", "/query", {minTLSVersion='tls1.3', serverTokens='doh', tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 }) addDOHLocal("[::]:444", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", "/query", {minTLSVersion='tls1.3', serverTokens='doh', tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 }) setACL({'0.0.0.0/0', '::/0'}) controlSocket('127.0.0.1:5199') setConsoleACL('127.0.0.1/8') setKey() pc = newPacketCache(5, {maxTTL=86400, minTTL=3, temporaryFailureTTL=60, staleTTL=60, dontAge=false}) getPool(""):setCache(pc) webserver("127.0.0.1:8083") setWebserverConfig({...}) setVerboseHealthChecks(true) addAction(QTypeRule(65535), RCodeAction(DNSRCode.NOTIMP)) ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] greqp() output columns
Hi Remi, I don't think we have a way to log only these, unfortunately :-/ If you have the dnsdist console set up, you can use grepq('1000ms') to look at all queries that took more than 1 second, which is usually indicative of a problem, or even grepq('2000ms'), as dnsdist records timeouts with a very high response time. Thanks for this suggestion. out of ~200 lines from the grepq('3000ms') output 184 lines end with ... T.O RD No Error. 0 answers examples: aPPLE.CoM. A T.O RDNo Error. 0 answers fACeboOK.COm. T.O RDNo Error. 0 answers does "T.O" in the Lat. column stand for timeout? best regards, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] backend drops metrics for TCP
This counter will always be 0 for TCP backends indeed, it is only incremented when we give up waiting on a UDP response. Thanks for confirming. The default timeout for TCP backends is set at 30s, while for UDP responses it is at 2s. So it is very possible that dnsdist no longer considers the response a timeout but the application now does. You might try to tune the 'tcpRecvTimeout' on `newServer`. Note that this suggests that the backend is slow to answer, so tuning dnsdist might not help at all and investigating why the backend struggles with these queries might be needed. I've switched back to using UDP. Is there an easy way to log queries that timeout (2s) - and not log any others? To investigate some examples further? https://dnsdist.org/rules-actions.html?highlight=addaction#ERCodeRule https://dnsdist.org/reference/constants.html#dnsrcode The only RCode with "time" in it: DNSRCode.BADTIME Yes, I'm also investigating the increased timeout rate on the backend Recursor side and I'm in contact with Otto about it. So far disabling agg. NSEC caching has been the most significant workaround for that problem. Do you enable out-of-order processing, via 'maxInFlight' on `newServer`? yes (1k) If so, are you sure that the backend actually supports it? A while back you pointed out a problem in our Recursor config since then Recursor should work with maxInFlight config. best regards, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
[dnsdist] backend drops metrics for TCP
Hello! when playing around with things to reduce the drop rate I noticed that TCP based backends always have 0 drops in showServers() output and these metrics: dnsdist_server_drops dnsdist_downstream_timeouts Is that always the case and that counter has no meaning for TCP based backends or can this counter be non-zero for TCP backends as well? dnsdist's CPU usage doubled after switching to TCP via tcpOnly=true and the DNS timeout rate as measured by the application generating the queries running on the same host as dnsdist actually increased after switching dnsdist to use TCP instead of UDP. So switching to TCP eliminated the drops problem when measured by dnsdist but it made things worse for the application. All of these values are also at 0: dnsdist_server_tcpdiedsendingquery{address="127.0.0.1:54"} 0 dnsdist_server_tcpdiedreadingresponse{address="127.0.0.1:54"} 0 dnsdist_server_tcpgaveup{address="127.0.0.1:54"} 0 dnsdist_server_tcpreadtimeouts{address="127.0.0.1:54"} 0 dnsdist_server_tcpwritetimeouts{address="127.0.0.1:54"} 0 dnsdist_server_tcpconnecttimeouts{address="127.0.0.1:54"} 0 dnsdist_server_latency and dnsdist_server_tcplatency are on the same level after switching to TCP for the specific backend. Since sockets=NUM in newServer() is only for UDP and dnsdist_server_tcpcurrentconnections{address="127.0.0.1:54"} 10 suggests it uses only 10 TCP sockets. How can this be configured? sockets was set to 32, so this implicit change when sitching from UDP to TCP might also have an effect here. best regards, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] dnsdist.conf: "list" example (solved)
addDOHLocal("127.0.0.1",nil,nil,{"/a","/b"}) for some reasons this config works now, so it was likely a problem on my end. best regards, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] dnsdist latency bucket metric still broken in 1.8.0?
Did you compile dnsdist yourself? I installed it via pkg. If I try to install dnsdist on 13.1-RELEASE-p6 I only get 1.7.3: you are likely using the default FreeBSD repo (quartely) if you use the latest repo you will get version 1.8.0: mkdir -p /usr/local/etc/pkg/repos you can create this file /usr/local/etc/pkg/repos/FreeBSD.conf to use the latest repo: FreeBSD: { url: "pkg+http://pkg.FreeBSD.org/${ABI}/latest";, mirror_type: "srv", signature_type: "fingerprints", fingerprints: "/usr/share/keys/pkg", enabled: yes } A quick test with a self-compiled 1.8.0 dnsdist shows a non-zero sum for me, so I'm confused what's goiong on. here is our dnsdist.conf, maybe it helps to reproduce the issue. thanks for your help! Christoph newServer({address="109.70.100.136", maxInFlight=1000}) newServer({address="109.70.100.140", maxInFlight=1000}) setServerPolicy(leastOutstanding) addTLSLocal("0.0.0.0", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 }) addTLSLocal("[::]", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 }) addDOHLocal("0.0.0.0:444", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", "/query", {minTLSVersion='tls1.3', serverTokens='doh', tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 }) addDOHLocal("[::]:444", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", "/query", {minTLSVersion='tls1.3', serverTokens='doh', tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 }) setACL({'0.0.0.0/0', '::/0'}) controlSocket('127.0.0.1:5199') setConsoleACL('127.0.0.1/8') setKey("xxx") pc = newPacketCache(5, {maxTTL=86400, minTTL=3, temporaryFailureTTL=60, staleTTL=60, dontAge=false}) getPool(""):setCache(pc) webserver("127.0.0.1:8083") setWebserverConfig({password="xxx"}) setVerboseHealthChecks(true) ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] dnsdist latency bucket metric still broken in 1.8.0?
Remi Gacogne via dnsdist: The fix not being backported is an oversight, I added the "backport to 1.7.x" flag so we include it in an upcoming 1.7.x release. Great to hear that this was unexptected. Recently we upgraded our dnsdist instances to 1.8.0 but the upgrade did not improve the values in dnsdist_latency_bucket. Now after the upgrade, the graph show basically a flat line. This only affects our FreeBSD servers, not our Debian based dnsdist instances. That's weird. Would you be able to share the prometheus output, or the dumpStats() one, so we know if this is the same bug or a related one? I added it here because I wanted to add the graph as well but github upload is failing me, so just prometheus: https://github.com/PowerDNS/pdns/issues/11239#issuecomment-1507536007 thanks for looking into it! Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
[dnsdist] dnsdist latency bucket metric still broken in 1.8.0?
Hi, ever since [1] got the dnsdist-1.8.0 milestone we were looking forward to the 1.8.0 release and were also a bit surprised that this regression will not be in a 1.7.x bugfix release. Recently we upgraded our dnsdist instances to 1.8.0 but the upgrade did not improve the values in dnsdist_latency_bucket. Now after the upgrade, the graph show basically a flat line. This only affects our FreeBSD servers, not our Debian based dnsdist instances. [1] https://github.com/PowerDNS/pdns/issues/11239 best regards, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] prometheus values queries-per-connection and connection-duration always 0 for DoH?
Remi Gacogne via dnsdist wrote: > These metrics are not yet implemented for DoH, but as they are inherited > from the generic frontend structure they do appear in our metrics. > > The reason why it was not yet implemented is that the current API of the > library we are using to handle HTTP/2, h2o, makes that a bit difficult. > I just implemented [1] an external table to match the connection to DoH > queries, so we should have these metrics in 1.6.0. > > [1]: https://github.com/PowerDNS/pdns/pull/9738 Thanks for your comprehensive reply and for implementing it, we are looking forward to running the next release :) best regards, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
[dnsdist] prometheus values queries-per-connection and connection-duration always 0 for DoH?
Hi, while creating a dashboard for dnsdist prometheus metrics we noticed that the following values are always 0 in case of DoH, in case of DoT they appear to work fine: dnsdist_frontend_tcpavgqueriesperconnection dnsdist_frontend_tcpavgconnectionduration We do use DoH and there are ongoing DoH queries. dnsdist version: 1.5.1 Is anyone successfully seeing non-zero values in these metrics for DoH or is this a bug? thanks, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
[dnsdist] grafana dashboard for dnsdist? (inkl. DoH, DoT)
Hi, I was wondering if there are any pre-existing grafana dashboards for dnsdist prometheus metrics? I didn't find anything current and dnsdist related at the usual place: https://grafana.com/grafana/dashboards?search=dnsdist found on github but older: https://gist.github.com/mrlesmithjr/54e0dd24417337bd2509f212c6c72545 https://github.com/PowerDNS/grafana-metronome/tree/master/dashboards thanks, Christoph ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] how to increase connection qlen on DoH listener?
> please open a feature request [1] if you > think it's worth it. thanks for considering this https://github.com/PowerDNS/pdns/issues/8986 >> Reading >> https://www.freebsd.org/doc/en/books/handbook/configtuning-kernel-limits.html >> I would expect that you want to increase kern.ipc.soacceptqueue >> >> -Otto > https://docs.freebsd.org/doc/12.1-RELEASE/usr/local/share/doc/freebsd/en/books/handbook/configtuning-kernel-limits.html > > confirms that that is very likely the proper sysctl for your version, They are the same setting but as Remi said it is not supported by dnsdist. from listen(2): The kern.ipc.somaxconn sysctl(3) has been replaced with kern.ipc.soacceptqueue in FreeBSD 10.0 to prevent confusion about its actual functionality. The original sysctl(3) kern.ipc.somaxconn is still available but hidden from a sysctl(3) -a output so that existing applications and scripts continue to work. ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] how to increase connection qlen on DoH listener?
I also tried: setMaxTCPQueuedConnections(2048) from: https://dnsdist.org/reference/tuning.html but it had no effect on the netstat -Lan output after restarting dnsdist. ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
[dnsdist] how to increase connection qlen on DoH listener?
Hi, due to log entries saying: "Listen queue overflow: 193 already in queue awaiting acceptance" we increased kern.ipc.somaxconn to 2048 after restarting dnsdist we noticed that while nginx takes the new setting into account dnsdist remains at 128: netstat -Lan Current listen queue sizes (qlen/incqlen/maxqlen) Proto Listen tcp4 0/0/128 <<< dnsdist tcp4 5/0/2048 <<< nginx Is there a way to tell dnsdist to increase the connection queue on the DoH listener? I didn't not see something like that in the documentation: https://dnsdist.org/reference/config.html?highlight=adddohlocal#addDOHLocal This is on FreeBSD 12.1 with dnsdist v1.4.0 thanks, Christoph refs: kern.ipc.somaxconn: Maximum listen socket pending connection accept queue size from FreeBSD netstat(1) manual page: -L Show the size of the various listen queues. The first count shows the number of unaccepted connections, the second count shows the amount of unaccepted incomplete connections, and the third count is the maximum number of queued connections. ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist