Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-19 Thread Christoph via dnsdist

Remi Gacogne via dnsdist:

thanks for the pointer, really looking forward to the dnsdist version
that has this solved.


Sure, I expect to release 1.9.2 including this fix in the next couple 
weeks.


thanks!

Note that this metric (doh_http_version_queries) is incremented after 
doing some sanity checks but before actually parsing the DNS query, so 
unfortunately we cannot be sure these are valid DoH queries. At this 
point they could be bots. Can you check doh_version_status_responses for 
httpversion=1 and status=200 instead?


Thanks for pointing that out.
In our case these two graphs overlap very closely.
Maybe because only requests using the correct hostname in the SNI
actually reach dnsdist in the first place.



So the practical solution to use dnsdist 1.9.0 with nghttp2 and
still support HTTP/1.1 clients is to use a webserver like nginx in 
front of dnsdist?


Yes, a reverse proxy like nginx or HAProxy might be the best option to 
keep HTTP/1.1 support at this point.


Turns out nginx does not speak HTTP/2 with upstream servers
but HAProxy does according to the documentation.

I'm afraid we are currently not increasing any counter in this exact 
case, I'll see what I can do about it.


Thanks, appreciated.

You are correct, but in practice I am yet to see a DoH client using 
HTTP/1.1 in production.


Would be interesting to know how much non-HTTP/2 traffic large DoH
service providers see in practice, maybe I'm going to reach out on the
dns-operations mailing list.

I just don't want to increase the 
code complexity and attack surface just to reply to crawlers..


Yes, that makes sense :)

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-18 Thread Christoph via dnsdist

Otto Moerbeek wrote:
This might be related:https://github.com/PowerDNS/pdns/issues/13850, 
not backported yet


thanks for the pointer, really looking forward to the dnsdist version
that has this solved.

Remi wrote:

In addition to the issue mentioned by Otto, it might also be that the
monitoring does not support HTTP/2.


yes, that appears to be the case uptimerobot does not support HTTP/2 and 
was affected, our blackbox_exporter appears to support HTTP/2 and was 
not affected.



The new nghttp2 provider for
incoming DNS over HTTPS does not support HTTP/1.1. In 1.9.x it's
still possible to switch back to the legacy h2o provider but note
that it will likely go away in the next major version of DNSdist. In
our testing the lack of HTTP/1.1 support was not an issue for actual
DNS over HTTPS clients, with most of HTTP/1.1 queries coming from
crawlers/bots, but of course we will reconsider if you find out that
legitimate DoH clients are impacted.


we see about 5-10% of non-version 2 DoH requests by looking at:

sum by (version)
(irate(dnsdist_frontend_doh_http_version_queries{job="$job"}[$__rate_interval]))

So the practical solution to use dnsdist 1.9.0 with nghttp2 and
still support HTTP/1.1 clients is to use a webserver like nginx in front 
of dnsdist?


I expected an increase of this metric during our partial outage but
this value did not increase, is this expected?

irate(dnsdist_frontend_doh_version_status_responses{httpversion="1",status="400",job="$job"}[$__rate_interval])

dnsdist_frontend_noncompliantqueries also didn't increase.
Which value is expected to increase?


btw:
dnsdist's v1.9.0 answer to HTTP requests not using HTTP/2:


This server implements RFC 8484 - DNS Queries over HTTP, and
requires HTTP/2 in accordance with section 5.2 of the RFC.


but RFC8484 does not actually require HTTP/2, right?

https://www.rfc-editor.org/rfc/rfc8484.html#section-5.2
> 5.2.  HTTP/2


HTTP/2 [RFC7540] is the minimum RECOMMENDED version of HTTP for use 
with DoH.


It is recommended but not a "MUST".

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


[dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-17 Thread Christoph via dnsdist

Hi,

in February we upgraded our test DoH/DoT server from 1.8.3 to 1.9.0
but we did not notice any problems so we upgraded our production server
from 1.8.3 to 1.9.0 yesterday.

Immediately after upgrading our monitoring claimed our DoH service is 
unavailable (HTTP 400) but we were unable to reproduce it using firefox.


A closer look confirmed that there is some issue because we see about 
50% less DoH requests in our grafana graphs showing DoH request rates.


Having a look at the request rates per HTTP method suggests that we 
"loose" almost all GET requests but also a significant fraction of POST 
DoH requests.


sum by (method) 
(irate(dnsdist_frontend_doh_http_method_queries{job="$job"}[$__rate_interval]))


After looking at the TLS versions graph I noticed a clear correlation
but then I realized that all our DoH requests are TLS version 1.3
because we set minTLSVersion='tls1.3' - so this might be irrelevant.

irate(dnsdist_frontend_tlsqueries{job="$job"}[$__rate_interval])

2024-03-16 20:57:59 dnsdist upgraded: 1.8.3_1 -> 1.9.0
2024-03-16 20:59 monitoring says DoH is down (HTTP 400 - Bad Request)
monitoring requests this: 
https://doh.applied-privacy.net/query?dns=l1sBAAABA3d3dw1rbm90LXJlc29sdmVyAmN6AAAcAAE
Mar 17 02:40:45 bender-dpriv1 kernel: pid 77544 (dnsdist), jid 0, uid 
208: exited on signal 11 -> also interesting put likely unrelated?


Today we downgraded to 1.8.3, and everything went back to normal.

Is anyone else observing similar issues on dnsdist 1.9.0?

DoT does not appear to be affected.

best regards,
Christoph

OS: FreeBSD 13.2
dnsdist installed via pkg

our dnsdist config:

newServer({address="109.70.100.136", maxInFlight=1000, sockets=32, 
name="clamps"})
newServer({address="109.70.100.140", maxInFlight=1000, sockets=32, 
name="roberto"})

--newServer({address="109.70.100.133", sockets=4, name="titanius-dpriv1"})
setServerPolicy(leastOutstanding)

addTLSLocal("0.0.0.0", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', 
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
addTLSLocal("[::]", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', 
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })


addDOHLocal("0.0.0.0:444", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
"/query", {minTLSVersion='tls1.3', serverTokens='doh', 
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
addDOHLocal("[::]:444", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
"/query", {minTLSVersion='tls1.3', serverTokens='doh', 
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })


setACL({'0.0.0.0/0', '::/0'})
controlSocket('127.0.0.1:5199')
setConsoleACL('127.0.0.1/8')

setKey()

pc = newPacketCache(5, {maxTTL=86400, minTTL=3, 
temporaryFailureTTL=60, staleTTL=60, dontAge=false})

getPool(""):setCache(pc)

webserver("127.0.0.1:8083")
setWebserverConfig({...})
setVerboseHealthChecks(true)
addAction(QTypeRule(65535), RCodeAction(DNSRCode.NOTIMP))



___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] greqp() output columns

2023-10-02 Thread Christoph via dnsdist

Hi Remi,

I don't think we have a way to log only these, unfortunately :-/ If you 
have the dnsdist console set up, you can use grepq('1000ms') to look at 
all queries that took more than 1 second, which is usually indicative of 
a problem, or even grepq('2000ms'), as dnsdist records timeouts with a 
very high response time.


Thanks for this suggestion.

out of ~200 lines from the grepq('3000ms') output 184 lines end with
... T.O RD No Error. 0 answers

examples:
aPPLE.CoM.  A T.O   RDNo Error. 0 answers
fACeboOK.COm.     T.O   RDNo Error. 0 answers

does "T.O" in the Lat. column stand for timeout?

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] backend drops metrics for TCP

2023-09-12 Thread Christoph via dnsdist
This counter will always be 0 for TCP backends indeed, it is only 
incremented when we give up waiting on a UDP response.


Thanks for confirming.

The default timeout for TCP backends is set at 30s, while for UDP 
responses it is at 2s. So it is very possible that dnsdist no longer 
considers the response a timeout but the application now does. You might 
try to tune the 'tcpRecvTimeout' on `newServer`. Note that this suggests 
that the backend is slow to answer, so tuning dnsdist might not help at 
all and investigating why the backend struggles with these queries might 
be needed.


I've switched back to using UDP.
Is there an easy way to log queries that timeout (2s) - and not log any 
others? To investigate some examples further?


https://dnsdist.org/rules-actions.html?highlight=addaction#ERCodeRule
https://dnsdist.org/reference/constants.html#dnsrcode
The only RCode with "time" in it: DNSRCode.BADTIME

Yes, I'm also investigating the increased timeout rate on the backend 
Recursor side and I'm in contact with Otto about it. So far disabling 
agg. NSEC caching has been the most significant workaround for that problem.


Do you enable out-of-order processing, 
via 'maxInFlight' on `newServer`? 


yes (1k)

If so, are you sure that the backend

actually supports it?


A while back you pointed out a problem in our Recursor config
since then Recursor should work with maxInFlight config.

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


[dnsdist] backend drops metrics for TCP

2023-09-11 Thread Christoph via dnsdist

Hello!

when playing around with things to reduce the drop rate I noticed
that TCP based backends always have 0 drops in showServers() output and 
these metrics:

dnsdist_server_drops
dnsdist_downstream_timeouts

Is that always the case and that counter has no meaning for TCP based 
backends or can this counter be non-zero for TCP backends as well?


dnsdist's CPU usage doubled after switching to TCP via tcpOnly=true
and the DNS timeout rate as measured by the application generating the 
queries running on the same host as dnsdist actually increased after 
switching dnsdist to use TCP instead of UDP. So switching to TCP 
eliminated the drops problem when measured by dnsdist but it made things 
worse for the application.


All of these values are also at 0:

dnsdist_server_tcpdiedsendingquery{address="127.0.0.1:54"} 0
dnsdist_server_tcpdiedreadingresponse{address="127.0.0.1:54"} 0
dnsdist_server_tcpgaveup{address="127.0.0.1:54"} 0
dnsdist_server_tcpreadtimeouts{address="127.0.0.1:54"} 0
dnsdist_server_tcpwritetimeouts{address="127.0.0.1:54"} 0
dnsdist_server_tcpconnecttimeouts{address="127.0.0.1:54"} 0

dnsdist_server_latency and
dnsdist_server_tcplatency
are on the same level after switching to TCP for the specific backend.

Since sockets=NUM in newServer() is only for UDP and
dnsdist_server_tcpcurrentconnections{address="127.0.0.1:54"} 10
suggests it uses only 10 TCP sockets. How can this be configured?
sockets was set to 32, so this implicit change when sitching from UDP to 
TCP might also have an effect here.


best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist.conf: "list" example (solved)

2023-09-07 Thread Christoph via dnsdist

addDOHLocal("127.0.0.1",nil,nil,{"/a","/b"})



for some reasons this config works now, so it
was likely a problem on my end.

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist latency bucket metric still broken in 1.8.0?

2023-04-14 Thread Christoph via dnsdist
Did you compile dnsdist yourself? 

I installed it via pkg.


If I try to install dnsdist on
13.1-RELEASE-p6 I only get 1.7.3:


you are likely using the default FreeBSD repo (quartely) if you use the 
latest repo you will get version 1.8.0:


mkdir -p /usr/local/etc/pkg/repos
you can create this file /usr/local/etc/pkg/repos/FreeBSD.conf
to use the latest repo:

FreeBSD: {
  url: "pkg+http://pkg.FreeBSD.org/${ABI}/latest";,
  mirror_type: "srv",
  signature_type: "fingerprints",
  fingerprints: "/usr/share/keys/pkg",
  enabled: yes
}


A quick test with a self-compiled 1.8.0 dnsdist shows a non-zero sum
for me, so I'm confused what's goiong on.


here is our dnsdist.conf,
maybe it helps to reproduce the issue.

thanks for your help!
Christoph


newServer({address="109.70.100.136", maxInFlight=1000})
newServer({address="109.70.100.140", maxInFlight=1000})
setServerPolicy(leastOutstanding)

addTLSLocal("0.0.0.0", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', 
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
addTLSLocal("[::]", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', 
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })


addDOHLocal("0.0.0.0:444", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
"/query", {minTLSVersion='tls1.3', serverTokens='doh', 
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
addDOHLocal("[::]:444", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
"/query", {minTLSVersion='tls1.3', serverTokens='doh', 
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })


setACL({'0.0.0.0/0', '::/0'})
controlSocket('127.0.0.1:5199')
setConsoleACL('127.0.0.1/8')

setKey("xxx")

pc = newPacketCache(5, {maxTTL=86400, minTTL=3, 
temporaryFailureTTL=60, staleTTL=60, dontAge=false})

getPool(""):setCache(pc)

webserver("127.0.0.1:8083")
setWebserverConfig({password="xxx"})
setVerboseHealthChecks(true)

___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist latency bucket metric still broken in 1.8.0?

2023-04-13 Thread Christoph via dnsdist

Remi Gacogne via dnsdist:
The fix not being backported is an oversight, I added the "backport to 
1.7.x" flag so we include it in an upcoming 1.7.x release.


Great to hear that this was unexptected.


Recently we upgraded our dnsdist instances to 1.8.0
but the upgrade did not improve the values in dnsdist_latency_bucket.
Now after the upgrade, the graph show basically a flat line.
This only affects our FreeBSD servers, not our Debian based dnsdist 
instances.


That's weird. Would you be able to share the prometheus output, or the 
dumpStats() one, so we know if this is the same bug or a related one?


I added it here because I wanted to add the graph as well but github 
upload is failing me, so just prometheus:

https://github.com/PowerDNS/pdns/issues/11239#issuecomment-1507536007

thanks for looking into it!
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


[dnsdist] dnsdist latency bucket metric still broken in 1.8.0?

2023-04-13 Thread Christoph via dnsdist

Hi,

ever since [1] got the  dnsdist-1.8.0 milestone
we were looking forward to the 1.8.0 release
and were also a bit surprised that this regression
will not be in a 1.7.x bugfix release.

Recently we upgraded our dnsdist instances to 1.8.0
but the upgrade did not improve the values in dnsdist_latency_bucket.
Now after the upgrade, the graph show basically a flat line.
This only affects our FreeBSD servers, not our Debian based dnsdist 
instances.


[1] https://github.com/PowerDNS/pdns/issues/11239


best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] prometheus values queries-per-connection and connection-duration always 0 for DoH?

2020-11-18 Thread Christoph via dnsdist
Remi Gacogne via dnsdist wrote:
> These metrics are not yet implemented for DoH, but as they are inherited
> from the generic frontend structure they do appear in our metrics.
> 
> The reason why it was not yet implemented is that the current API of the
> library we are using to handle HTTP/2, h2o, makes that a bit difficult.
> I just implemented [1] an external table to match the connection to DoH
> queries, so we should have these metrics in 1.6.0.
> 
> [1]: https://github.com/PowerDNS/pdns/pull/9738

Thanks for your comprehensive reply and for implementing it, we are
looking forward to running the next release :)

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


[dnsdist] prometheus values queries-per-connection and connection-duration always 0 for DoH?

2020-11-14 Thread Christoph via dnsdist
Hi,

while creating a dashboard for dnsdist prometheus metrics
we noticed that the following values are always 0 in case of DoH,
in case of DoT they appear to work fine:

dnsdist_frontend_tcpavgqueriesperconnection
dnsdist_frontend_tcpavgconnectionduration

We do use DoH and there are ongoing DoH queries.

dnsdist version: 1.5.1

Is anyone successfully seeing non-zero values in these metrics for DoH
or is this a bug?

thanks,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


[dnsdist] grafana dashboard for dnsdist? (inkl. DoH, DoT)

2020-11-14 Thread Christoph via dnsdist
Hi,

I was wondering if there are any pre-existing grafana dashboards for
dnsdist prometheus metrics?

I didn't find anything current and dnsdist related at the usual place:
https://grafana.com/grafana/dashboards?search=dnsdist


found on github but older:
https://gist.github.com/mrlesmithjr/54e0dd24417337bd2509f212c6c72545
https://github.com/PowerDNS/grafana-metronome/tree/master/dashboards


thanks,
Christoph



___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] how to increase connection qlen on DoH listener?

2020-03-30 Thread Christoph via dnsdist
> please open a feature request [1] if you
> think it's worth it.

thanks for considering this
https://github.com/PowerDNS/pdns/issues/8986



>> Reading 
>> https://www.freebsd.org/doc/en/books/handbook/configtuning-kernel-limits.html
>> I would expect that you want to increase kern.ipc.soacceptqueue
>>
>>  -Otto
> https://docs.freebsd.org/doc/12.1-RELEASE/usr/local/share/doc/freebsd/en/books/handbook/configtuning-kernel-limits.html
> 
> confirms that that is very likely the proper sysctl for your version,

They are the same setting but as Remi said it is not supported by dnsdist.

from listen(2):
 The
 kern.ipc.somaxconn sysctl(3) has been replaced with
 kern.ipc.soacceptqueue in FreeBSD 10.0 to prevent confusion about

 its actual functionality.  The original sysctl(3)
kern.ipc.somaxconn is still
 available but hidden from a sysctl(3) -a output so that existing
 applications and scripts continue  to work.

___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] how to increase connection qlen on DoH listener?

2020-03-29 Thread Christoph via dnsdist
I also tried:
setMaxTCPQueuedConnections(2048)

from:
https://dnsdist.org/reference/tuning.html

but it had no effect on the netstat -Lan output
after restarting dnsdist.
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


[dnsdist] how to increase connection qlen on DoH listener?

2020-03-29 Thread Christoph via dnsdist
Hi,

due to log entries saying:
"Listen queue overflow: 193 already in queue awaiting acceptance"
we increased
kern.ipc.somaxconn to 2048


after restarting dnsdist we noticed that while nginx takes
the new setting into account dnsdist remains at 128:

netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen
tcp4  0/0/128  <<< dnsdist
tcp4  5/0/2048 <<< nginx


Is there a way to tell dnsdist to increase the connection queue on the
DoH listener?

I didn't not see something like that in the documentation:
https://dnsdist.org/reference/config.html?highlight=adddohlocal#addDOHLocal


This is on FreeBSD 12.1 with dnsdist v1.4.0

thanks,
Christoph


refs:

kern.ipc.somaxconn: Maximum listen socket pending connection accept
queue size

from FreeBSD netstat(1) manual page:
-L  Show the size of the various listen queues.  The first
count shows the number of unaccepted connections, the
second count shows the amount of unaccepted incomplete
connections, and the third count is the maximum number of
queued connections.

___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist