Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-19 Thread Christoph via dnsdist

Remi Gacogne via dnsdist:

thanks for the pointer, really looking forward to the dnsdist version
that has this solved.


Sure, I expect to release 1.9.2 including this fix in the next couple 
weeks.


thanks!

Note that this metric (doh_http_version_queries) is incremented after 
doing some sanity checks but before actually parsing the DNS query, so 
unfortunately we cannot be sure these are valid DoH queries. At this 
point they could be bots. Can you check doh_version_status_responses for 
httpversion=1 and status=200 instead?


Thanks for pointing that out.
In our case these two graphs overlap very closely.
Maybe because only requests using the correct hostname in the SNI
actually reach dnsdist in the first place.



So the practical solution to use dnsdist 1.9.0 with nghttp2 and
still support HTTP/1.1 clients is to use a webserver like nginx in 
front of dnsdist?


Yes, a reverse proxy like nginx or HAProxy might be the best option to 
keep HTTP/1.1 support at this point.


Turns out nginx does not speak HTTP/2 with upstream servers
but HAProxy does according to the documentation.

I'm afraid we are currently not increasing any counter in this exact 
case, I'll see what I can do about it.


Thanks, appreciated.

You are correct, but in practice I am yet to see a DoH client using 
HTTP/1.1 in production.


Would be interesting to know how much non-HTTP/2 traffic large DoH
service providers see in practice, maybe I'm going to reach out on the
dns-operations mailing list.

I just don't want to increase the 
code complexity and attack surface just to reply to crawlers..


Yes, that makes sense :)

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-19 Thread Remi Gacogne via dnsdist

Hi,

On 18/03/2024 22:00, Christoph via dnsdist wrote:
This might be related:https://github.com/PowerDNS/pdns/issues/13850, 
not backported yet


thanks for the pointer, really looking forward to the dnsdist version
that has this solved.


Sure, I expect to release 1.9.2 including this fix in the next couple weeks.


The new nghttp2 provider for
incoming DNS over HTTPS does not support HTTP/1.1. In 1.9.x it's
still possible to switch back to the legacy h2o provider but note
that it will likely go away in the next major version of DNSdist. In
our testing the lack of HTTP/1.1 support was not an issue for actual
DNS over HTTPS clients, with most of HTTP/1.1 queries coming from
crawlers/bots, but of course we will reconsider if you find out that
legitimate DoH clients are impacted.


we see about 5-10% of non-version 2 DoH requests by looking at:

sum by (version)
(irate(dnsdist_frontend_doh_http_version_queries{job="$job"}[$__rate_interval]))


Note that this metric (doh_http_version_queries) is incremented after 
doing some sanity checks but before actually parsing the DNS query, so 
unfortunately we cannot be sure these are valid DoH queries. At this 
point they could be bots. Can you check doh_version_status_responses for 
httpversion=1 and status=200 instead?



So the practical solution to use dnsdist 1.9.0 with nghttp2 and
still support HTTP/1.1 clients is to use a webserver like nginx in front 
of dnsdist?


Yes, a reverse proxy like nginx or HAProxy might be the best option to 
keep HTTP/1.1 support at this point.



I expected an increase of this metric during our partial outage but
this value did not increase, is this expected?

irate(dnsdist_frontend_doh_version_status_responses{httpversion="1",status="400",job="$job"}[$__rate_interval])

dnsdist_frontend_noncompliantqueries also didn't increase.
Which value is expected to increase?


I'm afraid we are currently not increasing any counter in this exact 
case, I'll see what I can do about it.



btw:
dnsdist's v1.9.0 answer to HTTP requests not using HTTP/2:


This server implements RFC 8484 - DNS Queries over HTTP, and
requires HTTP/2 in accordance with section 5.2 of the RFC.


but RFC8484 does not actually require HTTP/2, right?

https://www.rfc-editor.org/rfc/rfc8484.html#section-5.2
 > 5.2.  HTTP/2


HTTP/2 [RFC7540] is the minimum RECOMMENDED version of HTTP for use 
with DoH.


It is recommended but not a "MUST".


You are correct, but in practice I am yet to see a DoH client using 
HTTP/1.1 in production. Bind 9, Unbound and Knot also only support DNS 
over HTTP/2. That being said, I'm really open to implementing DNS over 
HTTP/1.1 if it serves a real purpose, I just don't want to increase the 
code complexity and attack surface just to reply to crawlers..


Best regards,
--
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/



OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-18 Thread Christoph via dnsdist

Otto Moerbeek wrote:
This might be related:https://github.com/PowerDNS/pdns/issues/13850, 
not backported yet


thanks for the pointer, really looking forward to the dnsdist version
that has this solved.

Remi wrote:

In addition to the issue mentioned by Otto, it might also be that the
monitoring does not support HTTP/2.


yes, that appears to be the case uptimerobot does not support HTTP/2 and 
was affected, our blackbox_exporter appears to support HTTP/2 and was 
not affected.



The new nghttp2 provider for
incoming DNS over HTTPS does not support HTTP/1.1. In 1.9.x it's
still possible to switch back to the legacy h2o provider but note
that it will likely go away in the next major version of DNSdist. In
our testing the lack of HTTP/1.1 support was not an issue for actual
DNS over HTTPS clients, with most of HTTP/1.1 queries coming from
crawlers/bots, but of course we will reconsider if you find out that
legitimate DoH clients are impacted.


we see about 5-10% of non-version 2 DoH requests by looking at:

sum by (version)
(irate(dnsdist_frontend_doh_http_version_queries{job="$job"}[$__rate_interval]))

So the practical solution to use dnsdist 1.9.0 with nghttp2 and
still support HTTP/1.1 clients is to use a webserver like nginx in front 
of dnsdist?


I expected an increase of this metric during our partial outage but
this value did not increase, is this expected?

irate(dnsdist_frontend_doh_version_status_responses{httpversion="1",status="400",job="$job"}[$__rate_interval])

dnsdist_frontend_noncompliantqueries also didn't increase.
Which value is expected to increase?


btw:
dnsdist's v1.9.0 answer to HTTP requests not using HTTP/2:


This server implements RFC 8484 - DNS Queries over HTTP, and
requires HTTP/2 in accordance with section 5.2 of the RFC.


but RFC8484 does not actually require HTTP/2, right?

https://www.rfc-editor.org/rfc/rfc8484.html#section-5.2
> 5.2.  HTTP/2


HTTP/2 [RFC7540] is the minimum RECOMMENDED version of HTTP for use 
with DoH.


It is recommended but not a "MUST".

best regards,
Christoph
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-18 Thread Remi Gacogne via dnsdist

Hi Christoph,

In addition to the issue mentioned by Otto, it might also be that the 
monitoring does not support HTTP/2. The new nghttp2 provider for 
incoming DNS over HTTPS does not support HTTP/1.1. In 1.9.x it's still 
possible to switch back to the legacy h2o provider but note that it will 
likely go away in the next major version of DNSdist. In our testing the 
lack of HTTP/1.1 support was not an issue for actual DNS over HTTPS 
clients, with most of HTTP/1.1 queries coming from crawlers/bots, but of 
course we will reconsider if you find out that legitimate DoH clients 
are impacted.


Best regards,

Remi

On 17/03/2024 19:12, Otto Moerbeek via dnsdist wrote:

On Sun, Mar 17, 2024 at 06:41:13PM +0100, Christoph via dnsdist wrote:


Hi,

in February we upgraded our test DoH/DoT server from 1.8.3 to 1.9.0
but we did not notice any problems so we upgraded our production server
from 1.8.3 to 1.9.0 yesterday.

Immediately after upgrading our monitoring claimed our DoH service is
unavailable (HTTP 400) but we were unable to reproduce it using firefox.

A closer look confirmed that there is some issue because we see about 50%
less DoH requests in our grafana graphs showing DoH request rates.

Having a look at the request rates per HTTP method suggests that we "loose"
almost all GET requests but also a significant fraction of POST DoH
requests.

sum by (method) 
(irate(dnsdist_frontend_doh_http_method_queries{job="$job"}[$__rate_interval]))

After looking at the TLS versions graph I noticed a clear correlation
but then I realized that all our DoH requests are TLS version 1.3
because we set minTLSVersion='tls1.3' - so this might be irrelevant.

irate(dnsdist_frontend_tlsqueries{job="$job"}[$__rate_interval])

2024-03-16 20:57:59 dnsdist upgraded: 1.8.3_1 -> 1.9.0
2024-03-16 20:59 monitoring says DoH is down (HTTP 400 - Bad Request)
monitoring requests this: 
https://doh.applied-privacy.net/query?dns=l1sBAAABA3d3dw1rbm90LXJlc29sdmVyAmN6AAAcAAE
Mar 17 02:40:45 bender-dpriv1 kernel: pid 77544 (dnsdist), jid 0, uid 208:
exited on signal 11 -> also interesting put likely unrelated?

Today we downgraded to 1.8.3, and everything went back to normal.

Is anyone else observing similar issues on dnsdist 1.9.0?

DoT does not appear to be affected.

best regards,
Christoph

OS: FreeBSD 13.2
dnsdist installed via pkg

our dnsdist config:

newServer({address="109.70.100.136", maxInFlight=1000, sockets=32,
name="clamps"})
newServer({address="109.70.100.140", maxInFlight=1000, sockets=32,
name="roberto"})
--newServer({address="109.70.100.133", sockets=4, name="titanius-dpriv1"})
setServerPolicy(leastOutstanding)

addTLSLocal("0.0.0.0",
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256',
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
addTLSLocal("[::]",
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256',
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })

addDOHLocal("0.0.0.0:444",
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key",
"/query", {minTLSVersion='tls1.3', serverTokens='doh',
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
addDOHLocal("[::]:444",
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key",
"/query", {minTLSVersion='tls1.3', serverTokens='doh',
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })

setACL({'0.0.0.0/0', '::/0'})
controlSocket('127.0.0.1:5199')
setConsoleACL('127.0.0.1/8')

setKey()

pc = newPacketCache(5, {maxTTL=86400, minTTL=3, temporaryFailureTTL=60,
staleTTL=60, dontAge=false})
getPool(""):setCache(pc)

webserver("127.0.0.1:8083")
setWebserverConfig({...})
setVerboseHealthChecks(true)
addAction(QTypeRule(65535), RCodeAction(DNSRCode.NOTIMP))



This might be related: https://github.com/PowerDNS/pdns/issues/13850,
not backported yet

-Otto

___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist




OpenPGP_signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-17 Thread Otto Moerbeek via dnsdist
On Sun, Mar 17, 2024 at 06:41:13PM +0100, Christoph via dnsdist wrote:

> Hi,
> 
> in February we upgraded our test DoH/DoT server from 1.8.3 to 1.9.0
> but we did not notice any problems so we upgraded our production server
> from 1.8.3 to 1.9.0 yesterday.
> 
> Immediately after upgrading our monitoring claimed our DoH service is
> unavailable (HTTP 400) but we were unable to reproduce it using firefox.
> 
> A closer look confirmed that there is some issue because we see about 50%
> less DoH requests in our grafana graphs showing DoH request rates.
> 
> Having a look at the request rates per HTTP method suggests that we "loose"
> almost all GET requests but also a significant fraction of POST DoH
> requests.
> 
> sum by (method) 
> (irate(dnsdist_frontend_doh_http_method_queries{job="$job"}[$__rate_interval]))
> 
> After looking at the TLS versions graph I noticed a clear correlation
> but then I realized that all our DoH requests are TLS version 1.3
> because we set minTLSVersion='tls1.3' - so this might be irrelevant.
> 
> irate(dnsdist_frontend_tlsqueries{job="$job"}[$__rate_interval])
> 
> 2024-03-16 20:57:59 dnsdist upgraded: 1.8.3_1 -> 1.9.0
> 2024-03-16 20:59 monitoring says DoH is down (HTTP 400 - Bad Request)
> monitoring requests this: 
> https://doh.applied-privacy.net/query?dns=l1sBAAABA3d3dw1rbm90LXJlc29sdmVyAmN6AAAcAAE
> Mar 17 02:40:45 bender-dpriv1 kernel: pid 77544 (dnsdist), jid 0, uid 208:
> exited on signal 11 -> also interesting put likely unrelated?
> 
> Today we downgraded to 1.8.3, and everything went back to normal.
> 
> Is anyone else observing similar issues on dnsdist 1.9.0?
> 
> DoT does not appear to be affected.
> 
> best regards,
> Christoph
> 
> OS: FreeBSD 13.2
> dnsdist installed via pkg
> 
> our dnsdist config:
> 
> newServer({address="109.70.100.136", maxInFlight=1000, sockets=32,
> name="clamps"})
> newServer({address="109.70.100.140", maxInFlight=1000, sockets=32,
> name="roberto"})
> --newServer({address="109.70.100.133", sockets=4, name="titanius-dpriv1"})
> setServerPolicy(leastOutstanding)
> 
> addTLSLocal("0.0.0.0",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
> {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256',
> minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
> addTLSLocal("[::]",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
> {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256',
> minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
> 
> addDOHLocal("0.0.0.0:444",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key",
> "/query", {minTLSVersion='tls1.3', serverTokens='doh',
> tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
> addDOHLocal("[::]:444",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key",
> "/query", {minTLSVersion='tls1.3', serverTokens='doh',
> tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
> 
> setACL({'0.0.0.0/0', '::/0'})
> controlSocket('127.0.0.1:5199')
> setConsoleACL('127.0.0.1/8')
> 
> setKey()
> 
> pc = newPacketCache(5, {maxTTL=86400, minTTL=3, temporaryFailureTTL=60,
> staleTTL=60, dontAge=false})
> getPool(""):setCache(pc)
> 
> webserver("127.0.0.1:8083")
> setWebserverConfig({...})
> setVerboseHealthChecks(true)
> addAction(QTypeRule(65535), RCodeAction(DNSRCode.NOTIMP))


This might be related: https://github.com/PowerDNS/pdns/issues/13850,
not backported yet

-Otto

___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


[dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

2024-03-17 Thread Christoph via dnsdist

Hi,

in February we upgraded our test DoH/DoT server from 1.8.3 to 1.9.0
but we did not notice any problems so we upgraded our production server
from 1.8.3 to 1.9.0 yesterday.

Immediately after upgrading our monitoring claimed our DoH service is 
unavailable (HTTP 400) but we were unable to reproduce it using firefox.


A closer look confirmed that there is some issue because we see about 
50% less DoH requests in our grafana graphs showing DoH request rates.


Having a look at the request rates per HTTP method suggests that we 
"loose" almost all GET requests but also a significant fraction of POST 
DoH requests.


sum by (method) 
(irate(dnsdist_frontend_doh_http_method_queries{job="$job"}[$__rate_interval]))


After looking at the TLS versions graph I noticed a clear correlation
but then I realized that all our DoH requests are TLS version 1.3
because we set minTLSVersion='tls1.3' - so this might be irrelevant.

irate(dnsdist_frontend_tlsqueries{job="$job"}[$__rate_interval])

2024-03-16 20:57:59 dnsdist upgraded: 1.8.3_1 -> 1.9.0
2024-03-16 20:59 monitoring says DoH is down (HTTP 400 - Bad Request)
monitoring requests this: 
https://doh.applied-privacy.net/query?dns=l1sBAAABA3d3dw1rbm90LXJlc29sdmVyAmN6AAAcAAE
Mar 17 02:40:45 bender-dpriv1 kernel: pid 77544 (dnsdist), jid 0, uid 
208: exited on signal 11 -> also interesting put likely unrelated?


Today we downgraded to 1.8.3, and everything went back to normal.

Is anyone else observing similar issues on dnsdist 1.9.0?

DoT does not appear to be affected.

best regards,
Christoph

OS: FreeBSD 13.2
dnsdist installed via pkg

our dnsdist config:

newServer({address="109.70.100.136", maxInFlight=1000, sockets=32, 
name="clamps"})
newServer({address="109.70.100.140", maxInFlight=1000, sockets=32, 
name="roberto"})

--newServer({address="109.70.100.133", sockets=4, name="titanius-dpriv1"})
setServerPolicy(leastOutstanding)

addTLSLocal("0.0.0.0", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', 
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
addTLSLocal("[::]", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
{ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256', 
minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })


addDOHLocal("0.0.0.0:444", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
"/query", {minTLSVersion='tls1.3', serverTokens='doh', 
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
addDOHLocal("[::]:444", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt", 
"/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", 
"/query", {minTLSVersion='tls1.3', serverTokens='doh', 
tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })


setACL({'0.0.0.0/0', '::/0'})
controlSocket('127.0.0.1:5199')
setConsoleACL('127.0.0.1/8')

setKey()

pc = newPacketCache(5, {maxTTL=86400, minTTL=3, 
temporaryFailureTTL=60, staleTTL=60, dontAge=false})

getPool(""):setCache(pc)

webserver("127.0.0.1:8083")
setWebserverConfig({...})
setVerboseHealthChecks(true)
addAction(QTypeRule(65535), RCodeAction(DNSRCode.NOTIMP))



___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist