Re: [Pdns-users] Slow query and SERVERFAIL from local pdns_recursor
On 9/10/20 3:40 PM, Christian Degenkolb wrote: what is a reasonable low value for udp-truncation-threshold? I tried with 900 and 600 (as low as half the default value) but found no improvements. I use 1220 because the always recommended 1232 does not work for me with IPv6. Some months ago the network team forgot to configure fragment handling correctly on JunOS. As soon as I lowered the udp-truncation-threshold dhl.com and others started working immediately. Also I don't think this is a vmware.com problem since I have the same problem with multiple domains. Another thing that I noticed is that not well utilized recursors perform bad because they need to work through the whole chain from . to the zones nameserver including many extra queries for dnssec. "not well utilized" as in less than 10k queries/second. Please try to "preheat" your recursor and see what changes. For use at home I've written https://github.com/miesi/DNS-Standheizung to have all tld namesserver with their A//... in the recursors chache Cheers Thomas ___ Pdns-users mailing list Pdns-users@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/pdns-users
Re: [Pdns-users] Slow query and SERVERFAIL from local pdns_recursor
On Thu, Sep 10, 2020 at 03:40:54PM +0200, Christian Degenkolb via Pdns-users wrote: > Hi Thomas, > > what is a reasonable low value for udp-truncation-threshold? I tried with > 900 and 600 (as low as half the default value) but found no improvements. Try edns-outgoing-bufsize, that is the one that influences traffic between the recursor and the authoritative servers. > > Also I don't think this is a vmware.com problem since I have the same > problem with multiple domains. Yes, there clear are indications your connectivity is hampered somehwere. -Otto > > To illustrate I found the tool dnsperf from > https://www.dns-oarc.net/tools/dnsperf and created a queryfile with the list > of 500 domains from here https://moz.com/top500 see > https://paste.ubuntu.com/p/DxGBqRvngv/ > > If I call the tool against my local resolver on a clean cache (even with > udp-truncation-threshol=600) I get the following output. > > # rec_control wipe-cache $ > wiped 4154 records, 8 negative records, 500 packets > # ./dnsperf -d queryfile_top500_clean > DNS Performance Testing Tool > Version 2.3.4 > > [Status] Command line: dnsperf -d queryfile_top500_clean > [Status] Sending queries (to 127.0.0.1) > [Status] Started at: Thu Sep 10 15:29:26 2020 > [Status] Stopping after 1 run through file > > "Warning: received a response with an unexpected (maybe timed out) id: 162"> > > [Status] Testing complete (end of file) > > Statistics: > > Queries sent: 500 > Queries completed:278 (55.60%) > Queries lost: 222 (44.40%) > > Response codes: NOERROR 209 (75.18%), SERVFAIL 69 (24.82%) > Average packet size: request 29, response 56 > Run time (s): 16.455935 > Queries per second: 16.893601 > > Average Latency (s): 1.313376 (min 0.000543, max 4.491949) > Latency StdDev (s): 1.446709 > > # ./dnsperf -d queryfile_top500_clean > DNS Performance Testing Tool > Version 2.3.4 > > [Status] Command line: dnsperf -d queryfile_top500_clean > [Status] Sending queries (to 127.0.0.1) > [Status] Started at: Thu Sep 10 15:29:49 2020 > [Status] Stopping after 1 run through file > [Status] Testing complete (end of file) > > Statistics: > > Queries sent: 500 > Queries completed:500 (100.00%) > Queries lost: 0 (0.00%) > > Response codes: NOERROR 281 (56.20%), SERVFAIL 219 (43.80%) > Average packet size: request 29, response 50 > Run time (s): 4.571526 > Queries per second: 109.372669 > > Average Latency (s): 0.015253 (min 0.54, max 4.556146) > Latency StdDev (s): 0.244755 > > As I see this way to much queries lost without a filled cache and way to > high SERVFAIL for this kind of domains even on retries. > The SERVFAIL stays high on subsequent runs. > > Whereas if I run it against 1.1.1.1 (or the hoster DNS server) I get the > following output. > > # ./dnsperf -d queryfile_top500_clean -s 1.1.1.1 > DNS Performance Testing Tool > Version 2.3.4 > > [Status] Command line: dnsperf -d queryfile_top500_clean -s 1.1.1.1 > [Status] Sending queries (to 1.1.1.1) > [Status] Started at: Thu Sep 10 15:33:24 2020 > [Status] Stopping after 1 run through file > [Status] Testing complete (end of file) > > Statistics: > > Queries sent: 500 > Queries completed:500 (100.00%) > Queries lost: 0 (0.00%) > > Response codes: NOERROR 499 (99.80%), SERVFAIL 1 (0.20%) > Average packet size: request 29, response 77 > Run time (s): 0.882704 > Queries per second: 566.441299 > > Average Latency (s): 0.013521 (min 0.005065, max 0.863349) > Latency StdDev (s): 0.054510 > > A near perfect score. > > Doesn't this mean the problem lies within the local resolver since dnsperf > would make the same requests the local resolver would make to the external > DNS server? > Or at least there does not exist an uplink problem but something local to my > server? > > regards > Chris > > > > > > Am 2020-09-09 10:05, schrieb Thomas Mieslinger via Pdns-users: > > Hi Christian, > > > > Hetzner might filter ip fragments. Please try if your situation gets > > better if you set udp-truncation-threshold to a reasonable low value. > > > > By default pdns-recursor does dnssec. I would like to suggest to set > > +dnssec on your dig queries. > > > > A possible workaround for the vmware.com problems is to add a negative > > trust anchor for vmware.com. in pdns config. > > > > Cheers Thomas > > > > On 9/8/20 2:16 PM, Christian Degenkolb via Pdns-users wrote: > > > Hi, > > > > > > I set the trace=yes option in the recursor config an redid the tests > > > for > > > pubs.vmware.com. > > > > > > The log can be found here https://paste.debian.net/hidden/07526601/ > > > > > > I found two timeouts in the logs > > > > > > Line 41: > > > Sep 8 10:21:54 rho pdns_recursor[25208]: [3] pubs.vmware.com: > > > Resolved > > > 'vmware.com' NS ns01.vmwdns.com to: 45.54.11.1 > > > Sep 8 10:21:54 rho
Re: [Pdns-users] Slow query and SERVERFAIL from local pdns_recursor
Hi Thomas, what is a reasonable low value for udp-truncation-threshold? I tried with 900 and 600 (as low as half the default value) but found no improvements. Also I don't think this is a vmware.com problem since I have the same problem with multiple domains. To illustrate I found the tool dnsperf from https://www.dns-oarc.net/tools/dnsperf and created a queryfile with the list of 500 domains from here https://moz.com/top500 see https://paste.ubuntu.com/p/DxGBqRvngv/ If I call the tool against my local resolver on a clean cache (even with udp-truncation-threshol=600) I get the following output. # rec_control wipe-cache $ wiped 4154 records, 8 negative records, 500 packets # ./dnsperf -d queryfile_top500_clean DNS Performance Testing Tool Version 2.3.4 [Status] Command line: dnsperf -d queryfile_top500_clean [Status] Sending queries (to 127.0.0.1) [Status] Started at: Thu Sep 10 15:29:26 2020 [Status] Stopping after 1 run through file "Warning: received a response with an unexpected (maybe timed out) id: 162"> [Status] Testing complete (end of file) Statistics: Queries sent: 500 Queries completed:278 (55.60%) Queries lost: 222 (44.40%) Response codes: NOERROR 209 (75.18%), SERVFAIL 69 (24.82%) Average packet size: request 29, response 56 Run time (s): 16.455935 Queries per second: 16.893601 Average Latency (s): 1.313376 (min 0.000543, max 4.491949) Latency StdDev (s): 1.446709 # ./dnsperf -d queryfile_top500_clean DNS Performance Testing Tool Version 2.3.4 [Status] Command line: dnsperf -d queryfile_top500_clean [Status] Sending queries (to 127.0.0.1) [Status] Started at: Thu Sep 10 15:29:49 2020 [Status] Stopping after 1 run through file [Status] Testing complete (end of file) Statistics: Queries sent: 500 Queries completed:500 (100.00%) Queries lost: 0 (0.00%) Response codes: NOERROR 281 (56.20%), SERVFAIL 219 (43.80%) Average packet size: request 29, response 50 Run time (s): 4.571526 Queries per second: 109.372669 Average Latency (s): 0.015253 (min 0.54, max 4.556146) Latency StdDev (s): 0.244755 As I see this way to much queries lost without a filled cache and way to high SERVFAIL for this kind of domains even on retries. The SERVFAIL stays high on subsequent runs. Whereas if I run it against 1.1.1.1 (or the hoster DNS server) I get the following output. # ./dnsperf -d queryfile_top500_clean -s 1.1.1.1 DNS Performance Testing Tool Version 2.3.4 [Status] Command line: dnsperf -d queryfile_top500_clean -s 1.1.1.1 [Status] Sending queries (to 1.1.1.1) [Status] Started at: Thu Sep 10 15:33:24 2020 [Status] Stopping after 1 run through file [Status] Testing complete (end of file) Statistics: Queries sent: 500 Queries completed:500 (100.00%) Queries lost: 0 (0.00%) Response codes: NOERROR 499 (99.80%), SERVFAIL 1 (0.20%) Average packet size: request 29, response 77 Run time (s): 0.882704 Queries per second: 566.441299 Average Latency (s): 0.013521 (min 0.005065, max 0.863349) Latency StdDev (s): 0.054510 A near perfect score. Doesn't this mean the problem lies within the local resolver since dnsperf would make the same requests the local resolver would make to the external DNS server? Or at least there does not exist an uplink problem but something local to my server? regards Chris Am 2020-09-09 10:05, schrieb Thomas Mieslinger via Pdns-users: Hi Christian, Hetzner might filter ip fragments. Please try if your situation gets better if you set udp-truncation-threshold to a reasonable low value. By default pdns-recursor does dnssec. I would like to suggest to set +dnssec on your dig queries. A possible workaround for the vmware.com problems is to add a negative trust anchor for vmware.com. in pdns config. Cheers Thomas On 9/8/20 2:16 PM, Christian Degenkolb via Pdns-users wrote: Hi, I set the trace=yes option in the recursor config an redid the tests for pubs.vmware.com. The log can be found here https://paste.debian.net/hidden/07526601/ I found two timeouts in the logs Line 41: Sep 8 10:21:54 rho pdns_recursor[25208]: [3] pubs.vmware.com: Resolved 'vmware.com' NS ns01.vmwdns.com to: 45.54.11.1 Sep 8 10:21:54 rho pdns_recursor[25208]: [3] pubs.vmware.com: Trying IP 45.54.11.1:53, asking 'pubs.vmware.com|A' Sep 8 10:21:56 rho pdns_recursor[25208]: [3] pubs.vmware.com: timeout resolving after 1501.63msec Sep 8 10:21:56 rho pdns_recursor[25208]: [3] pubs.vmware.com: Trying to resolve NS 'ns04.vmwdns.com' (2/8) But a request to the 45.54.11.1 for pubs.vmware.com come back within 11 msec. $ dig -t A @45.54.11.1 pubs.vmware.com ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @45.54.11.1 pubs.vmware.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24122 ;; flags: qr aa rd; QUERY: 1, ANSWER:
Re: [Pdns-users] Slow query and SERVERFAIL from local pdns_recursor
Hi Christian, Hetzner might filter ip fragments. Please try if your situation gets better if you set udp-truncation-threshold to a reasonable low value. By default pdns-recursor does dnssec. I would like to suggest to set +dnssec on your dig queries. A possible workaround for the vmware.com problems is to add a negative trust anchor for vmware.com. in pdns config. Cheers Thomas On 9/8/20 2:16 PM, Christian Degenkolb via Pdns-users wrote: Hi, I set the trace=yes option in the recursor config an redid the tests for pubs.vmware.com. The log can be found here https://paste.debian.net/hidden/07526601/ I found two timeouts in the logs Line 41: Sep 8 10:21:54 rho pdns_recursor[25208]: [3] pubs.vmware.com: Resolved 'vmware.com' NS ns01.vmwdns.com to: 45.54.11.1 Sep 8 10:21:54 rho pdns_recursor[25208]: [3] pubs.vmware.com: Trying IP 45.54.11.1:53, asking 'pubs.vmware.com|A' Sep 8 10:21:56 rho pdns_recursor[25208]: [3] pubs.vmware.com: timeout resolving after 1501.63msec Sep 8 10:21:56 rho pdns_recursor[25208]: [3] pubs.vmware.com: Trying to resolve NS 'ns04.vmwdns.com' (2/8) But a request to the 45.54.11.1 for pubs.vmware.com come back within 11 msec. $ dig -t A @45.54.11.1 pubs.vmware.com ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @45.54.11.1 pubs.vmware.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24122 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;pubs.vmware.com.INA ;; ANSWER SECTION: pubs.vmware.com.30INCNAME pubs.vmware.com.ds.edgekey.net. ;; Query time: 11 msec ;; SERVER: 45.54.11.1#53(45.54.11.1) ;; WHEN: Tue Sep 08 13:29:57 CEST 2020 ;; MSG SIZE rcvd: 88 and a seconds timeout in line 159: Sep 8 10:21:56 rho pdns_recursor[25208]: [3] e751.dscx.akamaiedge.net: Trying IP 2.16.106.23:53, asking 'e751.dscx.akamaiedge.net|A' Sep 8 10:21:57 rho pdns_recursor[25208]: [3] e751.dscx.akamaiedge.net: timeout resolving after 1501.74msec Sep 8 10:21:57 rho pdns_recursor[25208]: [3] e751.dscx.akamaiedge.net: Trying to resolve NS 'n3dscx.akamaiedge.net' (2/8) Same picture here with a very good response time. $ dig -t A @2.16.106.23 e751.dscx.akamaiedge.net ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @2.16.106.23 e751.dscx.akamaiedge.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7947 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;e751.dscx.akamaiedge.net.INA ;; ANSWER SECTION: e751.dscx.akamaiedge.net. 20INA104.111.214.47 ;; Query time: 5 msec ;; SERVER: 2.16.106.23#53(2.16.106.23) ;; WHEN: Tue Sep 08 13:31:32 CEST 2020 ;; MSG SIZE rcvd: 69 To check that this is not a vmware.com problem I tested some more and got the same timeouts. One more example for $dig nameservers.dnscheck.co @127.0.0.1 ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> nameservers.dnscheck.co @127.0.0.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 23852 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;nameservers.dnscheck.co.INA ;; Query time: 3005 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Tue Sep 08 12:15:29 CEST 2020 ;; MSG SIZE rcvd: 52 can be found here https://paste.debian.net/hidden/b48a78a2/. This time multiple timeout regarding the root name servers, for example g.root-servers.net Sep 8 12:15:21 rho pdns_recursor[25208]: [50] nameservers.dnscheck.co: Resolved '.' NS g.root-servers.net to: 192.112.36.4 Sep 8 12:15:21 rho pdns_recursor[25208]: [50] nameservers.dnscheck.co: Trying IP 192.112.36.4:53, asking 'nameservers.dnscheck.co|A' Sep 8 12:15:22 rho pdns_recursor[25208]: [50] nameservers.dnscheck.co: timeout resolving after 1501.63msec Sep 8 12:15:22 rho pdns_recursor[25208]: [50] nameservers.dnscheck.co: Trying to resolve NS 'j.root-servers.net' (2/13) Where a direct request via dig works like a charm. $ dig -t A @192.112.36.4 nameservers.dnscheck.co ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @192.112.36.4 nameservers.dnscheck.co ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18641 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 13 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: ce9eaf15bb34977b41354b5f5f576c3841785bfba5901e93 (good) ;; QUESTION SECTION: ;nameservers.dnscheck.co.INA ;; AUTHORITY SECTION: co.172800 INNSns5.cctld.co. co.172800 INNSns1.cctld.co. co.172800 INNSns6.cctld.co. co.172800 INNSns4.cctld.co. co.172800
Re: [Pdns-users] Slow query and SERVERFAIL from local pdns_recursor
Hi, I set the trace=yes option in the recursor config an redid the tests for pubs.vmware.com. The log can be found here https://paste.debian.net/hidden/07526601/ I found two timeouts in the logs Line 41: Sep 8 10:21:54 rho pdns_recursor[25208]: [3] pubs.vmware.com: Resolved 'vmware.com' NS ns01.vmwdns.com to: 45.54.11.1 Sep 8 10:21:54 rho pdns_recursor[25208]: [3] pubs.vmware.com: Trying IP 45.54.11.1:53, asking 'pubs.vmware.com|A' Sep 8 10:21:56 rho pdns_recursor[25208]: [3] pubs.vmware.com: timeout resolving after 1501.63msec Sep 8 10:21:56 rho pdns_recursor[25208]: [3] pubs.vmware.com: Trying to resolve NS 'ns04.vmwdns.com' (2/8) But a request to the 45.54.11.1 for pubs.vmware.com come back within 11 msec. $ dig -t A @45.54.11.1 pubs.vmware.com ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @45.54.11.1 pubs.vmware.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24122 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;pubs.vmware.com.INA ;; ANSWER SECTION: pubs.vmware.com.30INCNAME pubs.vmware.com.ds.edgekey.net. ;; Query time: 11 msec ;; SERVER: 45.54.11.1#53(45.54.11.1) ;; WHEN: Tue Sep 08 13:29:57 CEST 2020 ;; MSG SIZE rcvd: 88 and a seconds timeout in line 159: Sep 8 10:21:56 rho pdns_recursor[25208]: [3] e751.dscx.akamaiedge.net: Trying IP 2.16.106.23:53, asking 'e751.dscx.akamaiedge.net|A' Sep 8 10:21:57 rho pdns_recursor[25208]: [3] e751.dscx.akamaiedge.net: timeout resolving after 1501.74msec Sep 8 10:21:57 rho pdns_recursor[25208]: [3] e751.dscx.akamaiedge.net: Trying to resolve NS 'n3dscx.akamaiedge.net' (2/8) Same picture here with a very good response time. $ dig -t A @2.16.106.23 e751.dscx.akamaiedge.net ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @2.16.106.23 e751.dscx.akamaiedge.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7947 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;e751.dscx.akamaiedge.net.INA ;; ANSWER SECTION: e751.dscx.akamaiedge.net. 20INA104.111.214.47 ;; Query time: 5 msec ;; SERVER: 2.16.106.23#53(2.16.106.23) ;; WHEN: Tue Sep 08 13:31:32 CEST 2020 ;; MSG SIZE rcvd: 69 To check that this is not a vmware.com problem I tested some more and got the same timeouts. One more example for $dig nameservers.dnscheck.co @127.0.0.1 ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> nameservers.dnscheck.co @127.0.0.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 23852 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;nameservers.dnscheck.co.INA ;; Query time: 3005 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Tue Sep 08 12:15:29 CEST 2020 ;; MSG SIZE rcvd: 52 can be found here https://paste.debian.net/hidden/b48a78a2/. This time multiple timeout regarding the root name servers, for example g.root-servers.net Sep 8 12:15:21 rho pdns_recursor[25208]: [50] nameservers.dnscheck.co: Resolved '.' NS g.root-servers.net to: 192.112.36.4 Sep 8 12:15:21 rho pdns_recursor[25208]: [50] nameservers.dnscheck.co: Trying IP 192.112.36.4:53, asking 'nameservers.dnscheck.co|A' Sep 8 12:15:22 rho pdns_recursor[25208]: [50] nameservers.dnscheck.co: timeout resolving after 1501.63msec Sep 8 12:15:22 rho pdns_recursor[25208]: [50] nameservers.dnscheck.co: Trying to resolve NS 'j.root-servers.net' (2/13) Where a direct request via dig works like a charm. $ dig -t A @192.112.36.4 nameservers.dnscheck.co ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @192.112.36.4 nameservers.dnscheck.co ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18641 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 13 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: ce9eaf15bb34977b41354b5f5f576c3841785bfba5901e93 (good) ;; QUESTION SECTION: ;nameservers.dnscheck.co.INA ;; AUTHORITY SECTION: co.172800 INNSns5.cctld.co. co.172800 INNSns1.cctld.co. co.172800 INNSns6.cctld.co. co.172800 INNSns4.cctld.co. co.172800 INNSns3.cctld.co. co.172800 INNSns2.cctld.co. ;; ADDITIONAL SECTION: ns1.cctld.co. 172800 INA156.154.100.25 ns2.cctld.co. 172800 INA156.154.101.25 ns3.cctld.co. 172800 INA156.154.102.25 ns4.cctld.co. 172800 INA156.154.103.25 ns5.cctld.co. 172800 INA156.154.104.25 ns6.cctld.co. 172800 INA156.154.105.25 ns1.cctld.co. 172800 IN2001:502:2eda::21 ns2.cctld.co. 172800
Re: [Pdns-users] Slow query and SERVERFAIL from local pdns_recursor
On Tue, Sep 08, 2020 at 09:22:31AM +0200, Christian Degenkolb wrote: > (send again, first answer was not send cc to the ML) > > Hi, > > sorry for not sending any configs. pdns_recursor runs more or less with the > vanilla config with the following changes: > > forward-zones-recurse=zen.spamhaus.org=1.1.1.1;1.0.0.1 (thats why I wanted > to use the local recursor, as mentioned the server is located in the hetzner > IP Range which apparently is blocked for the spamhaus DNSBL) > loglevel=6 > log-common-errors=yes > quiet=no > root-nx-trust=no (found this as a solution for the SERVERFAIL but did not > work) > > and > # rec_control set-carbon-server 37.252.122.50 rho-test (for the grafs) > > > A trace for the same resolves from my last mail: > > $ time dig +trace pubs.vmware.com @127.0.0.1 > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> +trace pubs.vmware.com > @127.0.0.1 > ;; global options: +cmd > . 86118 IN NS d.root-servers.net. > . 86118 IN NS c.root-servers.net. > . 86118 IN NS l.root-servers.net. > . 86118 IN NS b.root-servers.net. > . 86118 IN NS f.root-servers.net. > . 86118 IN NS m.root-servers.net. > . 86118 IN NS e.root-servers.net. > . 86118 IN NS a.root-servers.net. > . 86118 IN NS i.root-servers.net. > . 86118 IN NS k.root-servers.net. > . 86118 IN NS g.root-servers.net. > . 86118 IN NS h.root-servers.net. > . 86118 IN NS j.root-servers.net. > . 86118 IN RRSIG NS 8 0 518400 2020092105 > 2020090804 46594 . > wgnBz8tKA9hjwIxmMQgTVwnZaiUpAB9a1+oC5T/syHzqNj1e5qhApLQN > NLok43hu5Ykt8RFe/IiDZuYxIdyyzItwk > 4QN8xNgsQsfhVfBbZ26bWRz > fskquwnFn6Gmvq2qI6o42tsBxXUw09X4sNlNYI2zHB3sKaaMu0AbN9WI > Pe14jpX/PwaP3m78+XqMy9CiKmuDon6g3BuyecPhCZL5Pa8ZPC7nrKfV > pfyNSiPoBODsJE96UHGlOCJTFcbu/6Ia4ek3AGOJf+WC84HPrxLT > riyk XHfbPl7EjTbFSPgT8D7jGBfVCTQU3JSfynv29VFAHWZu1gm5VJWNQGaw u5gatA== > ;; Received 540 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms > > com.172800 IN NS a.gtld-servers.net. > com.172800 IN NS b.gtld-servers.net. > com.172800 IN NS c.gtld-servers.net. > com.172800 IN NS d.gtld-servers.net. > com.172800 IN NS e.gtld-servers.net. > com.172800 IN NS f.gtld-servers.net. > com.172800 IN NS g.gtld-servers.net. > com.172800 IN NS h.gtld-servers.net. > com.172800 IN NS i.gtld-servers.net. > com.172800 IN NS j.gtld-servers.net. > com.172800 IN NS k.gtld-servers.net. > com.172800 IN NS l.gtld-servers.net. > com.172800 IN NS m.gtld-servers.net. > com.86400 IN DS 30909 8 2 > E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766 > com.86400 IN RRSIG DS 8 1 86400 2020092105 > 2020090804 46594 . > zz85z6R/YUHxyW+ywA6zrgiYILjPo0i248M3wU+2XCRCneBH6yknQfjM > LIcbo3vADVUlkJd0l4W2TLd7NPgC255hr2 > +ALojzzHa07jyFmE203Kdw > ma7XL0C55TdFrCEMhARkZf4EncfJH9JH+fdWRWdMr0EQZd1A+FzMYemO > o7/L/8ZYq4FOt0vz+zheAJNDveGii+QpXAoDyw4xt3HMUVM+40Z/VgD1 > tk9Y3K9e2wwRNISeHdlq21JFVA2SY/gDgPCzBtM1r9Yz7oFZ2ld5W > AD0 P84GPEUMgUceAGofwxlV9+dSawhunskb+yVrpdjpizLageyJRWEu/F9A zDXxew== > ;; Received 1175 bytes from 198.97.190.53#53(h.root-servers.net) in 5 ms > > vmware.com. 172800 IN NS dns1.p05.nsone.net. > vmware.com. 172800 IN NS dns2.p05.nsone.net. > vmware.com. 172800 IN NS dns3.p05.nsone.net. > vmware.com. 172800 IN NS dns4.p05.nsone.net. > vmware.com. 172800 IN NS ns01.vmwdns.com. > vmware.com. 172800 IN NS ns02.vmwdns.com. > vmware.com. 172800 IN NS ns03.vmwdns.com. > vmware.com. 172800 IN NS ns04.vmwdns.com. > vmware.com. 86400 IN DS 48553 13 2 > AA2C697F3990472642AF01509A18224828E403CA8608EC75D5C83002 CE21847E > vmware.com. 86400 IN RRSIG DS 8 2 86400 20200915062203 > 20200908051203 24966 com. > FA2xsJKvT2LLn5UEy7hAE7PaYmds7FBkQB0SGhm8riwJRKnxbHAY0tvv > I1T/k0EzXJ4wy1J5qzNLMjhzFgPxEQB > 6BwBfJm8qo8Cnzxm4YC5Ko1/9 > pDWooVBHoFfMmJgu14Dk+u1AcHobxH9pPs7az16cLK/3YeaFW3dCrIVQ > NK2fZc0d/pc7CY0Zl1LjYQdTq+MsZiL2kbepEHD6A/4J6g== > ;; Received 523 bytes from
Re: [Pdns-users] Slow query and SERVERFAIL from local pdns_recursor
(send again, first answer was not send cc to the ML) Hi, sorry for not sending any configs. pdns_recursor runs more or less with the vanilla config with the following changes: forward-zones-recurse=zen.spamhaus.org=1.1.1.1;1.0.0.1 (thats why I wanted to use the local recursor, as mentioned the server is located in the hetzner IP Range which apparently is blocked for the spamhaus DNSBL) loglevel=6 log-common-errors=yes quiet=no root-nx-trust=no (found this as a solution for the SERVERFAIL but did not work) and # rec_control set-carbon-server 37.252.122.50 rho-test (for the grafs) A trace for the same resolves from my last mail: $ time dig +trace pubs.vmware.com @127.0.0.1 ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> +trace pubs.vmware.com @127.0.0.1 ;; global options: +cmd . 86118 IN NS d.root-servers.net. . 86118 IN NS c.root-servers.net. . 86118 IN NS l.root-servers.net. . 86118 IN NS b.root-servers.net. . 86118 IN NS f.root-servers.net. . 86118 IN NS m.root-servers.net. . 86118 IN NS e.root-servers.net. . 86118 IN NS a.root-servers.net. . 86118 IN NS i.root-servers.net. . 86118 IN NS k.root-servers.net. . 86118 IN NS g.root-servers.net. . 86118 IN NS h.root-servers.net. . 86118 IN NS j.root-servers.net. . 86118 IN RRSIG NS 8 0 518400 2020092105 2020090804 46594 . wgnBz8tKA9hjwIxmMQgTVwnZaiUpAB9a1+oC5T/syHzqNj1e5qhApLQN NLok43hu5Ykt8RFe/IiDZuYxIdyyzItwk 4QN8xNgsQsfhVfBbZ26bWRz fskquwnFn6Gmvq2qI6o42tsBxXUw09X4sNlNYI2zHB3sKaaMu0AbN9WI Pe14jpX/PwaP3m78+XqMy9CiKmuDon6g3BuyecPhCZL5Pa8ZPC7nrKfV pfyNSiPoBODsJE96UHGlOCJTFcbu/6Ia4ek3AGOJf+WC84HPrxLT riyk XHfbPl7EjTbFSPgT8D7jGBfVCTQU3JSfynv29VFAHWZu1gm5VJWNQGaw u5gatA== ;; Received 540 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms com.172800 IN NS a.gtld-servers.net. com.172800 IN NS b.gtld-servers.net. com.172800 IN NS c.gtld-servers.net. com.172800 IN NS d.gtld-servers.net. com.172800 IN NS e.gtld-servers.net. com.172800 IN NS f.gtld-servers.net. com.172800 IN NS g.gtld-servers.net. com.172800 IN NS h.gtld-servers.net. com.172800 IN NS i.gtld-servers.net. com.172800 IN NS j.gtld-servers.net. com.172800 IN NS k.gtld-servers.net. com.172800 IN NS l.gtld-servers.net. com.172800 IN NS m.gtld-servers.net. com.86400 IN DS 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766 com.86400 IN RRSIG DS 8 1 86400 2020092105 2020090804 46594 . zz85z6R/YUHxyW+ywA6zrgiYILjPo0i248M3wU+2XCRCneBH6yknQfjM LIcbo3vADVUlkJd0l4W2TLd7NPgC255hr2 +ALojzzHa07jyFmE203Kdw ma7XL0C55TdFrCEMhARkZf4EncfJH9JH+fdWRWdMr0EQZd1A+FzMYemO o7/L/8ZYq4FOt0vz+zheAJNDveGii+QpXAoDyw4xt3HMUVM+40Z/VgD1 tk9Y3K9e2wwRNISeHdlq21JFVA2SY/gDgPCzBtM1r9Yz7oFZ2ld5W AD0 P84GPEUMgUceAGofwxlV9+dSawhunskb+yVrpdjpizLageyJRWEu/F9A zDXxew== ;; Received 1175 bytes from 198.97.190.53#53(h.root-servers.net) in 5 ms vmware.com. 172800 IN NS dns1.p05.nsone.net. vmware.com. 172800 IN NS dns2.p05.nsone.net. vmware.com. 172800 IN NS dns3.p05.nsone.net. vmware.com. 172800 IN NS dns4.p05.nsone.net. vmware.com. 172800 IN NS ns01.vmwdns.com. vmware.com. 172800 IN NS ns02.vmwdns.com. vmware.com. 172800 IN NS ns03.vmwdns.com. vmware.com. 172800 IN NS ns04.vmwdns.com. vmware.com. 86400 IN DS 48553 13 2 AA2C697F3990472642AF01509A18224828E403CA8608EC75D5C83002 CE21847E vmware.com. 86400 IN RRSIG DS 8 2 86400 20200915062203 20200908051203 24966 com. FA2xsJKvT2LLn5UEy7hAE7PaYmds7FBkQB0SGhm8riwJRKnxbHAY0tvv I1T/k0EzXJ4wy1J5qzNLMjhzFgPxEQB 6BwBfJm8qo8Cnzxm4YC5Ko1/9 pDWooVBHoFfMmJgu14Dk+u1AcHobxH9pPs7az16cLK/3YeaFW3dCrIVQ NK2fZc0d/pc7CY0Zl1LjYQdTq+MsZiL2kbepEHD6A/4J6g== ;; Received 523 bytes from 2001:503:eea3::30#53(g.gtld-servers.net) in 6 ms pubs.vmware.com.30 IN CNAME pubs.vmware.com.ds.edgekey.net. pubs.vmware.com.30 IN RRSIG CNAME 13 3 30 20200909071011 20200907071011 12752
Re: [Pdns-users] Slow query and SERVERFAIL from local pdns_recursor
On Wed, Sep 02, 2020 at 09:44:37AM +0200, Christian Degenkolb via Pdns-users wrote: > Hi, > > I hope somebody on the ML can help me figure out what I'm doing wrong. > I have a local pdns_recursor (version 4.1.11-1+deb10u1 from debian 10) > runing and added it at the top of my /etc/resolve.conf as 127.0.0.1. > > However I see some strange SERVERFAIL resolves happening and all in all a > slow DNS system. > > For example see the following two consecutive resolves and a direct request > to the NS. > The first one takes nearly 3 seconds vs 11 ms from the same system if I > query the NS directly. > > $ dig pubs.vmware.com @127.0.0.1 > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> pubs.vmware.com @127.0.0.1 > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 4929 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 > > ;; OPT PSEUDOSECTION: > ; EDNS: version: 0, flags:; udp: 4096 > ;; QUESTION SECTION: > ;pubs.vmware.com.INA > > ;; ANSWER SECTION: > pubs.vmware.com.30INCNAME pubs.vmware.com.ds.edgekey.net. > pubs.vmware.com.ds.edgekey.net. 10 IN CNAME e751.dscx.akamaiedge.net. > > ;; Query time: 3009 msec > ;; SERVER: 127.0.0.1#53(127.0.0.1) > ;; WHEN: Wed Sep 02 09:19:04 CEST 2020 > ;; MSG SIZE rcvd: 123 > > $ dig pubs.vmware.com @127.0.0.1 > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> pubs.vmware.com @127.0.0.1 > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1345 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1 > > ;; OPT PSEUDOSECTION: > ; EDNS: version: 0, flags:; udp: 4096 > ;; QUESTION SECTION: > ;pubs.vmware.com.INA > > ;; ANSWER SECTION: > pubs.vmware.com.18INCNAME pubs.vmware.com.ds.edgekey.net. > pubs.vmware.com.ds.edgekey.net. 4 INCNAME e751.dscx.akamaiedge.net. > e751.dscx.akamaiedge.net. 16INA104.111.214.47 > > ;; Query time: 0 msec > ;; SERVER: 127.0.0.1#53(127.0.0.1) > ;; WHEN: Wed Sep 02 09:19:08 CEST 2020 > ;; MSG SIZE rcvd: 139 > > $ dig pubs.vmware.com @ns03.vmwdns.com > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> pubs.vmware.com > @ns03.vmwdns.com > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5509 > ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 > ;; WARNING: recursion requested but not available > > ;; OPT PSEUDOSECTION: > ; EDNS: version: 0, flags:; udp: 4096 > ;; QUESTION SECTION: > ;pubs.vmware.com.INA > > ;; ANSWER SECTION: > pubs.vmware.com.30INCNAME pubs.vmware.com.ds.edgekey.net. > > ;; Query time: 11 msec > ;; SERVER: 45.54.11.129#53(45.54.11.129) > ;; WHEN: Wed Sep 02 09:34:42 CEST 2020 > ;; MSG SIZE rcvd: 88 > > Also I have a number SERVFAIL in /var/log/syslog (pdns_recurser is currently > running with loglevel=6). > For example: > > Sep 2 08:45:35 rho pdns_recursor[19311]: Sending SERVFAIL to 127.0.0.1 > during resolve of 'pubs.vmware.com' because: Too much time waiting for > pubs.vmware.com.ds.edgekey.net|A, timeouts: 5, > throttles: 1, queries: 6, 7991msec > > # grep 'Too much time waiting for' /var/log/syslog | wc -l > 184 > > As per https://blog.powerdns.com/2014/12/11/powerdns-graphing-as-a-service/ > I send the metrics to > https://metronome1.powerdns.com/?server=pdns.rho-test.recursor=-172800 > > Does anybody have an idea whats wrong? This seems way to slow for DNS and > the SERVFAIL schouldn't happen this often. > The server in question is running in a DC of the german Hoster hetzner.de. > Besides the strange DNS I don't have any problems with the reliability of > the network connection. > > thanks > Chris > > ___ > Pdns-users mailing list > Pdns-users@mailman.powerdns.com > https://mailman.powerdns.com/mailman/listinfo/pdns-users You did not share any config or traces, so it's hard to tell. A wild guess: It might be you enabled IPV6 but your IPV6 connectivity is bad. -Otto ___ Pdns-users mailing list Pdns-users@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/pdns-users