Re: unbound NXDOMAIN TTL shared between records
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Patrik, On 22/08/15 07:27, Patrik Lundin wrote: On Fri, Aug 21, 2015 at 11:13:34PM +0200, Wouter Wijngaards via Unbound-users wrote: This is because the RRset cache is shared between answers. The SOA record is in that cache. When you query the second time, unbound detects that the SOA record has not changed, and therefore keeps timing out the existing SOA record. And then you get a lower TTL, of that SOA record, when you query again. This is because of cache update rules, which are complicated. We want to time out existing records, so that we look them up again when they expire. If the newer SOA record was different (i.e. contained different data), it would have been updated. These cache update rules are set to stop eg. cache poisoning, and the resolver sticking to an old nameserver after a nameserver change. Thanks for the explanation. Just knowing that this is by design and not due to me triggering some bug or memory starvation issue is comforting. One of the domains that were confusing me further was looking up stuff under google.se where the TTL would sometimes be shared and sometimes not. But now that I know what to look for I notice that there seem to be discrepancies in the SOA serial, below is an example of running +nssearch a few times in a row: === $ dig +nssearch google.se SOA ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns1.google.com in 10 ms. SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns3.google.com in 11 ms. SOA ns3.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns2.google.com in 24 ms. SOA ns3.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns4.google.com in 25 ms. $ dig +nssearch google.se SOA ns2.google.com. dns-admin.google.com. 101275644 900 900 1800 60 from server ns1.google.com in 11 ms. SOA ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns3.google.com in 11 ms. SOA ns1.google.com. dns-admin.google.com. 101275644 900 900 1800 60 from server ns2.google.com in 24 ms. SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns4.google.com in 25 ms. $ dig +nssearch google.se SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns1.google.com in 10 ms. SOA ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns3.google.com in 10 ms. SOA ns4.google.com. dns-admin.google.com. 101275644 900 900 1800 60 from server ns2.google.com in 24 ms. SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns4.google.com in 25 ms. === While on the topic of corner cases, was the TTL of 600 for a google.com NXDOMAIN (being a result of NOERROR for the NS hostnames) expected as well? I think this may be an issue that is fixed in (the most recent) 1.5.4 release. It would have TTL 300 like it says in the rdata (because that is lower). Best regards, Wouter === $ dig nonexistant1.google.com ; DiG 9.4.2-P2 nonexistant1.google.com ;; global options: printcmd ;; Got answer: ;; -HEADER- opcode: QUERY, status: NXDOMAIN, id: 50243 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;nonexistant1.google.com. IN A ;; AUTHORITY SECTION: google.com. 600 IN SOA ns1.google.com. dns-admin.google.com. 101273744 7200 1800 1209600 300 ;; Query time: 621 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;; WHEN: Sat Aug 22 07:24:13 2015 ;; MSG SIZE rcvd: 91 === -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJV2sNRAAoJEJ9vHC1+BF+NFnYP/3P3nHYEJD06uwTJ4/F5sjFc o+bEIwAAdfspo66R31kgzmgnmaI/OAZ+Q6tZyGxMhZc78C9zikkupTinAFKeTyYh HD0uCP3WvyRXuB/UL6ylO5jlOlEDcq5HbSGYo7Of12XgPwzPTM4ghlg2e87c4GcR IlIrqu/3go/x2sOTYFlPRcsGQdbwZzUaY5uNWDVdEEIleQcUQNbiaxuuCTzONKYM ay17LdsqGo0DE6mPDgZ5TY3qGCQ2iBFmgMuU3+A8yWmJPc1ZIWqNiJV3uCQ8DOAK 9dB4D1A3OZXJED4ZAnMvwq78krVtMTCdyd0F90Hg0i4veE6kUm4W3RazseDmwDI/ 1Vo0JwFPaXWSyZ4p9fHvyQ81lSVfwpQTQgjt+zzPoCNgdJoqviv4e73vL54ReY03 8yFS1BA96tKUltKH/L9uLpbGgYvFU9FZ4VimG0W5uW43gIXfjtvUiQvGZ0zC/OaB vKPVpsAE+hVjAxoijlv56FigEkjMcGZvGLUJERFRNVDEHMxJ0z5a1VCsrhEFSpPZ 19sZXfA8hpemHBdSwCFSEQt3cfsb15VxIaxJwONQ8Vvmw2mujPQ9hNgk7oGLTcSB r/lnhp4+Gn28uRUQqnP/1t5Db+0YZjX6j99+ufdcDfOA8kkIsZjigxE1WJ/csmAl WRxJjhgTUry+VKUO9App =pYqP -END PGP SIGNATURE-
Re: unbound NXDOMAIN TTL shared between records
On Fri, Aug 21, 2015 at 11:13:34PM +0200, Wouter Wijngaards via Unbound-users wrote: This is because the RRset cache is shared between answers. The SOA record is in that cache. When you query the second time, unbound detects that the SOA record has not changed, and therefore keeps timing out the existing SOA record. And then you get a lower TTL, of that SOA record, when you query again. This is because of cache update rules, which are complicated. We want to time out existing records, so that we look them up again when they expire. If the newer SOA record was different (i.e. contained different data), it would have been updated. These cache update rules are set to stop eg. cache poisoning, and the resolver sticking to an old nameserver after a nameserver change. Thanks for the explanation. Just knowing that this is by design and not due to me triggering some bug or memory starvation issue is comforting. One of the domains that were confusing me further was looking up stuff under google.se where the TTL would sometimes be shared and sometimes not. But now that I know what to look for I notice that there seem to be discrepancies in the SOA serial, below is an example of running +nssearch a few times in a row: === $ dig +nssearch google.se SOA ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns1.google.com in 10 ms. SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns3.google.com in 11 ms. SOA ns3.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns2.google.com in 24 ms. SOA ns3.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns4.google.com in 25 ms. $ dig +nssearch google.se SOA ns2.google.com. dns-admin.google.com. 101275644 900 900 1800 60 from server ns1.google.com in 11 ms. SOA ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns3.google.com in 11 ms. SOA ns1.google.com. dns-admin.google.com. 101275644 900 900 1800 60 from server ns2.google.com in 24 ms. SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns4.google.com in 25 ms. $ dig +nssearch google.se SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns1.google.com in 10 ms. SOA ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns3.google.com in 10 ms. SOA ns4.google.com. dns-admin.google.com. 101275644 900 900 1800 60 from server ns2.google.com in 24 ms. SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server ns4.google.com in 25 ms. === While on the topic of corner cases, was the TTL of 600 for a google.com NXDOMAIN (being a result of NOERROR for the NS hostnames) expected as well? === $ dig nonexistant1.google.com ; DiG 9.4.2-P2 nonexistant1.google.com ;; global options: printcmd ;; Got answer: ;; -HEADER- opcode: QUERY, status: NXDOMAIN, id: 50243 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;nonexistant1.google.com. IN A ;; AUTHORITY SECTION: google.com. 600 IN SOA ns1.google.com. dns-admin.google.com. 101273744 7200 1800 1209600 300 ;; Query time: 621 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;; WHEN: Sat Aug 22 07:24:13 2015 ;; MSG SIZE rcvd: 91 === -- Patrik Lundin
Re: unbound NXDOMAIN TTL shared between records
Hi, Patrik - which also suspiciously seems to use the SOA TTL of 7200 rather than the NXDOMAIN TTL of 18000 That is how negative cache TTL is calculated (by authoritative server). RFC2308 reads: (Negative cache) TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL. Regards, Daisuke HIGASHI
RE: unbound NXDOMAIN TTL shared between records
Hi Patrik, Yes I can confirm that unbound have a domain wide NXD caching. As long as the returned TTL for your second query is lower than the max TTL for the record this (IMHO) is not a violation of RFC2308. However there are domains out there that return a higher TTL for EMPTY NOERROR vs NXDOMAIN and this can trick unbound into cache the value longer than expected. This issue was reported to unbound. Using the SOA TTL is expected see RFC 2308 section 3. The TTL of this record is set from the minimum of the MINIMUM field of the SOA record and the TTL of the SOA itself, and indicates how long a resolver may cache the negative answer. For more info watch the video from the DNS OARC workshop in Amsterdam about 39 minutes in https://www.youtube.com/watch?v=UcAygzNSxlI Thanks, Stephan Lagerholm -Original Message- From: Unbound-users [mailto:unbound-users-boun...@unbound.net] On Behalf Of Patrik Lundin via Unbound-users Sent: Friday, August 21, 2015 8:15 AM To: unbound-users@unbound.net Subject: unbound NXDOMAIN TTL shared between records Hello, I recently noticed what to me is a strange caching behaviour for NXDOMAIN results. This has been seen both on Ubuntu 14.04 with unbound 1.4.22 and on OpenBSD with unbound 1.5.2. I noticed that for some domains, the cache TTL for NXDOMAIN results seemed to be shared for all nonexistant replies under that domain: The first lookup (which also suspiciously seems to use the SOA TTL of 7200 rather than the NXDOMAIN TTL of 18000): === dig https://na01.safelinks.protection.outlook.com/?url=nonexistant1.unbound. netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=w6aM HZJ%2fsTmXyKW2aCIqsaB2m3t1X3bSrQSR4QEk0os%3d ; DiG 9.4.2-P2 https://na01.safelinks.protection.outlook.com/?url=nonexistant1.unbound. netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=w6aM HZJ%2fsTmXyKW2aCIqsaB2m3t1X3bSrQSR4QEk0os%3d ;; global options: printcmd ;; Got answer: ;; -HEADER- opcode: QUERY, status: NXDOMAIN, id: 35933 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;nonexistant1.unbound.net. IN A ;; AUTHORITY SECTION: unbound.net.7200IN SOA ns.nlnetlabs.nl. postmaster.unbound.net. 2015081500 28800 7200 604800 18000 ;; Query time: 474 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;; WHEN: Fri Aug 21 16:51:23 2015 ;; MSG SIZE rcvd: 104 === The second lookup for that same name, which as one would expect has a decremented TTL: === $ dig https://na01.safelinks.protection.outlook.com/?url=nonexistant1.unbound. netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=w6aM HZJ%2fsTmXyKW2aCIqsaB2m3t1X3bSrQSR4QEk0os%3d ; DiG 9.4.2-P2 https://na01.safelinks.protection.outlook.com/?url=nonexistant1.unbound. netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=w6aM HZJ%2fsTmXyKW2aCIqsaB2m3t1X3bSrQSR4QEk0os%3d ;; global options: printcmd ;; Got answer: ;; -HEADER- opcode: QUERY, status: NXDOMAIN, id: 9365 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;nonexistant1.unbound.net. IN A ;; AUTHORITY SECTION: unbound.net.7195IN SOA ns.nlnetlabs.nl. postmaster.unbound.net. 2015081500 28800 7200 604800 18000 ;; Query time: 0 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;; WHEN: Fri Aug 21 16:51:28 2015 ;; MSG SIZE rcvd: 104 === Now we look up another nonexistant domain, which I would expect to have a TTL of 7200 (18000?), but this one shares the reported TTL with my previous lookup: === $ dig https://na01.safelinks.protection.outlook.com/?url=nonexistant2.unbound. netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=8pLKV 79WhwE6EXBrwFSkm73o6du8mTKzHuNyL4qrbz4%3d ; DiG 9.4.2-P2 https://na01.safelinks.protection.outlook.com/?url=nonexistant2.unbound. netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=8pLKV 79WhwE6EXBrwFSkm73o6du8mTKzHuNyL4qrbz4%3d ;; global options: printcmd ;; Got answer: ;; -HEADER- opcode: QUERY, status: NXDOMAIN, id: 27898 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;nonexistant2.unbound.net. IN A ;; AUTHORITY SECTION: unbound.net.7189IN SOA ns.nlnetlabs.nl. postmaster.unbound.net. 2015081500 28800 7200 604800 18000 ;; Query time: 32 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;; WHEN: Fri Aug 21 16:51:34 2015 ;; MSG SIZE rcvd: 104 === Does anyone else see this? Is it by design? What makes this
Re: unbound NXDOMAIN TTL shared between records
Patrik Lundin via Unbound-users unbound-users@unbound.net wrote: The first lookup (which also suspiciously seems to use the SOA TTL of 7200 rather than the NXDOMAIN TTL of 18000): RFC 2308 section 5 Like normal answers negative answers have a time to live (TTL). As there is no record in the answer section to which this TTL can be applied, the TTL must be carried by another method. This is done by including the SOA record from the zone in the authority section of the reply. When the authoritative server creates this record its TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL. Tony. -- f.anthony.n.finch d...@dotat.at http://dotat.at/ Plymouth: Southerly 4 or 5 becoming variable 3 or 4. Slight or moderate. Rain or drizzle, fog patches. Moderate or good, occasionally very poor.
Re: unbound NXDOMAIN TTL shared between records
On Fri, Aug 21, 2015 at 03:40:14PM +, Stephan Lagerholm wrote: Yes I can confirm that unbound have a domain wide NXD caching. As long as the returned TTL for your second query is lower than the max TTL for the record this (IMHO) is not a violation of RFC2308. Interesting... Is it documented somewhere where why it is done this way? I was actually worried that it could be a symptom of getting close to my configured msg-cache-size or something like that. However there are domains out there that return a higher TTL for EMPTY NOERROR vs NXDOMAIN and this can trick unbound into cache the value longer than expected. This issue was reported to unbound. For more info watch the video from the DNS OARC workshop in Amsterdam about 39 minutes in https://www.youtube.com/watch?v=UcAygzNSxlI Thanks a lot for pointing out your presentation. I just looked through it and it was very informative. I had specifically scratched my head looking at nonexistant1.google.com returning a TTL of 600 to my client which matched neither the 86400 SOA TTL or the 300 minimum TTL. It was interesting to hear that the 600 came from the NXDOMAIN response for the equivalent lookup of nonexistant1.google.com. -- Patrik Lundin
Re: unbound NXDOMAIN TTL shared between records
On Fri, Aug 21, 2015 at 04:32:33PM +0100, Tony Finch wrote: RFC 2308 section 5 Like normal answers negative answers have a time to live (TTL). As there is no record in the answer section to which this TTL can be applied, the TTL must be carried by another method. This is done by including the SOA record from the zone in the authority section of the reply. When the authoritative server creates this record its TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL. Thanks for pointing that out, it explains the length of the initial TTL. -- Patrik Lundin