Re: unbound NXDOMAIN TTL shared between records

2015-08-24 Thread W.C.A. Wijngaards via Unbound-users
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Patrik,

On 22/08/15 07:27, Patrik Lundin wrote:
 On Fri, Aug 21, 2015 at 11:13:34PM +0200, Wouter Wijngaards via
 Unbound-users wrote:
 
 This is because the RRset cache is shared between answers.  The
 SOA record is in that cache.  When you query the second time,
 unbound detects that the SOA record has not changed, and
 therefore keeps timing out the existing SOA record.  And then you
 get a lower TTL, of that SOA record, when you query again.
 
 This is because of cache update rules, which are complicated.  We
 want to time out existing records, so that we look them up again
 when they expire.  If the newer SOA record was different (i.e.
 contained different data), it would have been updated.  These
 cache update rules are set to stop eg. cache poisoning, and the
 resolver sticking to an old nameserver after a nameserver
 change.
 
 
 Thanks for the explanation. Just knowing that this is by design and
 not due to me triggering some bug or memory starvation issue is
 comforting.
 
 One of the domains that were confusing me further was looking up
 stuff under google.se where the TTL would sometimes be shared and
 sometimes not. But now that I know what to look for I notice that
 there seem to be discrepancies in the SOA serial, below is an
 example of running +nssearch a few times in a row:
 
 === $ dig +nssearch google.se SOA ns2.google.com.
 dns-admin.google.com. 101273744 900 900 1800 60 from server
 ns1.google.com in 10 ms. SOA ns1.google.com. dns-admin.google.com.
 101273744 900 900 1800 60 from server ns3.google.com in 11 ms. SOA
 ns3.google.com. dns-admin.google.com. 101273744 900 900 1800 60
 from server ns2.google.com in 24 ms. SOA ns3.google.com.
 dns-admin.google.com. 101273744 900 900 1800 60 from server
 ns4.google.com in 25 ms.
 
 $ dig +nssearch google.se SOA ns2.google.com. dns-admin.google.com.
 101275644 900 900 1800 60 from server ns1.google.com in 11 ms. SOA
 ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60
 from server ns3.google.com in 11 ms. SOA ns1.google.com.
 dns-admin.google.com. 101275644 900 900 1800 60 from server
 ns2.google.com in 24 ms. SOA ns1.google.com. dns-admin.google.com.
 101273744 900 900 1800 60 from server ns4.google.com in 25 ms.
 
 $ dig +nssearch google.se SOA ns1.google.com. dns-admin.google.com.
 101273744 900 900 1800 60 from server ns1.google.com in 10 ms. SOA
 ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60
 from server ns3.google.com in 10 ms. SOA ns4.google.com.
 dns-admin.google.com. 101275644 900 900 1800 60 from server
 ns2.google.com in 24 ms. SOA ns1.google.com. dns-admin.google.com.
 101273744 900 900 1800 60 from server ns4.google.com in 25 ms. ===
 
 While on the topic of corner cases, was the TTL of 600 for a
 google.com NXDOMAIN (being a result of  NOERROR for the NS
 hostnames) expected as well?

I think this may be an issue that is fixed in (the most recent) 1.5.4
release.  It would have TTL 300 like it says in the rdata (because
that is lower).

Best regards, Wouter

 === $ dig nonexistant1.google.com
 
 ;  DiG 9.4.2-P2  nonexistant1.google.com ;; global options:
 printcmd ;; Got answer: ;; -HEADER- opcode: QUERY, status:
 NXDOMAIN, id: 50243 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0,
 AUTHORITY: 1, ADDITIONAL: 0
 
 ;; QUESTION SECTION: ;nonexistant1.google.com.   IN  A
 
 ;; AUTHORITY SECTION: google.com. 600 IN  SOA
 ns1.google.com. dns-admin.google.com. 101273744 7200 1800 1209600
 300
 
 ;; Query time: 621 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;;
 WHEN: Sat Aug 22 07:24:13 2015 ;; MSG SIZE  rcvd: 91 ===
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJV2sNRAAoJEJ9vHC1+BF+NFnYP/3P3nHYEJD06uwTJ4/F5sjFc
o+bEIwAAdfspo66R31kgzmgnmaI/OAZ+Q6tZyGxMhZc78C9zikkupTinAFKeTyYh
HD0uCP3WvyRXuB/UL6ylO5jlOlEDcq5HbSGYo7Of12XgPwzPTM4ghlg2e87c4GcR
IlIrqu/3go/x2sOTYFlPRcsGQdbwZzUaY5uNWDVdEEIleQcUQNbiaxuuCTzONKYM
ay17LdsqGo0DE6mPDgZ5TY3qGCQ2iBFmgMuU3+A8yWmJPc1ZIWqNiJV3uCQ8DOAK
9dB4D1A3OZXJED4ZAnMvwq78krVtMTCdyd0F90Hg0i4veE6kUm4W3RazseDmwDI/
1Vo0JwFPaXWSyZ4p9fHvyQ81lSVfwpQTQgjt+zzPoCNgdJoqviv4e73vL54ReY03
8yFS1BA96tKUltKH/L9uLpbGgYvFU9FZ4VimG0W5uW43gIXfjtvUiQvGZ0zC/OaB
vKPVpsAE+hVjAxoijlv56FigEkjMcGZvGLUJERFRNVDEHMxJ0z5a1VCsrhEFSpPZ
19sZXfA8hpemHBdSwCFSEQt3cfsb15VxIaxJwONQ8Vvmw2mujPQ9hNgk7oGLTcSB
r/lnhp4+Gn28uRUQqnP/1t5Db+0YZjX6j99+ufdcDfOA8kkIsZjigxE1WJ/csmAl
WRxJjhgTUry+VKUO9App
=pYqP
-END PGP SIGNATURE-


Re: unbound NXDOMAIN TTL shared between records

2015-08-21 Thread Patrik Lundin via Unbound-users
On Fri, Aug 21, 2015 at 11:13:34PM +0200, Wouter Wijngaards via Unbound-users 
wrote:
 
 This is because the RRset cache is shared between answers.  The SOA
 record is in that cache.  When you query the second time, unbound
 detects that the SOA record has not changed, and therefore keeps
 timing out the existing SOA record.  And then you get a lower TTL, of
 that SOA record, when you query again.
 
 This is because of cache update rules, which are complicated.  We want
 to time out existing records, so that we look them up again when they
 expire.  If the newer SOA record was different (i.e. contained
 different data), it would have been updated.  These cache update rules
 are set to stop eg. cache poisoning, and the resolver sticking to an
 old nameserver after a nameserver change.
 

Thanks for the explanation. Just knowing that this is by design and not
due to me triggering some bug or memory starvation issue is comforting.

One of the domains that were confusing me further was looking up stuff under
google.se where the TTL would sometimes be shared and sometimes not. But now
that I know what to look for I notice that there seem to be discrepancies in
the SOA serial, below is an example of running +nssearch a few times in a row:

===
$ dig +nssearch google.se
SOA ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server 
ns1.google.com in 10 ms.
SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server 
ns3.google.com in 11 ms.
SOA ns3.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server 
ns2.google.com in 24 ms.
SOA ns3.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server 
ns4.google.com in 25 ms.

$ dig +nssearch google.se 
SOA ns2.google.com. dns-admin.google.com. 101275644 900 900 1800 60 from server 
ns1.google.com in 11 ms.
SOA ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server 
ns3.google.com in 11 ms.
SOA ns1.google.com. dns-admin.google.com. 101275644 900 900 1800 60 from server 
ns2.google.com in 24 ms.
SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server 
ns4.google.com in 25 ms.

$ dig +nssearch google.se 
SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server 
ns1.google.com in 10 ms.
SOA ns2.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server 
ns3.google.com in 10 ms.
SOA ns4.google.com. dns-admin.google.com. 101275644 900 900 1800 60 from server 
ns2.google.com in 24 ms.
SOA ns1.google.com. dns-admin.google.com. 101273744 900 900 1800 60 from server 
ns4.google.com in 25 ms.
===

While on the topic of corner cases, was the TTL of 600 for a google.com NXDOMAIN
(being a result of  NOERROR for the NS hostnames) expected as well?
===
$ dig nonexistant1.google.com 

;  DiG 9.4.2-P2  nonexistant1.google.com
;; global options:  printcmd
;; Got answer:
;; -HEADER- opcode: QUERY, status: NXDOMAIN, id: 50243
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;nonexistant1.google.com.   IN  A

;; AUTHORITY SECTION:
google.com. 600 IN  SOA ns1.google.com. 
dns-admin.google.com. 101273744 7200 1800 1209600 300

;; Query time: 621 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sat Aug 22 07:24:13 2015
;; MSG SIZE  rcvd: 91
===

-- 
Patrik Lundin


Re: unbound NXDOMAIN TTL shared between records

2015-08-21 Thread Daisuke HIGASHI via Unbound-users
Hi, Patrik -

 which also suspiciously seems to use the SOA TTL of 7200
 rather than the NXDOMAIN TTL of 18000

That is how negative cache TTL is calculated (by authoritative server).
RFC2308 reads:

  (Negative cache) TTL is taken from the minimum of the SOA.MINIMUM field
   and SOA's TTL.

Regards,
Daisuke HIGASHI


RE: unbound NXDOMAIN TTL shared between records

2015-08-21 Thread Stephan Lagerholm via Unbound-users
Hi Patrik,

Yes I can confirm that unbound have a domain wide NXD caching. As long as the 
returned TTL for your second query is lower than the max TTL for the record 
this (IMHO) is not a violation of RFC2308. However there are domains out there 
that return a higher TTL for EMPTY NOERROR vs NXDOMAIN and this can trick 
unbound into cache the value longer than expected. This issue was reported to 
unbound. 

Using the SOA TTL is expected see RFC 2308 section 3. 
The TTL of this record is set from the minimum of the MINIMUM field of the SOA 
record and the TTL of the SOA itself, and indicates how long a resolver may 
cache the negative answer.

For more info watch the video from the DNS OARC workshop in Amsterdam about 39 
minutes in https://www.youtube.com/watch?v=UcAygzNSxlI

Thanks, Stephan Lagerholm

 -Original Message-
 From: Unbound-users [mailto:unbound-users-boun...@unbound.net] On
 Behalf Of Patrik Lundin via Unbound-users
 Sent: Friday, August 21, 2015 8:15 AM
 To: unbound-users@unbound.net
 Subject: unbound NXDOMAIN TTL shared between records
 
 Hello,
 
 I recently noticed what to me is a strange caching behaviour for NXDOMAIN
 results.
 
 This has been seen both on Ubuntu 14.04 with unbound 1.4.22 and on
 OpenBSD with unbound 1.5.2.
 
 I noticed that for some domains, the cache TTL for NXDOMAIN results
 seemed to be shared for all nonexistant replies under that domain:
 
 The first lookup (which also suspiciously seems to use the SOA TTL of 7200
 rather than the NXDOMAIN TTL of 18000):
 ===
 dig
 https://na01.safelinks.protection.outlook.com/?url=nonexistant1.unbound.
 netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9
 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=w6aM
 HZJ%2fsTmXyKW2aCIqsaB2m3t1X3bSrQSR4QEk0os%3d
 
 ;  DiG 9.4.2-P2 
 https://na01.safelinks.protection.outlook.com/?url=nonexistant1.unbound.
 netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9
 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=w6aM
 HZJ%2fsTmXyKW2aCIqsaB2m3t1X3bSrQSR4QEk0os%3d
 ;; global options:  printcmd
 ;; Got answer:
 ;; -HEADER- opcode: QUERY, status: NXDOMAIN, id: 35933 ;; flags: qr rd
 ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
 
 ;; QUESTION SECTION:
 ;nonexistant1.unbound.net.  IN  A
 
 ;; AUTHORITY SECTION:
 unbound.net.7200IN  SOA ns.nlnetlabs.nl.
 postmaster.unbound.net. 2015081500 28800 7200 604800 18000
 
 ;; Query time: 474 msec
 ;; SERVER: 192.168.1.1#53(192.168.1.1)
 ;; WHEN: Fri Aug 21 16:51:23 2015
 ;; MSG SIZE  rcvd: 104
 ===
 
 The second lookup for that same name, which as one would expect has a
 decremented TTL:
 ===
 $ dig
 https://na01.safelinks.protection.outlook.com/?url=nonexistant1.unbound.
 netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9
 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=w6aM
 HZJ%2fsTmXyKW2aCIqsaB2m3t1X3bSrQSR4QEk0os%3d
 
 ;  DiG 9.4.2-P2 
 https://na01.safelinks.protection.outlook.com/?url=nonexistant1.unbound.
 netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9
 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=w6aM
 HZJ%2fsTmXyKW2aCIqsaB2m3t1X3bSrQSR4QEk0os%3d
 ;; global options:  printcmd
 ;; Got answer:
 ;; -HEADER- opcode: QUERY, status: NXDOMAIN, id: 9365 ;; flags: qr rd
 ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
 
 ;; QUESTION SECTION:
 ;nonexistant1.unbound.net.  IN  A
 
 ;; AUTHORITY SECTION:
 unbound.net.7195IN  SOA ns.nlnetlabs.nl.
 postmaster.unbound.net. 2015081500 28800 7200 604800 18000
 
 ;; Query time: 0 msec
 ;; SERVER: 192.168.1.1#53(192.168.1.1)
 ;; WHEN: Fri Aug 21 16:51:28 2015
 ;; MSG SIZE  rcvd: 104
 ===
 
 Now we look up another nonexistant domain, which I would expect to have
 a TTL of 7200 (18000?), but this one shares the reported TTL with my
 previous lookup:
 ===
 $ dig
 https://na01.safelinks.protection.outlook.com/?url=nonexistant2.unbound.
 netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9
 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=8pLKV
 79WhwE6EXBrwFSkm73o6du8mTKzHuNyL4qrbz4%3d
 
 ;  DiG 9.4.2-P2 
 https://na01.safelinks.protection.outlook.com/?url=nonexistant2.unbound.
 netdata=01%7c01%7cstlagerh%40microsoft.com%7c4780f7fab8a045710b9
 908d2aa3b8ac9%7c72f988bf86f141af91ab2d7cd011db47%7c1sdata=8pLKV
 79WhwE6EXBrwFSkm73o6du8mTKzHuNyL4qrbz4%3d
 ;; global options:  printcmd
 ;; Got answer:
 ;; -HEADER- opcode: QUERY, status: NXDOMAIN, id: 27898 ;; flags: qr rd
 ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
 
 ;; QUESTION SECTION:
 ;nonexistant2.unbound.net.  IN  A
 
 ;; AUTHORITY SECTION:
 unbound.net.7189IN  SOA ns.nlnetlabs.nl.
 postmaster.unbound.net. 2015081500 28800 7200 604800 18000
 
 ;; Query time: 32 msec
 ;; SERVER: 192.168.1.1#53(192.168.1.1)
 ;; WHEN: Fri Aug 21 16:51:34 2015
 ;; MSG SIZE  rcvd: 104
 ===
 
 Does anyone else see this? Is it by design? What makes this 

Re: unbound NXDOMAIN TTL shared between records

2015-08-21 Thread Tony Finch via Unbound-users
Patrik Lundin via Unbound-users unbound-users@unbound.net wrote:

 The first lookup (which also suspiciously seems to use the SOA TTL of 7200
 rather than the NXDOMAIN TTL of 18000):

RFC 2308 section 5

   Like normal answers negative answers have a time to live (TTL).  As
   there is no record in the answer section to which this TTL can be
   applied, the TTL must be carried by another method.  This is done by
   including the SOA record from the zone in the authority section of
   the reply.  When the authoritative server creates this record its TTL
   is taken from the minimum of the SOA.MINIMUM field and SOA's TTL.

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
Plymouth: Southerly 4 or 5 becoming variable 3 or 4. Slight or moderate. Rain
or drizzle, fog patches. Moderate or good, occasionally very poor.


Re: unbound NXDOMAIN TTL shared between records

2015-08-21 Thread Patrik Lundin via Unbound-users
On Fri, Aug 21, 2015 at 03:40:14PM +, Stephan Lagerholm wrote:
 
 Yes I can confirm that unbound have a domain wide NXD caching. As
 long as the returned TTL for your second query is lower than the max
 TTL for the record this (IMHO) is not a violation of RFC2308.


Interesting... Is it documented somewhere where why it is done this way?
I was actually worried that it could be a symptom of getting close to
my configured msg-cache-size or something like that.

 However
 there are domains out there that return a higher TTL for EMPTY NOERROR
 vs NXDOMAIN and this can trick unbound into cache the value longer
 than expected. This issue was reported to unbound. 
 
 For more info watch the video from the DNS OARC workshop in Amsterdam
 about 39 minutes in https://www.youtube.com/watch?v=UcAygzNSxlI
 

Thanks a lot for pointing out your presentation. I just looked through
it and it was very informative.

I had specifically scratched my head looking at nonexistant1.google.com
returning a TTL of 600 to my client which matched neither the 86400 SOA
TTL or the 300 minimum TTL.

It was interesting to hear that the 600 came from the NXDOMAIN response
for the equivalent  lookup of nonexistant1.google.com.

-- 
Patrik Lundin


Re: unbound NXDOMAIN TTL shared between records

2015-08-21 Thread Patrik Lundin via Unbound-users
On Fri, Aug 21, 2015 at 04:32:33PM +0100, Tony Finch wrote:
 
 RFC 2308 section 5
 
Like normal answers negative answers have a time to live (TTL).  As
there is no record in the answer section to which this TTL can be
applied, the TTL must be carried by another method.  This is done by
including the SOA record from the zone in the authority section of
the reply.  When the authoritative server creates this record its TTL
is taken from the minimum of the SOA.MINIMUM field and SOA's TTL.
 

Thanks for pointing that out, it explains the length of the initial TTL.

-- 
Patrik Lundin