Re: [DNSOP] draft-moura-dnsop-negative-cache-loop
On Mon, Nov 08, 2021 at 08:49:03AM +0100, Giovane C. M. Moura wrote a message of 58 lines which said: > We wrote a new draft that adds a new requirement to existing solutions: > recursive resolvers must detect and negative cache problematic (loop) > records. I basically agree with Petr Špaček and Ralf Weber. Resource limiting is: * more general (it also addresses infinite recursion - CVE-2014-8500, CVE-2014-8602, CVE-2014-8601, not just loops), * already implemented. So, I'm not sure we need a new RFC. ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] draft-moura-dnsop-negative-cache-loop
On 10. 11. 21 10:31, Giovane C. M. Moura wrote: Ad the draft content: 2. Past solutions This section somehow does not mention RFC 2308 section 7.1 which solves most of the problem if implemented. In fact BIND has an implementation of it and is not vulnerable to the TsuNAME attack (or at least I was not able to reproduce it). Yep, but 7.1 was unfortunately (for this case) optional, and a MAY. But when we privately disclosed tsuname at OARC34, we tested only if BIND and others would loop in the presence of a single client query. They don't. That covers only one source of loop: resolvers looping. But what happens when a client sends non-stop queries to the same resolver? Does bind answer from cache (7.1 RFC2308) OR will trigger new queries again? (we did not test for that, if you did, could you please share the findings)? This is an interesting question. In case of BIND there are two (or three...) things which prevent it from generating queries to authoritatives when queried repeatedly: 1] First stage is RFC 2308 section 7.1-style "SERVFAIL cache". It is by default configured with a 1 second TTL ("servfail-ttl" option in named.conf). Identical queries which resulted in SERVFAIL are responded from this cache without doing anything else. Please note that this is an "output" cache, i.e. it stores SERVFAILs generated by the resolver itself - which happens when query fails for a number of reasons, including resource limits. 2] If the answer is not in SERVFAIL cache, the resolver starts recursing, but naturally consults its RR cache for each step. While processing the second query, the resolver will find delegations from the authoritative servers in RR cache and use these instead of re-querying servers again. I.e. no queries will be generated until TTL in RR cache expires (or cache eviction kicks out delegation RRs for other reasons). 3] The third reason is a bug in older versions of BIND :-D A subtle bug caused mishandling of queries with cyclic dependencies in delegations, causing BIND to _delay_ responding with SERVFAIL by roughly 10 seconds (an another internal timeout). All two/three mechanism dampen amount of outgoing queries. Of course we need to look at it with attacker's mindset and probe for holes in it, but with this infrastructure in place I think it will not be much worse than regular TTL=0 query/answer flood, and that's only possible if attacker has control over delegation TTL (which is AFAIK not the case for most TLDs). Because if does not cache, clients recurrent queries would force the resolver to send many queries to the authoritative servers, and it would seem they'd be looping. See fig3(b) in [0], where we show that only some of Google resolvers would be aggressive -- and those were the ones that had these impatient clients. That's the second root cause: clients/forwarders looping. Sure, that boils down to generic problem "clients evading cache in resolvers", which is always PITA. We should declare TTL=0 illegal :-) -- Petr Špaček ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] draft-moura-dnsop-negative-cache-loop
Thanks Ralf, > I fully agree here. Most of the current or older implementations > solve this by resource limiting and had no problem with tsuName. Only > some new cloud implementations had a problems. So please don’t > require those that had working mitigations to change them. Well, not only cloud implementations: we found 34 ASes that had issues -- but again that is limited by our vantage points (sinkhole & ripe atlas). >> An additional nitpick: I think section 4. New requirement sound >> avoid term "negative" caching. In my eyes it is a bit misleading >> because "negative" is typically used for different kinds of >> answers. > Maybe failed resolution caching is a better term here. Sure, will work on that. Thanks Ralf, /giovane ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] draft-moura-dnsop-negative-cache-loop
Thanks a lot, Petr. > > If I understand this correctly, TL;DR summary essentially is > """ make https://datatracker.ietf.org/doc/html/rfc2308#section-7.1 > mandatory """ > (even though your version is a bit stronger). Is that correct? > Thanks for pointing to this section. We missed it. We need to make a new draft version incorporating this. RFC2308's solution is not strong enough (MAY cache, we say MUST cache) -- as you rightfully pointed. Question about "server failure": do loops qualify as a "server failure" in the resolver's logic? I assume they do, the resolver will simply try to resolver a qname, and after say, as you pointed, "a resource limit like, say, number of delegation steps per query", it automatically classify the query as failure, even though I mean, all *parent* authoritative servers are responsive when loops are present. > If it is the case, then the document needs to clearly update 2308 > section 7.1 and go through standards track. Right now this might not be > clear. > +1 > Ad the draft content: > >> 2. Past solutions > This section somehow does not mention RFC 2308 section 7.1 which solves > most of the problem if implemented. In fact BIND has an implementation > of it and is not vulnerable to the TsuNAME attack (or at least I was not > able to reproduce it). > Yep, but 7.1 was unfortunately (for this case) optional, and a MAY. But when we privately disclosed tsuname at OARC34, we tested only if BIND and others would loop in the presence of a single client query. They don't. That covers only one source of loop: resolvers looping. But what happens when a client sends non-stop queries to the same resolver? Does bind answer from cache (7.1 RFC2308) OR will trigger new queries again? (we did not test for that, if you did, could you please share the findings)? Because if does not cache, clients recurrent queries would force the resolver to send many queries to the authoritative servers, and it would seem they'd be looping. See fig3(b) in [0], where we show that only some of Google resolvers would be aggressive -- and those were the ones that had these impatient clients. That's the second root cause: clients/forwarders looping. >> 4. New requirement > I think section 4 should not require full blown _loop_ detection, but > any sort of limit should be good enough for compliance. > > I mean, implementing a loop detection algorithm in hot path might not be > a good idea, mainly because most of the time it just wastes resources - > compared to a simple resource limit like, say, number of delegation > steps per query. That sounds much simpler indeed, and that's what RFC1035 and RFC. Will incorporate that. > I hope this early feedback helps a bit. It helps a lot, thanks for bringing the developer point-of-view in the discussion. best, /giovane [0] https://www.isi.edu/~johnh/PAPERS/Moura21b.pdf ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] draft-moura-dnsop-negative-cache-loop
Moin! On 9 Nov 2021, at 17:12, Petr Špaček wrote: >> 4. New requirement > I think section 4 should not require full blown _loop_ detection, but any > sort of limit should be good enough for compliance. > > I mean, implementing a loop detection algorithm in hot path might not be a > good idea, mainly because most of the time it just wastes resources - > compared to a simple resource limit like, say, number of delegation steps per > query. > > To be clear: > I don't think the resolver _has to_ stop resolution at the earliest moment it > has data to potentially detect the cycle. If the cycle has length 2, it > should be okay to allow the resolver to do 4,6,8,... steps before giving up. > For compliance it should be good enough to stop within "a" reasonable limit > (not necessarily specified by a number). I fully agree here. Most of the current or older implementations solve this by resource limiting and had no problem with tsuName. Only some new cloud implementations had a problems. So please don’t require those that had working mitigations to change them. > An additional nitpick: I think section 4. New requirement sound avoid term > "negative" caching. In my eyes it is a bit misleading because "negative" is > typically used for different kinds of answers. Maybe failed resolution caching is a better term here. So long -Ralf ——- Ralf Weber ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] draft-moura-dnsop-negative-cache-loop
On 08. 11. 21 8:49, Giovane C. M. Moura wrote: Folks, Loops in DNS are an old problem, but as our tsuname[0,1] disclosure last May shows, they are still a problem. We wrote a new draft that adds a new requirement to existing solutions: recursive resolvers must detect and negative cache problematic (loop) records. It would be nice to hear what folks have to say. I generally support the direction, 22+ years after RFC 2308 was published it's time to have a look at it again. If I understand this correctly, TL;DR summary essentially is """ make https://datatracker.ietf.org/doc/html/rfc2308#section-7.1 mandatory """ (even though your version is a bit stronger). Is that correct? If it is the case, then the document needs to clearly update 2308 section 7.1 and go through standards track. Right now this might not be clear. Ad the draft content: 2. Past solutions This section somehow does not mention RFC 2308 section 7.1 which solves most of the problem if implemented. In fact BIND has an implementation of it and is not vulnerable to the TsuNAME attack (or at least I was not able to reproduce it). 3. Current Problem Nitpick: Maybe this should go to Appendix as there is no protocol description in here? 4. New requirement I think section 4 should not require full blown _loop_ detection, but any sort of limit should be good enough for compliance. I mean, implementing a loop detection algorithm in hot path might not be a good idea, mainly because most of the time it just wastes resources - compared to a simple resource limit like, say, number of delegation steps per query. To be clear: I don't think the resolver _has to_ stop resolution at the earliest moment it has data to potentially detect the cycle. If the cycle has length 2, it should be okay to allow the resolver to do 4,6,8,... steps before giving up. For compliance it should be good enough to stop within "a" reasonable limit (not necessarily specified by a number). An additional nitpick: I think section 4. New requirement sound avoid term "negative" caching. In my eyes it is a bit misleading because "negative" is typically used for different kinds of answers. I hope this early feedback helps a bit. -- Petr Špaček ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
[DNSOP] draft-moura-dnsop-negative-cache-loop
Folks, Loops in DNS are an old problem, but as our tsuname[0,1] disclosure last May shows, they are still a problem. We wrote a new draft that adds a new requirement to existing solutions: recursive resolvers must detect and negative cache problematic (loop) records. It would be nice to hear what folks have to say. Thanks, /giovane Giovane C.M. Moura SIDN Labs [0] https://tsuname.io [1] https://www.isi.edu/~johnh/PAPERS/Moura21b.pdf -- A new version of I-D, draft-moura-dnsop-negative-cache-loop-00.txt has been successfully submitted by Giovane C. M. Moura and posted to the IETF repository. Name: draft-moura-dnsop-negative-cache-loop Revision: 00 Title: Negative Caching of Looping NS records Document date: 2021-11-08 Group: Individual Submission Pages: 8 URL: https://www.ietf.org/archive/id/draft-moura-dnsop-negative-cache-loop-00.txt Status: https://datatracker.ietf.org/doc/draft-moura-dnsop-negative-cache-loop/ Htmlized: https://datatracker.ietf.org/doc/html/draft-moura-dnsop-negative-cache-loop Abstract: This document updates guidance about detecting DNS loops in recursive resolver algorithms with new requirements to require recursive resolvers to detect loops and to implement negative caches. The IETF Secretariat ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop