Re: [DNSOP] draft-moura-dnsop-negative-cache-loop

2021-11-12 Thread Stephane Bortzmeyer
On Mon, Nov 08, 2021 at 08:49:03AM +0100,
 Giovane C. M. Moura  wrote 
 a message of 58 lines which said:

> We wrote a new draft that adds a new requirement to existing solutions:
> recursive resolvers must detect and negative cache problematic (loop)
> records.

I basically agree with Petr Špaček and Ralf Weber. Resource limiting is:

* more general (it also addresses infinite recursion - CVE-2014-8500,
CVE-2014-8602, CVE-2014-8601, not just loops),

* already implemented.

So, I'm not sure we need a new RFC.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] draft-moura-dnsop-negative-cache-loop

2021-11-10 Thread Petr Špaček

On 10. 11. 21 10:31, Giovane C. M. Moura wrote:

Ad the draft content:


2.  Past solutions

This section somehow does not mention RFC 2308 section 7.1 which solves
most of the problem if implemented. In fact BIND has an implementation
of it and is not vulnerable to the TsuNAME attack (or at least I was not
able to reproduce it).



Yep, but 7.1 was unfortunately (for this case) optional, and a MAY.

But when we privately disclosed tsuname at OARC34, we tested only if
BIND and others would loop in the presence of a single client query.
They don't. That covers only one source of loop: resolvers looping.

But what happens when a client sends non-stop queries to the same
resolver?  Does bind answer from cache (7.1 RFC2308) OR will trigger new
queries again? (we did not test for that, if you did, could you please
share the findings)?


This is an interesting question. In case of BIND there are two (or 
three...) things which prevent it from generating queries to 
authoritatives when queried repeatedly:


1] First stage is RFC 2308 section 7.1-style "SERVFAIL cache". It is by 
default configured with a 1 second TTL ("servfail-ttl" option in 
named.conf).
Identical queries which resulted in SERVFAIL are responded from this 
cache without doing anything else.
Please note that this is an "output" cache, i.e. it stores SERVFAILs 
generated by the resolver itself - which happens when query fails for a 
number of reasons, including resource limits.


2] If the answer is not in SERVFAIL cache, the resolver starts 
recursing, but naturally consults its RR cache for each step. While 
processing the second query, the resolver will find delegations from the 
authoritative servers in RR cache and use these instead of re-querying 
servers again. I.e. no queries will be generated until TTL in RR cache 
expires (or cache eviction kicks out delegation RRs for other reasons).


3] The third reason is a bug in older versions of BIND :-D A subtle bug 
caused mishandling of queries with cyclic dependencies in delegations, 
causing BIND to _delay_ responding with SERVFAIL by roughly 10 seconds 
(an another internal timeout).


All two/three mechanism dampen amount of outgoing queries. Of course we 
need to look at it with attacker's mindset and probe for holes in it, 
but with this infrastructure in place I think it will not be much worse 
than regular TTL=0 query/answer flood, and that's only possible if 
attacker has control over delegation TTL (which is AFAIK not the case 
for most TLDs).



Because if does not cache, clients recurrent queries would force the
resolver to send many queries to the authoritative servers, and it would
seem they'd be looping.  See fig3(b) in [0], where we show that only
some of Google resolvers would be aggressive -- and those were the ones
that had these impatient clients.
That's the second root cause: clients/forwarders looping.


Sure, that boils down to generic problem "clients evading cache in 
resolvers", which is always PITA. We should declare TTL=0 illegal :-)


--
Petr Špaček

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] draft-moura-dnsop-negative-cache-loop

2021-11-10 Thread Giovane C. M. Moura
Thanks Ralf,

> I fully agree here. Most of the current or older implementations
> solve this by resource limiting and had no problem with tsuName. Only
> some new cloud implementations had a problems. So please don’t
> require those that had working mitigations to change them.

Well, not only cloud implementations: we found 34 ASes that had issues
-- but again that is limited by our vantage points (sinkhole & ripe atlas).


>> An additional nitpick: I think section 4.  New requirement sound
>> avoid term "negative" caching. In my eyes it is a bit misleading
>> because "negative" is typically used for different kinds of
>> answers.
> Maybe failed resolution caching is a better term here.

Sure, will work on that.

Thanks Ralf,

/giovane

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] draft-moura-dnsop-negative-cache-loop

2021-11-10 Thread Giovane C. M. Moura
Thanks a lot, Petr.

> 
> If I understand this correctly, TL;DR summary essentially is
> """ make https://datatracker.ietf.org/doc/html/rfc2308#section-7.1
> mandatory """
> (even though your version is a bit stronger). Is that correct?
> 

Thanks  for pointing to this section. We missed it.
We need to make a new draft version incorporating this.

RFC2308's solution is not strong enough (MAY cache, we say MUST cache)
-- as you rightfully pointed.


Question about "server failure": do loops qualify as a  "server failure"
in the resolver's logic?

I assume they do, the resolver will simply try to resolver a qname, and
after say, as you pointed,  "a resource limit like, say, number of
delegation steps per query", it automatically classify the query as
failure, even though I mean,  all *parent* authoritative servers are
responsive when loops are present.


> If it is the case, then the document needs to clearly update 2308
> section 7.1 and go through standards track. Right now this might not be
> clear.
> 

+1

> Ad the draft content:
> 
>> 2.  Past solutions
> This section somehow does not mention RFC 2308 section 7.1 which solves
> most of the problem if implemented. In fact BIND has an implementation
> of it and is not vulnerable to the TsuNAME attack (or at least I was not
> able to reproduce it).
> 

Yep, but 7.1 was unfortunately (for this case) optional, and a MAY.

But when we privately disclosed tsuname at OARC34, we tested only if
BIND and others would loop in the presence of a single client query.
They don't. That covers only one source of loop: resolvers looping.

But what happens when a client sends non-stop queries to the same
resolver?  Does bind answer from cache (7.1 RFC2308) OR will trigger new
queries again? (we did not test for that, if you did, could you please
share the findings)?

Because if does not cache, clients recurrent queries would force the
resolver to send many queries to the authoritative servers, and it would
seem they'd be looping.  See fig3(b) in [0], where we show that only
some of Google resolvers would be aggressive -- and those were the ones
that had these impatient clients.
That's the second root cause: clients/forwarders looping.



>> 4.  New requirement
> I think section 4 should not require full blown _loop_ detection, but
> any sort of limit should be good enough for compliance.
> 
> I mean, implementing a loop detection algorithm in hot path might not be
> a good idea, mainly because most of the time it just wastes resources -
> compared to a simple resource limit like, say, number of delegation
> steps per query.

That sounds much simpler indeed, and that's what RFC1035  and RFC. Will
incorporate that.


> I hope this early feedback helps a bit.

It helps a lot, thanks for bringing the developer point-of-view in the
discussion.

best,

/giovane


[0] https://www.isi.edu/~johnh/PAPERS/Moura21b.pdf

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] draft-moura-dnsop-negative-cache-loop

2021-11-09 Thread Ralf Weber
Moin!

On 9 Nov 2021, at 17:12, Petr Špaček wrote:
>> 4.  New requirement
> I think section 4 should not require full blown _loop_ detection, but any 
> sort of limit should be good enough for compliance.
>
> I mean, implementing a loop detection algorithm in hot path might not be a 
> good idea, mainly because most of the time it just wastes resources - 
> compared to a simple resource limit like, say, number of delegation steps per 
> query.
>
> To be clear:
> I don't think the resolver _has to_ stop resolution at the earliest moment it 
> has data to potentially detect the cycle. If the cycle has length 2, it 
> should be okay to allow the resolver to do 4,6,8,... steps before giving up. 
> For compliance it should be good enough to stop within "a" reasonable limit 
> (not necessarily specified by a number).
I fully agree here. Most of the current or older implementations solve this by 
resource limiting and had no problem with tsuName. Only some new cloud 
implementations had a problems. So please don’t require those that had working 
mitigations to change them.

> An additional nitpick: I think section 4.  New requirement sound avoid term 
> "negative" caching. In my eyes it is a bit misleading because "negative" is 
> typically used for different kinds of answers.
Maybe failed resolution caching is a better term here.

So long
-Ralf
——-
Ralf Weber

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] draft-moura-dnsop-negative-cache-loop

2021-11-09 Thread Petr Špaček

On 08. 11. 21 8:49, Giovane C. M. Moura wrote:

Folks,

Loops in DNS are an old problem, but as our tsuname[0,1] disclosure last
May shows, they are still a problem.

We wrote a new draft that adds a new requirement to existing solutions:
recursive resolvers must detect and negative cache problematic (loop)
records.

It would be nice to hear what folks have to say.


I generally support the direction, 22+ years after RFC 2308 was 
published it's time to have a look at it again.



If I understand this correctly, TL;DR summary essentially is
""" make https://datatracker.ietf.org/doc/html/rfc2308#section-7.1 
mandatory """

(even though your version is a bit stronger). Is that correct?

If it is the case, then the document needs to clearly update 2308 
section 7.1 and go through standards track. Right now this might not be 
clear.



Ad the draft content:


2.  Past solutions
This section somehow does not mention RFC 2308 section 7.1 which solves 
most of the problem if implemented. In fact BIND has an implementation 
of it and is not vulnerable to the TsuNAME attack (or at least I was not 
able to reproduce it).



3.  Current Problem
Nitpick: Maybe this should go to Appendix as there is no protocol 
description in here?




4.  New requirement
I think section 4 should not require full blown _loop_ detection, but 
any sort of limit should be good enough for compliance.


I mean, implementing a loop detection algorithm in hot path might not be 
a good idea, mainly because most of the time it just wastes resources - 
compared to a simple resource limit like, say, number of delegation 
steps per query.


To be clear:
I don't think the resolver _has to_ stop resolution at the earliest 
moment it has data to potentially detect the cycle. If the cycle has 
length 2, it should be okay to allow the resolver to do 4,6,8,... steps 
before giving up. For compliance it should be good enough to stop within 
"a" reasonable limit (not necessarily specified by a number).



An additional nitpick: I think section 4.  New requirement sound avoid 
term "negative" caching. In my eyes it is a bit misleading because 
"negative" is typically used for different kinds of answers.



I hope this early feedback helps a bit.

--
Petr Špaček

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


[DNSOP] draft-moura-dnsop-negative-cache-loop

2021-11-07 Thread Giovane C. M. Moura
Folks,

Loops in DNS are an old problem, but as our tsuname[0,1] disclosure last
May shows, they are still a problem.

We wrote a new draft that adds a new requirement to existing solutions:
recursive resolvers must detect and negative cache problematic (loop)
records.

It would be nice to hear what folks have to say.

Thanks,

/giovane

Giovane C.M. Moura
SIDN Labs


[0] https://tsuname.io
[1] https://www.isi.edu/~johnh/PAPERS/Moura21b.pdf

--

A new version of I-D, draft-moura-dnsop-negative-cache-loop-00.txt
has been successfully submitted by Giovane C. M. Moura and posted to the
IETF repository.

Name:   draft-moura-dnsop-negative-cache-loop
Revision:   00
Title:  Negative Caching of Looping NS records
Document date:  2021-11-08
Group:  Individual Submission
Pages:  8
URL:
https://www.ietf.org/archive/id/draft-moura-dnsop-negative-cache-loop-00.txt
Status:
https://datatracker.ietf.org/doc/draft-moura-dnsop-negative-cache-loop/
Htmlized:
https://datatracker.ietf.org/doc/html/draft-moura-dnsop-negative-cache-loop


Abstract:
   This document updates guidance about detecting DNS loops in recursive
   resolver algorithms with new requirements to require recursive
   resolvers to detect loops and to implement negative caches.





The IETF Secretariat


___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop