Hello Duane & others,

thank you for your response. Comments inline below.


On Thu, 2023-06-29 at 23:58 +0000, Wessels, Duane wrote:
> 
> 
> > 
> > ## 2.2
> > 
> > The first paragraph correctly mentions "policy reasons". The second 
> > paragraph
> > correctly says "they are not authoritative". I am not sure not being
> > authoritative can be considered a policy reason, so perhaps these two
> > paragraphs can be connected with an "or"?
> 
> I see your point.  We propose this change to the introduction sentence:
> 
> A name server returns a message with the RCODE field set to REFUSED
> when it refuses to process the query, e.g., for policy or other reasons.

Works for me.

> 
> > 
> > ## 3.1
> > 
> > "A resolver MUST NOT retry a given query over a server's transport  more 
> > than
> > twice" - should this be clarified to say "in a short period of time" or
> > something like that? Clearly a retry is allowed *eventually*.
> 
> For reference, here’s the sentence in question at the start of 3.1:
> 
>    A resolver MUST NOT retry a given query over a server's transport more
>    than twice (i.e., three queries in total) before considering the
>    server's transport unresponsive for that query.
> 
> We feel that “a given query” and “for that query” in the sentence 
> sufficiently limits the
> scope here, and there is no need to qualify it by some amount of time.
> 
> As an example, let’s say that a recursive has been asked to lookup 
> www.example.com (our “given” query).  The example.com zone has two name 
> servers, each of which has two IP addresses, and (presumably) two transports. 
>  It can send 3 queries to 199.43.135.53 over UDP (then that transport is 
> unresponsive), 3 queries to 199.43.133.53 over UDP, same over TCP, over IPv6, 
> and so on.  In total the recursive can send 2x2x2x3 = 24 queries before it 
> has to give up if all servers and all transports are unresponsive. At this 
> point the resolver gives up on that query and returns SERVFAIL.
> 
> Then, section 3.2 is about caching and says that the resolution failure MUST 
> be cached for at least 5 seconds, but otherwise gives implementations a lot 
> of freedom in how to do that.  Could be by query tuple, by server/transport, 
> or some other way.

Right! 3.2 solves this.

> > Also, "MUST NOT" is pretty strong language. Given the various process 
> > models of
> > resolver implementations, two subprocesses (threads) both retrying the same 
> > or
> > a similar thing a few times can not always be avoided. Would you settle for
> > SHOULD NOT? The "given" in "retry a given query" gives some leeway, but not
> > enough, I feel.
> 
> We feel that MUST NOT is appropriate but would like more input from working 
> group
> members and implementors especially.

Ok

> > "may retry a given query over a different transport .. believe .. is 
> > available"
> > - this ignores that some transports have better security properties than
> > others. One currently active draft in this area is
> > draft-ietf-dprive-unilateral-probing. Perhaps add some wording, without 
> > being
> > too prescriptive, such as "available, and compatible with the resolver's
> > security policies, ..".
> 
> We think “compatible with the resolver’s security policies” goes without 
> saying, but don’t mind making it explicit.

I am inclined to agree, and will leave this for others to judge.

> > 
> > ## 3.2
> > 
> > A previous review
> > (https://secure-web.cisco.com/1-uwEOxF71cZbW0W3ux-QNC1pO0bJjYJvc0KHnZ_wN4Xw3M1XWB_K8diPjdzzV1zzAfZ98vObLHcs-9USjQPtEzxOdqnjHtcYGPxv8yID-fDRYNW8i8BtGJL-qahSS-JHbS3LHL6Bfm0duG-nUUKdSZF_MOoDFhQymCFnu838N4-l8Ky7xjoVKijU3pbZHLVQFpxjYecSLm0hqLoc4GW9n2Ri-vYT-lKiSPl5qB72Q1kbSUp21qnHSMMrfCCEizICDfjVzCKrwtau5DkwfiR7PVxgh2wT1twgX8oVBhJIY-0QfTaJLnHg7itWRgwH3tcX/https%3A%2F%2Fmailarchive.ietf.org%2Farch%2Fmsg%2Fdnsop%2FsJlbyhro-4bDhfGBnXhhD5Htcew%2F)
> > suggested that the then-chosen tuple was not specific enough, and also said 
> > it
> > was too prescriptive. I agree with both. The current draft prescribes 
> > nothing,
> > which I'm generally a fan of!
> > 
> > However, speaking to a coworker (the one likely responsible for implementing
> > this draft, if it turns out our implementation deviates from its final form)
> > told me "some guidance would be nice". After some discussion on
> > prescriptiveness, here is our suggestion: do not prescribe, but mention
> > (without wanting to be complete) a few tuple formats that might make sense, 
> > and
> > suggest that implementations document what they choose here.
> 
> The relevant text here currently says:
> 
>    The implementation might cache different resolution failure conditions
>    differently.  For example, DNSSEC validation failures might be cached
>    according to the queried name, class, and type, whereas unresponsive
>    servers might be cached only according to the server's IP address.
> 
> So we provide two examples, although not really phrased as “tuples”.  I guess 
> you’re suggesting to see more options here and talk about them more as tuples?

Yes, I think that would make sense.

> For the documentation suggestion, maybe something like this?: “Developers 
> SHOULD document their implementation choices so that operators know what 
> behaviors to expect when resolution failures are cached.”

Wonderful.

> 
> 
> First, we apologize for not realizing that this and two other “for 
> discussion” questions were not yet resolved.  We plan to remove the first 
> (from the Introduction).
> 
> For the one that was in section 2.6, we propose this updated text and new 
> section 3.4:
> 
> 2.6.  DNSSEC Validation Failures
> 
>    For zones that are signed with DNSSEC, a resolution failure can occur
>    when a security-aware resolver believes it should be able to
>    establish a chain-of-trust for an RRset but is unable to do so,
>    possibly after trying multiple authoritative name servers.  DNSSEC
>    validation failures may be due to signature mismatch, missing DNSKEY
>    RRs, problems with denial-of-existence records, clock skew, or other
>    reasons.
> 
>    Section 4.7 of [RFC4035] already discusses the requirements and
>    reasons for caching validation failures.  Section 3.4 of this
>    document strengthens those requirements.

Good.

> 3.4.  DNSSEC Validation Failures
> 
>    Section 4.7 of [RFC4035] states:
> 
>    To prevent such unnecessary DNS traffic, security-aware resolvers MAY
>    cache data with invalid signatures, with some restrictions.
> 
>    This document updates [RFC4035] with the following, stronger
>    requirement:
> 
>    To prevent such unnecessary DNS traffic, security-aware resolvers
>    MUST cache DNSSEC validation failures, with some restrictions.

Good :)

> And for the one in section 3.3 we propose this:  
> 
> 3.3.  Requerying Delegation Information
> 
>    Section 2.1 of [RFC4697] identifies circumstances in which "every
>    name server in a zone's NS RRSet is unreachable (e.g., during a
>    network outage), unavailable (e.g., the name server process is not
>    running on the server host), or misconfigured (e.g., the name server
>    is not authoritative for the given zone, also known as 'lame')."  It
>    prohibits unnecessary "aggressive requerying" to the parent of a non-
>    responsive zone by sending NS queries.
> 
>    The problem of aggresive requerying to parent zones is not limited to
>    queries of type NS.  This document updates the requirement from
>    section 2.1.1 of [RFC4697] to apply more generally: Upon encountering
>    a zone whose name servers are all non-responsive, a resolver MUST
>    cache the resolution failure.  Furthermore, the resolver MUST limit
>    queries to the non-responsive zone's parent zone (and other ancestor
>    zones) just as it would limit subsequent queries to the non-
>    responsive zone.

Looks great.

Thanks!

Kind regards,
-- 
Peter van Dijk
PowerDNS.COM BV - https://www.powerdns.com/

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to