Hi Peter,

Thanks you very much for these comments. I will look carefully how to
implement carefully these comments in our new version.

Yours,
Daniel

On Tue, May 16, 2023 at 1:08 PM Peter Thomassen <pe...@desec.io> wrote:

>
>
> On 5/12/23 23:09, Viktor Dukhovni wrote:
> > Repost of my belated comments in the thread, apologies about not doing
> > it right the first time...
>
> Inspired by Viktor's comments, I spent some time to give the document a
> thorough review.
>
> I'd like to support Viktor's comments on the dependent RRset TTL cap
> described in Section 9.
>
I feel that the recommendation there is potentially harmful while its
> benefit is unclear. As for the harm, it makes DS updates less flexible
> because it effectively pushes their TTL towards higher values (so that
> caches remain effective). While always-low DS TTLs are problematic, too, it
> doesn't seem like a sound concept for an auth's load to be essentially
> inversely proportional to the DS TTL when it is set to a low value
> temporarily.
>
> As for the benefit, the objective appears to be "preponing" the removal of
> cached RRsets from their scheduled expiry to "as soon as they potentially
> would not longer validate", as indicated by upstream TTLs related to the
> trust chain. However, there's no need to do this based on TTLs alone: if
> one wants to pursue this (optional) objective, it is sufficient to
> revalidate once an *actual* change in the DNSKEY or DS set is detected. But
> even in the face of a sudden change in the trust relationship, it's not
> clear whether ignoring a signed (!) long TTL is beneficial, as that might
> harm stability and resilience during time periods of configuration errors,
> which the cache would otherwise help survive.
>
> I agree and will update the document (see my response to Viktor). Thanks
for raising this.

>
> Second, I'm confused about the normative language in this informational
> document. (There are about 20 occurrences of MUST and about 40 of
> SHOULD/RECOMMENDED.)
>
> These are recommendations, so a MUST inside a recommendation is in many
cas eto be read as "strongly recommended" while SHOULD may be "it is
recommended". I do understand your concern and we try to see the best way
to solve that.

>
> Third, The document contains several inaccurate or contradictory
> statements. One example is related to  Section 7.1.4, which says:
>     *  DNS resolver MUST validate the TA before starting the DNSSEC
>        resolver, and a failure of TA validity check MUST prevent the
>        DNSSEC resolver to be started.  Validation of the TA includes
>        coherence between out-out band values, values stored in the DNS as
>        well as corresponding DS RRsets.
>
> The recommendation says that a resolver may not be started if it's trust
> anchors are incoherent with values obtained from the DNS.
>
> My understanding is that the purpose of a trust anchor is to pin a trusted
> key for a name, in a self-contained fashion, without relying on its
> confirmation through some other channel (e.g. corresponding DS records). If
> a trust anchor is required to be coherent with values stored in the DNS,
> then the trust anchor doesn't appear to be needed in the first place.
>

The primary intent of that recommendation is to prevent a resolver
configured with a deprecated TA or synchronised time that starts resolving.
This idea here is that the resolver starts by checking the coherence
time/TA raises an error otherwise. There is more control during when the
resolver is started, so the net admin is likely to investigate what
happened if it does not see the resolver being started.


>
> It is also left open how the DRO should check "coherence between out-out
> band values, values stored in the DNS as well as corresponding DS RRsets"
> for their root trust anchors. There are no DS records, so you can check ...
> DNSKEY?

yes, that is essentially the keys.

> Hm. Then, what exactly to check? Also, what about IANA's root-anchors.xml
> file (RFC 7958)? -- The problem here is that "values stored in the DNS" is
> underspecified, although one MUST comply with it.
>
> It is underspecified as there is not a single procedure and we did not
want to assume only the root key is used. For the KSK of the root zone 7958
is expected to be part of the validation procedure. We mention that we
expect other TA follows have as similar mechanism.

What's more, Sections 7.1.2.1 has:
>     Besides deployments in
>     networks other than the global public Internet (hence a different
>     root), operators may want to configure other trust points.
>
> Now, how would the above recommendation (enforce trust anchor coherence
> with DNS) be enforced in such a setting?
>
> This idea here is that if a DRO is willing to set a specific Trust Anchor
(let's say other than the root). There is a need to implement a procedure
to validate the TAs can be used. One of the intent here is also to point
out at the implications of adding another TA than the root TA.

>
> That said, I wrote up some of my (pencil-on-printout) comments from the
> remainder of the document; you can find them below.
>
> Looking at my scribblings, large parts of the document seem to lack
> clarity (at least to me). The parts which are not unclear to me (few
> scribblings) are
>
> We are actively refactoring the document, so we will look carefully at
making these points clearer.

> - Section 1-4 (intro material and boilerplate)
> - Section 6 (importance of correct wall time)
> - Section 7.1 until 7.1.2 included (trust anchor intro)
> - Section 13 (transport considerations)
> - Section 14 (IANA considerations)
>
> ... with the missing sections containing the meat of the recommendations.
> As I find many of the them unclear, I'm not sure I support the document as
> is, simply because I have a hard time following what is says.
>
> I'd like to emphasize that I appreciate the work and effort that went into
> the document. I just think that for it to be helpful guidance (and for the
> actual recommendations and arguments to be discussed), a lot of work on
> clarity is needed. My review is intended as constructive feedback, not as
> harsh criticism.
>
> Best,
> Peter
>
>
> Section 5:
>     A DRO needs to be able to enable DNSSEC validation with sufficient
>     confidence they will not be held responsible in case their resolver
>     does not validate the DNSSEC response.  The minimization of these
>     risks
>
> This sounds like a managerial document from a business risk department. --
> As the opening paragraph of the section laying out the justification for
> the different recommendations types, I wonder if this is a sufficiently
> stringent argument for justifying technical guidance.
>
> I see your point, this could be move to more high level sections such as
the introduction for example. The recommendations aims both at saying how
to deploy resolvers and make folks confident they such employment is
reasonable. Another important aspect is also to avoid things that could be
perceived as being good ideas.  Your comment on capping is a good example
;-)


> In the same section, there are some occurrences of rather obscure language:
>     The
>     recommendations do not come with the same level of recommendations
>
> or
>
>     Some recommendations may simply not be
>     provided by the operated software
>
> I'm not sure what these things mean.
>
> The intent of these sentence was to say that it could be fine some
recommendations are not checked. In some cases, the risks associated might
be minimal in other the risk is higher. One reason for not being able to
implement them is that they are not provided by the software. In many
cases, the mechanism described are already defined and implemented - by
default or not by the software. In many case, we do not expect the DRO to
hack the software to fulfill the recommendations. I believe that what we
expected here.



> Similarly, in Section 6:
>     For all recommendations, it is strongly RECOMMENDED that
>     recommendations are supported by automated processes.
>
> Section 6 also has:
>     *  While operating, a DRO MUST closely monitor time derivations of
>        the resolvers and maintain the time synchronized.
>
> s/derivations/deviations/
>
> A point that's missing here is (how) to take into account the effects of
> time adjustments on stored TTLs.
>
> Section 7 explains three types of trust-anchor-related recommendations,
> namely initial provisioning, updates, and reporting. It then says:
>     Note that TA update and TA reporting only concerns running resolvers.
>
> It's unclear to me why this is written down here. It's prefectly clear
> that when nothing's running, then no validation is going on, so there's no
> reporting or updating of validation trust anchors.
>
That is correct.

>
> This kind of "requirement fencing" is familiar to me from risk management
> documents, where the requirements author attempts to prevent a manager
> potentially not familiar with the topic to enforce certain requirements in
> contexts where they are not applicable.
>
> I see your point. I think it is somehow related to the use of MUST within
recommendations. We probbaly can do better. That is a useful input.

> I have no idea whether this is the case here, but I remain unconvinced of
> the need to say that validation-related things only apply to running
> resolvers. In fact, I find such statements distracting (and as such an
> anti-pattern), as they make me think "what's this, did I miss something? is
> there an edge case?".
>
> Section 7.1.2 has:
>     Although some bootstrapping mechanisms to securely retrieve publish
>     [RFC7958] and retrieve [UNBOUND-ANCHOR] the Root Zone Trust Anchor
>     have been defined, it is believed these mechanisms should be extended
>     to other KSKs or Trust Anchors.
>
> believed by whom?
>
at least the authors.

>
> Another example of fuzzy language is in 7.1.2.1, which says:
>     For validators that may be used on the global public Internet (with
>     "may be" referring to general purpose, general release code),
>     handling the IANA managed root zone KSK trust anchor is a
>     consideration.
>
> It's a thing to get right (and not a consideration).
>
> ok, in that case being more directive.

> Section 7.1.3:
>     The generation of a configuration file associated to the TA is
>     expected to be implementation independent.  The necessity of tweaking
>     the data [...]
>
> In general, TA configuration does not require generation of a
> configuration file. (An implementation might just a well take them from
> something like /etc/resolver/trust-anchors.d/, with each file therein
> containing DS-type records, and the the domain somehow encoded in the
> filename.)
>
> ok configuration may not be restricted to a file.

> It's not clear what "tweaking the data" means (neither which data, nor in
> what way they are tweaked).
>
> so here tweaking means that different implementations may not take the
same input as a configuration of the TAs.

> This suggests that the author of this text has a specific context in mind,
> from which the line of argument descends (similarly to the managerial
> framing in other sections).
>
>

> Section 7.2:
>     This includes for a DRO the ability to
>     check which TA are in used as well as to resolve in collaboration of
>     authoritative servers and report the used TAs.
>
> I am not sure what this means. Resolve what -- DNS queries, or trust
> anchor issues? Something else?
>

I think what we meant was the  the resolver is able to report the TA to the
authoritative server.

>
> Section 7.2.1:
>     Trust is inherently a matter of an operations policy.  As such, a DRO
>     will need to be able to update the list of Trust Anchors.  TA updates
>     are not expected to be handled manually.  This introduces a
>     potentially huge vector for configuration errors
>
> Probably the opposite is meant? (No manual handling --> less configuration
> errors?)
>
at least the huge vector is associated to the manual configuration ;-)

>
>     Instead DRO will rely on "Automated Updates to DNSSEC
>     Trust Anchors" [RFC5011]
>
> Well, perhaps; implementation is not mandatory.
>
> exact. the will is not normative SHOULD may be more appropriated, but to
echo one of your comment, this is why we mentioned some recommendations
could be followed. But I will look how to make that better.


> The two SHOULD recommendations (check TA publisher commitment to RFC 5011,
> and enable RFC 5011 automatic updates) in that section are phrased as
> independent, but they are not. (There's no point to the first
> recommendation when the second is not conditional on the outcome of the
> first.)
>
> correct

> The first paragraph of Section 7.2.2 says:
>     A DRO SHOULD regularly check the trust anchor used by the DNSSEC
>     resolver is up-to-date and that values used by the resolvers are
>     conform to the ones in the configuration
>
> I find this quite fuzzy. Does this mean that the software should detect
> configuration changes and reload the trust anchor?
>
 We wanted to avoid to rely on updating the configuration and reloading
such files to avoid read-only system to keep the old configuration. So our
expectation is rather to restart the resolver in case of mismatch. The
point here is to detect a TA update that fails.

>
> In any case, this section is about "regular checks", but its first
> explicit recommendation is for "STARTUP", which seems inconsistent.
>
>     In the case of a key roll
>     over, the resolver is moving from an old value to an up-to date
>     value.  This up-to-date value does not need to survive reboot, and
>     there is no need to update the configuration file of the running
>     instances - configuration is updated by a separate process.  To put
>     it in other words, the updated value of the TA is only expected to be
>     stored in the resolver's memory.  Avoiding the configuration file to
>     be updated prevents old configuration file to survive to writing
>     error on read-only file systems.
>
> I'm not convinced. Rollover from a very old trust anchor to a new one may
> not be possible indefinitely, like when you reboot three years later.
>
exactly, this is the reason we are insisting on starting in a clean state.

>
> Also, booting with a trust anchor that was broken long ago is insecure, as
> an attacker might exploit that by subsequently forging the rollover. It
> seems more prudent to write rollovers to permanent storage, at least when
> the algorithm or key size is changed. Not doing so is effectively trusting
> the old key indefinitely, against better knowledge.
>
> What we recommend is to set a procedure for a clean start, so a single
procedure is put in place with the correct up-to-date TAs.

> The recommendation there says:
>     *  DRO SHOULD enable "Signaling Trust Anchor Knowledge in DNS
>        Security Extensions (DNSSEC)" [RFC8145] to provide visibility to
>        the TA used by the resolver.  The TA can be queried using a DNSKEY
>        query.
>
> This is not about querying the TA.
>
> No, this is to enable to provide the TA in used. It is preferred to use
that mechanism over inspecting the memory or files on the system.

>     Note also that [RFC8145] does not only concern Trust Anchor but is
>     instead generic to DNSKEY RRsets.  As a result, unless for the root
>     zone, it is not possible to determine if the KSK/ZSK or DS is a Trust
>     Anchor or a KSK/ZSK obtained from regular DNSSEC resolutions.
>
> DROs (who are the subject of the document) can easily determine whether a
> key or DS has been obtained from a trust anchor or from regular resolution
> is easily possible: just look at whether a trust anchor is configured for
> the name, or whether a DS query was issued.
>
> yes, I think we put that note to indicate that a response is not
necessarily a TA. Maybe that can be omitted.

> Transferring the note from my print-out, I realize that perhaps what was
> meant is that the recipient of an RFC 8145 signal cannot tell whether the
> signaled key is a trust anchor. I did not realize that the first few times
> I read it.
>
> It is always better to be clear the first time so, I will try to correct
this. Thanks. But yes the second interpretation is what we intended to
mean.

>     A failed key roll over or any other abnormal situation MUST trigger
>     an alarm.
>
> What does "alarm" mean in this context? (It's underspecified but
> mandatory.)
>
> An alarm here is a notification to the admin

>     If the mismatch is due to a failed key roll-over, this SHOULD be
>     considered as a bug by the DRO.  The DRO MUST restart the resolver
>     with updated TA.
>
> Why should it be considered a bug? It may just be a misconfiguration.
>
> What we wanted to say here is that the DRO must not try to fix the roll
over. The same way he does not try to fix a software bug. The DRO is just
expected to restart the resolver.


> The situation here is after a failed rollover. Restart with what updated
> TA? Is the intention here that the DRO handles this manually? (That is
> discouraged in other parts of the document.)

The start procedure is such that TA are updated, and the procedure is
automated.

>
>
    *  A DRO SHOULD be able to check the status of a TA as defined in
>        Section 3 of [RFC7583].
>
> I can't find anything like that in this section. (It deals with key
> rollover timing, not with trust anchor checks.)
>
> I think these can be reused no?

> Section 8:
>     The intent of this section is to position these
>     guidelines toward the operational recommendations provided in this
>     document.
>
> This is not technical advice. It sounds like an internal compliance
> document. Who is the audience of this document?
>
> We did not want to repeat the NTA document.

>     *  DRO SHOULD set automated procedures to determine the NTA of DNSSEC
>        resolvers.
>
> What does that mean?
>
The expectation is that DRO ensures they won't be surprised and got
prepared when  NTA needs to be put in place.

>
>     A failure in signaling validation is associated to a mismatch between
>     the key and the signature.
>
> What signaling?
>
> This is a nit signaling should be read as signature

> A validation mismatch is not necessarily between key and signature, it may
> also be between data and signature.
>
correct we need to clarify this.

>
>     In addition, DRO are likely to
>     have specific communication channel with TA maintainer which eases
>     trouble shooting.
>
> Why should that be so / what's the basis for the likelihood statement?
>
I do not see that as always true, for example when DLV were used or with
key of the root zone.

>
>     A signature validation failure is either an attack or a failure in
>     the signing operation on the authoritative servers.
>
> Or something else, like a misconfiguration of a DS record, or a validation
> bug, or ...
>
> The last recommendation in this section is MAY (which admits either way),
> although it is labeled a "recommendation" (which implies a preference for
> doing it).
>
> Section 9:
>     the DNSSEC validator performs a DNSSEC query to
>     the authoritative server that returns the RRset signed with the new
>     KSK / ZSK.  The DNSSEC validator may not be able to retrieve the new
>     KSK / ZSK
>
> Why should it be the case that the resolver can query some RRset, but not
> the DNSKEY RRset?
>
> We took this as an hypothetical use case, but there is no reason this is
the expected way so it is basically a bug associated to the emergency key
roll over.

>     This either results in a bogus resolution or in an
>     invalid signature check.
>
> What's the difference?
>
This outcome is the same, but the reasons are different.

>
>     Note that by comparing the Key Tag Fields,
>     the DNSSEC validator is able to notice the new KSK / ZSK used for
>     signing differs from the one used to generate the received generated
>     signature.
>
> The key tags may be the same even when the key differs.
>
> correct, though unlikely.

>     However, the DNSSEC validator is not expected to retrieve
>     the new ZSK / KSK, as such behavior could be used by an attacker.
>
> I am confused what this could mean.
>
I think what we meant is that when a signature fails, there is no reason
for the DRO to flush and retry the resolution.

>
>     Note also that even though the data may
>     not be associated to the KSK / ZSK that has been used to validate the
>     data, the link between the KSK / ZSK and the data is still stored in
>     the cache using the RRSIG.
>
> This seems highly implementation-dependent.
>
> All of the comments so far on Section 9 relate to two paragraphs, which I
> don't think are necessary for what follows. Instead of fixing the
> inaccuracies, it may be better to just drop them.
>
> I think we agreed on that.

> Further down in the section, the text mentions "TTL associated with
> FQDNs", which is not accurate as a name can have several RRsets with
> different TTLs.
>
> correct.

> Apart from that, I disagree with the recommendation in this section (see
> beginning of this message).
>
> Section 10.1:
>     A DRO MAY regularly report the Trust Anchor used to the authoritative
>     server.  This would at least provide insight to the authoritative
>     server and provide some context before moving a key roll over
>     further.
>
> The question is what the authoritative should do with this information, if
> lots of trust anchors are report that have not been updated.
>
> That is beyond the scope of the DRO, but that would give him some sense if
his roll over can go wrong or not.

> That's probably out of scope for this document, but nevertheless an
> immediate question: Should the rollover process shall be stalled? That
> would open up a trivial path to block the rollover. If not, what then? --
> Perhaps it's better to not get into this and drop the last half sentence.
>
> I think the process has been delayed for the root zone roll over. This
intention is also to mention the DRO and authoritative server can/should
collaborate.


> Section 10.2
>     Similarly, a DRO may be informed by other channel a rogue
>     or unwilling DNSKEY has been emitted.
>
> What's an unwilling DNSKEY?
>
unwilling key designates a buggy key here I think that is generated and put
into the zone by the owner of the zone. This is different from a rogue key
which I would consider as being the result of an attack.

>
>     *  A DRO MUST be able to flush the cached data subtree associated to
>        a DNSKEY
>
> It seems to me that at the MUST level, it's sufficient to flush the cache
> as a whole.
>
> Flushing the full cache is one way to do this. It is fine though we could
refine this a bit.

> Section 11:
>     *  A DRO SHOULD regularly request and monitor the signature scheme
>        supported by an authoritative server.
>
> What does that mean?
>
> The intention was that DRO sort of monitor authoritative server to ease
the deployment of new algo. One message we want to carry is that
collaboration should be encouraged.

>     *  A DRO SHOULD report a "Unsupported DNSKEY Algorithm" as defined in
>        [RFC8914] when a deprecated algorithm is used for validation.
>
> Is this meant for rcode 0 responses?

 I was expecting rcode =2

>     One inconvenient to such strategy i sthat it does not let one DRO to
>     take advantage of more recent cryptographic.
>
> Why?
>
> The crypto is basically provided by the authoritative side which needs to
update its crypto to be aware that resolver supports it. This is why we
recommend the resolver to advertise the supported crypto.

> Section 12:
>     12. Invalid Reporting Recommendations
>
> This section title seems confusing.
>
>     An invalid response may be the result of an attack or a
>     misconfiguration, and the DNSSEC validator may play an important role
>     in sharing this information with the authoritative server or domain
>     name owner.
>
> I'm not sure I agree with this. It's probably not a good idea if all
> validating resolvers start contacting a specific domain owner.
>
> good point we need maybe to be more specific, thought I expected the
contact to be between organization as opposed to be on a per request.

> Section 13:
>     RUNTIME: * DRO SHOULD regularly discover MTU
>
> I'm no expert here; does this really need regular checks, or is there a
> value that's generally considered safe? If regular checking is done, how
> frequently would be reasonable?
>
> yes. We need to check the frequency, but I am unsure we will have a
specific number. Let's check.

> Section 15:
>     Providing inappropriate information can lead to misconfiguring the
>     DNSSEC validator, and thus disrupting the DNSSEC resolution service.
>
> Not sure what "providing inappropriate information" means here.
>
> This is a bad sentence I think we meant information that will taken into
account for the configuration.

>     RRSet that were
>     cached require a DNSSEC resolution over the Internet
>
> when queried.
>
>     An attacker may ask the DNSSEC validator to consider a rogue KSK/ZSK,
>     thus hijacking the DNS zone.
>
> How so?
>
>     An attacker (cf.  Section 7) can advertise a "known insecure" KSK or
>     ZSK is "back to secure"
>
> How so?
>
> The intent was to list attack goals, not the attack itself, but maybe we
can be more specific.

> --
> https://desec.io/
>
> _______________________________________________
> DNSOP mailing list
> DNSOP@ietf.org
> https://www.ietf.org/mailman/listinfo/dnsop
>


-- 
Daniel Migault
Ericsson
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to