from:"Edward Lewis"

[DNSOP] Re: New draft: DNS Servers MUST Shuffle Answers

2024-11-07 Thread Edward Lewis

I don’t think you intended this - but for DNSSEC validation, the set has to be 
sorted so don’t MUST NOT that … but I get that this is just a matter of wording 
in a suggestion.  Perhaps, “shuffle on send/reply” is what is desired, what a 
protocol element does internally is up to its maker.

The root cause of this is programmers, in many situations, expect one value to 
be returned and not a list or set.  Dealing with what a “set” is is also a 
weakness in coding.  I keep thinking back to my first experiences with 
gethostbyname and only ever looking at the first returned value until I 
realized there was an array.  (I never thought much about the “[0]” thing in 
all the examples back then.)

> On Nov 7, 2024, at 10:37, Ben Schwartz  
> wrote:
> 
> I would support a draft that says "every authoritative, recursive, forwarder, 
> stub, and application SHOULD shuffle the RRset, and MUST NOT sort it".  Yes, 
> it would suffice that any one of them complies with this recommendation, but 
> the more components comply, the lower the risk of a biased overall system.
> 
> --Ben Schwartz
> From: Joe Abley mailto:jab...@strandkip.nl>>
> Sent: Tuesday, November 5, 2024 9:13 AM
> To: Shane Kerr mailto:sh...@time-travellers.org>>
> Cc: dnsop@ietf.org   >
> Subject: [DNSOP] Re: New draft: DNS Servers MUST Shuffle Answers
>  
> 
> 
> On 5 Nov 2024, at 14:48, Joe Abley  > wrote:
> 
> > The idea of making a protocol change in the DNS to work around behaviour 
> > that might be fixable in one point release of Android and iOS
> 
> ... seems less than ideal, I meant to say. Sorry, clicked send a bit early. 
> Perhaps both those things were obvious :-)
> 
> 
> Joe
> ___
> DNSOP mailing list -- dnsop@ietf.org 
> To unsubscribe send an email to dnsop-le...@ietf.org 
> 
> ___
> DNSOP mailing list -- dnsop@ietf.org 
> To unsubscribe send an email to dnsop-le...@ietf.org 
> 
___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: New draft: DNS Servers MUST Shuffle Answers

2024-11-06 Thread Edward Lewis

On Nov 6, 2024, at 12:18, Mark Andrews  wrote:
> 
> Round robin results in unbalanced traffic when one or more of the addresses 
> is unreachable.  It is not recommended.

This reminds me of another situation … we had a load balancer that would ping 
machines behind it, if they were up, they were included.  The trouble was that 
when BIND 8 was running (single-threaded code), the machine would answer to 
pings but not respond (in a timely manner) to port 53 requests if BIND was 
doing a zone transfer (i.e., then a fairly long-lived operation).  This isn’t 
quite the same situation as described in the draft, but the moral of the tale 
is that one really ought to be doing application-level/specific testing of 
servers to balance the load, relying on anything else risks breakage.

Perhaps recommend that when an answer involves a multi-record set apply some 
(predictable) shuffling but otherwise not expect too much.  The “auth server” 
can’t control enough of the environment to be in position to dictate what the 
eventual receiver will do.

An aside, when I see “addresses…unreachable” I’m reminded that reachability is 
not transitive, A->B might work and A->C might work, but B->C might not.  It’s 
hard for a third party to know if a client would be able reach a server.

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: New draft: DNS Servers MUST Shuffle Answers

2024-11-06 Thread Edward Lewis

On Nov 5, 2024, at 6:56 AM, Shane Kerr  wrote:
> 
> I wrote a quick draft to specify that answers returned should be returned in 
> a random order:
> 
> https://datatracker.ietf.org/doc/draft-kerr-everybodys-shuffling/

(I’ve read the draft and the thread thru Wed 1400UTC, but I am relying to the 
original post.)

I’m surprised no one mentioned “round robin” - a possibly undocumented feature 
of BIND 8 (showing my age) to rotate the records each time the set was included 
in a response.  With DNSSEC creating a need for a canonical ordering of records 
within a set (for the purposes of computing and validating answers), this was a 
bit of a headache.

Round robin seemed to assume that all the answers went to the same querier…a 
bold claim made because of the lack of any documentation…so the rotation might 
not have ever had the desired effect.  But the DNSSEC team was not permitted to 
remove that feature.

A few thoughts:

“Random” is probably a bad thing, or at least a “bad word to use” here.  In 
operations, I need predictability first so monitoring and debugging can be 
possible.  Determinism is good.  To that end, I’d propose, if you do want to 
rotate answers, to use, perhaps, the minutes on the wall clock modulo the 
number of records in the set to indicate which is presented first - the rest 
then follow in canonical order, wrapping over at the end.  This would give 
predicability…in the sense that a packet trace with time would be able to 
determine whether the “right” first record was sent.

Mentioned in the thread by others:

But, there’s still that pesky problem that the records are a set and not an 
ordered list, so other elements in the pipeline may disrupt any attempt to do 
traffic shaping/load management in the DNS.

And, then there’s that DNSSEC precomputed answer feature.  (You could store all 
permutations…whatever.)

I don’t know that the DNS can support application load management all that 
well.  When you look at it through the lens of pure protocol engineering, there 
are a lot of obstacles.  Perhaps pragmatically there can be some benefit, but 
you have to remember that DNS is not a client-server protocol, the source can’t 
expect to control the destination.
___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: DNSOP[EDE] Registering a few more error codes

2024-09-18 Thread Edward Lewis

On Sep 18, 2024, at 10:22 AM, Shumon Huque  wrote:
> 
> On Tue, Sep 17, 2024 at 11:32 AM Wes Hardaker  > wrote:
>> Shumon Huque mailto:shu...@gmail.com>> writes:
>> 
>> > Yes, and more specifically, to quote the RFC, they aren't allowed to
>> > modify DNS protocol processing:
>> 
>> True, but debugging tools may be able to use the machine readable codes
>> as trigger points for diving into further analysis or as a hint for what
>> other fields might be related, etc, to display to the admin/user.
> 
> I agree that this is a reasonable use case. But the work flow of a debugging
> tool does not, in my view, fall under normal DNS protocol processing (the
> prohibition stated in the RFC). So, I think this is fine.

I wanted to second that.  “Trust” isn’t the same when debugging, in fact doubt 
may be a necessary part of that process.

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: DNSOP[EDE] Registering a few more error codes

2024-09-18 Thread Edward Lewis

On Sep 17, 2024, at 10:37 AM, Petr Špaček  wrote:
> 
> On 17. 09. 24 15:57, Stephane Bortzmeyer wrote:
>> On Tue, Sep 17, 2024 at 03:16:43PM +0200,
>>  Petr Špaček  wrote
>>  a message of 30 lines which said:
>>> I think EDE 29 (Synthesized) with text note "RFC 8482" is perfectly
>>> appropriate for the made-up HINFO answer to ANY (or RRSIG or ...) query.
>> I tend to disagree since RFC 8482 is about removing data that exists,
>> not the opposite.
> 
> I tend to disagree. HINFO is (nowadays) very unlikely to exist in the first 
> place, so it is _also_ about making stuff up. Especially when asked for 
> nonexistent.whatever.example ANY the server might conjure new HINFO answer 
> from nothing.
> 
> In my view it's not about removing data, it's about _not even looking at the 
> data_ in the first place.
> 
>> But there is another question: should we try to save codepoint space
>> by using the same EDE for many different uses (and using the extra text
>> to demultiplex) or should we use the fact that the registration policy
>> is quite open to register many codes? RFC 8914, section 5.2, does not
>> offer any guidance.
> 
> I agree that code points are cheap, but on the other hand if there is too 
> many code points to chose from implementations will use different codes for 
> the same thing and that will make it _harder_ for consumers to make sense of 
> meaning.
> 
> Specifically for RFC 8482, I think special code is warranted only for section 
> "4.2.  Answer with a Synthesized HINFO RRset", and the existing EDE 29 
> (Synthesized) fits that very nicely.
> 
> 
> All the rest is normal 'ANY means literally "any"' [1] and I think RFC 8482 
> sections 4.1 and 4.3 do not need special code because it would not add useful 
> information.
> 

To be sure, I looked for the title of  “RFC 8482” : "Providing Minimal-Sized 
Responses to DNS Queries That Have QTYPE=ANY”  (I’m too old to memorize numbers 
anymore.)  It’s important to start with this as “denying ANY” isn’t the same as 
a “synthesized response” despite one practice in the document is to synthesize 
HINFO.  Examples of synthesized responses - wildcards and, to some extent, 
negative answers from cached NSEC information, plus there is always the 
“proprietary option” - synthesizing from some other (non-documented maybe) 
process.

I do recall giving a talk at DNS-OARC, a lightening one, in February 2020 (San 
Fran) about the annoying nature of that document (RFC 8482).  It has too many 
ways a responder can “weasel out” of responding will all the data sets at a 
name.  To be clear, I applaud “eliminating” the practice of sending all data 
sets in response to one query.  But given that it was common practice at one 
time, an explicit, deterministic response is needed to say “no”.

The reason I cared is that I used to observe all the data published at TLD apex 
names to determine how DNSSEC was being implemented.  I used ANY queries to 
grab what was at an apex name, if I wasn’t allowed to do that, I’d poll for the 
types of interest.  The fallback is pretty simple.  A simple indicator that the 
responder would not reply to a query for type = ANY, is sufficient.

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: [EDE] Registering a few more error codes

2024-09-11 Thread Edward Lewis

> On Sep 11, 2024, at 09:22, Stephane Bortzmeyer  wrote:
> 
> In the current registry for Extended DNS Error Codes (RFC 8914), there
> are codes that may be interesting to add:
> 
> * One to say that the response was deliberately minimal (RFC 8482)

Certainly.  I used to have code that prepared to ask for ANY, if that would not 
be honored, the code would then ask for each type of interest.  Knowing whether 
to fallback was a pain without an explicit signal.

> * One to say that the response comes from a local root (RFC 8806)

Certainly.  Could point to someone having a stale copy, resulting in the wrong 
IP address being hit for a next-level down.

> * One to say that the response has been tailored because of ECS (RFC
>  7871) [the most useful, IMHO]

Question - I certainly can see why knowing a response is tailored is useful, 
but does it matter why?  I.e., would this return code be only for tailoring due 
to ECS?  Would a different return code be needed for other tailoring reasons 
(like DNS load balancing/traffic mgmt)?

It is useful to know if a response is tailored when debugging someone else’s 
report - realizing that the answer the debugging individual gets is different 
is on purpose, vs. some other sort of accident of maybe time.

> I am thinking about asking for a registration. Policy for this
> registry is "first come, first served".  Before I start sending email
> to IANA, I ask your advice. Is it a good idea? Will the authors of
> resolver / authoritative software use it?
> 
> ___
> DNSOP mailing list -- dnsop@ietf.org
> To unsubscribe send an email to dnsop-le...@ietf.org

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: [Ext] Request: Review changes - draft-ietf-dnsop-rfc7958bis-03 → 04.

2024-08-26 Thread Edward Lewis

On Aug 21, 2024, at 18:12, Warren Kumari  wrote:
> My initial email in this thread said:
> 
> The IANA is eagerly awaiting this becoming a standard so that they can update 
> their trust anchor with the DNSKEY material - so, if you have any strong 
> objections to these changes, please let me know by end of day (anywhere!) on 
> Aug 18th."

Apologies for only replying now, I missed this message in my in-box until I saw 
Petr’s.

The way that quote is worded makes it sound like a quick approval is important 
and that this would make something a standard.

Even if this is just a document of how IANA publishes information about its 
trust anchors for the root zone it administers[1], there ought to be no 
ambiguity in the meaning of the fields or even the presence/absence of the 
fields.  Nothing should be left to the imagination of the reader.  There’s no 
underlying standard regarding trust anchors that supplies default assumptions.  
This document then has to stand on its own.

I am a bit surprised that this is a WG document, as it pertains to one 
operator’s approach in filling an undefined gap in the management of the 
protocol.  WG review of this is beneficial, arguably the best means to address 
risks involved and a worthy use of WG time.  Nevertheless, if IANA’s operations 
are to be defined by IANA, and they should be, then this document is “owned” by 
IANA and not by DNSOP.

I’m writing this to encourage consideration of Mike St. John’s and Petr 
Spacek’s comments as opposed to pushing this through because “IANA is eagerly 
awaiting this becoming a standard.

[1] qualification recognizing that the DNS protocol can be instantiated in 
different environments, IANA is administering the root zone for the global 
public Internet.___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: [Ext] Request: Review changes - draft-ietf-dnsop-rfc7958bis-03 → 04.

2024-08-21 Thread Edward Lewis

On Aug 20, 2024, at 20:42, Michael StJohns  wrote:
> 
> Hi Paul -
> 
> I'm confused from your responses below - is this a WG document where the WG 
> gets to decide, or is this an IANA document (like the one it was replacing) 
> where IANA gets to decide?  I *think* I saw you argue both ways in your 
> response below.

This question interests me.

When DNSSEC was designed, there was a decision to treat all zones the same.  
The fear was large delegated zones (COM) would need special treatment, we 
didn’t want the protocol to different per zone for that.  We didn’t address the 
uniqueness of the root zone though, specifically in distributing the trust 
anchor for it.  This left a gap we’ve never addressed.

IANA has addressed this for the DNS running on the global public Internet.  
Said in the sense that there is one DNS protocol and possibly many 
instantiations of a running DNS system.  (I knew of a non-Internet DNS at one 
time, operating on a separate, private inter-network.  It may not be around any 
more, on the other hand, when it comes to the inter-planetary work, there may 
be a DNS system per, say, planet.)  This document is addressing how IANA is, 
has, and will be distributing the trust anchor for the root zone they manage.

On the one hand, IANA wants to do what is in the best interests of the global 
public Internet and as such, seeks expert opinions of which this document is an 
example.  The WG can’t materially change the document - without convincing IANA 
to alter something operational.  This doesn’t make WG review futile, a “rubber 
stamp” step, IANA is listening to the feedback.  OTOH, I wonder if this is 
truly a WG document or something that is best through the Independent stream 
but reviewed by the DNSOP WG.

I doubt there is enough energy for the WG to design a “standards based” means 
for root zone trust anchor management and distribution that is out of band, 
despite the gap, as there is only one working example (IANA’s) and IANA has its 
methods (including this document) in place.

“Automated Updates of DNSSEC Trust Anchors” is the WG’s in-band mechanism.  A 
while ago I wrote a replacement for that to address issues uncovered in looking 
at a root zone DNS Security Algorithm change but abandoned the work once I 
realized the only operational deployment of it would be for the root zone, 
which isn’t enough to justify the standards work.  The root zone implementation 
of Automated Updates isn’t precisely “by the RFC” but it works and for any 
change to the DNS Security Algorithm, it’ll be “made to work”, an alternate 
approach isn’t worth pursuing.

> Syntax is easy.  Semantics are hard and this document has a bit too much 
> ambiguity for a naive relying party.  Strangely, if this were simply a signed 
> file of hashes with a time associated with it indicating the IANA's current 
> view (at time of publication) of the trust anchor set, I'd have a lot less to 
> argue about.  Someone tried to do too much I think.

Protocol-defining IETF documents are meant to spur implementations, seeing 
multiple independent implementations interoperate is the goal.  As a result, 
the documents often leave details up to the reader/implementer.

But this document is not a pure protocol-defining document, it is an 
operational process document.  As such, it ought to be more concrete. That is, 
if the goal is to describe the entirety of distributing the trust anchors.  The 
document could be here to just present the marshaling of the trust anchor 
materials - describing the syntax as it does - leaving the interpretation up to 
the writer (IANA) and reader (relying parties).

Maybe this document ought to just describe what’s in the file.  Maybe this 
document ought to expand to include rules for relying on the document as Mike 
suggests.  I’m not decided on this, frankly I need to go over the thread again. 
 But its going to be a debate over whether this document is only about 
marshaling the trust anchors or it is about managing the trust anchors.

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: [Ext] New draft on collision free key tags in DNSSEC

2024-07-27 Thread Edward Lewis

On Jul 27, 2024, at 20:00, John Levine  wrote:
> 
> I am a bad person. My zone uses the new algorithm and I put in two keys with 
> the same tag. Now what? Other than perhaps stopping at two keys rather than 
> three, what is the difference in what resolvers do?

Answering just to further exploration of this - a resolver could elect to 
declare a service failure if it sees two keys in a DNSKEY resource record set 
suffering a collision.  (Caveats - same DNS security algorithm as well.)

Resolvers already are allowed to behave according to local policy and refuse to 
“work too hard” to validate data.  An idea I think is often overlooked is that 
the beneficiary of DNSSEC are resolvers (specifically caches), DNSSEC is 
supplying them cryptographic data to decide whether a data set has made it form 
the source to them unscathed.  Often we (collectively) talk about DNSSEC being 
an extension of a zone administrator’s policy, but it isn’t, despite the zone 
setting all the parameters.

As part of my answering to "further exploration", I’m skeptical that it is 
possible eliminate key tag collisions from the protocol.  Not that collisions 
are in anyway desirable or are worthy of being tolerated, my skepticism is 
whether or not elimination is possible.  Which is why I was thinking of where 
it would be enforced - in a benign setting at the primary, which of course 
doesn’t mean it would catch malignant/malicious use cases.  For the latter, the 
resolver is where duplicates would be, well, “forbidden” from causing wasted 
cycles.

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: New draft on collision free key tags in DNSSEC

2024-07-27 Thread Edward Lewis

[DNSOP] Re: Introducing Relative Label for DNS

2024-07-21 Thread Edward Lewis

The draft reads very different than the email message.  The draft sticks to a 
protocol definition while the email describes a use case.  This difference is 
significant.

Reading the draft (no use case assumed), my first question is how “www.subpage” 
would be encoded (referring to section 4.1’s wire format).  Are both labels 
relative labels or just the final?  What if the encoding marks the first label 
(deepest) as relative and the next as “ordinary”?

In general, what suffix is affixed to a relative label?  In printed zone files, 
there is the $ORIGIN directive, and dynamic update specifies a zone, but those 
are part of use cases (and are not the same, not equivalent).  I say not 
equivalent as dynamic update needs to know what zone to edit, the value isn’t 
intended the same way $ORIGIN is - in a zone file $ORIGIN may be redefined as 
desired.

Where trouble will come is in handling unknown types, see "Handling of Unknown 
DNS Resource Record (RR) Types” (RFC 3597), specifically section 4 on Domain 
Name Compression.  Prior to that document, there was much confusion of where 
domain names can be compressed, it was clarified that only the original set of 
resource records were eligible for compression because those are the only 
resource records “every server has to know”.  (I.e., RFC 1034/1035 are the 
base, all others are optional add-ons.)

I don’t think there’s any good to come from shrinking an in-memory size of the 
zone this way.  Saving space, sure, but I don’t think the cost in code 
complexity will favorable.

I see this as a UI issue.  A (secure) dynamic update client can elect to append 
the zone name (from that section of the message) where there is no ending dot.  
In a zone file, $ORIGIN can be used at will (but doing so for each name would 
be overkill).

Another concern as whether this would bleed into the search string issue - when 
and where search strings are applied.

I think it is best if the server internal representations remain fully 
qualified (even if not the same as the on-the-wire FQDN), as well as in zone 
files to avoid any ambiguity.  As part of that, I doubt there’s ever been a 
comprehensive definition of the grammar of zone files - particularly the 
directives.  ($TTL, $ORIGIN, $GENERATE, etc.) which be useful before judging 
the concept of relative labels.

In a binary zone file [which I have not played with], can’t compression of the 
initial types be done to save space - if needed?  Just a passing thought.

And - I understand the idea that remembering whether the label was entered 
“relative” is desirable, but we’ve always had a larger problem with comments.  
Comments in an original zone file lost once the file is loaded as a zone and 
then written back to disk (or whatever).

Ed

> On Jul 21, 2024, at 14:50, Ben van Hartingsveldt 
>  wrote:
> 
> Dear all,
> 
> In the recent years I started working on my own coded DNS server, because I 
> was done with the synchronization between BIND and DirectAdmin that broke all 
> the time. It resulted in a Java server that is running on 4 IPs for some 
> years now. Because of this, I had to read many RFCs to have it pass tests 
> like Zonemaster, DNSViz, IntoDNS, etc. While reading and implementing things, 
> I also came across some shortcomings of DNS. On advice of someone at SIDN, I 
> will share my draft that I published today. It solves one of the shortcomings 
> that DNS has in its core: relative domain names.
> 
> I'm talking about 
> https://datatracker.ietf.org/doc/html/draft-yocto-dns-relative-label-00. This 
> draft is meant to solve the problem that we cannot use relative domain names 
> in the DNS system, specificly in DNS UPDATE and in binary zone files. This 
> also means that this draft is not meant for use with the QUERY opcode (except 
> for possibly AXFR and IXFR). Let me explain those two usecases.
> 
> 1) DNS UPDATE: In DNS UPDATE it is possible to update the zone using DNS 
> itself. This can be used in routers when dynamic DNS is wanted, but also in 
> other situations. Imagine wanting to add an MX record. Using a webinterface, 
> you are likely able to chooses one of the following four options:
> - mail IN MX 10 mx
> - mail IN MX 10 mx.example.com.
> - mail.example.com. IN MX 10 mx
> - mail.example.com. IN MX 10 mx.example.com.
> However, using DNS UPDATE you are only able to add the record with fourth 
> format; both record name and FQDN field have to be absolute. This means that 
> when I return to the webinterface, I will likely see absolute domain names, 
> even when I use relative domain names in my other records. My draft wants to 
> give the client more control over when to use relative and when to use 
> absolute domain names by adding a new label type.
> 
> 2) Binary Zone Files: Since BIND 9, it is possible to save zones in a binary 
> format. This is possible to enable/disable using `masterfile-format`. It is 
> possible to convert the textual format to binary and vice versa. However, 
> whe

[DNSOP] Re: Side Meeting - DNS Load Balancing

2024-07-18 Thread Edward Lewis

On Jun 28, 2024, at 12:47, Ben Schwartz  
wrote:
> 
> Hi DNSOP,
> 
> The practice of DNS Load Balancing -- sending different answers to different 
> resolvers to optimize latency and avoid overload — 

A request - can you call this “Traffic Engineering via DNS” (or “DNS Traffic 
Engineering”)?

When I first saw this I had a to do a double-take because DNS Load Balancing 
also refers to the practice of sending queries to different servers behind a 
load balancer.

It is conceivable that, instead of ECS, which raises the ire of privacy 
enthusiasts, a name server could return a single “formula” that a recursive 
server could use to determine what answer to pass onward.  The formula would be 
large (in bytes) though, making this a “pipe dream” in the current DNS protocol.

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP] Re: Wallet is not implementable.

2024-06-24 Thread Edward Lewis

I recall a long-ago problem involving SIP DNS resource records related to 
confusion over escaped values.  I believe it was the NAPTR resource record.

The problem presented to me involved two RFC documents, but related to NAPTR 
and both “published” as RFCs (not Internet Drafts).

The conflict between the two related to one RFC adhering strictly to the 
conventions for zone files, while the other ignored a fine detail of the 
convention.  This came down to the representation of a ‘\’ character in the 
record - one RFC wrote it as “\\” (escaping the backslash) and the other RFC 
did not perform the escape.

Historical data (I’m surprised I could dig this back up.):

In https://www.ietf.org/rfc/rfc2915.txt 
>  http.uri.arpa. IN NAPTR
> ;;  order   pref flags service  regexp replacement
>  100 90   ""  ""   "!http://([^/:]+)!\1!i"   .
In https://www.rfc-editor.org/rfc/rfc5483.txt 

>   A correct way to write this example [referring to the example as written 
> just before this, not RFC 2915] is:
>   * IN NAPTR 100 10 "u"
>   "E2U+sip" "!^\\+4655(.*)$!sip:\\1...@example.net!" .
>
>   Note that when a NAPTR resource record is shown in DNS master file
>   syntax (as in this example above), the backslash itself must be
>   escaped using a second backslash.  The DNS on-the-wire packet will
>   have only a single backslash.
The latter is more strict - the escape (for zone files) is shown.  Ironically, 
the latter, while trying to isolate the escape character, omits the line 
continuation parentheses.

Regarding the two above, my explanation (to colleagues) was that the first RFC 
is showing the presentation of the record external to a zone file, the latter 
shows the record in a zone file.

With this in mind, I’ve believed that there are three forms of a resource 
record.

One is the wire format.  Ignoring for now the compression of some domain names 
in some of early resource record types, there is just one way to express these 
in octets.  I’ve learned that this is the format that matters - it is what is 
sent over the wire.

Two is the convention of having a presentation format. This is the above 
converted into what we can see, speak, and hear when talking, listening, 
reading and writing documents.

Three is the format that appears in zone files.  The difference between this 
and the above format is that zone files have parsing instructions embedded, 
things external to the DNS protocol.  What is in this difference are the 
(albeit quirky) line continuation characters, the ‘$’ directives, comments, 
escape for DDD (three-digit decimal values), and the regular escape.  It’s been 
a long time since I wrote a parser, I may be forgetting some rarer cases.

To understand how WALLET is “unimplementable”, I need to ask this… if the 
resource record is saying that ‘(‘ has no special meaning in the presentation 
format, when converting to zone file (presentation) format, wouldn’t you stick 
an escape character in front of it?  That escape character would only be in the 
zone file, inserted and removed when writing and parsing it.

> On Jun 24, 2024, at 01:25, Mark Andrews  wrote:
> 
> I was meaning to send a more meaningful message.
> 
> WALLET has the following paragraph that should have prevented it being 
> approved
> because multi-line continuation can be anywhere in a record and there is no 
> way
> actually write a zone file parser and also do the requested behaviour.
> 
> "None of the characters in either the  or
>  are special. For example, a backslash
> character (U+005C) does not act as an escape character of any sort.”
> 
> Note the opening ‘(‘ occurs before the type is defined so one can’t event 
> disable
> multi-line parsing after processing the type is specified.
> 
> % cat multi-line.db
> @ ( SOA . . 0 0 0 0 0 )
> @ NS .
> % named-checkzone -D example multi-line.db
> multi-line.db:1: no TTL specified; using SOA MINTTL instead
> zone example/IN: loaded serial 0
> example.   0 IN SOA . . 0 0 0 0 0
> example.   0 IN NS .
> OK
> % 
> 
> Now the specification of WALLET doesn’t disallow ‘)’ as a currency identifier.
> 
> I know people don’t like having to deal with escape processing but that isn’t
> actually negotiable.
> 
> Mark
> 
>> On 23 Jun 2024, at 12:57, Mark Andrews  wrote:
>> 
>> Turning off escape processing prevents turning off multi line processing. 
>> 
>> 
>> 
>> -- 
>> Mark Andrews
>> 
>> ___
>> DNSOP mailing list -- dnsop@ietf.org
>> To unsubscribe send an email to dnsop-le...@ietf.org
> 
> -- 
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org
> 
> ___
> DNSOP mailing list -- dnsop@ietf.org
> To unsubscribe send an email to dnsop-le...@ietf.org

_

[DNSOP]Re: [Ext] Our reading of consensus on draft-hardaker-dnsop-rfc8624-bis, and the "must-not-algorithm" docs.

2024-05-14 Thread Edward Lewis

>From: Warren Kumari 
>Date: Tuesday, May 14, 2024 at 16:59
>To: dnsop 
>Subject: [Ext] [DNSOP]Our reading of consensus on 
>draft-hardaker-dnsop-rfc8624-bis, and the "must-not-algorithm" docs.

>Option 1: Pivot this document from providing implementers with guidance 
>(“Implementers MUST NOT use Foo for signing”) to providing guidance to 
>operators instead (“Operators MUST NOT use Foo for signing”).

I would tweak this question a bit, sparked by the idea the implementers’ job 
here is not to “use”:  For operators, it’s a use/don’t use recommendation.  For 
implementers, it’s a support/don’t support recommendation and a recommended 
‘default’ value.

___
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

Re: [DNSOP] [Ext] Do we need new draft that recommends number limits ?

2024-03-14 Thread Edward Lewis

The DNS needs operational profile documents. Documents that set societal norms
for the global public Internet while still allowing the protocol to be overly
flexible ("my network, my rules" world).

On 3/12/24, 04:19, "DNSOP on behalf of Kazunori Fujiwara"
wrote:

With DNS, there are several things to consider, such as the number and
number of times that can complicate name resolution or cause DoS.

For example, number of CNAME chains or number of chains of "unrelated"
name server names are not limited. (Each implementations limit.)

"KeyTrap" also seems to be caused by the configuration of a large
number of DNSKEY RRs and RRSIG RRs in one domain name.

For example,

- Number of CNAME chains
- Number of "unrelated" name server name resolutions (hard to write)
- Number of NS RRs in each delegation
- Number of RRs in one RRSet.
- Number of RRSIG RRs in one RRSet
- Number of DNSKEY RRs in one domain name

DNSOP WG limitted NSEC3 Parameters in RFC 9276,
beyond which DNSSEC validation was not required.

Then, we can generate new recommendations that limit numbers and
if it exceeds that limits,
it might be a name resolution error or no validation.

Rather than writing a draft for each limitation,
I think it would be better to compile them all into one draft.

--
Kazunori Fujiwara, JPRS

___
DNSOP mailing list
DNSOP@ietf.org

https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/dnsop__;!!PtGJab4!668p516xLDGGnGTMw7gMQ6_DZg8_EMynquifrz9egdugWq24bSnRbqPLCUr4sRoXfhXzeCSYRZy1AC3MEjdEDenkcH0$
[ietf[.]org]

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-03-01 Thread Edward Lewis

On 3/1/24, 13:45, "pch-b538d2...@u-1.phicoh.com on behalf of Philip Homburg" 
 wrote:

>If we have a protocol where validators are allowed to discard RR sets with
>duplicate key tags but we place no restriction on signers, then we have a 
>protocol with a high chance of failure even if all parties follow the 
>protocol.

>From what I gather, from what I've measured, and from what I've heard from 
>others, generally key tag collisions don't happen often in natural operations. 
> (They may begin in malicious operations.)

If a validator chooses to discard all signatures for which there are multiple 
DNSKEY resource records matching the key tab in the RRSIG resource record, 
there'll be SERVFAILs across the population that cares about the data involved. 
 From past observations, when there's a widespread "I can't get to that", it 
bubbles up to the service provider and then take steps to fix it.

This kind of feedback loop seems to be the state of the art in the Internet 
today.  I'm not sure we need to take on what would be a large effort to do 
better.  At least given anecdotal evidence todate.

>At the end of the day, following the protocol is voluntary. But if we want
>to be able to reason about the protocol, then we have to assume that all
>interested parties try to follow the protocol.

To use an anecdote - "When crossing a street with a cross-walk signal you 
should still look for vehicles.  While no vehicle ought to be entering the 
cross-walk against the signal, don't bet your life on it."  Something like this 
was on a police safety poster.

In designing a protocol, you can't assume that the remote end will do anything 
sensible.  You need to focus on what you can control locally.

>Indeed. But the question is, if a validator finds both RRSIGs associated with a
>RR set and we have guarantees about uniqueness of key tags for public key,
>can the validator then discard those signatures?

What if both signatures were generated by the same key (private of the pair) 
but the data changed between the inception time of one and the inception time 
of another?  One signature may be over a stale copy of the data, not from a 
different key.

>The first step to conclude is that for the core DNSSEC protocol, requiring
>unique key tags is doable. Even without a lot of effort (other the usual
>of coordinating changes to the protocol).

Back in the day, prefacing because it may no longer be true, BIND would 
generate keys and place them in a default directory.  Each key would be in a 
file whose name included the owner name, the DNSSEC security algorithm number, 
and key tag.  A key tag collision would be detected if the file name about to 
be used was already present in the directory.  This strategy only worked though 
if the user of BIND did not move the keys elsewhere, this is something the 
strategy couldn't control.

I'm not sure it's doable, even for "simple" DNSSEC, if you have to account for 
the myriad of ways signer processes are implemented.  Perhaps I'm being 
obstinate about the ease in which collisions can be detected because I still 
maintain it just doesn't matter.  Validators still need to protect themselves 
and when something that matters breaks, it'll light up the social media sphere. 
 (As it has in the past)

>But the protocol also has to take reasonable measures to limit the amount
>of time a validator has to spend on normal (including random exceptional)
>cases.
>
>For example, without key tags, validators would have to try all keys in
>a typical DNSKEY RR set or face high random failures.

For the most part, zones don't have many keys.  Usually only one ZSK and one 
KSK unless there is a roll happening.  There are some zones with lots of keys, 
but that doesn't seem the norm.  I don't know if there is a study that finds 
the average number of keys in zones weighted by the use of the data in the 
zone.  (Meaning, TLDs would be weighted more highly than an ill-managed hobby 
zone.)

>So the question is, does requiring unique key tags significantly reduce the
>attack surface for a validator?
>
>Are there other benefits (for example in diagnotics tools) for unique key
>tags that outweigh the downside or making multi signer protocols more
>complex?

Key tag collisions are not desirable, we know that.

My diagnostic tool has crashed the two times it came across them, in one case I 
could differentiate by assuming the role (KSK vs. ZSK) and in the other I shot 
off a (possibly futile) message to the operator and the collision cleared 
quickly, plus I was able to smudge my code a bit.  Sooner or later, I know I 
won't be able to distinguish a collision unless I grab more data.

The question isn't about the goodness of collisions.  It's about the best way 
to address the resource consumption problem than can exacerbate.  Ruling them 
out of bounds doesn't mean they can't come back on the field and cause 
problems.  Treat the problem - resource consumption - that can be done.

And

Re: [DNSOP] [Ext] About key tags

2024-03-01 Thread Edward Lewis

On 3/1/24, 11:13, "pch-b538d2...@u-1.phicoh.com on behalf of Philip Homburg" 
 wrote:

I removed a lot of logic, as it seems dead on.  But...

>This would allow validators to reject any DS or DNSKEY RR set that has a
>duplicate key tag.

"This" refers to barring keys from having duplicate key tags.  My knee-jerk 
response is that validators are already permitted to reject anything they want 
to reject.  (We used to talk about the catch-all "local policy" statements in 
the early specs.)  You don't have to bar duplicate key tags to allow validators 
to dump them, validators already have that "right."

>Duplicate key tags in RRSIGs is a harder problem

I'm not clear on what you mean.

I could have RRSIG generated by the same key (binary-ily speaking, not key 
tag-speaking) that have different, overlapping temporal validities.  If you 
want to draw a malicious use case, I could take an RRSIG resource record signed 
in January with an expiration in December for an address record that is changed 
in March, and replay that along with a new signature record, signed in April 
and valid in December.  One would validate and the other not.  But this isn't a 
key tag issue, it's a bad signing process issue.

Not completely fictious one.  There was a TLD whose signatures always expired 
on New Year's eve.  Not sure if the TLD in question does this anymore, but for 
a number of years (at least 3), all signatures they generated expired on the 
next New Year's eve.

>But for the simple question, would requiring unique key tags in DNSSEC be
>doable without significant negative effects, then I think the answer is yes.

Heh, heh, if you make the problem simpler, then solving it is possible.

Seriously, while I do believe in the need for a coherent DNSKEY resource record 
set, there are some multi-signer proposals that do not.  If the key set has to 
be coherent, then someone can guard against two keys being published with the 
same key tag.  The recovery may not be easy as you'd have to determine what key 
needs to be kicked and who does it and where (physically in HSMs or 
process-wise).  I have some doubt that key tag collisions can be entirely 
avoided.

Even if you could - you still have the probablility that someone intentionally 
concocts a key tag collision.  Not everyone plays by the rules, especially when 
they don't want to.

So - to me - it keeps coming back to - a validator has to make reasonable 
choices when it comes to using time/space/cpu to evaluate an answer.  No matter 
whether or not the protocol "bars" duplicate key tags and whether or not 
signers are instructed to avoid such duplication.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-29 Thread Edward Lewis

From: DNSOP  on behalf of Shumon Huque 

Date: Wednesday, February 28, 2024 at 16:22
To: Edward Lewis 
Cc: John Levine , "dnsop@ietf.org" 
Subject: Re: [DNSOP] [Ext] About key tags

>… I think writing a BCP telling folks how to avoid collisions would make sense 
>though (and yes, it needs to cover the multi-signer case too).

I support that.  Key tag collisions make one of my pet projects (visualizing 
key management over time) cry.  And collisions are a multiplier in a malicious 
use case.  Discouraging them is a good thing.

The point I’m belaboring is how the issue of resource over consumption is 
addressed matters.  We can’t ban the problem out of existence, even if it were 
simple to restrict it from ever happening, we need to enforce this where the 
resources at risk are managed.

If this means a validator experiences some false positives, I could live with 
that.  There are very few good reasons to have a complex DNS set up and such 
situations are supported and tolerated in the protocol that doesn’t mean they 
are good ideas or have simpler alternatives.  Discouraging wacky configurations 
isn’t a terrible thing to do, especially since we can have (or imagine) highly 
complex signing scenarios which could, if the planets align correctly, permit a 
key tag collision no matter to what length we go to prevent a collision from 
seeing the light of day.

Keeping in mind - this entire topic is covering the non-usual state of the 
protocol, one that fears a malicious activity I believe has not been 
encountered in the wild.  (If no action is taken, malicious activity might 
follow now that it is described, but I have not heard of a historical case of 
it.)  We are dealing with the odd, we need to mitigate its impact, eliminating 
it might just be -relatively speaking - too much work.
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-28 Thread Edward Lewis

On 2/27/24, 17:09, "DNSOP on behalf of John Levine"  wrote:

>The kind of load is different but in each case the client needs to
>limit the amount of work it's willing to do. We can forbid it in the
>protocol but unless you have better contacts at the Protocol Police
>than I do, people will do it anyway.

I side with John Levine's line of reasoning, that the solution is defending 
against taking on too much work (in this case, the validator caps it's effort - 
in whatever way is appropriate).  It would be futile to prevent key tag 
collisions from happening via a protocol change as a malicious actor is not 
bounded by specifications.

If it is forbidden in the protocol, it might still happen.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Detecting, regeneration and avoidance was Re: [Ext] About key tags

2024-02-21 Thread Edward Lewis

On 2/20/24, 16:35, "Mark Andrews"  wrote:
>Validator resource consumption (CPU) *is* is tied to tags.

The number of tag collisions is related but is not the only cause of the 
validator resource consumption vulnerability.

>Without tags the cost of verification increases and the number of cache misses 
>that can be handled decreases as the number of keys per algorithm increase.  A 
>tag collision undoes the value of the tag for the keys that collide.
>1 -> 1
>2 -> 1.5
>3 -> 2
>4 -> 2.5

There are two basic ways to put a cap on how much effort you are willing to put 
into accomplishing something.  One is to cap the time taken and another is to 
cap the steps taken.  One can combine the two, and/or substitute resources for 
steps.  Capping time is capping the most generic commodity.  Capping by steps 
means having to decide what is the reasonable limit, perhaps having to make a 
judgement call on what is a step.

Using CNAME and DNAME redirects, negative answer proofs, etc., also contribute. 
 Therefore, I don't see eliminating key tag collisions as the root cause to 
solve.  It's certainly one of the root causes, but eliminating collisions 
("barring them by specification") is not going to completely solve the problem 
and a validator still has to deal with the possibility of encountering 
collisions, via non-compliant (old, buggy) code or receiving maliciously 
intentional colliding keys.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Detecting, regeneration and avoidance was Re: [Ext] About key tags

2024-02-20 Thread Edward Lewis

From: DNSOP  on behalf of Bob Harold 

Date: Tuesday, February 20, 2024 at 09:53
To: Edward Lewis 
Cc: "dnsop@ietf.org" , Paul Wouters 
Subject: Re: [DNSOP] Detecting, regeneration and avoidance was Re: [Ext] About 
key tags

>But if I have a 'standby' DS record, to allow faster rollover if a key is 
>compromised,
>then there will always be two DS records.  And without a key tag or 
>equivalent, resolvers
>would have to do extra work to check the DS record because they would not know
>which one was active.  Key tags add efficiency, we just need to handle 
>occasional collisions.

Instead of a hash algorithm and hash value, the key’s bit’s itself could be in 
the DS resource record.  You could drop a field from the DS resource record but 
make it longer.

Not to get into a history lesson but to add context, in the early iterations of 
DNSSEC, before there was the DNSKEY, there was the KEY resource record.  Around 
that time there was no DS resource record.

I’ll inject that the DS resource record, with the hash, was proposed to address 
design issues and I have to be honest I’ve forgotten some of them.

I believe one was the fear that an advertised public key would enable reverse 
engineering of the private key, so to stop that from happening, only a hash was 
used.  I could be wrong on that reasoning - and I won’t vouch for the idea that 
one can reverse engineer the private key from just the public key (and even 
with some data and signature samples), as if that were
possible, public key cryptography wouldn’t work at all.

There were concerns about sending the public key to the parent.  I do recall 
that I felt strongly the child should only send the hash but operating 
registries began requiring the keys so that they would calculate the hash.

There was also concern about how to signal that a child was not signing with 
DNSSEC, something had to be at the parent.

Looking back, I can’t figure out why we didn’t just include the whole key in 
the DS resource record.  Perhaps we didn’t want to recreate the NS at parent 
and at child situation.
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Detecting, regeneration and avoidance was Re: [Ext] About key tags

2024-02-20 Thread Edward Lewis

That’s where I’m heading as well…

1) Benign collisions aren’t major headaches, except perhaps for the key manager 
(because rare events are headaches)
2) Validator resource consumption is a general issue, not tied to key tag 
collisions

My kicking this off was not the KeyTrap issue, a report of a potential 
maliciously abused vulnerability.  My kickoff was the report that a TLD “went 
down in DNSSEC” because they published the wrong key in a key tag collision 
set.  That’s why I keep raising the key management angle.

From: Ted Lemon 
Date: Tuesday, February 20, 2024 at 09:48
To: Edward Lewis 
Cc: Mark Andrews , Paul Wouters , 
"dnsop@ietf.org" 
Subject: Re: [DNSOP] Detecting, regeneration and avoidance was Re: [Ext] About 
key tags

Sorry, I did not mean that the attack isn't a serious problem. I mean that 
insisting that there be no key hash collisions in a verification attempt is not 
as hard a problem as you were suggesting. The main issue is that it would 
require a flag day, but the number of affected zones in the wild is probably 
small enough that this could be managed. My point was that the keying and 
signing of large, sensitive zones should not be an impediment to having it be 
the rule that key hash collisions aren't allowed. Like, I'm not saying there 
aren't problems, but this is not an insurmountable problem.

On Tue, Feb 20, 2024 at 9:41 AM Edward Lewis 
mailto:edward.le...@icann.org>> wrote:

From: Ted Lemon mailto:mel...@fugue.com>>
Date: Tuesday, February 20, 2024 at 09:05
To: Edward Lewis mailto:edward.le...@icann.org>>
Cc: Mark Andrews mailto:ma...@isc.org>>, Paul Wouters 
mailto:p...@nohats.ca>>, 
"dnsop@ietf.org<mailto:dnsop@ietf.org>" mailto:dnsop@ietf.org>>
Subject: Re: [DNSOP] Detecting, regeneration and avoidance was Re: [Ext] About 
key tags

>This seems like an implementation detail.

I don’t want to brush this off that quickly.

>The random likelihood of the root and com key hashes colliding seems pretty 
>small.

This is very true - in nature.  The scare raise here is that someone may 
intentionally concoct a situation, intending to cause havoc.  I do have a dose 
of skepticism when use case is discovered academically as opposed to being seen 
in operational packet flows, but that doesn’t mean the vulnerability is 
irrelevant.  Probably there are lots of holes remaining in the protocol design, 
not yet discovered, so long as they aren’t being they aren’t operationally 
impactful.

The KeyTrap issue is resource consumption/depletion attack and it mentions key 
tag collisions as an ingredient, which is driving the urgency of this 
discussion.  My read of the paper is that, at heart, this is a general resource 
exhaustion problem which stems from the agility of the DNS protocol to find an 
answer no matter how hard it is to find.  Key tag collisions help in hiding 
intent of a malicious configuration by lowering the number of signature records 
needed.

>And while com is rather large, computes aren't as expensive as they were when 
>y'all invented the ritual. I suspect that if you just always pick two keys and 
>sign the zones twice, this problem becomes so improbable that we never have to 
>fall back to actually re-doing the ceremony. But if we did have to fall back 
>once in a blue moon and re-do the ceremony, that might be quite a bit cheaper 
>than allowing key hash collisions in situations where it's actually a problem. 
>I think it would be completely reasonable to insist that if there is a key 
>collision between e.g. com and fugue.com 
>[fugue.com]<https://urldefense.com/v3/__http:/fugue.com__;!!PtGJab4!52sLdAP9_ILh2m4N5k6puN4H9Muh5caOPGzze8vHKdSPc_3Kk48D2xgluq5vE9VesRqSm1Hbnpk8sfr1PuR_81s$>,
> that fugue.com 
>[fugue.com]<https://urldefense.com/v3/__http:/fugue.com__;!!PtGJab4!52sLdAP9_ILh2m4N5k6puN4H9Muh5caOPGzze8vHKdSPc_3Kk48D2xgluq5vE9VesRqSm1Hbnpk8sfr1PuR_81s$>
> could be obligated to regenerate its key rather than com.

In validation, key tag collisions are a problem when there is malicious intent 
and no more than a nuisance in a benign collision.

If an operator had two active ZSKs, there would be two signatures and two keys. 
 With non-colliding key tags, it would be easier to line them up - and recall 
the rule that it only takes one successful operation to declare success.  With 
a collision, there’s a 50% chance of a misalignment at first, which is what 
leads to the 1.5 signature verification operations per instance comes from.  
Given the low probability of a collision (it’s rare!) that 1.5 isn’t a big 
deal.  (No one as suggested a 3-key collision, which would be rarer, especially 
as most operators never exceed two keys of the same role per DNS security 
algorithm.)

Nevertheless, in a malicious case (no more need be said) … this makes me think 
the appropriate solution is for validators implemen

Re: [DNSOP] Detecting, regeneration and avoidance was Re: [Ext] About key tags

2024-02-20 Thread Edward Lewis

From: Ted Lemon 
Date: Tuesday, February 20, 2024 at 09:05
To: Edward Lewis 
Cc: Mark Andrews , Paul Wouters , 
"dnsop@ietf.org" 
Subject: Re: [DNSOP] Detecting, regeneration and avoidance was Re: [Ext] About 
key tags

>This seems like an implementation detail.

I don’t want to brush this off that quickly.

>The random likelihood of the root and com key hashes colliding seems pretty 
>small.

This is very true - in nature.  The scare raise here is that someone may 
intentionally concoct a situation, intending to cause havoc.  I do have a dose 
of skepticism when use case is discovered academically as opposed to being seen 
in operational packet flows, but that doesn’t mean the vulnerability is 
irrelevant.  Probably there are lots of holes remaining in the protocol design, 
not yet discovered, so long as they aren’t being they aren’t operationally 
impactful.

The KeyTrap issue is resource consumption/depletion attack and it mentions key 
tag collisions as an ingredient, which is driving the urgency of this 
discussion.  My read of the paper is that, at heart, this is a general resource 
exhaustion problem which stems from the agility of the DNS protocol to find an 
answer no matter how hard it is to find.  Key tag collisions help in hiding 
intent of a malicious configuration by lowering the number of signature records 
needed.

>And while com is rather large, computes aren't as expensive as they were when 
>y'all invented the ritual. I suspect that if you just always pick two keys and 
>sign the zones twice, this problem becomes so improbable that we never have to 
>fall back to actually re-doing the ceremony. But if we did have to fall back 
>once in a blue moon and re-do the ceremony, that might be quite a bit cheaper 
>than allowing key hash collisions in situations where it's actually a problem. 
>I think it would be completely reasonable to insist that if there is a key 
>collision between e.g. com and fugue.com 
>[fugue.com]<https://urldefense.com/v3/__http:/fugue.com__;!!PtGJab4!52sLdAP9_ILh2m4N5k6puN4H9Muh5caOPGzze8vHKdSPc_3Kk48D2xgluq5vE9VesRqSm1Hbnpk8sfr1PuR_81s$>,
> that fugue.com 
>[fugue.com]<https://urldefense.com/v3/__http:/fugue.com__;!!PtGJab4!52sLdAP9_ILh2m4N5k6puN4H9Muh5caOPGzze8vHKdSPc_3Kk48D2xgluq5vE9VesRqSm1Hbnpk8sfr1PuR_81s$>
> could be obligated to regenerate its key rather than com.

In validation, key tag collisions are a problem when there is malicious intent 
and no more than a nuisance in a benign collision.

If an operator had two active ZSKs, there would be two signatures and two keys. 
 With non-colliding key tags, it would be easier to line them up - and recall 
the rule that it only takes one successful operation to declare success.  With 
a collision, there’s a 50% chance of a misalignment at first, which is what 
leads to the 1.5 signature verification operations per instance comes from.  
Given the low probability of a collision (it’s rare!) that 1.5 isn’t a big 
deal.  (No one as suggested a 3-key collision, which would be rarer, especially 
as most operators never exceed two keys of the same role per DNS security 
algorithm.)

Nevertheless, in a malicious case (no more need be said) … this makes me think 
the appropriate solution is for validators implement self-protection (timeouts) 
and not try to avoid collisions.

‘Course, collisions still are a problem for the key managers, but that is a 
local problem.  Unless they publish the wrong key in the collision set.
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Detecting, regeneration and avoidance was Re: [Ext] About key tags

2024-02-20 Thread Edward Lewis

On 2/16/24, 15:05, "DNSOP on behalf of Mark Andrews"  wrote:

Pardon ... perhaps this issue has died down, but I've been off a few days, and 
I just saw this...

>Generating a new key is not hard to do.

That's not the issue, it's knowing that it would be the wise thing to do that 
is the issue.

>Adding a check against the common key store is not hard to do in a 
>multi-signer scenario.  It can be completely automated.

I'm not in agreement with that.  Some keys are managed with off-net HSM 
devices, accessed only during a key ceremony.  There may be some cases where 
the key set is assembled and signed without access of the 'net.  This is a 
result of an early design rule in DNSSEC, we had to design around a system that 
air-gapped the private keys from the open network.

This does underscore the importance of coherency in the key set even in a 
multi-signer scenario.  (There was talk of trying to let each server have its 
own key set perspective.)  In order to detect key tag collisions, the managing 
entity has to be able to see the entire set.

>We could even use the DNS and UPDATE to do that. Records with tuples of 
>algorithm, tag and operator. Grab the current RRset. Add it as a prerequisite 
>with a update for the new tag.  

This approach leaves open a race condition.  It's possible that two signers 
simultaneously generate keys with colliding key tags and each gets to add 
because they don't see each other.  My point, while this is admirable, 
achieving the perfect solution is out of reach, so let's not assume we can ever 
totally avoid key tag collisions.

My thesis is - key tag collisions are not the driver for validation resource 
consumption.  In the research paper, collisions do contribute by scaling the 
impact up.  Through using invalid signature values, resource consumption can be 
drained by throwing multiple "good-looking" signatures along with data set and 
having many keys.  The fact that key tags can collide only mean that I can 
cause multiple checks per signature, which may help hide my malicious tracks.

And remember, the paper requires that the crypto operations always fail.  I.e., 
the is no success to be missed by not trying all the combinations of keys and 
signatures.  A simple timer is all that is needed.

Key tag collisions are a pain in key management, operators that have 
experienced them have shown not to tolerate them for long even if there were no 
outages.  To me, whatever can be done to easily avoid them would be good, 
trying to define an interoperable way (standard) to eliminate them would prove 
to be overkill.  And...my original point was...don't include this idea in a 
future design.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] On suffering ... Re: [Ext] About key tags

2024-02-16 Thread Edward Lewis

On 2/16/24, 11:13, "DNSOP on behalf of Petr Špaček"  wrote:
> should resolvers suffer from more complex code & work, or should signers 
> suffer if they do something very unusual?

Coming from this perspective, find a solution may be difficult.

At the core, the DNS is extremely flexible, overly so.  DNSSEC arrived as a 
layer to add protection to the DNS without crimping functionality.  The DNSSEC 
approach is to present to the validator the "proof of work" that was done to 
arrive at the response given, proving that the DNS protocol was followed at 
each step.

Because the DNS is so permissive, this means that there's a lot of work to do 
in validation.  To make validation easier, what's allowed in the DNS has to be 
constrained.  If that is unacceptable, validation has to be constrained within 
some performance budget.

I think colliding key tags is a red herring.  In the Key Trap, the role 
colliding key tags plays is to sneak as many cryptographic operations into the 
validation of each signature.  I.e., instead of just having lots of RRSIG 
resource records, key tag collisions provide a multiplier to that lots of RRSIG 
resource records.  The underlying issue is resource consumption, key tags 
collision is just one ingredient in scaling.  For any data set, I could have 
many temporally overlapping RRSIG resource records signed by the same key 
(tag).  One RRSIG might be from Feb 1 to Feb 29, another Feb 2 to Feb 28, 
another Feb 1 to Feb 28, and so on.  Each would be accepted today (Feb 16), 
thus eligible to be cryptographically computed.  And if all fail - as designed 
- the validator would be tied up.

It would be good to enforce sanity on configurations.  (In poking at something, 
I found a zone with DS resource records covering 13 different key tags.  They 
zone had just one DNSKEY resource record.  Trialing a validation in the zone, 
it worked because the signing was working - they apparently forget to contact 
the parent to remove DS resource records when not needed.  I don't have history 
on this case, just a spot check.)  But that's not likely to happen.

While I can't argue for key tag collisions continued existence, I can't see a 
practical way to enforce a rule against them.  Discouraging collisions would be 
beneficial to key management crews, I can't see that it would be all that 
important to validators.

I can see encouraging validators to run in a time-resource envelope, using 
"timing out" as a valid excuse to fail a validation.  If a zone admin has a 
complex set up that exceeds validation budgets, their relying parties will let 
them know they can't get through.  I think this "crude" approach is the only 
one that is fair to all failure modes - it would even have to consider NSEC3 in 
the IPv6 reverse map - mindful of the closest encloser proof.  (Is NSEC3 
beneficial in the IPv6 reverse map?  That would take some thinking.)

IMHO, the reason this discussion is raging is that it's not a simple matter.  
What makes the DNS great has forced the design of DNSSEC to be a lot of work, 
given that DNSSEC was designed to keep all private keys air gapped from the 
network.   The "I have to show my work so you can trust me" approach is 
computationally hard on the relying party.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-16 Thread Edward Lewis

On 2/15/24, 13:03, "DNSOP on behalf of Ralf Weber"  wrote:
>... key collisions should not be allowed.

The problem with this statement is that you can't prevent them in advance.  So 
long as we have a short-hand means for referring to a key, you run this risk.  
And if someone sees an advantage in having a collision, they will hand-craft 
the situation.

One might think, but this is crypto, it's hard to craft collisions for hash 
functions (stepping away from the simple to collide 16-bit field key tag 
field).  But a malicious actor doesn't need the crypto to work, they just need 
it to cause you to react (and chew up your resources).  So, collisions might 
happen on-demand.

We have the notion of a time out in the query-response exchange.  If we didn't, 
it would be possible to claim name servers that are not reachable and have 
resolvers waste time and resources waiting for responses that will never come.  
The same notion ought to be applied to validation - set a time limit.  This 
would penalize those with overly complex (but honest) configurations and cap 
the damage malicious actors can cause.  This approach is also neutral to how 
the complexity has come about, key tag collisions are not the only nor are they 
really the culprit here.  A few key tag collisions have been observed, 
probabilistically there have been more, for the most part no widespread damage.

The potential for abuse does exist, but the potential isn't addressed by 
documenting "key collisions should not allowed." 

I do agree that key collisions should be avoided, for the sake of key 
management, but given the difficulty in avoiding them in all cases, I can't see 
that a protocol action can be taken to rule them out.  And there will always be 
non-compliant malicious-intent code available to cause collisions if collisions 
are indeed desired for abusive reasons.  The solution here is to roll out the 
notion across implementations that it is acceptable for a validator to fail a 
data set's DNSSEC validation based on time/computational complexity.
 

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-15 Thread Edward Lewis

On 2/15/24, 12:49, "Wellington, Brian"  wrote:
>A fairly simple way to deal with this issue is a Flag Day.  As Ralf said in a 
>later post, the number of zones with colliding key tags is relatively small.  
>It would certainly be reasonable to declare that at some time in the future, 
>colliding keys will not be handled by validators.

Thinking:
1) Operators need to be able to tell if they have colliding key tags.  
(Mitigating is as simple [or complex] as a key roll.)
2) The recent colliding-key-tag TLD outage was related to key management, not 
validation.
3) Resource consumption issues in validation is wider than key tag collision.

I'd save a flag day for a more general treatment of validator resource 
consumption - imposing limits on key tags, number of signatures to try, levels 
of dnssec-signed indirection (CNAME chains), and so on.

Getting validators to "ban" collisions doesn't seem the to be the right 
direction, given that validators are fine with "sane" levels of collisions.  
Realizing "sane" is a very subjective word.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

2024-02-15 Thread Edward Lewis

From: Ben Schwartz 
Date: Wednesday, February 14, 2024 at 11:34
To: Edward Lewis , Manu Bretelle 
Cc: "dnsop@ietf.org" 
Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting 
expectations in protocol definitions

> For the "testing" flag, the descriptive information is basically "this 
> endpoint does not carry my SLA".

>You can see a variation on this problem in draft-ietf-tls-svcb-ech

In the DNS environment, we assume no SLA.  The protocol assumes the worst.  
That’s why there are so many retries, so many alternative sources and so much 
tolerance for error.  It would be hard to tell if a service is in testing or 
not, so the protocol doesn’t try.

For the ECH example, it sounds like it matters in that environment - and that 
is fine.  It’s different.

Which leads me back to - I don’t see the use case for “The "testing" flag for 
Service Binding (SVCB) Records” in the context of DNS or DELEG.  A flag is a 
flag, having it mean “I’m testing” I get.  But I don’t see that notice helping 
the DNS, and applying the banner of the “DNS Camel”, I don’t think it should be 
added.

…OTOH, seeing that this is a SVCB flag, perhaps you have other environments 
where such a flag would be useful.  I just don’t see it in the DNS.
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-15 Thread Edward Lewis

On 2/15/24, 04:37, "DNSOP on behalf of Petr Špaček"  wrote:
>If you think colliding keys should be allowed, please propose your own limits 
>for sensible behavior. I will take popcorn and watch.

Hmmm, key tags were intended to simplify computation, somehow it seems that 
they've gone the other way.

Having, setting, or discussing, limits for sensible behavior deserves its own 
thread, independent of colliding keys.  I'd see this benefitting from a 
panel-of-implementers topic in operator fora (RIPE's DNS working group as an 
example, DNS-OARC as another) to gather operator-informed parameters, leading 
to a general document (IETF) on the recommendations.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-14 Thread Edward Lewis

The reason this topic is on the list now isn't validation, it began as a key 
management issue.  Donald's right (of course) in what he now posted.  But the 
performance gain of sub-selecting keys based on key tag is less than originally 
anticipated and comes at the cost of confusion in key management.  
Specifically, when someone or some code drops the wrong key (of a set matching 
the key tag) from the published DNSKEY resource record set.

On 2/14/24, 11:16, "DNSOP on behalf of Donald Eastlake" 
 wrote:

So, I am the person who added key tags in the initial design of
DNSSEC. The idea was just to, probabilistically, avoid unnecessary
expensive validation attempts. If key tags are causing problems for
some resolvers and validations are not such a problem with modern
hardware, you can always just ignore the key tag and try validation
with all keys. You still need to bound the effort you are willing to
put in to evade some attacks.

Thanks,
Donald
===
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 2386 Panoramic Circle, Apopka, FL 32703 USA
 d3e...@gmail.com

On Wed, Feb 14, 2024 at 11:06 AM Jim Reid  wrote:
>
>
>
> > On 14 Feb 2024, at 15:17, Paul Hoffman  wrote:
> >
> > On Feb 14, 2024, at 07:10, Jim Reid  wrote:
> >> That said, I think a minor tweak to the core DNSSEC specs would be a 
good idea. For instance, whenever a validator comes across a key tag collision, 
it MUST stop validating and either return a hard error or an unvalidated 
response.
> >>
> >> My concern here is a bad actor using key tag collisions to disrupt 
important validating resolver services. For some definition of important.
> >
> > That is not a "minor tweak", that will occasionally break validation in 
hard-to-detect ways.
>
> Could you please elaborate the hard-to-detect ways Paul? Key tag 
collision is an obscure corner case (modulo the current keytrap excitement) and 
refusing to validate in these circumstances seems more than reasonable to me. 
Fail early, fail “safe”. The resolver would presumably log the error and return 
a suitable response to the client.
>
> DNSSEC validation is already far too complex. Let’s not add more. IMO, 
the pragmatic approach here would be for a validator to say “Duplicate key tags 
mean the signer has messed up and I give up. Have a nice day.”.
>
> > The problem is not the collisions, it is the collisions causing almost 
unbounded processing.
>
> Indeed. So at the earliest opportunity for a validating resolver, nuke 
that from orbit. It’s the only way to be sure. :-)
>
> > A better update would be to say "watch for excessive processing due to 
keytag collisions and abort when you detect it".
>
> Seems a bit fluffy to me. Define “excessive” and “watch". More 
code/moving parts would be needed to implement this approach too.
>
> ___
> DNSOP mailing list
> DNSOP@ietf.org
> 
https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/dnsop__;!!PtGJab4!4Yew08yX--_q21CDURxWgJ3FMduBzAMqxHM8nzhuNqxSG67QOt4TUQW28rmrzRRuKSZumgJqIXPyX5Q0wLjx98A$
 [ietf[.]org]

___
DNSOP mailing list
DNSOP@ietf.org

https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/dnsop__;!!PtGJab4!4Yew08yX--_q21CDURxWgJ3FMduBzAMqxHM8nzhuNqxSG67QOt4TUQW28rmrzRRuKSZumgJqIXPyX5Q0wLjx98A$
 [ietf[.]org]

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-14 Thread Edward Lewis

On 2/14/24, 10:14, "DNSOP on behalf of Yorgos Thessalonikefs" 
 wrote:

>(actively while validating) to 4. Recent data shared in dns-oarc showed 
>mainly 2 collisions observed in the wild and we thought 4 is a safe number.

That's certainly reasonable given the reality we live in.

If any validator ever witnessed two keys with the same key tag (owner/DNS 
security algorithm/length as well), it'd be enough to go "huh."  If you see 
three [or more], log it - I'd want to see that.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-14 Thread Edward Lewis

On 2/14/24, 10:10, "DNSOP on behalf of Paul Hoffman"  wrote:

>On Feb 14, 2024, at 01:39, Petr Špaček  wrote:
>> In my mind this is good enough reason to outlaw keytag collisions - 
> without them it would be _much_ easier to implement reasonable limits without 
> risk of breaking legitimate clients.
>
>Outlawing keytag collisions implies that the signer has to keep a copy of 
> every keytag they've ever emitted. Adding that requirement nearly 20 years 
> after the RFCs were finished is incredibly unlikely to work universally, so 
> validators could not rely on it. Why add a requirement that cannot be relied 
> on?

The requirement could cover only currently published keys - but then there is a 
risk that a cache somewhere has a copy of a now-unpublished key what collides 
with a now-published key.  You'd not need all key tags, just recent ones.  
('Recent' is a relative term I'll leave undefined.)

The fact that 20 years has passed points out to the emerging nature of the 
field of operations.  This situation is based on probability, and only as time 
passes does the probability grow to the point where any problem is noticed.  
There are lots of analogous situations where, in early years the lack of 
participants meant interactions were rare and thus didn't need a thick set of 
rules, but as barriers to entry dropped the crush of new participants called 
for more stricter rules of order.  I see that it is inevitable that the passage 
of time will cause changes to the old rules - when needed.  (And to stress this 
again - the key tag situation is not one where I'd advocate for making changes, 
just noting that I wouldn't recommend anyone design thisway  again.)

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-14 Thread Edward Lewis

On 2/14/24, 09:00, "Havard Eidnes"  wrote:

Not arguing, but to tease out the difficulty of all this:

>One is to line up the signers "behind each other", so that the second one 
>signs an already-signed-by-the-first-signer zone...

The issue is in deciding the order they are lined up.  In one case, I was told 
a backend was changing, the outgoing created the key that led to the collision. 
 (In this case, the outgoing key never signed data, hence no validation issues 
were recorded.)  The solution was to let this go - the collision causing key 
was about to be deleted anyway.

It's difficult to derive a definitive rule.

>Or have I totally misunderstood, and your statement about required time of 
>enforcement is a universal property of all multi-signer scenarios?

I think it is a general protocol design consideration - if you need to 
communicate a piece of data and require some state be maintained because of it, 
you have to have some means to enforce the requirement.

If we were to require that the key tag point to a unique published key, we'd 
have to enforce this via a check on the zone's DNSKEY resource record set.  
Under some visions of multi-signer, different nameservers might publish 
different information based on the operator.  While I don't believe that 
incoherent versions of the DNSKEY resource record set can co-exist peacefully, 
it is possible that the RRSIG resource records may differ from source to 
source.  The problem in all this is - when and how could the protocol ensure 
there is no key tag collision?  Or limit the number of keys with matching key 
tags?

Going back a few messages, I raise the key tag issue in the sense of "let's not 
do this again" and not to try to change what it is now.  Clearly, changing it 
(to avoid collisions) would be difficult.  And, given the relative rarity of 
any problem stemming from it, not worth fixing at this point.  Just don't do it 
again.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-14 Thread Edward Lewis

On 2/14/24, 08:54, "Petr Špaček"  wrote:
>How many keytag collisions are you willing to allow & at the same time  
> protect validators from 2023-50387?

Admitting to reading the CVE link after replying: the issue doesn't need there 
to be a collision between keys.  I could tie up validation by posting a data 
set with a lengthy (more than 1? 2? 3?) list of RRSIG resource records, each 
having the same key tag, different but still temporally valid inception, 
expiration time pairs, and falsely generated signatures.  That could tie up a 
validator.  The only reason I mention "same key tag" is that this meltdown 
happens only if there is indeed a key matching - only one is needed, but more 
could be there as well.

The key tag isn’t the culprit here, it's that the process of validation is 
necessarily computationally complex, so falsely triggering a security response, 
which is a known tactic, is the root cause.  Putting time (or resource) limits 
before denying is a reasonable response provided the issue is the result of a 
malicious situation.  The goal is to design out benign causes of this.

Keep in mind, my argument against the key tag is that in ordinary operations, 
the key signing the data will usually be the ZSK unless it is the DNSKEY 
resource record set, where it'll be the KSK.  That's "ordinary", still one can 
twist configurations in all sorts of ways - that is what the designers did in 
the early development - and come up with gnarly situations where the key tag 
can be justified.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-14 Thread Edward Lewis

On 2/14/24, 08:38, "DNSOP on behalf of Joe Abley"  wrote:

>Is the triggering incident not just another cautionary note that we learn 
> from?

That was my original thought.   I don't mean my thought has changed, but that's 
the reason I bothered to raise this.

>Why is this particular incident a sign that we need to change the protocol 
> when so many others have not been?

I mentioned this only in an early private, off-list reply and it deserves to be 
said here - I'm not advocating for changing anything.  As it is, colliding key 
tags is at worst a rare snag in operations.  When I say I've seen it three 
times over 13 years of observations, I mean to stress that it is a rare event, 
and that only once has it "hit the press", it isn't usually (2 out of 3) 
operationally impacting.  I don't think anything seen so far justifies altering 
past definitions.

However...for any future re-designs I'd keep this tale in mind.  If I had a 
time machine, I'd go back 25-30 years and argue against the 16-bit field.  
I.e., when looking at DELEG or anything DELEG may enable down the road, I'd 
side away from the use of "pointers" like key tags or possibly hashes.

A side note: there is something I'm working on (which may never see the light 
of day) where I considered using a hash to identify a long-lived key.  Then I 
realized that if hash algorithms are changed, it would be impossible to tell if 
the old hash and the new hash indicated the same key.  Not a key collision 
issue, the fact that hashes are one-way mean that you can't tell, from two 
different hash values, each of a different hash algorithm, if they refer to the 
same (public) key - unless you have the (public) key itself.  And in my 
situation, having to have the key to do this negates the benefit of passing a 
(shortened) hash value around.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] About key tags

2024-02-14 Thread Edward Lewis

On 2/14/24, 04:40, "DNSOP on behalf of Petr Špaček"  wrote:

>In my mind this is good enough reason to outlaw keytag collisions - 
>without them it would be _much_ easier to implement reasonable limits 
>without risk of breaking legitimate clients.

That would make key tags meaningful. ;--)

The question is how, in a multi-signer friendly way.

Enforcement would have to be at zone load time, it might be only then that the 
entire DNSKEY resource record set is completely assembled.  Key generation time 
would be better, but if that happens off-line or is otherwise isolated, the 
check may not have the needed data to be made, especially of multiple tools are 
used to generate keys (whether multi-signer or a transition of platform).

Refusing to load a zone would be a very-late-in-the-game way to enforce this, 
it might be after a zone is entirely signed with the problem key, or after keys 
are generated at different locations and exchanged.

Maybe at data set signing time?  But it is possible to sign data at two 
locations and merge the RRSIG resource record sets after the fact, so the 
signer might not realize it is contributing to a key tag collision.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

2024-02-14 Thread Edward Lewis

From: Manu Bretelle 
Date: Tuesday, February 13, 2024 at 19:03
To: Edward Lewis 
Cc: "dnsop@ietf.org" 
Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting 
expectations in protocol definitions

First - why am I resisting this proposal?  I believe that for the sake of 
operations, development of protocols must trend towards simplicity.  I would 
add a flag or field when necessary and only then, lest it be forgotten (a 
burden with no benefit upon code maintainers) or worse a stumbling block 
(misused, mis-set, generally mis-understood).

On Tue, Feb 13, 2024 at 7:35 AM Edward Lewis 
mailto:edward.le...@icann.org>> wrote:

>An operator dipping its toes with DELEG and encrypted protocols may be willing 
>to signal to a resolver that such failures are likely operational failure 
>because this is a testing endpoint that may be unstable due to lack of 
>operational expertise. A privacy aware resolver can then decide to fallback on 
>clear-text. Again, there is nothing preventing the resolver to fail hard here, 
>this is out of the control of the auth server operator. All that can be done 
>is to "signal".

Wouldn’t the availability of the fallback transport be enough signal that the 
service operator does not have full faith in the preferred transport?  Having a 
separate flag is like a second source of data, there might be an inconsistency 
between the two, which is a generic form of root cause.

>I could also imagine an operator going through their first cert rotation to be 
>erring on the side of safety and switching to "testing" mode temporarily.
A bit of my concern is that sometimes we forget to remove the training wheels 
once we’ve learned.  A common error in operations is to forget the cleanup 
phase (remove old files, etc.) once new functionality has been proven.  This is 
a reason why I’m hesitant to support having a flag like this.
>If you look back at DNSSEC, had it been possible to turn DNSSEC in 
>"permissive" mode, would more operators have taken the leap to enable it 
>knowing that resolvers that would validate records would have been willing to 
>fallback while the flag is on? I think from an operational point of view, this 
>is something that can be of great help to build operational confidence and 
>expertise without taking the risk to break one's DNS.

Yes, yes it would.  Early on there was criticism that DNSSEC was “ok” or 
“fail”.  When operators messed up their key rotations (this happened quite 
often around 2010), there were calls to “purge caches” and even some thought 
given to automating a way for operators to initiate a global cache purge of 
their data.  (Failed, of course - there’s no way.)  This was followed by the 
development of negative trust anchors after the COMCAST/NASA.gov issue, 
something that was an uphill battle by operators to get documented in an IETF 
document.  More recently, an operator asked me about a developing a new 
resource record type that could be published at a zone apex to signal that all 
validations records signed by the apex keyset ought to be ignored.  (Sketched 
up, but not what the operator had in mind.)

Operators list the great leap of risk as a reason not to implement DNSSEC.  The 
protocol design did not accommodate a soft introduction.  The levels of 
certainty are binary - thumbs up or thumbs down thanks to the reliance on the 
DNS response code as the only error channel.

When I wrote a prototype validator during experimentation on DNSSEC, I realized 
that there were 50 or so if statements, anyone of which would cause validation 
to fail.  Some of the if’s were likely transient, some persistent, and so on, 
this information would have informed the response.  But we didn’t have enough 
bandwidth (that response code field was all) to feed that back up the chain.  
We probably then ought to have defined an extended response code mechanism - 
which is now a current work in progress in DNSOP, if I’m right.

In summary - I think this flag would be redundant to the availability of a 
means to fallback.  Basing the justification on “testing phase” assumes that it 
is a distinct phase with a declared ending - which I don’t believe is often 
true.  And I think we do need to build in a way for risk of adoption (initial 
or otherwise) to be lower, one way is via better feedback, other ways via 
abilities to “test-in-prod” (“immediate trial period, when staff is able to 
watch it launch before leaving for lunch”) and so on.
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

2024-02-13 Thread Edward Lewis

I’ve read this and don’t entirely understand the use case.

If I am running a service that uses an in-the-clear transport and then 
experimentally add an encrypted transport, I can see the desire to let the 
clients know that the latter is experimental and subject to accidental 
unavailability.

But once I decide that I want my service to only be available over an encrypted 
transport, why would I make it available over the in-the-clear transport any 
longer?  To prevent fallback, the latter must be disabled entirely.

Taking the perspective of the client, assume the client discovers the service 
and the service is available via a few transport options, with the testing flag 
clear (0, not in testing).  The client then chooses an encrypted transport but 
suffers a connection failure.  The client has the flag in hand, but for other 
reasons, presses on to connect via the in-the-clear transport.  As the service 
operator has indicated that this fallback is not desirable, how does the 
service provider react?

It would seem to me that the best way forward in this use case is for the 
operator to not offer the service over the on-the-clear transport.  That is the 
only way to enforce the “don’t fallback” so-called rule of the service operator.

Why would the service operator leave the undesirable option open?  Is it for 
clients that are not able to use the encrypted transport option?  How can the 
service operator distinguish between clients that can (and should) and those 
that can’t? And what if a client sometimes can and other times can’t, like in a 
nomadic client (nomadic: changes LAN connections from time to time, as opposed 
to mobile, constantly moving)?

I don’t understand the reason for any kind of “negotiating” in this case.  If 
the service operator does not want fallback to occur, remove the option for it 
to occur.

If it is a matter that I want a new service offering to be preferred over an 
old offering, in the sense that I want to test the new offering with live 
traffic for clients willing to take a risk, then offer both the old and new 
side-by-side and encourage, in any way you can, risk-takers to try the new.  (I 
feel compelled to add this cynical retort: this strategy worked so well with 
IPv6!  But let’s move on…)

The protocol design concept involved here is that one side of a communication 
**cannot** enforce any required reaction to be taken by the remote side.  The 
two sides are independent, the medium in between unreliable.

The server-side can’t prevent the client-side from attempting anything.  In the 
same sense you can never prevent an attack.  Neither side can demand the other 
react in a certain way, it’s all about requests (“please do”) and reactions 
(“hear it is”/”nope”).

A server-side doesn’t know the client-side’s context nearly as well as the 
client does, which means assumptions are limited.

Adding the testing flag is an interesting piece of meta-data to add for 
consideration by the remote side when connecting, but it isn’t something 
enforceable, hence just a complication in the configuration of the server and 
communication path.  (Misanthropically speaking: …as annoying as the dozens of 
happy birthday messages in the intra-office chat channel that happen weekly, 
interrupting any chain of thought one might have had!)

From: Ben Schwartz 
Date: Monday, February 12, 2024 at 16:39
To: Manu Bretelle , Peter Thomassen 
Cc: Edward Lewis , "dnsop@ietf.org" 
Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting 
expectations in protocol definitions

Manu and I have now published a draft describing this "testing" flag: 
https://datatracker.ietf.org/doc/draft-manuben-svcb-testing-flag/ 
[datatracker.ietf.org]<https://urldefense.com/v3/__https:/datatracker.ietf.org/doc/draft-manuben-svcb-testing-flag/__;!!PtGJab4!8-ys4ugbv5zdMQkk3MZf2Nj75pZ-yo7WYmpRUUcFYqy8o3WthsNYI-Tjj_lEwF7T8nK17pWWAF1muSZ9A4M4Qw$>

While we think this is relevant to DELEG, it is entirely independent and could 
be used in any SVCB setting (although it doesn't have any obvious utility for 
HTTPS records at present).

--Ben Schwartz

From: Manu Bretelle 
Sent: Wednesday, February 7, 2024 2:19 PM
To: Peter Thomassen 
Cc: Edward Lewis ; Ben Schwartz ; 
dnsop@ietf.org 
Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting 
expectations in protocol definitions

On Thu, Feb 1, 2024 at 4: 49 AM Peter Thomassen  wrote: On 
2/1/24 13: 34, Edward Lewis wrote: > The proper response will depend on the 
reason - more accurately the presumed (lacking any out-of-band signals) reason 
- why
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender

ZjQcmQRYFpfptBannerEnd

On Thu, Feb 1, 2024 at 4:49 AM Peter Thomassen 
mailto:pe...@desec.io>> wrote:

On 2/1/24 13:34, Edward Lewis wrote:
> The proper response will depend on the reason - more accurately the presumed 
> (lacking any out-of-b

[DNSOP] Adding a URL ... Re: [Ext] Re: About key tags

2024-02-12 Thread Edward Lewis

I should have included this URL, pointing to the article (via Google Translate) 
saying the outage was rooted in a key tag collision...

https://www.rbc.ru/technology_and_media/07/02/2024/65c38fea9a794752176bd3a0

On 2/12/24, 08:50, "Edward Lewis"  wrote:

On 2/9/24, 20:37, "Wellington, Brian"  wrote:
>The behavior was never added into any standards document because it has 
nothing to do with the standard.

True - but still it created a situation where operators could get snagged 
on something.

>If an implementation doesn’t support multiple keys with the same key tag 
when validating, that would be noncompliant.  That was not the case, though.

Also true, this is the reason why "colliding" key tags have not resulted in 
operational events (until, allegedly - assuming the English translation of the 
report I saw is accurate - the RU outage).

But validation (and signing for that matter) is not the entirety of where 
DNSSEC operational gaffs can happen - it can happen in the handling of the 
keys, namely, inserting or deleting the wrong key when two or more have the 
same key tag.

The issue is - by relying only on the 5-digit, easy to read, key tag, an 
operator may wind up including/excluding the wrong key.  With the set of keys 
in operation at any time being 3-5, the benefit of having a key tag (to select 
a subset) isn't great enough to justify this risk.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Encourage by the operator ... Re: [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

2024-02-12 Thread Edward Lewis

On 2/9/24, 11:02, "pch-b538d2...@u-1.phicoh.com on behalf of Philip Homburg" 
 wrote:

> One of the misconceptions in DNSSEC is that the zone administrator
> is in control of the situation, dictating the state of signing,
> the cryptography in use, and so on.  DNSSEC is for the benefit of
> the querier, not the responder.  A zone administrator can't force
> a querier to validate the results, it can't dictate what cryptographic
> library support the receiver must have.  

I don't see how this statement is relevant.

This was the text that made me react:

# If DELEG is mainly used to signal that a secure transport, such as DoT, DoH,
# or DoQ, is available then falling back to NS/DS might be preferred (by the
# zone operator) over failure.

...specifically, " then falling back to NS/DS might be ***preferred (by the 
zone operator)***"...

We need to approach the design with the knowledge that the querier is in the 
driver's seat, it is up to the querier to decide whether to fall back, or not, 
in any way.  The zone operator (the responder) can only present options to the 
querier (here), not dictate (that's too strong a word) or encourage (a bit 
softer) or influence (yet milder) how the querier will "act next".

It's the querier's prerogative to choose whether they fall back and how, and 
what privacy enhancement they crave (judging this being a concern by the 
inclusion of DoT and DoH in the context), not the responder's.

I'm not picking on fall back, or privacy - I'm picking on the process of how we 
design the protocol.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: About key tags

2024-02-12 Thread Edward Lewis

On 2/9/24, 22:05, "Mark Andrews"  wrote:

>The primary use of the key tag is to select the correct key to validate the 
>signature from multiple keys. 

Yes - which is great if 1) you need to pare down the potential set of keys into 
something you can handle (like, from 10's to 3) and 2) if you have somewhat to 
request only those keys.

Operators generally only publish 2 keys outside of rolls, 3 when rolling the 
ZSK or the KSK, maybe more if they aren't optimizing.  There's no need to 
specify a subset.  I say this with complete highsight.

And, in the DNSSEC protocol, there's never been a way to request the DNSKEY 
resource record set (to validate something) that includes 'but only those 
key(s) with key tag ABCDE.  So, subsetting doesn't help the response size issue.

My reason for raising this is...not to deprecate key tags as they exist today, 
it's not worth it, but to avoid designing something like them in the future.  
We don't need them, and they have contributed operational issues and, 
reportedly, one significant outage.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: About key tags

2024-02-12 Thread Edward Lewis

On 2/9/24, 20:37, "Wellington, Brian"  wrote:
>The behavior was never added into any standards document because it has 
>nothing to do with the standard.

True - but still it created a situation where operators could get snagged on 
something.

>If an implementation doesn’t support multiple keys with the same key tag when 
>validating, that would be noncompliant.  That was not the case, though.

Also true, this is the reason why "colliding" key tags have not resulted in 
operational events (until, allegedly - assuming the English translation of the 
report I saw is accurate - the RU outage).

But validation (and signing for that matter) is not the entirety of where 
DNSSEC operational gaffs can happen - it can happen in the handling of the 
keys, namely, inserting or deleting the wrong key when two or more have the 
same key tag.

The issue is - by relying only on the 5-digit, easy to read, key tag, an 
operator may wind up including/excluding the wrong key.  With the set of keys 
in operation at any time being 3-5, the benefit of having a key tag (to select 
a subset) isn't great enough to justify this risk.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Encourage by the operator ... Re: [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

2024-02-08 Thread Edward Lewis

On 2/8/24, 09:25, "DNSOP on behalf of Philip Homburg"  wrote:

>whether fallback to NS/DS is encouraged by the operator of the zone.
>
>If DELEG is mainly used to signal that a secure transport, such as DoT, DoH, 
>or DoQ, is available then falling back to NS/DS might be preferred (by the 
>zone operator) over failure.

One of the misconceptions in DNSSEC is that the zone administrator is in 
control of the situation, dictating the state of signing, the cryptography in 
use, and so on.  DNSSEC is for the benefit of the querier, not the responder.  
A zone administrator can't force a querier to validate the results, it can't 
dictate what cryptographic library support the receiver must have.  Whatever a 
zone administrator publishes in a zone on a name server is open to the world, 
although NSEC3 hashing does help to stem, to some extent, abusive mining of 
what is published.  All choices of how to proceed are made by the recipient.  I 
mention this as a precursor to DELEG design.

A zone administrator isn't the beneficiary of secured transports, the receiver 
is.  (The zone administrator already has the data - no need for transporting 
it.)  It is the receiver's choice to attempt to look something up with any 
receiver-set expectation of privacy, it is the receiver's choice to lower that 
expectation if it can't be met.  The zone administrator is out there in plain 
sight, anyone can see the data, anyone can see activity.  One can't (always) 
identify the receiver, that's what the privacy-enhancing transports support.

A zone administrator may elect to not make data within a zone available via 
NS/DS delegation, a zone administrator may elect to support only certain 
transports, akin to only supporting IPv6 but not offering IPv4.  The zone 
administrator does not direct how any fallback happens.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] About key tags

2024-02-08 Thread Edward Lewis

Prior to the news breaking that having two keys with the same key tag in a TLD 
led to an outage in late January, I was debugging some analysis code of mine 
that broke when a different TLD simultaneously published two DNSKEY resource 
records with the same key tag.  This code had been fixed once before, when yet 
another TLD in 2019 had a KSK and a ZSK share a key tag.  Yes, my code sucks, 
everyone already knew that a 16-bit value would inevitably suffer collisions 
when used to identity thousands-of-bits long pieces of data.  Neither of two 
instances I'd dealt with were operationally impacting.

When DNSSEC was designed, the possibility of tags colliding was known.  The 
validation process was defined to expect that a tag might lead to a 
non-singleton set of keys.  When it came to key management, and the practice of 
storing keys in files named K--.public was derived, the 
lone developer of DNSSEC code (at the time) sidestepped the odds that key tags 
would collide by deleting the work done when creating a new key pair if the 
file to be used already existed.  This side-stepping practice was never added 
to a standards document.  (BTW, the reason for the "K" in the filename was that 
we developed the protocol using a test root zone, meaning the  would be 
".".  A filename starting with "." is hidden in the OS systems we used then, so 
we needed a prefix.)

Thinking back on why the key tag was created and used, it was to help 
distinguish what key was to be used to validate a signature in an RRSIG 
resource record.  At the time we imagined a future where cryptography was used 
differently than it is today.  We assumed that zones would have many keys, keys 
of different "strengths", lengths, and algorithms, all in current use.  We 
didn't think about response size, that wasn’t in the forefront, because we knew 
we would blow past 512 bytes.  Given a lot of keys, even being able to identify 
a subset from which to try all possible keys would be a win.

Operators today will typically have two published keys - a KSK and a ZSK - with 
one used to sign just the DNSKEY resource record set and the other to sign all 
the other sets in the zone.  Only when keys are being rolled would multiple 
keys of either type (cryptographic role) be seen.  Operator practice has come 
to eliminate the need to subset the keys for manageability.

Even if operators did deploy enough keys so that sub-setting the keys would be 
useful, the protocol lacks the ability to ask for keys with a certain key tag 
from a DNSKEY resource record set.  When a query is issued for keys, it will 
always fetch the entire set.  This might have been a design-time oversight.

If the key tag had not been invented, the RRSIG resource record would still 
need to list the owner of the DNSKEY resource record that is needed to validate 
the signature, and this would be sufficient.  Even ignoring the SEP bit in 
validation (as is required by rule), there would be few keys to try.  Trying 
all the keys in sets today (usually 2) would be no worse than if there were 2 
keys with the same key tag.  Perhaps it is stretch - time needed to validate a 
signature is less of a concern now that it was on 1990's hardware (what we used 
in the lab).

In the DS resource record, there's a hash of the key to provide some more 
protection which obviates the need for the key tag, come to think of it.  (I 
could make the argument that the hash should have just been the key bits 
itself, but that's for another tirade.)

I can't think of a good reason for the key tag anymore, seeing how 20 years of 
operations have evolved DNSSEC outside the lab eliminating the need to subset 
the DNSKEY resource record set.  And now confusion over keys due to colliding 
key tags has contributed to one significant outage.  Ok, one outage doesn't 
mean it's time to panic.  The point of this tirade is not to eliminate the key 
tag from current use, but raises something to keep in mind when designing DELEG 
or future delegation mechanisms.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

2024-02-08 Thread Edward Lewis

From: Manu Bretelle 
Date: Wednesday, February 7, 2024 at 14:19
To: Peter Thomassen 
Cc: Edward Lewis , Ben Schwartz , 
"dnsop@ietf.org" 
Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting 
expectations in protocol definitions

>Agreed, I don't think that the protocol should prescribe what to do in case of 
>"operational error". Differentiating an "operational error" from an actual 
>malicious interference is very likely going to be a slippery slope.

Diagnosis is much more difficult than detecting symptoms, so attempting to 
differentiate between errors and attacks is bound to be difficult, whether for 
a human operator or an automated script.

>That being said, I think it will be useful for adoption that resolvers provide 
>a feature to use DELEG and fallback to NS when things are not correct. This is 
>not something that is to be part of the protocol though.
>
>What I see could be useful is if we could signal something alike the qualifier 
>in SPF [0]. This way an operator could onboard their zone into DELEG in 
>"testing mode", allowing them to enable DELEG with the comfort of falling back 
>to NS, build confidence and flip the switch. This could have the side effect 
>of ever having DELEG delegations in "testing mode" though.
>
>[0] https://www.spf-record.com/syntax 
>[spf-record.com]<https://urldefense.com/v3/__https:/www.spf-record.com/syntax__;!!PtGJab4!7-se9JpiApxa27fExN-OEjA6EzLuBcOZbd_mdZzmpomrgI-WDlfPK2AD5H44ShDGlaNg4ZBij6vlPDNABueEukjk$>

We do need to recognize that one barrier to operator adoption is the risk of 
taking that first step, hence whatever is defined needs to address that.

When it comes to DELEG vs. NS, I don’t see the “fallback to NS” as a downgrade 
attack.  The design of the transition shouldn’t think of a “fallback” as that 
but just as a transition mechanism.  The receiver, though, should be able to 
know when it should be able to get more information that might be in a DELEG 
record - setting expectations.

If a receiver builds the expectation of getting a DELEG and a needed piece of 
information to set up a more secured (encrypted?) channel, but can’t get that 
information, whether the receiver gives up or goes the old route is not part of 
DELEG, it’s part of the context the receiver is operating within.

I don’t think the DNS is capable of supporting any notion that a receiver will 
get the information that they ought to receive.  DNSSEC is about validating 
what you receive, not ensuring the data gets through.  In fact, no system can 
guarantee delivery - your “wires” may be yanked before the signals come back.  
Trying to overcome this would … consume an eternity of research budgets.
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

2024-02-01 Thread Edward Lewis

Automation isn't an solution in an of itself. When I recently mentioned,
during a panel discussion, that automation is essential (for scalability), an
operator on the same panel responded that automation is also a great way to
scale problems. Automation is needed but it must be automated correctly and
rely on good heuristics when it can't be deterministic. Automation contributes
to resiliency insofar as addressing "fat-fingering" and "forgot to do it", but
it won't address systemic issues. Automation won't fix weaknesses, but
appropriately done it enables scalability and contributes to stability. (Much
the same as "rebooting" never fixes problems, but it does make then go away -
for a while.)

Nevertheless, the protocol definition has to expect and react appropriately to
benign operations errors. What this means is that the protocol definition
needs to include features that a receiver can use when expectations are not met
to determine how to react. In the first DNSSEC validator (circa 1998), there
were 50-100 different error codes, some indicated transient problems, some
persistent, some suspicious, some superficial, with the problem becoming that
only SERVFAIL was available to signal an error, a well-known knock on the
DNSSEC design. In a perfect world, the protocol definition would not give rise
to mistakes, a design ought to be graded on how far it goes towards that goal,
but there'll never been a perfect world.

As far as deployment, I think measurements of that ought to be integral in
judging how well a protocol is designed. I attended part of the "Evolvability,
Deployability, & Maintainability" (edm) WG session at IETF 118 and joined the
mailing list to make that point but have heard no reaction. The discussion was
focused only on seeing multiple implementations, falling short of examining
whether anyone made us of the code (paths). Deployment to me is how the field
of operations grades a protocol definition.

On 2/1/24, 07:49, "DNSOP on behalf of Peter Thomassen" wrote:

On 2/1/24 13:34, Edward Lewis wrote:
> The proper response will depend on the reason - more accurately the
presumed (lacking any out-of-band signals) reason - why the record is absent.

Barring any other information, the proper response should IMHO not depend
on the presumed reason, but assume the worst case. Anything else would break
expected security guarantees.

> From observations of the deployment of DNSSEC, [...]
> It’s very important that a secured protocol be able to thwart or limit
damage due to malicious behavior, but it also needs to tolerate benign
operational mistakes. If mistakes are frequent and addressed by dropping the
guard, then the security system is a wasted in investment.

That latter sentence seems right to me, but it doesn't follow that the
protocol needs to tolerate "benign operational mistakes".

Another approach would be to accompany protocol deployment with a suitable
set of automation tools, so that the chance of operational mistakes goes down.
That would be my main take-away from DNSSEC observations.

In other words, perhaps we should consider a protocol incomplete if the
spec doesn't easily accommodate automation and deployment without it would
yield significant operational risk.

Let's try to include automation aspects from the beginning.

Peter

https://urldefense.com/v3/__https://desec.io/__;!!PtGJab4!59Bd5xr0sMeJ5zWRh-uPWUQ_wVp05KY0rjweR55k1uxSyApBVPnOv28bYt2OwrkEgN-EyLTU3zHpyHG-bb4tB5c$
[desec[.]io]

___
DNSOP mailing list
DNSOP@ietf.org

https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/dnsop__;!!PtGJab4!59Bd5xr0sMeJ5zWRh-uPWUQ_wVp05KY0rjweR55k1uxSyApBVPnOv28bYt2OwrkEgN-EyLTU3zHpyHG-XdGs2c4$
[ietf[.]org]

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

2024-02-01 Thread Edward Lewis

After thinking about the response below a bit, my question would be - when a 
receiver expects a record to be present, but it isn’t, what is the proper 
response?

The proper response will depend on the reason - more accurately the presumed 
(lacking any out-of-band signals) reason - why the record is absent.

From observations of the deployment of DNSSEC, most of the time that needed 
records weren’t present have been from operational errors and not malicious 
activity.  This has led to a mindset that makes new deployments seen as high 
risk.

It’s very important that a secured protocol be able to thwart or limit damage 
due to malicious behavior, but it also needs to tolerate benign operational 
mistakes.  If mistakes are frequent and addressed by dropping the guard, then 
the security system is a wasted in investment.

I’ve not yet equated a missing DELEG record to a sign of a malicious attack.  I 
can see that during a transitional phase, the absence/presence of a DELEG 
record may be a matter of one or more operators experiencing roll-out jitters.  
Or even caching effects, where (apparently) overly long TTLs on existing 
resource record sets for a zone might have pinned them into memory while 
servers are now publishing DELEG.

That is why the protocol definition needs to treat this as ‘set expectations’ 
and how to react, rather than assume a mindset of addressing ‘downgrade 
attacks.’  How one reacts to a benign error is different than the reaction to a 
malicious attack, in the past protocol definitions have been written as if 
everything unexpected was malicious.

The question remains - can a receiver distinguish between a benign error and a 
malicious attack?

From: Ben Schwartz 
Date: Tuesday, January 30, 2024 at 13:59
To: Edward Lewis , "dnsop@ietf.org" 
Subject: [Ext] Re: General comment about downgrades vs. setting expectations in 
protocol definitions

In this line of reasoning, let's remember the "herd immunity" effect.  If 
receivers mostly respond to expectation violations by transparent fallback, an 
attacker on the wire has more incentive to attempt the downgrade attack.  If 
receivers mostly "fail closed", this incentive is reduced.  This is a 
collective security effect, not something that can be determined unilaterally 
by each receiver.

--Ben Schwartz
____
From: DNSOP  on behalf of Edward Lewis 

Sent: Tuesday, January 30, 2024 1:21 PM
To: dnsop@ietf.org 
Subject: [DNSOP] General comment about downgrades vs. setting expectations in 
protocol definitions

!---|
  This Message Is From an External Sender

|---!

I hear talk about "downgrade attacks" frequently, across different ideas.  
Hearing it again in the context of DELEG, I had this thought.

We often wind up mired in discussions about downgrades, what they mean, the 
consequences.  I'd suggest, as definers of protocols, we think in terms of 
ensuring that receivers of messages have an expectation of something.  Inside 
protocol rules, data may be expected and arrive, expected and not, unexpected 
and arrive, or unexpected and not arrive.  A downgrade attack is a diagnosis of 
"expected and not".

A protocol ought to be documented to set up the receiver's expectations and 
define what the receiver does when they are not met.

Apologies for this generic message, when looking at the DELEG documents again, 
it'll be something I'll keep in mind.  I.e., the proposal to define one of the 
flags in the DNSKEY resource record format is setting up an expectation

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop 
[ietf.org]<https://urldefense.com/v3/__https:/www.ietf.org/mailman/listinfo/dnsop__;!!PtGJab4!9Kib5s7byH9knE7jhrQHwMERW6OMcWPfc55IGNZKu9OARR3qQfUc8Jef5Vv0DwibGVBtaVRpbgZ6N2ssZQkVyQ$>
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Two points by Joe - was Re: [Ext] Re: DELEG and parent only resolution

2024-01-31 Thread Edward Lewis

On 1/31/24, 13:04, "DNSOP on behalf of Dave Lawrence"  wrote:
>
>Edward Lewis writes:
>> The impact on the registration system wasn’t raised at the table.
>
>Not entirely true.  We did recognize that there'd need to be an EPP
>draft too.

Ah, yes.  Joe suggested not making changes to the registration process (I'm not 
using system) to ease DELEG into the existing registration systems (here I am). 
 What I recall from the table is that it was suggested to avoid changing the 
namespace (tree), the notion of zones, and the stub resolver experience, we 
didn't consider barring changes to the registration process.   However, in 
elevating operators into the protocol we did talk about the need to have 
registration information to support this.  So, yeah, what I wrote wasn't 
accurate, what I was thinking was that we hadn't tried to keep registration as 
it is.

I see DELEG as an enabler, not an end goal.  It will be deployed because it'll 
enable improvements (even if only reduction in operator pain/costs), so it 
needs to be deployed ahead of the gains.  Joe raises a good point about this 
first phase, DELEG-implementing code needs to be distributed before it'll start 
to offer a payoff.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Two points by Joe - was Re: [Ext] Re: DELEG and parent only resolution

2024-01-31 Thread Edward Lewis

Two things buried in Joe’s message I want to build on:

From: DNSOP  on behalf of Joe Abley 

Date: Wednesday, January 31, 2024 at 03:17
>
>However, that will require new metadata to be bundled with domain registration 
>in transactions between registrant and registrar and between registrar and 
>registry.

One of the topics discussed was the boundary of the work.  We don’t want to 
alter the namespace, at least not at the start.  (I say “at least” as other 
progress may enable future features.)  We don’t want to alter the management 
model, that would upend operations, not improve the operability of the system.  
And we don’t want to alter the “stub resolver experience” so that applications 
are not required to change.  DELEG is meant to address the pain of operating 
the current publication protocol, the warts and rough edges in the messages and 
message passing.

The impact on the registration system wasn’t raised at the table.  I think 
implicitly, there was a desire to alter it for the better, namely in elevating 
the role of a DNS operator within registration (in having a registrant 
designate an operator).  But I suppose for the initial phase we can try to 
define DELEG within the constraints of the existing registration system.  But 
keep in mind that you can divide the registration system three ways (of course, 
more), some TLDs are operated under a contract that requires a specific set of 
standard behaviors (consisting of most generic TLDs), some TLDs (or ‘like 
TLDs’) are operated under various environments (mostly the jurisdiction TLDs, 
RIR’s and some NIR’s), and then there are registrations in all other parts of 
the DNS name space.

So - working within current registration data flows would help get the first 
DELEG out the door, one of the early desires of DELEG is to address the lack of 
presence of the operator in registration which suggests changes in registration 
data flows.

>There are various reasons why that might take a while, even in the most 
>optimistic success scenario for DELEG.

What the global public Internet lacks is a Project Manager, and this is one 
reason why deployments drag on for decades.  Incremental improvements need to 
be self-justifying, which is often not the case when investing in future 
infrastructure.  I don’t see an Internet Project Management Task Force ever 
being created - as in there are no Protocol Police - so it’s important to make 
sure the goals and motivation or justification, are clearly documented.  It’s 
said there is no profit in running DNS, DELEG is about cost reduction (more 
manageable, etc.).
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] General comment about downgrades vs. setting expectations in protocol definitions

2024-01-30 Thread Edward Lewis

I hear talk about "downgrade attacks" frequently, across different ideas.  
Hearing it again in the context of DELEG, I had this thought.

We often wind up mired in discussions about downgrades, what they mean, the 
consequences.  I'd suggest, as definers of protocols, we think in terms of 
ensuring that receivers of messages have an expectation of something.  Inside 
protocol rules, data may be expected and arrive, expected and not, unexpected 
and arrive, or unexpected and not arrive.  A downgrade attack is a diagnosis of 
"expected and not".

A protocol ought to be documented to set up the receiver's expectations and 
define what the receiver does when they are not met.

Apologies for this generic message, when looking at the DELEG documents again, 
it'll be something I'll keep in mind.  I.e., the proposal to define one of the 
flags in the DNSKEY resource record format is setting up an expectation

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Conflicting info was - Re: [Ext] Re: draft-dnsop-deleg-00

2024-01-30 Thread Edward Lewis

On 1/30/24, 09:57, "DNSOP on behalf of Roy Arends"  wrote:
>> On 30 Jan 2024, at 12:57, Joe Abley  wrote:
>
>> Related, what to do when the ipv4hints are not the same as the 
> corresponding A RRSet?
>
>IMHO, potential unsigned glue records from elsewhere are inferior to 
> address records in a signed DELEG record. If a validator supports DELEG, and 
> has information such as Nameserver names and name server addresses, it should 
> ignore glue and NS records.

The question of "what happens when two sources differ on information" is a good 
one.

In "Clarifications to the DNS Specification" a trustworthiness scale is in the 
"Ranking data" section. (That's RFC 2181, section 5.4.1. for those that address 
via numbers.)  Nevertheless, I've see aggressive resolvers rely on glue records 
when higher ranking data led to no response (query went out, no response within 
a set time out) or was inclusive (meaning no address resource record sets could 
be found).  "Aggressive" meant that the resolver tried all tricks, 
protocol-following or not, to get an answer back to the requester.

What I mean is - it would be good to give a crisp, specific prescription for 
this case, but history shows that implementers can be crafty.  I don't know if 
that is better or worse for operations though.  In operations it would be good 
if the events were predictable (meets expected behavior) but it is also good if 
we limit faults.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Extensible from the start - was - Re: [Ext] Re: DNSOPComments on draft-dnsop-deleg-00.txt - section 1

2024-01-30 Thread Edward Lewis

On 1/30/24, 01:14, "DNSOP on behalf of Ralf Weber"  wrote:
>... but having a
>record type that is extensible from the start ...

Designing in extensibility is a very good idea, ah, essential idea, but isn't a 
no-brainer.

Start by asking and documenting: What information is needed at a DNS 
delegation?  There's the service address of course.  There's a security context 
to be related.  And there are arguably other meta-data to include.  Consensus 
on this answer needs to be achieved, from this we can determine whether the 
construct of the resource record is necessary and sufficient.

Because the draft only defines DELEG via examples, I need to ask this question 
this way:

The RDATA has three fields -

Is there going to be an assumed "standard set" of keywords?  (And a defined 
manner to know how to deal with unknown-to-the-receiver keywords.)  In asking 
this I'm thinking of the early experience with message compression, that is was 
supposed to only work for the types defined in the base DNS documents [those 
labeled STD 13] but then compression was accidently/inappropriately added for 
more, which led to a mess that "Handling of Unknown DNS Resource Record (RR) 
Types" had to deal with.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Comments on draft-dnsop-deleg-00.txt - Section 3

2024-01-25 Thread Edward Lewis

# 3.  Implementation
...
# 3.1.  Including DELEG RRs in a Zone
# 
#A DELEG RRset MAY be present at a delegation point.  The DELEG RRset
#MAY contain multiple records.  DELEG RRsets MUST NOT appear at a
#zone's apex.
# 
#A DELEG RRset MAY be present with or without NS or DS RRsets at the
#delegation point.
# 

...the question is, during a search, where a domain name is the closest 
encloser in the zone for a query name and the domain name has a DELEG but no NS 
records, the authoritative server will respond with a referral, containing the 
DELEG in the authority section.  The querier, if not DELEG-aware, can't be told 
what to do here (it if could, it'd probably just implement DELEG support) so 
it'll do what it does now.  I'm sure this has been worked on (tested) and 
probably here would be a good place to include that, so that the implementer 
can have an idea of how the querier will react...

The next paragraph doesn't belong here - either in section 2, semantically 
defining the record or in a separate part of section 3.

#Construction of a DELEG RR requires knowledge which implies
#communication between the operators of the child and parent zones.
#This communication is an operational matter not covered by this
#document.
# 
# 3.1.1.  Signing DELEG RRs
# 
#A DELEG RRset MUST be DNSSEC signed if the zone is signed.
# 
#If a signed zone contains DELEG records, the zone MUST be signed with
#a DNSKEY that has the DELEG flag set.

To start DELEG, a zone would have to first add a new DNSSEC (zone) key with the 
flag on, then add the first DELEG resource record set(s) and sign them, and 
then probably roll all the other signatures to the new key.

I'm just a little concerned about adding key flags because that changes the key 
tag and validators would have to know how to match the keys with this new 
wrinkle.

# 3.2.  Authoritative Name Servers
# 
# 3.2.1.  Including DELEG RRs in a Response
# 

...assuming the QTYPE is not DELEG or AXFR or '*' (or anything that matches 
DELEG) which would cause the DELEG to be in the answer section:

#If a DELEG RRset is present at the delegation point, the name server
#MUST return both the DELEG RRset and its associated RRSIG RR in the
#Authority section along with the DS RRset and its associated RRSIG RR
#and the NS RRset.

RRSIG RR - there could be multiple signature records.

You need to account for having a DELEG resource record set, having a NS 
resource record set, and no DS resource record set, meaning, having to include 
proof of non-existence of a DS resource record set.

And what happens if there is a DELEG resource record set, DS resource record 
set, and no NS resource record set?  I would think this an anomaly, but I've 
seen many permutations of DNS configurations in the wild and a good spec will 
account for all cases.

I would think that when there is an expectation of a DELEG record (deferring 
that definition for now), a referral has to include in its authority section 
these things: A DELEG resource record set or proof of non-existence, A NS 
resource record set, and A DS resource record set or proof of non-existence.

To establish the expectation of DELEG, is it enough to require that at least 
one non-revoked zone key (key flag bit 7 is 1, key flag bit 8 is 0, with flag 
key flag bit 15 being 0 or 1) be included in the set?  In terms of a "chain of 
security", if the parent state is secured, then the cut point DS (or DELEG) 
resource record set will refer to the apex DNSKEY resource record set and in 
there, a DNSKEY resource record with the DELEG-supporting bit can be seen.

(I think section 1.5.1. of the document needs to be tightened up, perhaps to be 
as specific as the above paragraph.)

#If no DELEG RRset is present at the delegation point, and the zone
#was signed with a DNSKEY that has the DELEG flag set, the name server
#MUST return the NSEC or NSEC3 RR that proves that the DELEG RRset is
#not present including its associated RRSIG RR along with the DS RRset
#and its associated RRSIG RR if present and the NS RRset, if present.
# 
#Including these DELEG, DS, NSEC or NSEC3, and RRSIG RRs increases the
#size of referral messages.  If space does not permit inclusion of
#these records, including glue address records, the name server MUST
#set the TC bit on the response.
# 

# 3.2.2.  Responding to Queries for Type DELEG
# 
#DELEG records, when present, are included in referrals.  When a
#parent and child are served from the same authoritative server, this
#referral will not be sent because the authoritative server will
#respond with information from the child zone.  In that case, the
#resolver may query for type DELEG.

The above is hard-to-read.  It's true that referrals are suppressed when a 
hierarchy is all on a server, but this has nothing to do with how a server 
responds to a QTYPE=DELEG, QTYPE=T_ANY (*), or QTYPE=AXFR, each

[DNSOP] Comments on draft-dnsop-deleg-00.txt - section 2

2024-01-25 Thread Edward Lewis

As I'm not writing code, my comments will be less detailed...

# 2.  DELEG Record Type
...

# 2.1.  Difference between the records
# 
#This document uses two different resource record types.  Both records
#have the same functionality, with the difference between them being
#that the DELEG record MUST only be used at a delegation point, while
#the SVCB is used as a normal resource record and does not indicate
#that the label is being delegated.  For example, take the following
#DELEG record:
# 
#Zone com.:
#example.com.  86400  IN DELEG 1   config2.example.net.

Introducing something via an example leads to problems later on, as 
"definitions by example" create corner case gaps.  Even though it is said that 
the DELEG shares the format of the SVCB record, it would be good to give the 
generic definition first.

For one, having not read the SVCB document, the first RDATA field (1 or 0) 
seems to indicate whether the second field is the target for the requester or 
is another domain name the requester needs to consult.  It's not clear what 
happens if this value is 2 or more.
 
...
 
#The primary difference between the two records is that the DELEG
#record means that anything under the record label should be queried
#at the delegated server while the SVCB record is just for redirection
#purposes, and any names under the record's label should still be
#resolved using the same server unless otherwise delegated.

Earlier it is said that the DELEG resource record is like the SVCB resource 
record, with some differences. When applying for an entry in the Resource 
Record (RR) TYPEs registry an application has to justify why a new type is 
needed.  For this, it would be good to have a section enumerating the reasons 
why DELEG needs a different type than SVCB.  (I know there are reasons, would 
be good to put them up with a generic definition of the resource record.)
 
# 2.2.  AliasMode Record Type
...
# 
#example.com.86400IN  DELEG 0   config1.example.net.

Does the "0" mean that this is AliasMode?

# 2.2.2.  Loop Prevention
# 
#The TargetName of an SVCB or DELEG record MAY be the owner of a CNAME
#record.  Resolvers MUST follow CNAMEs as well as further alias SVCB
#records as normal, but MUST not allow more then 4 total lookups per
#delegation, with the first one being the DELEG referral and then 3
#SVCB/CNAME lookups maximal.

A few questions - why 4?  (Besides it being tradition?)
Does the count of 4 include CNAME's?  What if it is DELEG CNAME CNAME CNAME 
SVCB CNAME CNAME CNAME?
What are the practical reasons for such chains (e.g., why would an operator 
ever see this)?

...I'm asking in the sense that some protocol elements are defined to be as 
general as possible, which winds up being a code implementation headache and 
then operator headache with little or no payoff...I'm not denying this may be 
useful, but a case should be made for it.

#Special care should be taken by both the zone owner and the delegated
#zone operator to ensure that a lookup loop is not created by having
#two AliasMode records rely on each other to serve the zone.  Doing so
#may result in a resolution loop, and likely a denial of service.  The
#mechanism on following CNAME and SVCB alias above should prevent
#exhaustion of server resources.  If a resolution can not be found
#after 4 lookups the server should reply with a SERVFAIL error code.

It seems that the intent is to count each DELEG->CNAME->SVCB transition.  Okay, 
but the question of "why 4" still holds.

# 2.3.  Deployment Considerations
...
# 2.3.3.  Availability
# 
#If a zone operator removes all NS records before DELEG and SVCB
#records are implemented by all clients, the availability of their
#zones will be impacted for the clients that are using non-supporting
#resolvers.  In some cases, this may be a desired quality, but should
#be noted by zone owners and operators.

There are application servers (web servers) that function (poorly) today where 
there is a delegation from a parent to a child zone that does not answer to NS 
records.  I used to see a lot of mangled DNS implementations that would only 
return address records (A resource records and maybe even  resource 
records).  These situations work because of liberal resolvers, making use of 
the glue to send address requests to what the glue indicates, and those servers 
answering to A or  requests.

When an operator decides to go all DELEG, meaning no more NS (DS), it might 
look like today's "broken set up" that only answers addresses.  I don't think 
that last sentence adequately covers the situation.  I don't have a suggest now.

# 2.4.  Response Size Considerations
# 
...This has become a larger concern (meaning from the 90's to the now's) than 
when DNSSEC was first designed.  Besides trying to make the DELEG as 
bitspace-efficient as possible, what can be done to simp

[DNSOP] Comments on draft-dnsop-deleg-00.txt - section 1

2024-01-25 Thread Edward Lewis

I won't be pedantic (nits, wording) this time around, just raise conceptual 
issues in section 1

# 1.  Introduction
# 
#In the Domain Name System [STD13], subdomains within the domain name
#hierarchy are indicated by delegations to servers which are
#authoritative for their portion of the namespace.  The DNS records
#that do this, called NS records, contain hostnames of nameservers,
#which resolve to addresses.  No other information is available to the
#resolver.  It is limited to connect to the authoritative servers over
#UDP and TCP port 53.

Other information must be assumed...this is a crucial weakness.

#This limitation is a barrier for efficient introduction of new DNS
#technology.  New features come with additional overhead as they are
#constrained by the intersection of resolver and nameserver
#functionality.  New functionality could be discovered insecurely by
#trial and error, or negotiated after first connection, which is
#costly and unsafe.

Note that the DNS protocol does not have "connections" - which is a stumbling 
block before even getting to the "costly and unsafe" stage.  One could point 
out that it would be possible to introduce connections, thus sessions, in the 
DNS, but you would need a mechanism like DELEG to get to that point.

#The proposed DELEG record type remedies this problem by providing
#extensible parameters to indicate capabilities that a resolver may
#use for the delegated authority, for example that it should be
#contacted using a transport mechanism other than DNS over UDP or TCP
#on port 53.

"it"  each of the authoritative servers.  Sorry, I won't be pedantic, just 
to remember that the document has to be very clear and precise.  (Looseness 
breeds clarification documents...)
 
#DELEG records are served with NS and DS records in the Authority
#section of DNS delegation type responses.  Standard behavior of
#legacy DNS resolvers is to ignore the DELEG type and continue to rely
#on NS and DS records (see compliance testing described in
#Appendix A).  Resolvers that do understand DELEG and its associated
#parameters can efficiently switch to the new mechanism.

and (possibly) ignore the NS and DS resource records.  It would be good to have 
an idea when NS and DS could be retired.  Perhaps not by us, but by our 
student's students. __
 
#The DELEG record leverages the Service Binding (SVCB) record format
#defined in [RFC9460], using a subset of the already defined service
#parameters.
# 
#DELEG can use AliasMode, inherited from SVCB, to insert a level of
#indirection to ease the operational maintenance of multiple zones by
#the same servers.  For example, an operator can have numerous
#customer domains all aliased to nameserver sets whose operational
#characteristics can be easily updated without intervention from the
#customers.  Most notably, this provides a method for addressing the
#long-standing problem operators have with maintaining DS records on
#behalf of their customers.  This type of facility will be handled in
#separate documents.

I believe there is a lot more to be said here, relating the benefits of this to 
the DNS operator community.  It will take a lot of resources to convert from 
the DNS protocol using NS resource records to a DNS protocol using DELEG 
resource records for delegation (sounds like ipv4 vs. ipv6) and these resources 
will come from DNS operations.  The pain of operating DNS today is what DELEG 
wants to ease, there's the tradeoff.

# 1.1.  Terminology
# 
#The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
#"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
#"OPTIONAL" in this document are to be interpreted as described in
#BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
#capitals, as shown here.
# 
#Terminology regarding the Domain Name System comes from [BCP219],
#with addition terms defined here:
# 
#*  legacy name servers: An authoritative server that does not support
#   the DELEG record.
# 
#*  legacy resolvers: A resolver that does not support the DELEG
#   record.
# 

Legacy is such a generic word.  I'll propose NS-only name servers and NS-only 
resolvers to capture this distinction.

# 1.2.  Motivation for DELEG
# 
#*  There is no secure way to signal capabilities or new features of
#   an authoritative server, such as authenticated DNS-over-TLS.  A
#   resolver must resort to trial-and-error methods that can
#   potentially fall victim to downgrade attacks.
# 
#*  Delegation point NS records and glue address records are, by
#   design, not DNSSEC signed.  This presents a leap of faith.
#   Spoofed delegation point NS records can be detected eventually if
#   the delegated domain was signed, but only after traffic was sent
#   to the (potentially) sp

[DNSOP] Comments on draft-dnsop-deleg-00.txt - abstract

2024-01-25 Thread Edward Lewis

Being clear on the motivation for taking on the development and eventual 
deployment of DELEG is important to carry this work forward.  I'll start with 
some pedantic observations about the abstract:

#Abstract
#
#   A delegation in the Domain Name System (DNS) is a mechanism that
#   enables efficient and distributed management of the DNS namespace.

"enables highly scalable distributed management of the DNS namespace."

Rationale: The term "efficient" is rather vague especially in a context like 
this one where there is no metric for efficiency. The most endearing, if you 
will, facet of the DNS protocol design is its ability to scale to massive size 
and that deserves to be called out explicitly in any effort that has as grand a 
scale as DELEG. 
#   It involves delegating authority over subdomains to specific DNS
#   servers via NS records, allowing for a hierarchical structure and

Replace servers with nameservers (and be consistent, when covering "server", 
use "nameserver" (or "name server").
Replace "NS records" with "NS resource records".  In general, use "resource 
records" to refer to the DNS data structure and "records" to the generic idea 
of grouped data.
Replace the word "and" with "by" - the hierarchy enables distribution

#   distributing the responsibility for maintaining DNS records.
#
#   An NS record contains the hostname of the nameserver for the

An NS resource record contains only the hostname of the nameserver for the

#   delegated namespace.  Any facilities of that nameserver must be

delegated subdomain.   The network address of the nameserver must be determined 
from other records, but other necessary information, such as service port 
number, assumed according to the defined protocol and thus hard coded.  As a 
result of this is that the DNS protocol definition lacks the necessary 
flexibility to adapt to changing network environments.  Any advanced 
capabilities of that nameserver must be

#   discovered through other mechanisms.  This document proposes a new
#   extensible DNS record type, DELEG, which contains additional

DNS resource record type, DELEG, to co-exist and ultimately replace the NS 
resource record, containing the necessary

#   information about the delegated namespace and the capabilities of

information about the delegated subdomain and its
#   authoritative nameservers for the delegated namespace.

authoritative nameservers for the purposes of increasing the flexibility of the 
DNS protocol.



Putting together my suggested abstract:

A delegation in the Domain Name System (DNS) is a mechanism that enables highly 
scalable distributed management of the DNS namespace.  It involves delegating 
authority over subdomains to specific DNS name servers via NS resource records 
allowing for a hierarchical structure by distributing the responsibility for 
maintaining DNS records.

An NS resource record contains only the hostname of the nameserver for the 
delegated subdomain.   The network address of the nameserver must be determined 
from other resource records, but other necessary information, such as service 
port number, assumed according to the defined protocol and thus hard coded.  As 
a result of this is that the DNS protocol lacks the necessary flexibility to 
adapt to changing network environments.  Any advanced capabilities of that 
nameserver must be discovered through other mechanisms.  This document proposes 
a new DNS resource record type, DELEG, to co-exist and ultimately replace the 
NS resource record, containing the necessary information about the delegated 
subdomain and its
authoritative nameservers for the purposes of increasing the flexibility of the 
DNS protocol.

--

I think it is important to include "scalability" as it is a key factor in the 
DNS, and that the crucial need is "flexibility" in the protocol's definition.  
The first shapes DELEG, the latter is ultimately the reason for DELEG, and 
these ought to be in the abstract.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: Automated delegation management via DDNS

2023-10-30 Thread Edward Lewis

On 10/25/23, 12:19 PM, "DNSOP on behalf of Johan Stenstam" 
 wrote:

>Of course you have to agree that “push” is a better mechanism than “pull” when 
>it comes to propagating information about a change in one end to the other end.

Perhaps pulling this entirely out of context, but I'm not sure I agree with 
that statement.

Certainly, push is more elegant than pull, but that doesn't mean it is better.

Pull is simpler, and I'd argue more in line with the flow of the DNS protocol 
than push.

The action to be taken is by the parent.  The parent is being requested to 
change its zone contents based on information at the child (much like a name 
server change).  As the parent is going to have to do something, it's more 
natural to have the parent initiate the action by finding the request and 
acting upon it.  In contrast, if a child pushes a notice to the parent, the 
parent needs to be prepared to react at that non-predictable moment.

The DNS is a downward protocol.  Parent's point to children and not the other 
way around.  This contributes to the protocol's ability to scale so well.  The 
"downwardness" is evidenced in the query process (referrals) and in the 
provisioning process (NS, DS records).  We used to have a "referral to the 
root" for lame delegations - that didn't go so well.  (This observation comes 
from failed research efforts trying to build more robust DNSSEC 
chains-of-trust.  Never could make security go back up the tree.  Upward 
components don't seem to work so well. )

NOTIFY, as it is defined today, is a lateral mechanism, used within a zone.  A 
zone's name servers tell the name in the SOA record (the MNAME, not owner name) 
that they have a new version of the zone.  NOTIFY is not a generalized 
messaging platform, not defined for inter-zone management.  Defining NOTIFY for 
inter-zone or inter-DNS-administration management would be precedent setting, 
it would represent a change to the way NOTIFY works and possibly the role it 
plays in AXFR and IXFR.

I have some hesitation in trying to shoe-horn provisioning matters into the DNS 
query-response protocol "band".  If it works, fine, but realize that the IETF 
has a different protocol defined for DNS provisioning - EPP.  That protocol is 
designed for registrar to registry traffic, not DNS operator traffic, but I 
hold it up to say provisioning is so different from publishing data about names 
that DNS provisioning has its own protocol.  Further, EPP and RDAP are entirely 
different protocols that exist to handle DNS meta-data, examples of how not 
everything the DNS needs to function can be done "in-band."

FWIW, I've never heard an operator say they can't scan their own zone's 
delegations.  I've heard a few say that scanning is easier on them than setting 
up place where registrants (via registrars if needed) can send them notices (or 
NOTIFYs).

I think it is great that zone operators can publish the "desired" DS and DNSKEY 
resource records (CDS and CDNSKEY), that is a great way to marshal the data 
from source to destination.  The gap is error reporting - what if the registry 
will not accommodate the request?  The debate between polling (scanning) and 
event driven (NOTIFY) is significant but does not address that gap.  But this 
is getting off the topic a bit.

Okay, so what I mean to say...I'm not sold that push is better than pull.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] NXDOMAIN and NoError/NoData was Re: [Ext] Compact DoE sentinel choice

2023-08-09 Thread Edward Lewis

>Note however that Cloudflare quite deliberately implemented this differential 
>behavior (to preserve NXDOMAIN visibility for pre DNSSEC clients I suspect). 
>Some other implementations of Compact DoE return a uniform (NOERROR) RCODE for 
>either case.

The trouble I have, thinking about the difference between NXDOMAIN, 
NoError/NoData, and empty non-terminals is wondering about the impact of the 
difference between them.

The reason that existence of a name is (redefined) in “The Role of Wildcards in 
the Domain Name System” (RFC 4592) is that, in classical DNS, the only time 
whether a name existed or didn’t came during the process of synthesizing a 
response.  Whenever a query for a “name, class, type” discovered there was no 
matching data, it didn’t really matter whether it was the inability to match a 
name or, when a name matched, an inability to match a data set, unless it 
became a question for whether an answer could be synthesized.  Within the 
protocol, and hence as a DNS protocol engineer, the difference between NXDOMAIN 
and NoError/NoData doesn’t seem terribly important.

I ought to stop and observe that I am reluctant to say whether a name exists or 
not, instead I qualify a name as having descendants or associated data.  The 
reason is that a non-existent name may still return a response for data while 
an existing name may not return a response for data, and this confuses the 
issue.  A non-existent name (no descendent, no data) may be matched by a source 
of synthesis (wildcard) and appear to the querier has having an answer and 
therefore, in some sense “existing.”  Meanwhile an empty non-terminal may 
appear to not exist to a querier because it has no data to return.  This is 
what makes this topic really confusing.  What’s sufficient for the DNS protocol 
is at odds with how other protocols rely upon the data in the DNS.

When I mentioned “classical DNS” I meant to exclude the “minimal queries” 
approach.  (I haven’t given minimal queries much thought.). For now, I’ll 
assume that this adequately handled elsewhere and skip this.

What I’m driving at is this is a case where, if we solve for the needs of the 
DNS protocol, problems with other applications may arise.  For the most part, 
if there is no data to match a query, it doesn’t matter if it is NXDOMAIN or 
NoError/NoData to the DNS.  The question is how does it matter for other 
applications, especially if Compact Denial of Existence changes the way things 
are now - in any direction - will it upset other applications?
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Compact DoE sentinel choice

2023-08-09 Thread Edward Lewis

>… I do want to point out that Compact DoE handles wildcards quite differently, 
>and this may not be readily apparent to the casual observer.

FWIW, I noticed.  (Not meaning to say “I’m not a casual observer” but…). This 
is something that was playing in my mind when reviewing the proposal.

In basic DNS, if a name has no records and no descendants, then the responder 
looks for a relevant source of synthesis.  Relevant means that it is “up the 
tree”, owner first label is “*” and there is no “blocking” name.  (I won’t 
define “blocking here, there’s an RFC for that.)  If a relevant source found, a 
response is cobbled from that.

In year 2004 DNSSEC, the same process is followed, but the responder will add 
DNSSEC records to support the work it is doing.  In essence, DNSSEC is 
“proving” that the responder followed the protocol faithfully in answering as 
it is.  For zones using NSEC, this means showing the query name does not have 
records or descendants (NXDOMAIN) but there is a relevant source of synthesis 
and there is no blocking name.  That can be done in one or two NSEC resource 
record sets.  For zones using NSEC3, this means showing the name doesn’t exist, 
the showing the source of synthesis exists, and then showing there is no 
blocking - the three steps may each need their own NSEC3 resource record set.

(Sorry, now I’m on a roll…)

All of this is because of the way DNS has opted to implement synthesized 
responses. It was pointed out during the writing of the “Clarifications of 
Wildcards” there are many other ways to synthesize a response, “wildcards” was 
just one, the one publicly defined, method.  This argument was from Dan 
Bernstein.  For those who recall his years of participation, there was often 
frustration in understanding his points.  He was correct in this instance.  It 
just took a long time to understand what he meant.  Instead of “wildcards are 
the way to synthesize” he wanted “wildcards are a way to synthesize”.  I’m 
including this because the point is germane here.

If a server is synthesizing responses on the fly according to other algorithms, 
then the NSEC/NSEC3 proofs do not apply.  The server isn’t following the 
“standard” synthesis method.  The server then just needs to figure out how to 
express its response.  For a positive response, the label count has to be 
“right”.  For negative responses, the server needs to decide how it wants to 
answer.  In a purely on-the-fly, synthesizing responder, there may be no 
difference in response between a name not having any records nor descendants 
and a name not having what the query is seeking.

For some part of me, I don’t understand why the original protocol distinguishes 
between name error (NXDOMAIN) and no error/no data.  It’s been said that the 
DNS matching process has two phases, matching name first and then matching 
resource records.  This is apparent in the protocols basic design.  But why it 
is this way, I don’t know.  I don’t see this as terribly significant - except 
when you try to understand how answer synthesis (“wildcards”) is done.

Based on that, I can see that Compact denial of existence handles “wildcards” 
differently.  I don’t think wildcards are integral elements of the protocol, 
but it is a defined mechanism.  A zone administrator can elect to fully 
enumerate the names in the zone, and this can be implemented either by rote 
(explicitly listing all the names in a file) or by function (fully 
synthesizing).  If the zone administrator elects either, and implements the 
protocol fully, they won’t have wildcards and no incidence of Name Error 
(NXDOMAIN).

For any application relying on NXDOMAIN - such a zone has all names “existing” 
(in some sense), there simply will never be an NXDOMAIN.  (There can be empty 
non-terminals, but if the zone is properly defined, all the leaf names - those 
without in-zone descendants - would have to own some record set.)

So…that Compact Denial of Existence handles wildcards “differently” - this is 
significant…
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Compact DoE sentinel choice

2023-08-08 Thread Edward Lewis

>Compact DoE, and RFC4470 already appear to violate it for ENT responses. And 
>it was (arguably) already violated by
>pre-computed NSEC3 (5155), where an empty non-terminal name (or rather the 
>hash of it) does solely own an
>NSEC3 record.

NSEC3 is different.  Because NSEC3 hashes the labels into a flat space, it 
hides the in-zone structure, which is something a multi-label deep zone [rather 
uncommon] would need.  The impact is that empty non-terminals must by 
represented in the NSEC3 chain to adequately prove a name does not have records 
or subordinates (NXDOMAIN).

Due to NSEC resource record exposing the full name involved, the resolver can 
infer where empty, non-terminal names exist in the zone.  This is the reason 
behind the notion that at most two NSEC resource record sets are needed to 
answer negatively, whereas up to three NSEC3 resource record sets may be needed.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Compact DoE sentinel choice

2023-08-08 Thread Edward Lewis

On Mon, Jul 31, 2023 at 11:58 AM Edward Lewis 
mailto:edward.le...@icann.org>> wrote:
>You've probably stumbled across Cloudflare's differential behavior for DO=0 vs
>DO=1 queries. With non-DNSSEC queries it provides a vanilla, unsigned
>NXDOMAIN response. With DNSSEC enabled queries, it provides the
>Compact Answer NODATA response.

Stumbled isn’t the right word - I purposely went looking for it, found it as 
had I expected.  This is what was “feared” in the section in “Protocol 
Modifications for the DNS Security Extensions” titled “Including NSEC RRs in a 
Zone“ [a.k.a. RFC 4035, 2.3] - the divergence of the unsecured and secured view 
of a zone.

Backwards compatibility was one of the chief concerns in designing DNSSEC as it 
was expected that it would take it a very long time to achieve full deployment 
- and it was anticipated that “islands of security” would emerge before 
top-down.  (I don’t think there are many “islands of security”, especially the 
way the DNS service economy has emerged this century.)

>Your 1st query probably was DO=0. For your 2nd query, I assume the recursive
>server that you used sent DO=1 queries upstream by default.

Yep.  Well, not “by default” - I diddled the DiG run time parameters to make 
sure I did that…

>Yes, this is kind of confusing, and I'm not particularly a fan of this 
>differential
>behavior.

“Confusing” situations ought to be avoided.  Confusing is a problem in 
situations when “mean time to repair” matters.

My general concern is that although things may “work” in practice today and 
there’s a desire for expediency, but the way in which this pleases or 
displeases operations will be a large factor in whether deployment is achieved.

As has been the case in other finer points of the extension definitions, the 
rule against names only having an NSEC (and RRSIG) emerged in the context of 
developing the signing process.  At the time, the prevailing winds would have 
justified preparing an NSEC resource record set for each name in the zone, 
including empty non-terminals and possible even those that did not exist (no 
data and no descendants).  I can’t think of a negative impact on a validator 
verifying that a name had only an NSEC record, but that wasn’t the concern at 
the time.

What wasn’t done was disallowing queries for NSEC, by the time NSEC3 was 
derived, this was “fixed” (meaning, explicitly barred).

Buried in here is the notion that we want to tailor the response to match the 
query.  The only time this is done in base DNS is in answer synthesis 
(wildcards) and the only field modified there is the owner field.  DNSSEC 
accommodates this (and any decrement to the TTL).  We don’t have any precedent 
in the protocol for modifying the RDATA field based on the query, and DNSSEC 
was not built anticipating that.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Compact DoE sentinel choice

2023-07-31 Thread Edward Lewis

On 7/28/23, 1:48 PM, "DNSOP on behalf of Viktor Dukhovni" 
 wrote:

>We rolled out NSEC3 by introducing new algorithm code points, and
>eventually these weere widely enough deployed to allow zones to be
>signed with 7/8/10/13/14 without being seen as insecure by a significant
>fraction of resolvers.  I don't expect CDoE to wait for the ~5 years or
>more that this would take.

"Minimally Covering NSEC Records and DNSSEC On-line Signing" is referenced in 
the Compact Denial of Existence draft, it was published in 2004 (aka RFC 4470). 
 I can't determine which internet draft led to that document so I can't tell 
when discussions on this topic began.  Suffice it to say, this has been hanging 
around a very long time - enough time for a person to be born, raised and 
graduate from public schools (~18 years).  Persuasively I'll claim that this is 
the result of trying to be pragmatic when updating a protocol.  (Meaning - 
"what's another few years"?.)

I also think that software is updated more quickly, when motivated.  That's one 
lesson from the 2018 root zone KSK roll.  But I won't concentrate on that here.

What's pragmatic for protocol engineering may not be suitable for operations.  
I'm concerned with the low deployment of DNSSEC, 25 years since the first 
meeting to spur adoption.  Having sat through years of messaging that 
"operators need to be informed" and "we need to present the business case" 
without much success has led me to think inward.  My hypothesis 
(note-hypothesis) is that DNSSEC is not (entirely) suitable for operations.  My 
theory is that we need to be driving towards a simpler protocol, and as part of 
that, we need to avoid trying to retrofit "what is needed in the world now" 
into "what was designed for the world we anticipated in 1990."

This is the reason I'm objecting to this approach.

One of my objections is that this approach will make names that are 
non-existent (per the definition in "The Role of Wildcards in the Domain Name 
System") and reply to queries with records owned by the name.  In replies 
without DNSSEC records, the response code would be NXDOMAIN and in replies with 
DNSSEC records, the answer appears to be a no error but no data response.  This 
means the zone would be seen differently depending on whether the recipient 
reads DNSSEC or not.

Another objection is in the redefining of fields.  While the implementation of 
signing and validation may be able to accommodate using "dummy resource record 
types" (such as meta types designed to be in the range 128-255), whether 
management tools will be able to keep up needs to be kept in mind as well as 
the increasing skillset needed by the operations staff (who will be called in 
when customers do not get what they expect).

E.g., while preparing this message I tried these two dig messages:

dig somename.cloudflare.com a @ns3.cloudflare.com.
and
dig somename.cloudflare.com a

The first returned NXDOMAIN, the later NoError/NoData.  If I were a human 
trying to figure out a problem with a wildcard not matching, the difference 
between these two responses would be significant.  (The reason existence is 
defined in the wildcard document is that existence matters when applying the 
synthesis rules.)

I encourage updating DNSSEC to fit into the modern world.  The result ought to 
lead towards higher adoption - by making DNSSEC a "no brainer" to deploy and 
operate.  I'm urging that this be done in the (unquantified here) right way.  I 
have my doubts that fitting new meanings into old formats is the way to go.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] what could we do with 15 unused bits of QDCOUNT?

2023-07-27 Thread Edward Lewis

On 7/26/23, 4:11 PM, "DNSOP on behalf of George Michaelson" 
 wrote:
>
>if QDCOUNT is defined as [0|1] then we have 15 new bits of freedom in
>the header.

I don't think you can repurpose them.

One concern is backwards compatibility - code in place now wouldn't understand.

The practice of repurposing fields in the header will make it harder for future 
generations of operators to debug the protocol.  Operator goals include 
avoiding outages (via capacity) and mitigating outages (when they happen).  The 
more complex the protocol becomes the harder mitigation becomes.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Compact DoE sentinel choice

2023-07-26 Thread Edward Lewis

On 7/24/23, 1:55 PM, "DNSOP on behalf of Viktor Dukhovni" 
 wrote:
>2.  That said, there are multiple ways to *distinguish* ENT vs. NXDOMAIN
>responses:
>
>a.  Sentinel RTYPE for NXDOMAIN with just NSEC + RRSIG for ENT.
>b.  Sentinel RTYPE for ENT with just NSEC + RRSIG for NXDOMAIN.
>c.  Distinct sentinel RTYPEs for both ENTs and NXDOMAIN.
>
>Presently, the draft is proposing option "a".  My point is that with "a"
>we get a response that is promising the existence of an RRset for a name
>that does not exist:

Launching off this observation (and realizing there's a whole thread following 
it), I wanted to register some discomfort with this approach.

The definition of DNSSEC in RFC 4035 contains this paragraph:

#   An NSEC record (and its associated RRSIG RRset) MUST NOT be the only
#   RRset at any particular owner name.  That is, the signing process
#   MUST NOT create NSEC or RRSIG RRs for owner name nodes that were not
#   the owner name of any RRset before the zone was signed.  The main
#   reasons for this are a desire for namespace consistency between
#   signed and unsigned versions of the same zone and a desire to reduce
#   the risk of response inconsistency in security oblivious recursive
#   name servers.

What is most significant in that text is the "desire for namespace consistency 
between signed and unsigned versions of the same zone".  With this proposal 
providing an answer that says "yes the name exists but it doesn't have what you 
want" contradicts the unsigned response that would indicate NXDOMAIN, there is 
an inconsistency in what is seen in the signed and unsigned zone.

Note: I'm not trying to say "we have to stick to the old rules", I'm trying to 
look at the environment in which the DNSSEC was born and how we went from 
concept to reality (then).

In some sense, this proposal is establishing a (set of) wildcard(s) (source[s] 
of synthesis) that owns just an NSEC record when it applies to otherwise 
NXDOMAIN responses.  Mulling this over, it becomes apparent that the next name 
field in the NSEC record is a problem - wildcards allow for the inclusion of an 
owner name pulled from the query (and DNSSEC accommodates that via the label 
count) but there is no process for modifying the RDATA in a synthesized record. 
 The lack of a process for modifying the RDATA means that "this is something 
entirely new".

 I think that signing on the fly is a great idea.  But when DNSSEC was defined, 
and specifically here the NSEC record, it was assumed that DNSSEC records would 
be generated on machines air-gapped from the network because the state of the 
art in host security was simply poor.  This forced the design to take on an 
approach of showing the querier "here's what I do have, you can deduce that 
your request has no answer (NXDOMAIN)".  With signing on the fly, that approach 
makes no sense - you should be able to send a tailored response.

A tailored response, i.e., "there's no name matching the QNAME", means there's 
no need to mention the next name.  This would be great - no need to sort the 
zone, no need to assist zone walking, etc.  The NSEC record is just not built 
for that though, it's an entirely ill-fit.

We have the NSEC record, it's implemented and deployed.  (I'm just not 
considering NSEC3 as it's similar in approach despite it working on hashes and 
not names.)  Rolling out a new approach (say "NSEC17") would be an uphill 
battle (although we did roll out NSEC3...), assumed to be more work that 
force-fitting on-the-fly signing into NSEC.

But I'm not so sure - because of the potential problems with inconsistencies 
introduced.

I'm not saying "this can't work" - I'm raising a concern...and hoping some 
thought/work is done to come up with a new approach that is built to support 
the modern world.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Early comments on https://www.ietf.org/archive/id/draft-thomassen-dnsop-generalized-dns-notify-01.txt

2023-06-22 Thread Edward Lewis

After a quick read of Generalized DNS Notifications, -01, I have some comments:

It would be ludicrous of me to argue against the notion that event driven 
approaches are superior to polling approaches.  However, event driven 
approaches require more design work which is why it is natural for polling 
approaches to normally lead the way until they break.

In section 3, there is this phrase which caught my eye: "An increasing number 
of TLD registries are implementing periodic scanning for CDS and CDNSKEY 
records in child zones."  Operators are choosing and deploying the scanning 
approach.  Perhaps because it is the only option, nevertheless, the early 
deployments are scanning.  At a recent event, I heard a few measurements of the 
scanning approach, done with skepticism regarding the ability of such an 
approach to scale, but none of the measurements pointed to a bottleneck.  There 
was even an off-hand comment that perhaps scanning isn't scary at all.

I don't doubt the logic that is present in section 3 leading to the conclusion 
that there's a need to develop an event-driven approach.  In fact, I'm a fan of 
the logic.  However, there's no data to back up the claims that the polling 
approach won't work.  It would be good to have that included in the document to 
establish the need for the NOTIFY approach.

And why establish the need?  Deep in the nature of the DNS is the notion that 
parents know about children, but not vice versa.  In the DNS, delegations - 
both name server (NS) and security (DS/DNSKEY) - to date point downward.  
Nothing points upward.  CNAME and DNAME are query-rewriting tools, not 
delegation tools, I'm excluding them.

The historical architecture of the DNS is hostile to the idea of an 
event-driven approach - that's my fear.

NOTIFY, as it is in use today does not cross zones, it works only within the 
set of nameservers that a zone administrator has configured for a zone.  
"Also-notify" is a static configuration option available in implementations but 
being a configuration plane feature is not evident or supported in the standard 
protocol.  NOTIFY exists in a pool of familiar servers, all participants are 
managed by one entity or via an out of band arrangement.  It does not challenge 
the DNS "downward" architecture.

Using NOTIFY in another context may prove to be a significant change.  Perhaps 
the resource record format is general enough, but how a recipient would respond 
to the resource record would be different.  This is why I'm not greatly 
encouraged by the observation that we already have a record defined although 
that helps.

Using NOTIFY from a child to parent to trigger a CDS, CDNSKEY, CSYNC action 
makes sense, but the context is novel (for NOTIFY).  The message is used to 
cross administrative boundaries, upward even.  Mentioned earlier, in the DNS 
children don't talk up to the parent (easily), so a few things are needed.  One 
is the proposed NOTIFY record to tell the child where to send the notification 
query.  The other is figuring out how to set up a receiving server at the 
parent that is not a new burden on the zone administration.  This latter item 
concerns me the most as adding more modules to operations is a burden unless 
this can be adequately automated, buried in code, to the point it has no 
operational knobs an operator needs to manage and track.  (And there's always 
that DDoS/Firewall hole punching/traffic engineering challenge.)

Using NOTIFY from one operator for a zone administration to an independent 
other operator (aka multi-signer) is another novel environment.  In this case 
it is not shaped by a child to parent situation, but it exists in a potentially 
flat, out-of-band defined space.  I usually hear of multi-signer featuring two 
operators, but there could be more.  E.g., a TLD might decide to have a 
different signer for each of their half-dozen or so anycast name server 
contracted operators. There are quite a few design considerations for this.

Another way is to classify NOTIFY today as a "1:me" protocol, from one of my 
elements to another element by me.  From child to parent, it is "many:1" - a 
parent has many children (the many) with all the children trying to hit the 1 
parent.  Between co-operators of a zone, I'm not sure how to put that into a 
category of "1:1" or "1:many" or "many:1" protocol, but it's different.  I 
suppose it's 1:1 but neither of these might be the zone administrator, which is 
a problem.

Struggling to define the latter, here's what I'm concerned with.  When you have 
an administrator who contracts to two, or more, operators for service and there 
is a fault, how is the fault handled?  I.e., the error handling will need 
attention.

NOTIFY now, when it breaks, it's all within one administration to heal.  From 
child to parent, you could (perhaps) define fault handling in the registration 
agreement.  But between co-operators for one zone, it's harder to manage.  This 
puts the onus on the proto

Re: [DNSOP] [Ext] Coming soon: WG interim meeting on the definition of "lame delegation"

2023-06-22 Thread Edward Lewis

On 6/21/23, 4:46 PM, "DNSOP on behalf of Robert Edmonds" 
 wrote:

>"In-bailiwick" vs. "out-of-bailiwick"

I think the topic is no longer important.  But I'll explain why I brought up 
"bailiwick" in this context.

Bailiwick, according to a (non-technical/natural language dictionary, such as 
Merrian-Webster) means:
1) one's sphere of operations or particular area of interest.
2) the district or jurisdiction of a bailie or bailiff.

When a query arrives at a nameserver, one of the early steps is to:
(Copied from "Domain Concepts and Facilities" [RFC 1034], section Algorithm 
[4.3.2])
   2. Search the available zones for the zone which is the nearest
  ancestor to QNAME.  If such a zone is found, go to step 3,
  otherwise step 4.

My use of bailiwick comes from the "sphere of operations" mapping to "the 
available zones" in the nameserver.

However, after the discussion in the interim meeting, I don't think there's any 
need to "replace" lame delegation with anything as the situation I've seen it 
used in no longer is a topic of discussion, except when we are dredging up 
history for the sake of history.


___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Coming soon: WG interim meeting on the definition of "lame delegation"

2023-06-20 Thread Edward Lewis

I’ve just come across this message (I have been out a bit recently)…sorry is 
this is late.

These are suggestions…

For the situation where a (an active) nameserver is not configured to answer a 
query it received (which is the case where my use of lame delegation comes 
from), I’d suggest the following more accurate labels:

“out of bailiwick query” - from the perspective of the server, the query is 
something it can’t answer
“incorrect referral” - from the perspective of the recipient of the answer (= 
the querier, hopefully) as it was told to go there by some other party (the 
parent), but it’s a dead end.

For the situation where there is a problem related to the delegation of a 
domain to a set of nameservers:

“incorrect delegation” or “malformed delegation” or perhaps 
“some-other-adjective delegation” - a third party view of a situation stated by 
one party (the parent zone file) and refuted by another party (the collective 
landing points of the referral).  The latter parenthesized comment is meant to 
include IP addresses that are not actively hosting something answering on port 
53 - in addition to nameservers experiencing “out of bailiwick” queries.

From: DNSOP  on behalf of Paul Hoffman 

Date: Saturday, June 17, 2023 at 5:00 PM
To: dnsop 
Subject: [Ext] [DNSOP] Coming soon: WG interim meeting on the definition of 
"lame delegation"

Greetings again. A bunch of you have opinions in this area. In advance of our 
WG interim meeting on Tuesday, it would be grand if people with opinions would 
review those opinions and review the threads on the list where other peoples' 
opinions were expressed. This will make our time together in the interim 
meeting more valuable.

FWIW, I'm glad I'm not the one who will be deciding consensus here; the chairs 
will be.

--Paul Hoffman

Begin forwarded message:

From: IESG Secretary 
Subject: [Ext] [DNSOP] Domain Name System Operations (dnsop) WG Virtual 
Meeting: 2023-06-20
Date: June 6, 2023 at 7:19:23 AM PDT
To: "IETF-Announce" 
Cc: dnsop@ietf.org

The Domain Name System Operations (dnsop) WG will hold a virtual interim
meeting on 2023-06-20 from 20:00 to 21:00 Europe/Amsterdam (18:00 to 19:00
UTC).

Agenda:
## Agenda

### Administrivia

* Agenda Bashing, Blue Sheets, etc,  5 min


### Current Working Group Business

*   DNS Terminology and definition "lame delegation"
   - https://datatracker.ietf.org/doc/draft-ietf-dnsop-rfc8499bis/
   - WG Chairs, Paul Hoffman and Kazunori Fujiwara, 55 min
   - Chairs Action:


Information about remote participation:
https://meetings.conf.meetecho.com/interim/?short=7e86a1ea-71f7-40c0-8c34-eb16a1a57a6e



--
A calendar subscription for all dnsop meetings is available at
https://datatracker.ietf.org/meeting/upcoming.ics?show=dnsop

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: Coming soon: WG interim meeting on the definition of "lame delegation"

2023-06-20 Thread Edward Lewis

From: DNSOP  on behalf of Vladimír Čunát 

Date: Tuesday, June 20, 2023 at 6:01 AM
To: "dnsop@ietf.org" 
Subject: [Ext] Re: [DNSOP] Coming soon: WG interim meeting on the definition of 
"lame delegation"

>On 19/06/2023 17.00, Masataka Ohta wrote:
>>I can't see any problem with "lame" delegation than a "secondary"
or "slave" server, because of the history of racial discrimination
in US.

>Honestly, I'm personally still failing understand the problem of using 
>slightly offending word when referring to a machine (e.g. "slave" or "lame").

I sympathize, but when communicating, there are three elements - the sender, 
the medium, and the recipient.  Even if the sender doesn’t see a term as 
problematic, the recipient might, and that can hamper the communication.  As 
the word about the technology with which we surround ourselves spills out into 
other communities, it’s good to shake off our jargon so that others may 
understand, accept, listen, and learn what is necessary.

The “old labels” may have been arbitrarily applied and, unless you’ve walked 
the path for a long time, the terms are not accurately descriptive.  In this 
case, that there are multiple meanings to “lame delegation” tell me that it is 
time to have a more precise labelling, or we will continue to confuse 
ourselves.  In an earlier message, what I experienced as “lame” was the 
situation where the query seen by a server was one that the server had no 
answer.  “Lame” isn’t all that descriptive, whether or not some may see it as 
an insulting term.  (I’ll leave my soft peddled suggestions for the other 
message. 😉 )

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: rfc8499bis: lame

2023-06-12 Thread Edward Lewis

On 6/8/23, 11:23 PM, "DNSOP on behalf of Bob Bownes -Seiri" 
 wrote:

>I would posit that the potential to view the word as offensive has increased 
>as language usage has changed in the intervening years since it was first used 
>in this context. 

Researching a now-abandoned draft on the origin of domain names, I struggled to 
find dictionary definition of 'resolve' that matched what we now call DNS 
resolution.  In the early IEN and RFC (Internet Engineering Notes and Request 
for Comments), the first uses of 'resolve' were in the context of a group of 
people deciding on a path forward.  As in "the committee resolved to 
investigate..."

It wasn't until I asked the authors of one of the old RFCs (I now forget which 
one) where the term 'resolve' began to mean mapping a name to a network 
address.  The answer was 'from the field of compiler design.'  As in resolving 
a variable name to a memory location.  In hindsight, this was obvious but 
trying to go from dictionary definitions and common use then and now, I didn't 
see the link.

As far as 'lame' - besides the term sliding from being an objective assessment 
to a derogatory term as time goes by, it's meaning in the DNS context is not 
clear.   The use I am familiar with covers a server's response to a query for a 
name for which the server has absolutely no information, as opposed to looking 
at a delegation set which has 'issues.'  Both of those deserve terms and 
different ones as they are different situations.

I'm not sure the case of a server receiving a query for which it has no 
information is very important anymore.  Servers now will return either SERVFAIL 
or REFUSED for it and the operationally impacting situation I was working has 
been mitigated by this.  However, I have seen a situation when earnest traffic 
(not DDoS flood) has been sent to a server that was not (yet) configured for a 
zone.  But this happened once and was taken care of locally once the sender of 
the traffic realized what they were doing (or hadn't done). Perhaps the use of 
'lame' for this can be left in the dustbin, like so many other objects we no 
longer use in life.  Perhaps the queries are just 'out of bailiwick' queries 
relative to the server.

As far as assessing the health of a parent to child delegation, I'll leave 
terminology about that to those who work that area.  Broken, damaged, etc., but 
I bet that just about any descriptive term today may drift into other meanings 
as languages evolve.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Delegation acceptance checks [was: Re: [Ext] WGLC rfc8499bis one week extension for lame delegation definition]

2023-05-12 Thread Edward Lewis

On 5/11/23, 7:30 PM, "DNSOP on behalf of Mark Andrews"  wrote:
>
>It’s not a challenge to track what is lame.  It’s dead simple.  You just 
> have to look.  Getting
>it addressed is the challenge.

Speaking from experience (which means I'm about to launch an amplification 
attack here: taking a short message and adding in stories from the past few 
decades in this area), this is very true.  Using the analogy of observing 
symptoms, diagnosing cause, prescribing a fix, and following through, it's easy 
to tell when someone coughs - but, if the cause happens to be from a personal 
habit, very difficult to mitigate.

In following this discussion, admitting that I have no idea what "rfc8499bis" 
is (not a title, not a document file name, not a link), I ought to throw in 
this question: "Why do broken delegations (lame, unreachable address, etc.) 
matter?"  So in some sense I'm committing an IETF sin.

In my experience with lameness, the problem was rather specific.  In that era, 
a server, given a question it could not answer would refer the querier to the 
root zone - perhaps as some sort of joke initially.  The trouble was that some 
resolvers were not in on the joke (and I bet there was no technical document 
specifying what an "upwards referral" signified) but it turned out to be pretty 
easy to fix the problematic resolvers.  (It was one major, proprietary source 
who, surprisingly to many in the open source fandom, actually fixed their bug.) 
 I'm not sure my work in quantifying the amount of lameness ever mattered as 
the eradication process undertaken seemed to overcome by the fix of the 
resolvers (nevertheless, it was pretty interesting research to conduct).

(This is the central thought of this rant:)
It's true that if a registrant misconfigures their delegation or servers, their 
service will suffer.  But does this have fallout for anyone other than their 
service users?  Other than researchers who poke into this stuff (like me)?  
Does it impact the registry delegating to the registrant?

From other experience, I once dove into an incident (details I can't divulge).  
One of the things I identified was the source of the queries for a particular 
DNSSEC-related data set.  In the top-ten queries was a "labs" machine - a 
research organization was "pounding" the servers for this data set to the 
extent they were a noticeable portion of the incoming traffic.  At times, 
research is not measuring traffic but becoming the traffic.

And from another experience, I once had to deal with a customer who had a zone 
delegated to the servers that we operated but neglected to tell us.  (The very 
picture of lameness, completely unrelated to my earlier lameness 
quantification.)  Our monitors did not know the incoming queries were headed 
for one of our current customer's zone as we didn't know the zone was theirs.  
We thought it was simply DDoS-related traffic.  A factor here is that the 
operations I am talking about were not a TLD of any kind (hence the last label 
was not tell-tale).

And yet another experience, I've dealt with situations where a major change was 
being proposed but those that needed to play along had absolutely no 
relationship with us.  The Internet encourages people to plug in and play 
without being subject to remote monitoring.  At times that is very good.  At 
times that is very frustrating.  What I learned from that was no one can set 
expectations on what someone else will do on the Internet.  One can't expect 
protocol conformance from an entity with which you have no relationship, but 
you need to be prepared to deal with whatever comes (a take on "liberal accept, 
conservative send" - that adage attributed to Jon Postel).

It's fine to quantify situations.  It's fine to launch campaigns to improve the 
"health&safety" of the Internet and fine to measure the impact of the 
campaigns.  But you can never expect to see "success".  This is really 
frustrating when changes (including clarifications) are sorely needed to 
improve the state of operations across the global public Internet.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] WGLC rfc8499bis one week extension for lame delegation definition

2023-05-05 Thread Edward Lewis

On 5/4/23, 5:08 AM, "DNSOP on behalf of Mark Delany"  wrote:
>
>I have one last question. Regardless of whether we agree precisely on what 
> "lame" means,
>what is the call to action when a zone or its name servers are declared 
> lame?
>
>And how is that different from any other form of miscreant auth behaviour 
> such as
>inconsistency?

At the time when I was working on lame delegations, I had a specific purpose - 
identify where in the name tree "upwards referrals" were being sent.  At the 
time, there was some importance to this, I presented at a number of conferences 
and was dubbed "Mr. Lame" for a while.  But the importance of this faded 
quickly as the buggy implementations mishandling upwards referrals were fixed.  
That's about it - and my 15 minutes of lame fame.

>I mean if "lame" is a precious historical term that warrants considered 
> clarification,
>surely it has a very specific value that we can all act on, right? So what 
> is that
>very specific value?

I'll agree that it seems odd that the term "lame delegation" is getting a lot 
of attention now.  In the scheme of things, it's just one more arcane element 
in the DNS landscape, a landscape littered by misnomers, archaic-references, 
multiple-meaning terms, and things that aren't "things" anymore.

I do think that consistency is important so long as there are old RFC documents 
and other materials laying around.  If a term takes on a new meaning that is 
fine, but a reader trawling through the Internet Engineering Notes or RFCs with 
only 1, 2, or 3-digit numbers, ought to be able to make sense of the old words. 
 It's not the sacredness of the old terms, but the need to still be able to 
read the documents.  Outside of that, I'm a bit surprised that I'm bothering to 
spend any time typing about lameness anymore. __

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] WGLC rfc8499bis one week extension for lame delegation definition

2023-05-03 Thread Edward Lewis

On 5/1/23, 12:43 PM, "DNSOP on behalf of Wessels, Duane" 

wrote:

>My preferred definition is the one originally given by Paul Vixie, amended 
> by myself, and further amended by Peter Thomassen:
>
>A lame delegation is said to exist when one or more authoritative
>servers designated by the delegating NS rrset or by the child's apex NS
>rrset answers non-authoritatively for a zone.

The trouble I have with this definition is that servers don't "answer ... for a 
zone", they answer specific queries.

Plus, the adjective "authoritative" is redundant, as " designated by the 
delegating NS rrset or by the child's apex NS rrset" includes all authoritative 
servers (and then some, if you don’t include a parent NS name not in the child 
NS name as authortitative).

And, as DNS data is constantly changing, what's in or out of an NS set or 
authoritatively answered may change from moment to moment (so I add 'assumed' 
below):

A lame delegation is said to exist when a server assumed (by the querier) to be 
authoritative for a zone replies non-authoritatively for {any|all} data within 
the zone.

1) Answering authoritatively means that the answer section matches the query 
and the AUTHORITATIVE ANSWER bit is properly set - this ought to be in its own 
definition.

2) A server may be assumed to be authoritative for a zone if the server is 
listed in a current NS set for the zone, whether that set is published by the 
delegating zone at a cut point or by the zone itself at its apex. This also 
should be a separate definition. ...The undefined term in that is "current" - 
meaning - a NS set that is still within the TTL upon arrival...

3) {any|all} open question...can a server be "partially lame?"

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] WGLC rfc8499bis one week extension for lame delegation definition

2023-05-03 Thread Edward Lewis

On 5/1/23, 4:25 PM, "DNSOP on behalf of Mark Delany"  wrote:

>On 01May23, John Kristoff apparently wrote:
>> (usually due to a bad configuration)
>
>Was any "lame" situation defined which wasn't the result of a bad 
> configuration?

The difference between observing a symptom and diagnosing a cause is great.  I 
say this to caution against tying the "why it is" with "what it is."

(E.g., a lame delegation may have been set up to test software's reaction to a 
lame delegation, like seeing zone data mis-signed for a DNSSEC test.)

Further, prescribing a remedy, even when a symptom is identified and a 
diagnosis nailing the cause, can still be tricky.

In defining what the term "lame delegation" means, stick to the situation as 
observed, without guessing why or the fix.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] WGLC rfc8499bis one week extension for lame delegation definition

2023-05-03 Thread Edward Lewis

On 5/1/23, 12:58 PM, "DNSOP on behalf of John Kristoff"  wrote:

On Mon, 1 May 2023 16:09:23 +
Paul Hoffman  wrote:

> It would be grand if a bunch more people would speak up on this
> thread.

I'm not particularly satisfied with the requirement that there must be
a response to meet the definition, but that seems to be the consensus
even if most seem to agree it is imperfect.  I won't derail.  Until
someone comes up with better terminology, I'm likely still going to
refer to all those many cases we see in operation (usually due to a bad
configuration) as a form of lame delegation when a delegation is
effectively broken. :-)

When there is a timeout situation, there can be no conclusion about the remote 
end's status.

It could be that the remote end is properly set up to answer for a zone, but 
queries to the server are dropped on the way there.  Or responses dropped on 
the way back.  Or that the timeout is simple too quick.  The timeout may have 
nothing at all to do with the remote end.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Meaning of lame delegation

2023-04-18 Thread Edward Lewis

On 4/17/23, 5:18 PM, "DNSOP on behalf of Wes Hardaker"  wrote:

>I'm not saying that some people don't understand it.  It's just a weird
>english choice that we're sticking with because of history.  ...

There are lots of "weird English choices" in play.  Consistency is most 
important, especially with documents explaining how any word is used.  Even if 
operators don't go reading RFCs...someone else (code developers) will, and that 
someone will benefit from consistency.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Meaning of lame delegation

2023-04-17 Thread Edward Lewis

On 4/3/23, 4:02 PM, "DNSOP on behalf of Wessels, Duane"  wrote:
>
>(1) NS.EXAMPLE.ORG resolves to an IP address.  Queries to the IP address 
> result in a REFUSED, SERVFAIL, upward referral, or some other indication the 
> name server is not configured to serve the zone.
>
>(2) NS.EXAMPLE.ORG resolves to an IP address.  Queries to the IP address 
> do not elicit a response (e.g., timeout).
>
>(3) NS.EXAMPLE.ORG does not resolve to an IP address, so there is nowhere 
> to send a query.

In 2003+/-1 I worked on a project to identify lame delegations in response to a 
then bug-in-a-popular-resolver that followed upwards referrals to the point 
that it would no longer answer any other queries.  "Lame" meant that the 
nameserver indicated in an NS resource record answered (i.e., responded) 
explicitly that it was not authoritative for the subzone, at the time, this was 
indicated only by the upwards referral.  The SERVFAIL and REFUSED responses 
came later in history, partly as a response to amplification concerns.

While this may not be the fully correct answer, I was taught that a lame 
delegation is "singular" (i.e., not applied to a set) and "active".  Afterall, 
the problem I was to help solve was the upwards referral issue (by eliminating 
lame delegations in a portion of the DNS tree), which may bias what I was 
taught.

Singular - in the sense that the label "lame" referred to individual instances 
named in a NS resource record set and not the resource record set in its 
entirety.

Active - in the sense that lameness was judged by a response, not a timeout.  
Timeouts were ruled out( - as they didn't suffer the upwards referral issue I 
was supposed to address).

So, only #1 would qualify for lame under that usage.  #1,#2,#3 would make a 
delegation "broken" (unofficial term)...

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Working Group Last Call for aft-ietf-dnsop-dnssec-bcp

2022-08-02 Thread Edward Lewis

On 8/2/22, 11:02 AM, "DNSOP on behalf of Paul Hoffman"  wrote:
>I would rather mention NSEC/NSEC3 so the reader gets an idea for the mechanism 
>in RFC 8198. I left off NSEC3 because I thought that basically all use of 
>NSEC3 was with opt-out, but if I'm wrong, I could put it in the text.

Just as a data point, there are two gTLDs over 1 million delegations that use 
NSEC3 / no-opt out.

I have no data on ccTLDs, nor lower in the namespace.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] The DO bit - was Re: [Ext] Re: signing parent-side NS (was: Re: Updating RFC 7344 for cross-NS consistency)

2022-07-30 Thread Edward Lewis

On 7/29/22, 10:49 AM, "DNSOP on behalf of Paul Wouters"  wrote:
> I would have expected (and have taught) that this was by design to not 
> disrupt systems with new data unless we knew they were ready for it. I didn’t 
> realize we first tried to do it without that 😀

This response made me think a bit - besides the early DNSSEC issue, there have 
been other times when we-collectively did something that should have been a 
no-brainer but were surprised.  After the root zone KSK rollover, during the 
period where the old key appeared as revoked, there was a concerning rise of 
queries.  Once the revoking record was pulled, the queries abated [lessened].  
Note: I made sure my memory of this coincided with Wes H and Duane W.  As the 
situation passed, I don't recall any published study definitively diagnosing 
the cause although some work may have led to a likely culprit.  I'll put a plug 
here for this paper: https://www.isi.edu/~hardaker/papers/2019-10-ksk-roll.pdf.

I don't think is possible to achieve the point where any change can be made 
avoiding unpredicted repercussions [responses].  The operational state of the 
system has grown much too complex.  Obscure code paths, old versions still 
running, other home-crafted code all contributes to the randomness.  We can 
only hope to contain operational impacts and have good roll back plans in place.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] The DO bit - was Re: [Ext] Re: signing parent-side NS (was: Re: Updating RFC 7344 for cross-NS consistency)

2022-07-29 Thread Edward Lewis

On 7/29/22, 3:53 AM, "Petr Špaček"  wrote:
>By any chance, do you remember in what iteration the DO=1 in query was 
>introduced? I wonder what sort of disruption was anticipated/feared.
>
>In hindsight is seems that DO=1 requirement for "new" behavior (like, 
>say, adding RRSIG to delegations sent from the parent zone) could be 
> enough.

There was a specific incident, I don't recall the year, but it was in a later 
iteration.

DNSSEC's code development was carried out by a small contractor to the US 
government, physically located in a farm-like setting about an hour's drive 
from any city (providing a sense of isolation).  With the company's willingness 
to take on technical risk, DNSSEC had progressed to the point where we decided 
to put it into production, signing our corporate zone.

Everything seemed to be fine.  No one was able to verify the signatures as 
there were no trust anchor points set, but the records would be included in 
responses.

On the third(*) day, one of the principal investigators (project leads) 
realized she hadn't been getting mail from the government contracting offices 
(who were paying for DNSSEC and other projects).  It seemed no other principal 
investigator had received mail either.  A call went to the contracting offices, 
it was discovered that the government's name servers were rejecting our 
DNSSEC-signed responses.  The mail they needed to send us was "dropping on the 
floor" at their end.

All involved were highly sympathetic to the situation, so we initially rolled 
back, mail resumed, and the DO bit was invented (and eventually documented in 
https://www.rfc-editor.org/rfc/rfc3225.html).

* Well, I recall "3" being the number of days.  It was definitely between 1 and 
5...

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: signing parent-side NS (was: Re: Updating RFC 7344 for cross-NS consistency)

2022-07-28 Thread Edward Lewis

On 7/26/22, 3:05 PM, "DNSOP on behalf of Petr Špaček"  wrote:

>Interesting history lesson, thank you.
>Can you elaborate on
> > therefore only one can be signed.
>please?

>What is the reasoning behind it?

There were a few iterations in the development of DNSSEC.  RFC 4033-4035 are 
the third iteration.  Part of the "reason" is that the DNSSEC definition 
evolved over a period of years.

In the first two iterations, the rules for signing (or not) the cut points were 
set.  NS and glue, carrying information "reported" to the parent were not 
"from" the parent, hence not signed.  The NSEC (and later NSEC3) record did 
indicate the presence/absence of a zone cut as the presence of the cut was 
determined by the parent.  This design was deemed to be the most 
backwards-compatible approach (anticipating it would be a very long road to 
adoption).  FWIW, these iterations toyed with having a key set from the child 
up or something from the parent sent down, none it worked.

The DS record was added in the third(?maybe the count is different to others) 
iteration.  Although it contains a hash of what is reported to it by the child 
it is signed.  This is in some sense historically inconsistent.  It was felt 
that the signature here was needed, there had to be some signed statement from 
the parent to an iterator as it left for the child.  Given the DS was "new" 
there was no backwards compatibility to be maintained, although having this 
record be authoritative above the cut (well, so was the NSEC/3) was new - yet 
only seen when "doing" DNSSEC.

There was never any sympathy for signing the parent-side NS set at the time.  
It wouldn't add to the security goals of DNSSEC and potentially lead to 
confusing case - when the NS sets are out of alignment, which happens when name 
servers are changed (or where someone makes a mistake).  The decision to leave 
the parent-side NS set unsigned was never completely accepted, there were many 
thoughts on "fixing" the delegation in the DNS.  But doing so was thought to be 
too disruptive the current running system.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: draft-ietf-dnsop-algorithm-update

2019-04-15 Thread Edward Lewis

A few follow ups:

On 4/14/19, 22:35, "DNSOP on behalf of Mark Andrews"  wrote:

>You don’t publish DS records (or trust anchors) for a algorithm until the 
>incoherent state is resolved (incremental signing with the new algorithm is 
>complete).

While that makes sense, the protocol can't (not simply doesn't) forbid it.  The 
publisher of the DS resource record set may be a different entity than the 
publisher of the corresponding DNSKEY resource record set.  Because of the 
possibility of misalignment, the protocol as to be specific in order to be 
robust.

>You can only check if all records are signed with a given algorithm by 
>performing a transfer of a zone and analysing that.  There is no way to do it 
>with individual queries.

The historic error involved a resolver, upon receipt of a response, declaring a 
data set invalid when the set of RRSIG resource records did not cover all the 
DNSSEC security algorithms that the rules for zone signing specified, as 
opposed to validating the data set in question because there were sufficient 
records to build a secure chain.

>As for the original question, if all the DNSKEYs for a algorithm are revoked I 
>would only be signing the DNSKEY RRset with that algorithm.

This makes complete sense, but is not in-line with the letter of the protocol's 
rules.  That's the issue.

The consequence of following the protocol's current rules is a lot of 
deadweight.  Namely, unusable RRSIG resource records sent in each reply of 
authoritative data just to include the DNSSEC security algorithm.  The 
signatures need not make mathematic sense - as no one would need to validate 
them - with one exception. Where ever there is a division of key 
responsibilities such as having one organization manage the KSK and a different 
manage the ZSK, a ZSK may be "forced" to exist by rule and operational 
configuration.

(Removed the remainder of the thread history...)

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: draft-ietf-dnsop-algorithm-update

2019-04-12 Thread Edward Lewis

I've been inactive a long time, but someone alerted me to this message.
(Apologies what below looks like it's from a ranting lunatic.  But it is.)

On 4/12/19, 11:31, "DNSOP on behalf of Mark Andrews"  wrote:

Well given that the actual rule is all the algorithms listed in the DS RRset
rather than DNSKEY RRset and is designed to ensure that there is always a
signature present for each of the algorithms that could be used be used to
declare that the child zone is treated as secure, the answer is NO.

Mark

Looking at "Protocol Modifications for the DNS Security Extensions" (aka RFC 
4035):
...
2.  Zone Signing
...
2.2.  Including RRSIG RRs in a Zone
...
   There MUST be an RRSIG for each RRset using at least one DNSKEY of
   each algorithm in the zone apex DNSKEY RRset.  The apex DNSKEY RRset
   itself MUST be signed by each algorithm appearing in the DS RRset
   located at the delegating parent (if any).

From this I believe Mark's words are incorrect.  What I read is that the 
determining factor is what is in a zone's DNSKEY resource record set and that 
the set must be signed by a key of each of the algorithms in the DS set.  This 
allows the child administrator to have DNSKEY records for DNSSEC security 
algorithms other than those represented in the parent's DS resource record set 
and does accommodate other signatures covering the DNSKEY record set.

Historically there's been great confusion over this passage, partly because the 
context is usually missed.  This rule governs the actions of signing and 
serving a zone, not validating a zone.  The purpose of the rule is to allow a 
validator to "know" deterministically what signatures ought to be present for 
data.  This is needed to mitigate a downgrade attack enacted by simply 
filtering signature records.  The validator need not check to make sure all 
"required" signatures are available, it should be content with anything that 
works.  I.e., be aggressive in declaring success in the validation process to 
combat the brittleness introduced by DNSSEC.  (This is intentional, not an 
afterthought.)

Imagine one is descending the tree, and determines that the parent's zone data 
is signed by DNSSEC.  Taking the next step downward toward the desired data, 
the DS resource record set indicates how the child is secured.  Proven absence 
of a DS resource record set means the parent considers the child is unsigned 
(whether the child is or not).

One ought to imagine that when these rules were written, it was assumed 
validators taking these steps might not "know" all of the DNSSEC security 
algorithms.  I.e., a validator might know '8' but not '13'.  If a parent used 
'8' to sign, and the DS resource record set indicated that only '13' was used 
by the child (the DS resource record set itself signed by '8'), then the 
validator would treat the child as unsigned.  But if there were an '8' in 
there, then the child's DNSKEY set would have to be signed with '8' so the 
chain would continue.

It's possible that a child may have DNSKEY resource records for DNSSEC security 
algorithms not supported by the parent operator.  A child operator may have 
arranged to have trust anchors in all relying validators out of band to make 
this "still work."  This is the reason the determination is from the zone's 
apex DNSKEY resource record set and not the parent's DS resource record set.

Remember a child is "delegated" responsibility for a domain.  The parent only 
"gives it life".  What a child zone says about itself rules, local policy and 
all that.  A child may be under a non-DNSSEC parent and still practice DNSSEC 
with the validators it has out-of-band contact with.

> On 13 Apr 2019, at 1:05 am, Michael StJohns  
wrote:
> 
> Hi -
> 
> I had someone ask me (last night!!) whether or not the "must sign each 
RRSet with all of the algorithms in the DNSKEY RRSet" rule applies if the only 
key with algorithm A in the RRSet has the revoke bit set.  A question I had 
never previously considered.
> 
> Given that you can't trace trust through that revoked key, and any RRSig 
originated by that key is just extraneous bits, I came to three conclusions:  
1) A key must not be counted for the purposes of the rule if it has the 
(RFC5011) revoke bit set, (s) the only RRSigs created by a revoked key are over 
the DNSKEY RRSet and 3) it's possible/probable that interpretations could 
differ.
> 
> I tagged this email with the algorithm update ID/RFC candidate because 
about the only time you're going to see a revoked singleton key of a given 
algorithm is when you're transitioning the algorithms for the zone.
> 
> I hesitate to ask - and apologize for asking given the late date for this 
document, but should the statements (1) and (2) above or something similar be 
included in this document for completeness?
> 
> Alternatively, what breaks if publishers omit the extraneous signatures 
just because?
>

Re: [DNSOP] [Ext] Re: Last Call: (DNS Terminology) to Best Current Practice

2018-08-14 Thread Edward Lewis

Here we go with a thread on the set of Domain Names being a superset of host 
names again. ;)

On 8/14/18, 09:09, "DNSOP on behalf of Tony Finch"  wrote:

Viktor Dukhovni  wrote:
>
> Indeed in a non-public network, I'm free to provision a
> ".1" TLD, and even create hosts as sub-domains of this name:

This would break a (non-normative) promise in RFC 1123

   However, a valid host name can never
   have the dotted-decimal form #.#.#.#, since at least the
   highest-level component label will be alphabetic.

Tony.
-- 
f.anthony.n.finch
https://urldefense.proofpoint.com/v2/url?u=http-3A__dotat.at_&d=DwICAg&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=9G8-4P__AMgxNOQPiu7FkrImeieALKYtfGBE8UTuyg4&m=QjsPxg2u_w5HwqTk95P5eTu9RA87DRX8Va9nquK7g2I&s=NPniHZyF7bhbM8JRfkvKZTPrUgRkCZqoRFFeyAJKpe0&e=
North Utsire, South Utsire: Variable, becoming southerly 3 or 4, 
occasionally
5 later. Smooth or slight. Rain later. Good, occasionally poor later.

___
DNSOP mailing list
DNSOP@ietf.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_dnsop&d=DwICAg&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=9G8-4P__AMgxNOQPiu7FkrImeieALKYtfGBE8UTuyg4&m=QjsPxg2u_w5HwqTk95P5eTu9RA87DRX8Va9nquK7g2I&s=l4QjBQfu1QpSZggF3IGNFfaWiORLQGuabec4DZAQEb8&e=


___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: Comments on draft-wessels-dns-zone-digest-02

2018-08-14 Thread Edward Lewis

My reason for replying to this thead is to say that “we have other solutions in 
place for this, why one more?”

On 8/13/18, 16:43, "DNSOP on behalf of Brian Dickson" 
mailto:dnsop-boun...@ietf.org> on behalf of 
brian.peter.dick...@gmail.com> wro

>While it is easy to misunderstand what Duane is referencing, or perhaps there 
>was some minimization on his part as well, there is a weakness caused by the 
>unsigned nature of delegations, whereby not protecting (e.g. via zonemd) a 
>publication point against a host of vulnerabilities, by protecting the data 
>itself, creates a very attractive target that lets an adversary scale their 
>attack very effectively.

I think the fears are greatly exaggerated.  We have much experience with bad 
glue, usually as a result of fat-fingering and poor decommissioning practices.  
So there’s some real-world information to work on.

When glue is fat-fingered, traffic is either reduced or not received at the 
authoritative server.  As fat fingering is the culprit, the zone administrator 
(usually well connected to the authoritative servers) will know to make an edit 
somewhere.

I’ve handled one very interesting case of malicious cache poisoning that proved 
to be due to poor decommissioning. A zone administrator noted that their 
mailhost address was “poisoned” repeatedly at an open recursive service.  This 
service permitted remote purging of entries, so even I could go to the service, 
purge the record, execute a query, see the correct value appear and then 
replaced by the poisoned value.

The result of this “went to litigation” so I never got a documented final 
report.  (I’m omitting some details so as not to expose the case, despite this 
being about 6 years ago.)  But what I did uncover, via calls to the victim and 
other traces was this.  A sysadmin was fired from the store (from the phone 
call).  He then (because the cell number traced him in WhoIs to the next 
employer) went to a competitor and set up their DNS (WhoIs records).  He 
apparently knew that the victim had transitioned their DNS services from hoster 
A to hoster B but neglected to purge all of the glue records at the registrar 
(otherwise, the needle in the haystack would be seen).  This left the glue 
record still in the database of the TLD and he was able to have an NS record 
refer to the owner, thus getting the TLD to respond with the should-be-cruft 
glue.

Due to the failure of the recursive server algorithm (it was home grown) to 
follow at least the trustworthiness rules in “Clarifications to the DNS”, and 
without DNSSEC in place, the recursive service believed the stale glue over the 
authoritative answer for the glue.  After a query for the victim was seen, all 
that needed to happen was a query for the attacker’s new zone to swap out the 
glue value, the latter query could have been delivered via a cron entry.

The danger in this case was that mail traffic was misdirected from the victim 
to the attacker as a denial of service (impact on business operations).  The 
victim’s MX record’s RDATA domain name was the same as the malicious NS record 
mentioned before.

In that case, DNSSEC would have helped, a solution we already have but not in 
place.  Additionally, and this was a conversation I never got to have, the open 
recursive service’s implementation choice permitted the attack to happen, (I do 
know it wasn’t a bug, I did get to speak to the lead engineer but never had the 
chance to educate them) once again, a solution is already there (better 
software!).

So, I can attest to glue being a weakness.  That’s not in question.

>Here's the gist of the problem, inherent in unsigned glue:

>IF (big if, with the how/when/where etc kept as a separate discussion) an 
>attacker manages to modify glue (for example, poisoning a resolver's cache for 
>glue info), the attacker has the opportunity to selectively return unmodified 
>glue, or to replace further glue data (and continue to be a DNS-MITM) and thus 
>both view queries, and if/when the queries cross to insecure delegations, 
>modify non-glue data. For example, if there is a TLD "foo", and the attacker 
>manages to poison the A record for one (or more) names of NS for "foo", the 
>attacker can act as a forwarder for most *.foo names, but then selectively 
>modify the A records for the NS for "bar.foo", and then for "blech.bar.foo", 
>until there is an insecure delegation, at which point the attacker can spoof 
>any RR type for any name below that zone cut. The attacker also has control 
>over TTLs of any/all spoofed records, modulo recursive resolver's TTL 
>ceiling/floor. The attacker can gain further information about the ongoing 
>success of attacks by TTL-based meta-monitoring (high TTL on delegation glue, 
>low TTL on sub-delegation glue, observe sub-delegation re-queries at the 
>spoofed delegation point.)

Once the tree falls to “insecure” land, all bets are off.   (And this is hard 
to determi

Re: [DNSOP] [Ext] Re: Comments on draft-wessels-dns-zone-digest-02

2018-08-13 Thread Edward Lewis

On 8/13/18, 13:35, "John R Levine"  wrote:

>Hey, I have a great idea.  We could make sure that the zone file received 
>matches the zone file sent by including a hash of the zonee in a record in the 
>zone.  Whaddaya think?

In some sense, it's re-inventing the wheel.

>I realize you could refetch all the glue and check it but that's a lot more 
>work.

Some code already does that, in the sense that the glue may be needed for other 
queries.

If not, what happens if bad glue (meaning the address is not in use by the 
intended server) is included?  Either no response, lame server response, or a 
response with false data.  The first two are denials of service but the querier 
ought to (as in not guaranteed to) be able to find another source.  The latter 
may be a mistake (neglected decommissioning of a zone when service is 
transferred) or malicious (the user case usually in mind).  For the latter to 
"disrupt" the data would have to be correctly signed to get past DNSSEC 
validation.

What this keeps coming back to is - is this new invention giving us anything 
that DNSSEC doesn't already give us?  As in, if it seems that DNSSEC is needed 
to validate it, why not just validate the data we are after?

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: Comments on draft-wessels-dns-zone-digest-02

2018-08-13 Thread Edward Lewis

On 8/13/18, 11:50, "John Levine"  wrote:

>As we may have mentioned once or twice before in this discussion, it lets you 
>do zone transfers over insecure channels and batch verify the zone before 
>using it.

Yes, I see that.  As in no more argument is needed to convince me of that.

>but the obvious consumer is a DNS server.

Maybe, maybe not.  I've seen DNS used in turnkey ways.  Nevertheless, given the 
complexity of DNSSEC validation, a wise implementer should re-use the parts of 
a DNS server for this.

(I.e., there's a lot of wacky stuff out there.  See "DNS Zone Transfer Protocol 
(AXFR)", Definition of Terms section [RFC 5936, Sec. 1.1] definitions of 
"General-purpose DNS implementation" and "Turnkey DNS implementation.  I've run 
across a lot of weird stuff in my studies of responder behavior [shuddering to 
think of these beasts as servers ;) ].)

>it'd be nice to be able to check that the zone is correct and get notified of 
>failure

There are many existing tools for such a set up.  For one, use a VPN or in-band 
channel security, and/or make sure the zone file received matches the zone file 
sent.  (If you use AXFR, which can only run on TCP, make sure the first and 
last resource record are the SOA.  If you use RSYNC, use other filesystem 
checks - like file size for one.)

I think the use case is the "we'll put the zone on some third party server and 
let others [the second party] retrieve it" - that's where you would want an 
end-to-end "secure" set up.

This is where my "belts and suspenders" comment comes in, if whatever uses data 
sets from the zone as transferred will do DNSSEC validation on the sets, what's 
the need for a signed "checksum" of the whole zone?  What is the relying party 
does not do validation?

In essence, what would do DNSSEC validation for the ZONEMD and not for data 
sets within the zone?

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: Comments on draft-wessels-dns-zone-digest-02

2018-08-13 Thread Edward Lewis

On 8/11/18, 10:44, "DNSOP on behalf of John Levine" wrote:

>The way that ZONEMD is defined in the draft, it's not very useful if the 
>ZONEMD record isn't signed.

That's my read too, which is why I question the incremental benefit over 
relying on DNSSEC while doing the query/response over port 53 "thing".  
Question, not doubt, that is.

What I'm struggling with is the applicability to other uses of the zone file.  
There too, the consumer, when making use of the ZONEMD, if the record isn't 
signed then it could be recomputed by the manager of the repository from which 
the zone file came.  If the record is signed, the consumer would then need to 
implement DNSSEC.  'Course, one signature verification would be cheaper than 
"$lots" (hundreds, thousands, millions).

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: Comments on draft-wessels-dns-zone-digest-02

2018-08-10 Thread Edward Lewis

On 8/9/18, 20:24, "DNSOP on behalf of Paul Wouters"  wrote:

>The point was to allow redistribution and to not depend on a trusted source 

That, FWIW, is the heart of DNSSEC.  Source authentication and data integrity 
for data sets is the advertised goal (as well as provable non-existence) of the 
extensions.  The DNS is not a strict client-server protocol, there is no 
telling what path a resource record (set) would take from the zone 
administrator to the would-be validating receiver.

Originally the set of paths included authoritative servers, forwarding servers, 
recursive servers, iterative servers, and such down to the point where DNSSEC 
validation could no longer be done.  (Originally, CPU horsepower and reliable 
libraries weren't assumed to me on end devices, hence the stub resolvers aren't 
included.  No reason they couldn't, no reason applications can't validate, 
except for performance/trust anchor management concerns.)

Now we are envisioning the transfer of zones in bulk, not just single datasets, 
via non-port 53 means.  But is that any different?  Does it really matter 
whether RSYNC of a zone was used to get a dataset from A to B?

There is a concern I can see.  When a server is loaded from disk (secondary 
storage), it does not validate the DNSSEC records to save time.  There's a risk 
that a maliciously inserted zone version be loaded and served, possible if the 
channel security is defeated.  But, if that happens, validating recipients of 
data sets will fail the data sets that are altered.  While this could be a 
denial of service for the server, but DNS provides fallback to other servers so 
relying parties are just as protected as they ordinarily are (including, no 
DNSSEC, no protection...).

How realistic is it that a forged zone could defeat all of the channel security 
for a zone?  How likely would it be for someone to load a false zone on all the 
places a recursive server would look for it?  Answering that would be a crucial 
step in deciding whether to add a zone hash mechanism.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Comments on draft-wessels-dns-zone-digest-02

2018-08-09 Thread Edward Lewis

FWIW, this message was spurred by this comic strip [yes, today as I write]: 
http://dilbert.com/strip/2018-08-09.

"Will the time taken to generate and verify this record add to the security of 
a zone transfer?"

I understand that there is no protection for cut point or glue records now, nor 
any guarantee for the occluded records and there's a desire to cover them.  It 
would be great to have the whole zone (as a data structure) be subject to 
source integrity and authenticity protections.

But there are already mechanisms for this at the data set level.  (This is a 
"belts and suspenders" style argument.)  What if -err- when, in a zone's 
distribution, the glue records are either forged or simply fat-fingered?  
That's covered, in a way that is more efficient - in a lazy evaluation way.  
Mangled glue never referenced needs not be checked, when it is needed there's 
backup in the authoritative version.  If all else fails, DNSSEC will flag 
whatever response as suspect.

I don't know if this is documented, but at one time, prototype authoritative 
servers would validate all the signatures in a zone upon load (before setting 
the AD bit).  This was discarded as it made zone loading (and reloading) take 
f-o-r-e-v-e-r.  (I recall this mostly because I was on the losing end of the 
argument.)  Today, we assume the server can set the AD as it can trust what it 
gets from disk or from AXFR {which had better be done with channel security!}.

One concern is what or who makes the decision to enable ZONEMD for a zone.  We 
are marching toward more automation in NOCs, so this will be a buried 
parameter.  What happens if a zone grows astronomically over time, in the 
beginning the ZONEMD is on?  (Similarly, zone have had to transition from NSEC 
to NSEC3 with optout.)  File this under "fear, uncertainty or doubt" but it 
stems from how I see the operations of the DNS evolving.

In general, the proposal is more or less fine.  I don't know if it is feasible 
(speed of execution).  I don't know that it adds to the safety of the system 
(DNSSEC already protects at the data set level in the same manner this protects 
at zone load time).  I don't know if the added configuration knob is justified 
by the benefit (forgotten settings, "trusted-keys"-mania ?).

And ... returning to the comic strip ... "Your plan is dumb because it reminds 
me of something different that didn't work out." ;)  "History repeats."  And 
today I am wearing a blue shirt (but it's a Knot DNS shirt!).


___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: zonemd/xhash versus nothing new

2018-08-01 Thread Edward Lewis

On 8/1/18, 07:29, "DNSOP on behalf of Tony Finch"  wrote:

>I was kind of assuming that the NSEC chain would include the glue -
>obviously delegations and glue in opt-out intervals are not protected at
>all.

The reason cut point information is not signed is that the copies are not 
authoritative, that is, the delegating zone is not the source of the records, 
they are merely hints.

The reason this wasn't seen as an oversight is the thought that if bad glue 
were followed, the DNSSEC chain would not work (for the data set sought) so 
long as the private keys (involved) were private.

The reason there is no overall zone signature is that the goal was data (set) 
integrity, not zone transfer integrity.

If the issue is zone transfer integrity, a solution will need to go beyond what 
DNSSEC is defined to be now.  (Not saying this is an obstacle, pointing out 
that DNSSEC wasn't designed to do accomplish that.)

FWIW, there's TSIG protection of AXFR messages (hop-by-hop), which isn't DNSSEC 
and then other operational practices, as examples of other tools.  (That is 
obvious to many, just including for completeness.)

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: Ghost of a zone signature effort from the long ago days...

2018-08-01 Thread Edward Lewis

On 7/31/18, 15:25, "DNSOP on behalf of Wessels, Duane" wrote:

>Olafur wrote a little about this a couple weeks ago.  He said:

Oh, okay, never mind me then.


___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Ghost of a zone signature effort from the long ago days...

2018-07-31 Thread Edward Lewis

I hear there are proposals to sign the entire contents of zones. zonemd/xhash 
in some subject lines.

(Forgive me if SIG(AXFR) was mentioned before...)

"Domain Name System Security Extensions", a'la RFC 2065, section 4.1.3 Zone 
Transfer (AXFR) SIG:

"However, to efficiently
   assure the completeness and security of zone transfers, a SIG RR
   owned by the zone name must be created with a type covered of AXFR
   that covers all zone signed RRs in the zone and their zone SIGs but
   not the SIG AXFR itself."

"Domain Name System Security Extensions", a'la RFC 2535, Appendix B: Changes 
from RFC 2065:

"3. ...In addition, the SIG covering type AXFR has been
  eliminated..."

I wish I could recall why.  (Anyone else recall why this was dropped?  I recall 
realizing it was a fool's errand but not the reasons.)  Yes, today's network is 
different.

I would think, if there is concern that the glue records were a mucked-with and 
a validator were misdirected by malicious glue, the DS record would provide 
evidence of a redirection.  For unsigned delegations, this would be an 
incentive to sign, for non-validating resolvers, an incentive to validate.

Now, pushing for universal deployment of DNSSEC might be improper at this 
juncture.

The option is to develop, implement, and operate a way to "sign the contents of 
a zone."  (Especially considering the pushback on full DNSSEC deployment.)

History isn't always the guide to follow, but we tried this once and gave up.

(Note: no comment on the merits of zonemd/xhash, just throwing in some history.)

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: [Doh] [Driu] Resolverless DNS Side Meeting in Montreal

2018-07-11 Thread Edward Lewis

I caught wind of this in my DNSOP folder...(cutting down the reply a *little* 
bit)

On 7/11/18, 04:23, "DNSOP on behalf of Petr Špaček"  wrote:

>   On 10.7.2018 20:57, Ryan Sleevi wrote:
>   > 
>   > 
>   > On Tue, Jul 10, 2018 at 2:09 PM, Mike Bishop> > wrote:
>   > 
>   > sufficient, but how fresh is sufficiently fresh?  And does DNSSEC
>   > provide that freshness guarantee?
>   > 
>   > 
>> Right, these are questions that need to be answered, and not just with
>> abstract theoretical mitigations that don't or can't be deployed.
>
>Signatures in DNSSEC (i.e. RRSIG records) have validity period with
>1-second granularity so in theory it allows fine-tunning. Of course
>shorter validity means more cryptographic operations to update
>signatures etc.
>
>Taking into account that Cloudflare signs all recorsd on the fly, it is
>clearly feasible for CDN to generate fresh signatures on every DNS request.

This reminds me of one of the dreams chased while designing DNSSEC in the 
1990's (the in the lab era).  The idea that the signature validity span could 
convey freshness of data, or other timeliness came up and was kicked around.

The purpose of the signature validity data in the RRSIG records is to defend 
against replay of the signatures.  For that reason, for the first time in the 
evolution of the DNS protocol, we let wall-clock time (absolute time) into the 
protocol.  In the early history of protocol development, whether there was a 
universal clock at play was a big design choice, so the concept of clocks and 
time occupied our minds.

DNS has always had relative time.  The SOA record timers, for instance, the 
TTL.  These marked the passage of time (locally) and are not dependent on a 
coordinated clock.  (FWIW, Network Time Protocol wasn't a utility yet, NTP 
today makes this all see like a tempest in a teapot.)  Until the RRSIG, and 
then TSIG, wall-clock time wasn't part of the protocol.

So absolute time in DNS started with defending against potential replay 
attacks.  Because of this, we needed wall-clock time, it was in the door with 
that, no going back on that.

First we then tried things like "what if we want to sign data for Monday and 
different for Tuesday" - a sort of versioning.  This was an artifact of DNSSEC 
assuming air-gap signing as host software security was really weak back then 
(can't trust servers with private keys).  This didn't go far, no one has ever 
asked the DNS to have time-defined "versions", we just update the servers on 
demand now.

We then thought about freshness of data.  Should a set with an RRSIG with a 
"newer" validity span knock a set that was validated-and-cached out of a cache? 
 What I recall was that we weren't going to "go there" - that is - the validity 
period would remain solely for the purpose of preventing replays and not be 
used to compare the "value" of one set of validated data with another set of 
validated data.

Why?  I will admit that I am foggy on that, I wish I could be more definitive 
or provide a useful reference.  Nevertheless, here is what I recall (mindful 
this may be personal opinion, not consensus of any group):

1. Two validity periods may overlap, complicating what it means to be 
"fresher".  I.e., one might be from Jan 1 to Dec 31, the other from Feb 1 to 
Feb 10.  (Assume same year.)  Which is fresher?  Is more specificity important? 
 This is a practical question for the code developers.

2. The philosophy against scope creep: why expand the semantics of the fields 
based on a loosely, possibly incompletely defined concept (referring to 
"freshness of DNS data")?  (A hand-wavy push back.)

3. The philosophy that once a data set is validated, it stays validated in the 
cache.  There are use cases where a "hijack" might get unauthorized (yet 
validated) data into caches and calls to flush these, these call into question 
this philosophy.  The reason for this philosophy is it eases implementations 
(and with flushing an operator manual option, acceptable).  Caches don't have 
to re-test what's in them as it is now.

4. Under what use case would a "fresher" set of data come to the cache.  If the 
data was already in the cache, the resolver wouldn't fetch a new copy, so the 
need to compare wasn't apparent.  This is in the era where the additional data 
section was seen as the carrier of disease (see "Clarifications to the DNS 
Specification"), so that wasn't an active avenue.

Reasons 1 and 4 I think were the most compelling back then, although I can see 
reason 4 falling apart if data in the additional section is properly validated.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Lameness terminology

2018-05-04 Thread Edward Lewis

Ah, yes, I missed the definition in "Tools for DNS Debugging" in quickly 
skimming this morning.

I read the "A lame delegation is a serious error in DNS configurations, yet a 
(too) common one." and skipped to the next paragraph, assuming this was yet 
another definition by example.

On 5/4/18, 11:53, "Paul Hoffman"  wrote:

On 4 May 2018, at 8:16, Edward Lewis wrote:

> FWIW, if one is to cite a definition of lame, I'd use "Common DNS 
> Operational and Configuration Errors", February 1996, aka RFC 1912.

Yep, that's the one we already have. I think it's worthwhile to also add 
the wider definition that Amreesh pointed out in RFC 1713.

--Paul Hoffman

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Lameness terminology

2018-05-04 Thread Edward Lewis

On 5/4/18, 05:59, "DNSOP on behalf of Shane Kerr"  wrote:

>Pending Ed's archival research, it seems like we need to actually do
some work to structure the concepts around lameness. Digging in...

Quick and dirty, using a directory with "rfc$i.txt" files so far up to 5000 or 
so:

(BTW, my search is: "grep lame * | grep -e delegation -e server | sort -u -k 
1,1"
Yes, perhaps not optimal but quick and easy to run.)

rfc1470.txt - mentions a tool called LAMER to detect lame delegations
rfc1612.txt - a MIB definition tracking lame delegations
rfc1713.txt - first example of a lame delegation (when describing LAMER)
rfc1912.txt - first definition, here it is (with grammatical error):

"A lame delegations exists when a nameserver is delegated responsibility for 
providing nameservice for a zone (via NS records) but is not performing 
nameservice for that zone (usually because it is not set up as a primary or 
secondary for the zone)."

rfc2308.txt - author uses "lame server" instead of delegation (poke at Mark)
rfc2832.txt - back to lame delegation, when talking about RRP the "pre-"EPP
rfc3658.txt - defining the DS and uses the term lame nameserver
rfc4074.txt - lame delegation in the context of IPv6
rfc4697.txt - lame or lame server is used

>From this quick and dirty pass, my comments:

The concept is uniform - the situation is that the server of interest is not 
configured to host the zone.  I.e., this is a per-nameserver concept but not a 
per-cutpoint concept.

The precise term for lame "delegation|server|name server" seems to have never 
been cemented, although lame delegation seems most common.  Grammatically I'd 
choose (if I had a vote) for "A lame delegation occurs when a name server is 
named by a parent to be authoritative for a zone despite the name server not 
being configured to be authoritative for the zone.  If a name server receives a 
query for a domain (as opposed to zone!) for which it is not configured, the 
response is (Blank)."  The latter part of that would be important when writing 
interoperable code. ;)

I say Blank because there is no universally used response in this situation and 
I'm not going to promote any solution here.

FWIW, if one is to cite a definition of lame, I'd use "Common DNS Operational 
and Configuration Errors", February 1996, aka RFC 1912.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Lameness, registries, and enforcement was Re: [Ext] Lameness terminology (was: Status of draft-ietf-dnsop-terminology-bis)

2018-05-04 Thread Edward Lewis

On 5/4/18, 10:26, "DNSOP on behalf of Joe Abley"  wrote:

> from the perspective of whom? There's issue 6 for you.

I included that in issue 3.  Reference the "yadda, yadda, yadda".

[https://en.wikipedia.org/wiki/The_Yada_Yada]

I thought it was "yadda" not "yada", but since there's no subtitles on my TV 
sitcoms, I didn't know.


___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Lameness terminology (was: Status of draft-ietf-dnsop-terminology-bis)

2018-05-04 Thread Edward Lewis

Just noticed this (and why terminology is a problem):

On 5/3/18, 17:25, "Mark Andrews"  wrote:

>Start removing lame delegation ...

Are we talking about "lame servers" or "lame delegations"?  If the latter, is a 
"delegation" a single NS / glue record or a the set of NS records and 
associated glue for the owning domain name?  I've been "trained" to think of 
lame servers, not the entire delegation of a name, when thinking about this 
kind of maintenance activity.

I've been looking through documents and find, for example, "DNS Resolver MIB 
Extensions" in May 1994 using the term "Lame Delegation" while "Negative 
Caching of DNS Queries (DNS NCACHE)" in March 1998 using "lame server."  I.e., 
the documents are flip-flopping.

I still need more time to trawl the RFCs.  Apparently I haven't kept my cache 
of them around.

 

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Lameness, registries, and enforcement was Re: [Ext] Lameness terminology (was: Status of draft-ietf-dnsop-terminology-bis)

2018-05-04 Thread Edward Lewis

This isn't about terminology but the once-again debate about a registry's 
responsibility here.

It's simple to state a policy that says:

If an registered NS record does not function properly, the registrant will be 
notified and the NS record will be removed from the DNS until such time that it 
functions properly.

Nice, simple, clean.  Sounds like something a responsible registry would do.  
But it is on top  of an iceberg of issues.

Issue 1: define "function properly".  That can be done.  Lame, non-responsive, 
and so on.  But as I said privately to the original poster, the "science" of 
bad responses is vastly different from the "science" of no response.  (I recall 
from my experimentation that for some addresses, I could repeat the question 
over 10 times [some seconds apart], maybe 13, and still get back a "first" 
response from the address.  I used the id field to tell the queries apart.  To 
this day, I am astonished by that.)

Issue 2: how is the registrant notified, and what constitutes "success" in 
notifying the registrant?  Is an email to the NOC contact enough?  A robo-call? 
 What if the contact information is inaccurate?  This question is needed to 
tell whether the registry is properly implementing the policy they have.

Issue 3: determining the state of the service.  This is tougher than it seems.  
Multiple vantage points, sampling over time, setting a threshold for how many 
failed responses per time quantum constitute failure, yadda, yadda, yadda.  
Keep in mind, the NS record may be part of an anycast cloud and, if the 
registry is hitting one instance, that one might be affected by a spurious 
traffic flood.

My concern is the liability for false positives in failure testing.  I've been 
at the wrong end of such a test, where the registry had failures on their end 
and pointed the finger at us.  (IPv6 was the subject of the test.)  Even if the 
customer-impact of that was low, we spent a lot of resources pouring through 
logs, contacting service providers, tracing the routes, only to find the error 
was a scripting error by the registry.  I traced that down by meeting the 
tester -in person- and going over the test results.

Issue 4: If the registry pulls the NS record, the operator can't test their 
changes until the registry re-tests.  This makes operating the registration 
harder, the tech doing the work has to either engage the registry tech support 
"live" (include language barrier) or suspend completing the ticket until the 
registry gets around to the next test.

Issue 5: Even if the registry pulls the offending NS record, it might still be 
in the authoritative set, meaning caches will still have it present.  I.e., 
pulling the NS record at the parent is trumped by the child.  (This assumes 
some other NS is working, making the authoritative sset visible.)

Philosophically, in DNS, once a delegation is made, it's the child's.  For 
better or worse, the protocol doesn't equip the registry to "coach" the child 
well.  Any work done towards that is "fighting entropy".  It can be done, but 
consumes energy (instead of producing it).



___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Lameness terminology (was: Status of draft-ietf-dnsop-terminology-bis)

2018-05-02 Thread Edward Lewis

On 4/23/18, 10:23, "DNSOP on behalf of Shane Kerr"  wrote:

>I don't know if this is documented anywhere so that it can be
>referenced properly, sorry. I am happy to discuss further but I think
>this basically covers all I know. I don't mind proposing text, but
>probably someone (Ed maybe?) would be a better person.

I've been looking for something I recall writing a long time ago, but haven't 
found it.  I'm not the authority on lameness (insert self-deprecating joke 
here), the term has been repeatedly defined in a number of documents.  What I 
wrote was a survey of them, including the citations, but I haven't been able to 
find it.  Not on the web, that is.

The reason I bother with this was, when I was tasked with the work in 2002 or 
2003, I was genuinely surprised at the definition of "lame" delegation.  I was 
surprised that it did not include non-responsive servers - the term referred to 
responding servers only.

If I can't find the text soon, I'll try to recreate the list of references at 
least.

(Only if you like reading history:)

The reason was a flaw in "certain old resolvers" that followed the "upwards 
referral" to the root that the "predominate server code of the time" had 
decided to use for lameness.  The result was a lot of resolver stuck in an 
infinite loop, hitting the root servers.  I.e., this was an operational issue.  
The solution was updating and redeploying the buggy code, not stamping out lame 
servers (which was the goal of the task).  FWIW, the "upwards referrals" were 
discontinued when it became apparent they were being used in noticeable 
amplification attacks.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: tdns, 'hello-dns' progress, feedback requested

2018-04-13 Thread Edward Lewis

On 4/13/18, 15:02, "DNSOP on behalf of Matthew Pounsett" 
 wrote:

>On 13 April 2018 at 11:11, bert hubert  wrote:

>>RFC 1034, 4.3.2, step 3, a. It says to go back to step 1, which means that
in step 2 we look up the best zone again for the target of the CNAME. I have
not looked if newer RFCs deprecate this or not. So with 'chase' I mean,
consult other zones it is authoritative for. There might be millions of
these btw, operated by other people.

>Wouldn't there be a security concern with doing that?

Certainly.

And you just woke up a hole I've seen but not realized.

A name server may be properly authoritative (legit?) for a zone - meaning that 
somehow, when consulting the (grand)parent of the zone on another server, you 
can manage to get to the zone.  (The wording here is hairy as I'm accounting 
for cases where the NS sets above the cut and below the cut aren't the same, 
having some overlap.  Yes, that's broken, but it works.)

Name servers may also be configured to host zones they "ought not" (ill-legit) 
- such as a zone of an exited customer of a DNS hosting service.  I know this 
could happen in multi-tenant hosting providers who don't check whether their 
customers are registered "owners" of the zones the customer configures.  Or 
forget to clean up after departed customers.

If a name server CNAME chases from a "legit" zone to an "ill-legit zone" things 
could get interesting, especially when debugging.

I have thought, for a very long time, "chasing" of any DNS "rewrite rule" is 
wrong - CNAME or DNAME.  Leave the query gymnastics to the DNS iterator, the 
trust policy ought to be homed at the searcher's end.  But that is just 
opinion...

Oh, I meant to say - the above opinion I had when I worked for a hoster.  Since 
the hoster charged by query volume, if I had made that public I'd be accused of 
trying to raise revenue artificially.  But as a protocol analyst, I think that 
a few more queries is worth the simplicity...but you can argue "round trips" 
and all latency problems...but then I'd argue "that's what caching is for...

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

1 2 3 4 5 6 >

1 - 100 of 540 matches

Mail list logo