Re: [DNSOP] [dns-operations] dnsop-any-notimp violates the DNS standards

2015-03-13 Thread D. J. Bernstein
I remain puzzled at the entire technological motivation that CloudFlare
claims for this deliberate creation of interoperability problems.

In particular, what exactly is the programming difficulty that they
claim they're encountering in implementing QTYPE=*? Are they also having
trouble implementing NXDOMAIN, which from a programming perspective is a
very similar unification of data across all types?

The rest of this message identifies specific rules that CloudFlare's
threatened plan will violate in IETF's mandatory DNS standards.

David C Lawrence writes:
 RFC 1035 explicitly allows for a server to indicate that a kind of
 query is not implemented. Whether it is a good idea to respond to ANY
 this way is a separate argument that is worth having. You just won't
 win on the foundation that it is a violation of the standard.

Let's look at what the standards actually say.

RFC 1034 clearly defines QTYPE=* to match all RR types, along with,
e.g., QTYPE=A to match just that type. It explicitly says that the
name server looks for matching RRs.

CloudFlare's stated plans will violate this rule. This matching for
QTYPE=* is precisely what CloudFlare claims is too hard to implement!

You claim that this violation of a rule in IETF's mandatory DNS
standards doesn't constitute a violation of the standards. Evidently
you believe that the standards contain some relevant exception to the
rule. What exactly do you claim that this exception is?

The foundation for your argument, apparently, is the fact that RFC 1035
defines a syntax for a NOTIMP response. But why do you think this is of
any relevance to the matching RRs rule? The rule doesn't say the name
server looks for matching RRs, except for types that the server doesn't
want to bother implementing. Where exactly, and what exactly, is the
CloudFlare exception?

Do you believe that the availability of a NOTIMP syntax overrides all
other rules and frees the server to use this syntax for anything that it
doesn't want to implement? Here's a hypothetical example to consider:

   * A very large cache operator is opposed to the usage of DNSSEC. (The
 operator's reason for this isn't relevant to this example.)

   * To deter usage of DNSSEC, the cache operator decides to create
 large-scale DNSSEC interoperability problems, augmenting DNSSEC's
 existing fragility.

   * Specifically, the cache operator issues DNSSEC queries to servers;
 and then, if a server response shows DNSSEC support, the cache
 operator returns NOTIMP to clients for _all_ of the server's data.

   * To avoid any sudden changes, the cache operator slowly ramps up
 this behavior, turning on the DNSSEC queries with higher and higher
 probability as time passes, but jumping immediately to probability
 1 for servers that don't show DNSSEC support.

   * To justify the use of NOTIMP, the cache operator claims that it
 _wanted_ to implement actually returning DNSSEC records to clients
 but found this too complicated given geoip blah blah blah, so it
 had to return NOTIMP. It quotes your claim that RFC 1035
 explicitly allows returning NOTIMP.

Would you call this cache behavior compliant with the mandatory DNS
standards? No? Why not? Why isn't the cache free to use NOTIMP whenever
it hasn't implemented something? Are you quite sure that you've thought
through what you're claiming?

Let's continue looking at the mandatory DNS standards. RFC 1034
explicitly allows not-implemented queries in _some_ cases, such as
inverse queries:

   Implementation of this service is optional in a name server, but all
   name servers must at least be able to understand an inverse query
   message and return a not-implemented error response.

RFC 1035 is also quite clear about this:

   Inverse queries are an optional part of the DNS. Name servers are
   not required to support any form of inverse queries. If a name server
   receives an inverse query that it does not support, it returns an
   error response with the Not Implemented error set in the header. 
   While inverse query support is optional, all name servers must be at
   least able to return the error response.

If the RFCs had meant to say that _all_ DNS features are optional
(leaving interoperability up to the whim of bullies, apparently what
you're endorsing), then why didn't the RFCs simply say that? Why did
they explicitly highlight particular features as being optional?

Furthermore, RFC 1123 explicitly requires DNS software to support all
well-known, class-independent formats. This is another mandatory rule
that CloudFlare's plan clearly violates. What exactly do you think this
support requirement means, if servers are free to use NOTIMP whenever
they want, for example for QTYPE=*?

RFC 1034 explicitly says that a server is free to refuse to perform
recursive services for any or all clients (and also explicitly says
that an AXFR may cause an error, such as refused) but explicitly says
that All name servers must 

Re: [DNSOP] [dns-operations] dnsop-any-notimp violates the DNS standards

2015-03-12 Thread D. J. Bernstein
Paul Wouters writes:
 So if the MX or  record has expired from the cache but another
 RRtype with larger TTL (say NS) is still in there, your ANY query will
 fail to find records.

The client is behaving correctly. The ANY query isn't guaranteed to find
the MX, but you're wrong in claiming that the client is relying on this.
I realize that you don't understand how this type of DNS client works,
so let me go through the details, slowly, covering

   (1) why the queries reliably produce the desired information and
   (2) why QTYPE=ANY ends up reducing the number of queries that the
   authoritative server has to handle for producing this information.

The client begins with an ANY query to the cache. There are several
possibilities for the cache state at this moment:

   * maybe there's nothing;
   * maybe there's an MX record;
   * maybe there's an A record;
   * maybe there's some other regular record, such as NS;
   * maybe there's a mix of regular records;
   * maybe there's a CNAME record;
   * etc.

In the nothing case, the cache forwards the query to the server. This
could end up retrieving, e.g., an MX set, an A set, and an NS set; it
could end up retrieving a CNAME; there are many possibilities.

The cache then returns whatever it has to the client. The client notices
failure cases (such as SERVFAIL) and, in those cases, defers mail
delivery. Let's now focus on what happens in the success cases.

If the cache returns a CNAME: The client sees the CNAME, replaces the
original name with the CNAME, and starts over.

If the cache returns, e.g., an NS record: The client sees this record
and concludes that there _isn't_ a CNAME. This conclusion is justified
by the rule that a name can't have CNAME together with regular records.
An administrator who sets up a single name with both CNAME and NS (or
CNAME and MX, etc.) is entirely at fault for the resulting confusion.

The client then sends an MX query to the cache. Again failures are
caught and defer mail delivery. The most common success case is that
the cache has an MX set at this point; the client sees the MX set and
acts accordingly. (The A that the MX points to is normally cached too.)
If there's a successful response without an MX (case 5 described in
http://cr.yp.to/djbdns/notes.html#response-parsing) then the client
correctly concludes that there isn't an MX and falls back to an A query
for the original name.

To summarize, this type of ANY-MX-A client correctly sees whether
there's a CNAME, correctly sees whether there's an MX, and (when there
isn't an MX) correctly sees whether there's an A. If there's a server
failure (e.g., NOTIMP) or cache failure, the client correctly defers
mail delivery.

A comprehensive efficiency analysis requires detailed measurements for
many years at many sites, but spot checks have consistently shown that
the following cases are most important:

   * Most common: MX in server and in cache. The ANY-MX-A client ends
 up generating 0 queries to the server: the cache answers ANY with
 MX, and answers MX with MX.
 
 For comparison, a CNAME-MX-A client would also end up sending 0
 queries to the server _if_ the cache were smart enough to use the
 MX as a reason to deny CNAME. But typical caches aren't that smart,
 so this type of client would end up sending 1 query to the server.

 An MX-A client uninterested in CNAME would end up sending 0
 queries to the server.

   * MX in server but not in cache. The ANY-MX-A client ends up
 generating 1 query to the server: the server answers ANY with MX
 (and typically A), and then the cache answers MX with MX.

 For comparison, a CNAME-MX-A client would end up generating 2
 queries to the server: the server answers CNAME with no data, then
 answers MX with MX.

 An MX-A client uninterested in CNAME would end up sending 1 query
 to the server: the server answers MX with MX.

   * A in server and in cache, no MX in server: The ANY-MX-A client
 ends up sending 1 query to the server. The cache answers ANY with
 A; the server answers MX with no data; the cache answers A with A.

 For comparison, a CNAME-MX-A client would end up generating 2
 queries to the server (or 1 if the cache is smarter): the server
 answers CNAME with no data; the server answers MX with no data; the
 cache answers A with A.

 An MX-A client uninterested in CNAME would end up sending 1 query
 to the server. The server answers MX with no data, and the cache
 answers A with A.

   * A in server, nothing in cache: The ANY-MX-A client ends up
 sending 2 queries to the server. The server answers ANY with A; the
 server answers MX with no data; the cache answers A with A.

 For comparison, a CNAME-MX-A client would end up generating 3
 queries to the server: the server answers CNAME with no data; the
 server answers MX with no data; the server answers A with A.

 An MX-A client uninterested in 

[DNSOP] dnsop-any-notimp violates the DNS standards

2015-03-09 Thread D. J. Bernstein
My qmail software is very widely deployed (on roughly 1 million SMTP
server IP addresses) and, by default, relies upon ANY queries in a way
that is guaranteed to work by the mandatory DNS standards.

Specifically, query type ANY matches all RR types for that node on
that server. There's an example in RFC 1034 of how a CNAME record is
returned by a type CNAME or * query. There's nothing telling clients
to avoid this query type; it's perfectly valid for a client to treat a
server that refuses this query type as a broken server, because that's
exactly what the server is. Of course, there's no guarantee of which RR
types for a node are available at a cache, but a client is guaranteed to
be able to detect CNAME records from responses to query type ANY (as
qmail does), since a CNAME type prohibits all regular RR types.

I started using these standard ANY queries for interoperability reasons
(working around a BIND bug and matching sendmail behavior at the time)
but the choice continues to provide some efficiency benefits, larger
than many of the efficiency benefits used as rationales in IETF
protocols. In new software today I would sacrifice these efficiency
benefits for the sake of simplicity, but this doesn't mean that I'm
going to frivolously inflict retroactive punishment upon administrators
who have installed standards-compliant software and done nothing wrong.

Let's now take a look at draft-ogud-dnsop-any-notimp-00.txt with this
background in mind. The I-D specifies behavior that

   * violates the existing mandatory DNS standards and that
   * breaks interoperability with this standards-compliant use of ANY.

An accompanying blog post says that in a few weeks the author's
organization will begin deploying this standards violation---with the
effect of immediately damaging Internet email delivery. The blog post
and ID contain a number of dubious assertions attempting to justify this
change.

I don't mean to suggest that protocols must never change in incompatible
ways. However, modifications to IETF's standard protocols need to be
handled through the established IETF procedures, with appropriate
respect for the existing standards and the installed base:

   * First: The proposed protocol modification has to be taken to an
 IETF working group chartered to modify the protocol, so that
 stakeholders will have a proper chance to evaluate and comment on
 the proposal.

   * Second: The merits of the protocol modification have to be properly
 discussed in that working group, to evaluate the costs and benefits
 of the protocol modification---and to consider whether there are
 better ways to achieve the same benefits.

   * Third: _If_ the benefits of the modification are judged to outweigh
 the costs, a sunset period---in this case, a timeline for the
 client to stop using ANY queries---has to be specified, to avoid
 interoperability problems. This period has to be several years,
 recognizing the time required for client administrators to hear
 about and carry out the necessary redeployment. (It's not as if
 we're talking about an emergency security change.)

   * Fourth: After the sunset period expires, the server will be free to
 use the modified protocol---in this case, to refuse ANY queries.

I understand how a sufficiently large site might acquire the impression
that it can safely take radical action at its own whim, violating the
existing protocol standards and as a result creating interoperability
problems---but this is making a mockery of the IETF standardization
process. This is _not_ a mere operational change within the existing
protocol; it is _not_ a private extension using the standard negotiation
mechanisms; it is a flagrant violation of the required DNS standards.

My understanding is that dnsop@ietf.org is not chartered to make DNS
protocol changes, so any discussion here will have to be repeated in an
appropriate working group, but let me nevertheless comment on the two
benefits claimed for having servers refuse ANY queries:

   * Refusal would reduce DNS amplification: This argument already seems
 to have been dismissed by various people, and doesn't seem to be
 defended. It's obviously less effective than standards-compliant
 approaches such as limiting UDP responses to 512 bytes.

   * Attempting to handle ANY queries creates enormous complexity in
 our DNS server code base: This is a quite puzzling claim,
 especially since the specified features (load balancing, geoip,
 etc.) have been supported for many years by other software that has
 no trouble handling ANY. What exactly is the claimed difficulty in
 copying records from, e.g., the A key to the * key in the
 underlying database?

Apparently Firefox recently deployed ANY queries. I haven't looked at
the details but I gather that they're related to the well-known
annoyances of handling  etc. Firefox was browbeaten into reverting
this change on the 

Re: [DNSOP] [dns-operations] dnsop-any-notimp violates the DNS standards

2015-03-09 Thread D. J. Bernstein
Edward Lewis writes:
 Operators are not bound to comply with what the IETF documents.

As I said before, this is making a mockery of the IETF standardization
process. Instead of

   * obeying the existing mandatory standards,
   * giving due respect to the installed base relying on the standards,
   * trying to build consensus for a transition that demands action from
 the installed base, and
   * taking the slow steps necessary to maintain interoperability during
 a transition if any transition happens,

a large operator is using its market position to violate the standards
and _create_ interoperability failures as a tool to enforce a protocol
change that it wants. Furthermore, a few weeks before this standards
violation is going to be deployed, the stated rationale for the change
is undergoing massive revisions, making serious discussion difficult and
leading observers to wonder how carefully the change was thought through
in the first place.

If you want IETF standards to be taken seriously---if you think that the
basic rules of Internet communication should be established by consensus
in IETF, and not simply overridden by future developers and operators
who think they know better, including cases where you _don't_ agree with
them---then you have to stop endorsing standards violations.

Tony Finch writes:
 qmail uses ANY queries for domain canonicalization on outgoing
 messages, as specified by RFC 1123. But canonicalization is not
 required by the current SMTP specification. It is a waste of time.

I fully agree that this was made optional in SMTP---that qmail is no
longer required to do this. But how do you leap to the wild conclusion
(stated twice in your message) that this is a bug in qmail?

More importantly, why do you think this is relevant to anything that I
said? I didn't say that qmail's behavior is currently required for SMTP.
I said that qmail is very widely deployed and relies upon ANY queries in
a way that is guaranteed to work by the mandatory DNS standards. The
dnsop-any-notimp proposal ignores those standards and will create real
interoperability problems with mail delivery.

 Using an ANY query suppresses alias processing, so qmail makes a
 series of queries to follow CNAME chains. This is inefficient and
 wasteful.

No, you have the efficiency picture backwards. CNAME chains have always
been an extremely small fraction of the DNS queries inside mail, while
the QTYPE=ANY side effect of preloading MX/A records has always produced
a significantly larger reduction in DNS queries.

This item is something else that you explicitly label as a bug. You
keep using that word; I do not think that word means what you think it
means.

 qmail's DNS response buffer is too small to accommodate a complete DNS
 message, so it fails if it gets a large response.

Precisely how many bytes do you believe are in a complete DNS message?
65535, the TCP limit? Do you seriously believe that 65535-byte responses
work reliably today, and that any failure to handle such responses is a
bug? How about 512, given the fact that the mandatory DNS standards do
_not_ require TCP support? Or maybe 1280, the recommended safe size for
EDNS0 UDP (depending on the network etc.)?

Even in an imaginary world where 65535-byte responses work, what do you
think happens if a server administrator puts _more_ than 65535 bytes of
records at a single node? Do you blame the server administrator? If DNS
implementors handle this use case by introducing a DNS transport capable
of handling 4-gigabyte responses, will you then claim that there's a
bug in every DNS client that has less RAM available or that simply
insists on a smaller limit to control its resource use?

In fact, all DNS client and server deployments have size limits on DNS
responses, and these limits have always varied, making the system as a
whole increasingly fragile for the unfortunate administrators pushing
the limits (notably DNSSEC administrators). The only sane way out is for
the protocol to declare a single reasonable size limit to be respected
by all clients and servers---but this implies reengineering large DNS
record types to split data across nodes (the same way that TCP splits
streams across packets), and nobody seems willing to do this work. It's
much easier to stuff all data into one node, pretend that the system
works, and blame the administrators for all of the resulting failures.

I'd be happy to discuss this issue further if anyone is interested in
fixing these aspects of the DNS protocol. I would, however, suggest
starting a separate thread for that, since this really isn't relevant to
how dnsop-any-notimp violates the DNS standards.

---Dan

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop