Re: [DNSOP] [dns-operations] dnsop-any-notimp violates the DNS standards
I remain puzzled at the entire technological motivation that CloudFlare claims for this deliberate creation of interoperability problems. In particular, what exactly is the programming difficulty that they claim they're encountering in implementing QTYPE=*? Are they also having trouble implementing NXDOMAIN, which from a programming perspective is a very similar unification of data across all types? The rest of this message identifies specific rules that CloudFlare's threatened plan will violate in IETF's mandatory DNS standards. David C Lawrence writes: > RFC 1035 explicitly allows for a server to indicate that a kind of > query is not implemented. Whether it is a good idea to respond to ANY > this way is a separate argument that is worth having. You just won't > win on the foundation that it is a violation of the standard. Let's look at what the standards actually say. RFC 1034 clearly defines QTYPE=* to match "all RR types", along with, e.g., QTYPE=A to match "just that type". It explicitly says that "the name server looks for matching RRs". CloudFlare's stated plans will violate this rule. This "matching" for QTYPE=* is precisely what CloudFlare claims is too hard to implement! You claim that this violation of a rule in IETF's mandatory DNS standards doesn't constitute a violation of the standards. Evidently you believe that the standards contain some relevant exception to the rule. What exactly do you claim that this exception is? The foundation for your argument, apparently, is the fact that RFC 1035 defines a syntax for a NOTIMP response. But why do you think this is of any relevance to the "matching RRs" rule? The rule doesn't say "the name server looks for matching RRs, except for types that the server doesn't want to bother implementing". Where exactly, and what exactly, is the CloudFlare exception? Do you believe that the availability of a NOTIMP syntax overrides all other rules and frees the server to use this syntax for anything that it doesn't want to implement? Here's a hypothetical example to consider: * A very large cache operator is opposed to the usage of DNSSEC. (The operator's reason for this isn't relevant to this example.) * To deter usage of DNSSEC, the cache operator decides to create large-scale DNSSEC interoperability problems, augmenting DNSSEC's existing fragility. * Specifically, the cache operator issues DNSSEC queries to servers; and then, if a server response shows DNSSEC support, the cache operator returns NOTIMP to clients for _all_ of the server's data. * To avoid any sudden changes, the cache operator slowly ramps up this behavior, turning on the DNSSEC queries with higher and higher probability as time passes, but jumping immediately to probability 1 for servers that don't show DNSSEC support. * To justify the use of NOTIMP, the cache operator claims that it _wanted_ to implement actually returning DNSSEC records to clients but found this too complicated given geoip blah blah blah, so it had to return NOTIMP. It quotes your claim that RFC 1035 "explicitly allows" returning NOTIMP. Would you call this cache behavior compliant with the mandatory DNS standards? No? Why not? Why isn't the cache free to use NOTIMP whenever it hasn't implemented something? Are you quite sure that you've thought through what you're claiming? Let's continue looking at the mandatory DNS standards. RFC 1034 explicitly allows not-implemented queries in _some_ cases, such as inverse queries: Implementation of this service is optional in a name server, but all name servers must at least be able to understand an inverse query message and return a not-implemented error response. RFC 1035 is also quite clear about this: Inverse queries are an optional part of the DNS. Name servers are not required to support any form of inverse queries. If a name server receives an inverse query that it does not support, it returns an error response with the "Not Implemented" error set in the header. While inverse query support is optional, all name servers must be at least able to return the error response. If the RFCs had meant to say that _all_ DNS features are optional (leaving interoperability up to the whim of bullies, apparently what you're endorsing), then why didn't the RFCs simply say that? Why did they explicitly highlight particular features as being optional? Furthermore, RFC 1123 explicitly requires DNS software to "support all well-known, class-independent formats". This is another mandatory rule that CloudFlare's plan clearly violates. What exactly do you think this "support" requirement means, if servers are free to use NOTIMP whenever they want, for example for QTYPE=*? RFC 1034 explicitly says that a server "is free to refuse to perform recursive services for any or all clients" (and also explicitly says that an AXFR "may cause an error, such as refused") but explicitly says t
Re: [DNSOP] [dns-operations] dnsop-any-notimp violates the DNS standards
Paul Wouters writes: > So if the MX or record has expired from the cache but another > RRtype with larger TTL (say NS) is still in there, your ANY query will > fail to find records. The client is behaving correctly. The ANY query isn't guaranteed to find the MX, but you're wrong in claiming that the client is relying on this. I realize that you don't understand how this type of DNS client works, so let me go through the details, slowly, covering (1) why the queries reliably produce the desired information and (2) why QTYPE=ANY ends up reducing the number of queries that the authoritative server has to handle for producing this information. The client begins with an ANY query to the cache. There are several possibilities for the cache state at this moment: * maybe there's nothing; * maybe there's an MX record; * maybe there's an A record; * maybe there's some other regular record, such as NS; * maybe there's a mix of regular records; * maybe there's a CNAME record; * etc. In the "nothing" case, the cache forwards the query to the server. This could end up retrieving, e.g., an MX set, an A set, and an NS set; it could end up retrieving a CNAME; there are many possibilities. The cache then returns whatever it has to the client. The client notices failure cases (such as SERVFAIL) and, in those cases, defers mail delivery. Let's now focus on what happens in the success cases. If the cache returns a CNAME: The client sees the CNAME, replaces the original name with the CNAME, and starts over. If the cache returns, e.g., an NS record: The client sees this record and concludes that there _isn't_ a CNAME. This conclusion is justified by the rule that a name can't have CNAME together with regular records. An administrator who sets up a single name with both CNAME and NS (or CNAME and MX, etc.) is entirely at fault for the resulting confusion. The client then sends an MX query to the cache. Again failures are caught and defer mail delivery. The most common success case is that the cache has an MX set at this point; the client sees the MX set and acts accordingly. (The A that the MX points to is normally cached too.) If there's a successful response without an MX (case 5 described in http://cr.yp.to/djbdns/notes.html#response-parsing) then the client correctly concludes that there isn't an MX and falls back to an A query for the original name. To summarize, this type of ANY->MX->A client correctly sees whether there's a CNAME, correctly sees whether there's an MX, and (when there isn't an MX) correctly sees whether there's an A. If there's a server failure (e.g., NOTIMP) or cache failure, the client correctly defers mail delivery. A comprehensive efficiency analysis requires detailed measurements for many years at many sites, but spot checks have consistently shown that the following cases are most important: * Most common: MX in server and in cache. The ANY->MX->A client ends up generating 0 queries to the server: the cache answers ANY with MX, and answers MX with MX. For comparison, a CNAME->MX->A client would also end up sending 0 queries to the server _if_ the cache were smart enough to use the MX as a reason to deny CNAME. But typical caches aren't that smart, so this type of client would end up sending 1 query to the server. An MX->A client uninterested in CNAME would end up sending 0 queries to the server. * MX in server but not in cache. The ANY->MX->A client ends up generating 1 query to the server: the server answers ANY with MX (and typically A), and then the cache answers MX with MX. For comparison, a CNAME->MX->A client would end up generating 2 queries to the server: the server answers CNAME with no data, then answers MX with MX. An MX->A client uninterested in CNAME would end up sending 1 query to the server: the server answers MX with MX. * A in server and in cache, no MX in server: The ANY->MX->A client ends up sending 1 query to the server. The cache answers ANY with A; the server answers MX with no data; the cache answers A with A. For comparison, a CNAME->MX->A client would end up generating 2 queries to the server (or 1 if the cache is smarter): the server answers CNAME with no data; the server answers MX with no data; the cache answers A with A. An MX->A client uninterested in CNAME would end up sending 1 query to the server. The server answers MX with no data, and the cache answers A with A. * A in server, nothing in cache: The ANY->MX->A client ends up sending 2 queries to the server. The server answers ANY with A; the server answers MX with no data; the cache answers A with A. For comparison, a CNAME->MX->A client would end up generating 3 queries to the server: the server answers CNAME with no data; the server answers MX with no data; the server answers A with A. An MX
Re: [DNSOP] [dns-operations] dnsop-any-notimp violates the DNS standards
Edward Lewis writes: > Operators are not bound to comply with what the IETF documents. As I said before, this is making a mockery of the IETF standardization process. Instead of * obeying the existing mandatory standards, * giving due respect to the installed base relying on the standards, * trying to build consensus for a transition that demands action from the installed base, and * taking the slow steps necessary to maintain interoperability during a transition if any transition happens, a large operator is using its market position to violate the standards and _create_ interoperability failures as a tool to enforce a protocol change that it wants. Furthermore, a few weeks before this standards violation is going to be deployed, the stated rationale for the change is undergoing massive revisions, making serious discussion difficult and leading observers to wonder how carefully the change was thought through in the first place. If you want IETF standards to be taken seriously---if you think that the basic rules of Internet communication should be established by consensus in IETF, and not simply overridden by future developers and operators who think they know better, including cases where you _don't_ agree with them---then you have to stop endorsing standards violations. Tony Finch writes: > qmail uses ANY queries for domain canonicalization on outgoing > messages, as specified by RFC 1123. But canonicalization is not > required by the current SMTP specification. It is a waste of time. I fully agree that this was made optional in SMTP---that qmail is no longer required to do this. But how do you leap to the wild conclusion (stated twice in your message) that this is a "bug" in qmail? More importantly, why do you think this is relevant to anything that I said? I didn't say that qmail's behavior is currently required for SMTP. I said that qmail is very widely deployed and relies upon ANY queries in a way that is guaranteed to work by the mandatory DNS standards. The dnsop-any-notimp proposal ignores those standards and will create real interoperability problems with mail delivery. > Using an ANY query suppresses alias processing, so qmail makes a > series of queries to follow CNAME chains. This is inefficient and > wasteful. No, you have the efficiency picture backwards. CNAME chains have always been an extremely small fraction of the DNS queries inside mail, while the QTYPE=ANY side effect of preloading MX/A records has always produced a significantly larger reduction in DNS queries. This item is something else that you explicitly label as a "bug". You keep using that word; I do not think that word means what you think it means. > qmail's DNS response buffer is too small to accommodate a complete DNS > message, so it fails if it gets a large response. Precisely how many bytes do you believe are in "a complete DNS message"? 65535, the TCP limit? Do you seriously believe that 65535-byte responses work reliably today, and that any failure to handle such responses is a "bug"? How about 512, given the fact that the mandatory DNS standards do _not_ require TCP support? Or maybe 1280, the recommended safe size for EDNS0 UDP (depending on the network etc.)? Even in an imaginary world where 65535-byte responses work, what do you think happens if a server administrator puts _more_ than 65535 bytes of records at a single node? Do you blame the server administrator? If DNS implementors handle this use case by introducing a DNS transport capable of handling 4-gigabyte responses, will you then claim that there's a "bug" in every DNS client that has less RAM available or that simply insists on a smaller limit to control its resource use? In fact, all DNS client and server deployments have size limits on DNS responses, and these limits have always varied, making the system as a whole increasingly fragile for the unfortunate administrators pushing the limits (notably DNSSEC administrators). The only sane way out is for the protocol to declare a single reasonable size limit to be respected by all clients and servers---but this implies reengineering large DNS record types to split data across nodes (the same way that TCP splits streams across packets), and nobody seems willing to do this work. It's much easier to stuff all data into one node, pretend that the system works, and blame the administrators for all of the resulting failures. I'd be happy to discuss this issue further if anyone is interested in fixing these aspects of the DNS protocol. I would, however, suggest starting a separate thread for that, since this really isn't relevant to how dnsop-any-notimp violates the DNS standards. ---Dan ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
[DNSOP] dnsop-any-notimp violates the DNS standards
My "qmail" software is very widely deployed (on roughly 1 million SMTP server IP addresses) and, by default, relies upon ANY queries in a way that is guaranteed to work by the mandatory DNS standards. Specifically, query type ANY "matches all RR types" for that node on that server. There's an example in RFC 1034 of how a CNAME record is returned by "a type CNAME or * query". There's nothing telling clients to avoid this query type; it's perfectly valid for a client to treat a server that refuses this query type as a broken server, because that's exactly what the server is. Of course, there's no guarantee of which RR types for a node are available at a cache, but a client is guaranteed to be able to detect CNAME records from responses to query type ANY (as qmail does), since a CNAME type prohibits all regular RR types. I started using these standard ANY queries for interoperability reasons (working around a BIND bug and matching sendmail behavior at the time) but the choice continues to provide some efficiency benefits, larger than many of the efficiency benefits used as rationales in IETF protocols. In new software today I would sacrifice these efficiency benefits for the sake of simplicity, but this doesn't mean that I'm going to frivolously inflict retroactive punishment upon administrators who have installed standards-compliant software and done nothing wrong. Let's now take a look at draft-ogud-dnsop-any-notimp-00.txt with this background in mind. The I-D specifies behavior that * violates the existing mandatory DNS standards and that * breaks interoperability with this standards-compliant use of ANY. An accompanying blog post says that "in a few weeks" the author's organization will begin deploying this standards violation---with the effect of immediately damaging Internet email delivery. The blog post and ID contain a number of dubious assertions attempting to justify this change. I don't mean to suggest that protocols must never change in incompatible ways. However, modifications to IETF's standard protocols need to be handled through the established IETF procedures, with appropriate respect for the existing standards and the installed base: * First: The proposed protocol modification has to be taken to an IETF working group chartered to modify the protocol, so that stakeholders will have a proper chance to evaluate and comment on the proposal. * Second: The merits of the protocol modification have to be properly discussed in that working group, to evaluate the costs and benefits of the protocol modification---and to consider whether there are better ways to achieve the same benefits. * Third: _If_ the benefits of the modification are judged to outweigh the costs, a sunset period---in this case, a timeline for the client to stop using ANY queries---has to be specified, to avoid interoperability problems. This period has to be several years, recognizing the time required for client administrators to hear about and carry out the necessary redeployment. (It's not as if we're talking about an emergency security change.) * Fourth: After the sunset period expires, the server will be free to use the modified protocol---in this case, to refuse ANY queries. I understand how a sufficiently large site might acquire the impression that it can safely take radical action at its own whim, violating the existing protocol standards and as a result creating interoperability problems---but this is making a mockery of the IETF standardization process. This is _not_ a mere operational change within the existing protocol; it is _not_ a private extension using the standard negotiation mechanisms; it is a flagrant violation of the required DNS standards. My understanding is that dnsop@ietf.org is not chartered to make DNS protocol changes, so any discussion here will have to be repeated in an appropriate working group, but let me nevertheless comment on the two benefits claimed for having servers refuse ANY queries: * Refusal would reduce DNS amplification: This argument already seems to have been dismissed by various people, and doesn't seem to be defended. It's obviously less effective than standards-compliant approaches such as limiting UDP responses to 512 bytes. * "Attempting to handle ANY queries creates enormous complexity in our DNS server code base": This is a quite puzzling claim, especially since the specified features (load balancing, geoip, etc.) have been supported for many years by other software that has no trouble handling ANY. What exactly is the claimed difficulty in copying records from, e.g., the "A" key to the "*" key in the underlying database? Apparently Firefox recently deployed ANY queries. I haven't looked at the details but I gather that they're related to the well-known annoyances of handling etc. Firefox was browbeaten into reverting this cha