On Tue, Sep 29, 2015 at 04:26:38PM -0400, Dave Lawrence wrote: > David Dagon writes: > > I have some concerns, which I describe below. [...] > > David, > > Thank you very much for your thoughtful comments. Broadly speaking, I > very much agree with the bulk of them. Yet my current reaction is not > to make any more alterations to the existing document. It describes > the deployed protocol as-is, and your comments are appropriate for > consideration for the revised protocol, where I can assure you they > will definitely be integrated. > > Is there something specific about documenting (yet not endorsing) the > in-use protocol that you think is important to get into the document > before publication?
I'm preparing more notes, but wanted to offer more observation: 1) Testing Sundown? -- Many authorities still answer edns-client-subnet iteration, using the draft/testing option code (0x50FA, instead of the assigned 0x0008). -- Some return some appropriate rfc 1035 RCODE error for 0x50FA encoded queries. -- Some answer 0x50FA-typed queries with 0x0008 answers. (This was a surprise). I wonder if the document you're working on would need to comment on this practice. Some response patterns seem logical (e.g., RCODE=1 Format Error under 1035 s4.1.1), in response to 0x50FA option coded queries. Some are merely helpful (e.g., still answering test option coded queries, even after there's an IANA assigned field). Other behaviors seem helpful for very early testing, but are perhaps not a useful status quo and might be discouraged, e.g., returning 0x0008 in response to queries with option 0x50FA, since this raises anti-poisoning questions at the recursive. (Is query tuple matching at the recursive to additionally include the option code? If not, that doubles the probability of success for attack.) Perhaps if there are authority implementors on list, they can clarify the thinking here? (I'd be particularly interested in those zones who formerly answered 0x50FA, and now issue FormError or similar responses. That change denotes some re-evaluation, or maybe a new tool.) I'll have some stats on this shortly, if there's interest. 2) Probe Delay for Authority Behavior? I either don't understand or am not convinced by the draft's discussion of a possible probe delay for testing ECS behavior in authorities. Here's my current thinking: A naive in-line implementation of probes would of course incur delay when iterating to an authority for which a recursive has no cache evidence of ECS. But surely all recursive implementations have done other out-of-query-band testing of authorities for ECS behavior, at least from what I can determine from my logs. (Indeed, some are still manual.) Section 12.1 does note the need for periodic probing. I'm not clear why section 12.2 notes a "possible query loss/delay" for such probes. I speculate: in the worst case, wouldn't a busy recursive just provide a stock zone answer, without subnet localization? I speculate that, in the worst case, the first query for a novel zone results in this non-localized answer (sorry; no ECS for novel NS/novel zones; just plain vanilla 1034), but after the recursive validates ECS awareness (either out-of-band, or through manual whitelisting), subsequent queries become subnet aware. Worst case, if the whitelisting and/or periodic probing contemplated by S.12.2 were a linear scale of the TTL for the NS record (or the default for the zone), then even naive, in-line querying for ECS would be able to limit "loss/delay" to once-per-TTL expiration. And again, the recursive could avoid this, by simply not returning an ECS-endowed message, falling back to stock 1034 instead of failure. So I'm afraid I do not understand "loss/delay" discussion in the document. Granted, it's probably there to motivate the need for whitelisting. But I focus on this, because I'd like to understand (and hopefully avoid) any language that diminishes the operational value or potential for adding probe records such as this to any ECS-aware zone: _edns-client-subnet.${HOST}.in-addr.arpa IN TXT "v=ecs1 optin" This is operationally not done, AFAIK. But if it were (and also only honored in response to 0x0008 typed queries from the recursive), or in some similar form, it would become evident to the stubs---the first evidence they'd have a both recursive and authority treatment of ECS. If there are more complexities in NS ECS status maintenance, I'd like to better understand them. There are only two implementors of the protocol, AFAIK, so perhaps someone can help? I'm still digesting the rest of the document, and running tests. It's well written, and helpfully annotated. I'm just a bit slow in this process. I will endeavor in the time that remains for this IETF review to identify more comments about the draft, which documents current practices. My general sense, summarized in my earlier post, is that this protocol is a significant change due to the re-injection of user metadata, has/will cause user surprise (I use that word descritively, based on experience), affected proxies/vpns and hidden services, and could be better detailed in some parts (e.g., no encoding for PTR?, MX?, discussing FORMERR behavior for 0x50FA type queries, etc.). But I'm also aware that global recursive operators can point to a competitive need for mirror localization. In short, "interesting times". -- David Dagon da...@sudo.sh D970 6D9E E500 E877 B1E3 D3F8 5937 48DC 0FDC E717 _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop