On Tue, Sep 29, 2015 at 04:26:38PM -0400, Dave Lawrence wrote:
> David Dagon writes:
> > I have some concerns, which I describe below. [...]
> 
> David,
> 
> Thank you very much for your thoughtful comments.  Broadly speaking, I
> very much agree with the bulk of them.  Yet my current reaction is not
> to make any more alterations to the existing document.  It describes
> the deployed protocol as-is, and your comments are appropriate for
> consideration for the revised protocol, where I can assure you they
> will definitely be integrated.
> 
> Is there something specific about documenting (yet not endorsing) the
> in-use protocol that you think is important to get into the document
> before publication?

I'm preparing more notes, but wanted to offer more observation:

1)  Testing Sundown?

      -- Many authorities still answer edns-client-subnet iteration, using
         the draft/testing option code (0x50FA, instead of the assigned
         0x0008).

      -- Some return some appropriate rfc 1035 RCODE error for 0x50FA
         encoded queries.

      -- Some answer 0x50FA-typed queries with 0x0008 answers.  (This was
         a surprise).

    I wonder if the document you're working on would need to comment
    on this practice.  Some response patterns seem logical (e.g.,
    RCODE=1 Format Error under 1035 s4.1.1), in response to 0x50FA
    option coded queries.  Some are merely helpful (e.g., still
    answering test option coded queries, even after there's an IANA
    assigned field).

    Other behaviors seem helpful for very early testing, but are
    perhaps not a useful status quo and might be discouraged, e.g.,
    returning 0x0008 in response to queries with option 0x50FA, since
    this raises anti-poisoning questions at the recursive.  (Is query
    tuple matching at the recursive to additionally include the option
    code?  If not, that doubles the probability of success for
    attack.)

    Perhaps if there are authority implementors on list, they can
    clarify the thinking here?  (I'd be particularly interested in
    those zones who formerly answered 0x50FA, and now issue FormError
    or similar responses.  That change denotes some re-evaluation, or
    maybe a new tool.)

    I'll have some stats on this shortly, if there's interest.


2)  Probe Delay for Authority Behavior?

    I either don't understand or am not convinced by the draft's
    discussion of a possible probe delay for testing ECS behavior in
    authorities.  Here's my current thinking: A naive in-line
    implementation of probes would of course incur delay when
    iterating to an authority for which a recursive has no cache
    evidence of ECS.  But surely all recursive implementations have
    done other out-of-query-band testing of authorities for ECS
    behavior, at least from what I can determine from my logs.
    (Indeed, some are still manual.)

    Section 12.1 does note the need for periodic probing.  I'm not
    clear why section 12.2 notes a "possible query loss/delay" for
    such probes.  I speculate: in the worst case, wouldn't a busy
    recursive just provide a stock zone answer, without subnet
    localization?  I speculate that, in the worst case, the first
    query for a novel zone results in this non-localized answer
    (sorry; no ECS for novel NS/novel zones; just plain vanilla 1034),
    but after the recursive validates ECS awareness (either
    out-of-band, or through manual whitelisting), subsequent queries
    become subnet aware.  

    Worst case, if the whitelisting and/or periodic probing
    contemplated by S.12.2 were a linear scale of the TTL for the NS
    record (or the default for the zone), then even naive, in-line
    querying for ECS would be able to limit "loss/delay" to
    once-per-TTL expiration.  And again, the recursive could avoid
    this, by simply not returning an ECS-endowed message, falling back
    to stock 1034 instead of failure.

    So I'm afraid I do not understand "loss/delay" discussion in the
    document.  Granted, it's probably there to motivate the need for
    whitelisting.  But I focus on this, because I'd like to understand
    (and hopefully avoid) any language that diminishes the operational
    value or potential for adding probe records such as this to any
    ECS-aware zone:

       _edns-client-subnet.${HOST}.in-addr.arpa IN TXT "v=ecs1 optin"

    This is operationally not done, AFAIK.  But if it were (and also
    only honored in response to 0x0008 typed queries from the
    recursive), or in some similar form, it would become evident to
    the stubs---the first evidence they'd have a both recursive and
    authority treatment of ECS.  If there are more complexities in NS
    ECS status maintenance, I'd like to better understand them.  There
    are only two implementors of the protocol, AFAIK, so perhaps
    someone can help?

I'm still digesting the rest of the document, and running tests.  It's
well written, and helpfully annotated.  I'm just a bit slow in this
process.

I will endeavor in the time that remains for this IETF review to
identify more comments about the draft, which documents current
practices.

My general sense, summarized in my earlier post, is that this protocol
is a significant change due to the re-injection of user metadata,
has/will cause user surprise (I use that word descritively, based on
experience), affected proxies/vpns and hidden services, and could be
better detailed in some parts (e.g., no encoding for PTR?, MX?,
discussing FORMERR behavior for 0x50FA type queries, etc.).  

But I'm also aware that global recursive operators can point to a
competitive need for mirror localization.  In short, "interesting
times".

-- 
David Dagon
da...@sudo.sh
D970 6D9E E500 E877 B1E3  D3F8 5937 48DC 0FDC E717

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to