On Wed, Sep 16, 2015 at 12:05:58PM -0400, Suzanne Woolf wrote:

>  The current version is at:
> http://datatracker.ietf.org/doc/draft-ietf-dnsop-edns-client-subnet/
> <http://datatracker.ietf.org/doc/draft-ietf-dnsop-edns-client-subnet/>

I have some concerns, which I describe below.  I've attempted to offer
constructive improvements, but fell short of detailed suggestions.  My
comments are lengthy, even by my standards, so my general observations
are summarized:

1)  "The cost/benefit ratio must not divide by zero".

   -- The draft might account for situations where the zone offers no
      localized records, yet the authorities nonetheless engage in the
      protocol.  (I.e., the authorities witness user network
      information, because of whitelisting, but offer only non-diverse
      answers to any such query.  All cost; no benefit.)

   -- I motivate this issue by describing a global survey of who is
      using the protocol, and whethery they actually have need for
      CDN-services, based on active and passive enumeration of their
      zone content.  A category of such authorities, mainly
      governments, have their authorities whitelisted, but do not
      offer localized responses.  I suggest they are unlikely to
      engage in localized content, for policy reasons,

  ==> Perhaps the whitelisting process may account for this situation,
      to avoid undesired information transfer, when the authority
      objectively cannot improve or localize the user's subsequent TCP
      sessions.  (I.e., any implicit cost/benefit ratio considered by
      the draft might guard against DIVZERO.)

2) "Protocol implementations should avoid surprise".

   -- The privacy considerations might be improved, to consider
      situations where proxies are used for TCP traffic, but not UDP.
      There are numerous VPNs, web proxies, and privacy technologies
      that behave this way, and the protocol would now transmit user
      information to authorities, all while the users direct TCP
      through a proxy.  Likely this is not desired.  Perhaps the draft
      can address this, somehow.  

   -- I'm skeptical users should signal a desire for protocol
      avoidance in such situations, since this merely makes their
      sensitive data uses easier to identify.  Perhaps the best one
      might do here is counsel network operators who recommend
      protocol-enabled recursives, who formerly were viewed a mixnets
      in which to hide the non-proxied udp/53 traffic separate from
      the (vpn, http proxy) tcp traffic.

   -- To motivate this, I conducted an experiment, to induce such
      'information loss' for various TCP-only services.  This included
      some anonymity networks such as hidden services, and others.

  ==> While the draft might remain agnostic about such events (or view
      it as the responsibility of the user or 3d party tool maker),
      some discussion is warranted, to avoid surprise.

3) "Not All Queries Prognosticate Impending Real-Time TCP"

   -- There is no need, as far as I can tell, for protocol responses
      in certain query types.  E.g., PTR?, which is not necessarily or
      reliably an indicator of an immediate user TCP session.
      Likewise, not all answers preceding a "real time" user TCP
      session require balancing (or the rrtype already includes some
      implicit weighing/distance metrics, or the protocol is
      queue-based on and not real-time, e.g., MX, suggesting perhaps
      that localized responses are not needed.)

  ==> PTR, MX, and other records might be excluded in the draft, or
      the draft amended to suggest a type-specific consideration by
      the protocol implementors.  Or, perhaps improved whitelisting
      methodologies might be described.
       
In detail:

The current draft states that it "provides recommendations on when
this protocol extension should be used."

One early recommendation suggests that the "option will primarily be
used between Recursive Resolvers and Authoritative Nameservers that
are sensitive to network location issues."  (Section 5, "Overview").
I would recommend stronger wording than 'primarily' (either here or in
other places).

Following a survey of all NS one could discover (zone AXFR, NS queries
to parent and child NS, passive DNS, etc.) and also reach (via
instrumented DNS probes), it appears that edns-client-subnet ("ECS")
is widely used in situations where it provides no benefit to users.
For example, in a survey of the NS for the 'top million' domains
(however that is defined, e.g., Alexa), I can identify many
nameservers that:

  -- Are authority for a given zone, and few other zones (i.e., they
     are not likely a commercial provider, who enabled ECS by
     default).

  -- Have, according to passive dns and geo-diverse interaction with
     the authority (i.e., open recursive probes from diverse /24s)
     only one RRset to offer.  (Informally, there's only one possible
     answer in the only NS serving the zone, and no possible
     location-based service improvement.)

  -- Are not anycasted.

  -- Speak ECS to my iterative probes.  (Informally: With this, I
     presume they are whitelisted by existing ECS recursive
     implementers, either automatically or as a candidate for likely
     manual listing).

  -- Have low TTLs.  (Admittedly, that term is not well defined, and
     there's a long-term trend towards short TTLs for agility and
     expedited 'event' recovery).

  -- In many cases, are governments, hosting their own non-CDN
     infrastructure.

A note on that last item: To be clear, there's nothing wrong with
governments implementing ECS on their nameservers, and with recursives
sending them user network data.  But a survey of the 'top ECS'
implementers shows that many use ECS, despite having 'singleton
RRsets'.  (I.e., there's no evident CDN balancing of the zone, or
'benefit' to weigh against the privacy 'cost'.)  I only noted the
prevalence of this last property by accident, when reasoning about why
some zones collected user network information, but did not appear to
use it any witnessed response (either through my probes, or as
evidences via passive DNS).  

I decided to note that some are governments, merely to underscore
their status as political authorites, who for policy reasons are
unlikely to export their DNS authorities.  (I.e., if they've not
engaged commercial secondaries to date, and keep all authority
operations "in country" on on their own networks, they'd seem less
likely to do so in the future.  Thus, they do not need (and as
governments, are unlikely to want) commercial ECS-enabled secondary
services.  I do not know why some, who offer no localized answers
anyway, took the trouble to enable to the protocol (and presumably
become ECS whitelisted.)

But focusing on just the top four items noted above, I suggest that
the draft include language that recommends against/prohibits
ECS-whitelisting such 'RRset singletons'.  Apart from the 'sexy'
question of why some organizations use ECS in their otherwise simple
zones, either by accident of service contract or explicit intent, we
objectively can measure that there's no demonstrated engineering
benefit from sending user address information to authorities for
non-diverse zone records.  I.e., this is the 'DIVZERO' problem for a
cost/benefit reasoning about the protocol.

To fully understand the potential loss of privacy, I decided to try
and 'abuse' ECS myself: what can be learned about user network
origins, while at the same time providing no answer diversity?  I set
up some domains associated with the follow categories of interests:
document leaking, torrent use, bitcoin, and tor hidden service use.  I
had the authorities for the associated domains ECS whitelisted, and
witnessed the following from the authority logs:

   -- Because of DNS prefetching, I now see (at a /24 level) the IPs
      of hosts visiting specific pages on sites such as Wikileaks
      forums, various torrent sites, and some tor hidden services.
      This observation even holds after controling for zone scraping,
      hosts that appear to just resolve everything, academic and
      industry researchers, and the normal "DNS radiation" that comes
      from NS delegation.  

      And because the hosts also had associated http content, I could
      eliminate ECS data that came as a result of attempted http
      scraping (e.g., web crawlers/indexers) or actual user visits.

      That is, I can craft an ECS-lit zone to observe when users visit
      a 3d party forum or exhibit an interest in a given topic, even
      if they don't visit my site (due to DNS prefetching).

   -- I can in some instances identify (again, only at a /24 level)
      networks hosting Tor hidden services.  There is much detail to
      explain here, but in short, many privacy networks, VPN operators
      and "how to" pages generally instruct users to use Google or
      OpenDNS, as a type of informal 'Chaum-mixnet'---essentially, a
      large 'haystack' proxy, where the attributable origins of
      queries can be hidden among a much larger pool of other user
      traffic.
  
      While not a perfect mitigation of "DNS leaks" that challenge
      privacy networks, the "just use Google/OpenDNS" practice remains
      very common, particularly among commodity and low-end VPN
      providers (but also among implementations such as ttdns, tails,
      etc.).  A web search for the IPs for the public recursives and
      the term 'proxy' or such, will literally turn up hundreds of
      sites, howtos, and tools used by multitudes.  The average use
      case of such tools is not always evident, but the
      torrent-related ECS zones saw considerable quantities of udp/53
      traffic (with ECS payloads) from users who otherwise proxies
      their TCP traffic.

      The original advice to "just use Goolge or OpenDNS" was
      presumably sound, or at least a passable version of a mixnet,
      for such privacy networks.  However, the recursives identified
      as "DNS Chaum mixnets" by these proxy services are now ECS
      implementers, and therefore permit fine-grained tracking of
      queries.  I believe this would surprise most Tor and VPN users,
      since the ability to attribute traffic becomes easy, and the
      impact of DNS leak goes from "potentially harmful" to
      "attributable".  This is very true in situations where the
      origin network has a low query volume, the subject zone has a
      low TTL, and the privacy tool treats TCP and UDP paths
      separately.  It seems with only rare exception, the implementers
      of these privacy tools have historically recommended recursives
      that are now ECS implementer.

      The current draft contemplates some loss of user privacy, and I
      thank the authors for giving this space for consideration.  But
      the analysis now seems limited, in light of this exercise, and
      excludes the scenarios I tested (low-end VPN privacy leaks via
      ECS, revelation of Tor hidden services, enriched collection of
      user network data in non-UDP proxy privacy tools).

      The current draft makes the observation (quite reasonably IMHO),
      that the /24 network information is hardly a significant loss
      for a user about to make a TCP connection (and thereby revealing
      a /32) to a service.  I.e., an operator will see user host data
      in the httpd logs, so revealing the stub's /24 via DNS is hardly
      a significant cost, and it permits judicious improvements in
      services for the user and subsequent queries handled in cache.
      All this seems logical, and originally I found this very
      persuasive.

      But the draft does not contemplate DNS resolutions in support of
      privacy-sensitive applications, such as (what I call) the
      "low-end" VPN services (namely those that don't proxy both TCP
      and UDP traffic, and therefore have a half-decades' worth of
      users still relying on current ECS implementers to continue to
      act as Chaum mixnets for DNS.)

      I suggest the draft should address such privacy sensitive
      scenarios though I'm at a loss to know how a user might signal
      to a recursive that it need recursion for privacy-sensitive
      applications.  I also don't think such signaling would be
      wise---it merely illuminates such queries for any opportunistic
      measurement.  Perhaps at the least, the draft might suggest that
      local operators, tool makers, and privacy-based software systems
      consider this issue.  In some cases, such tools leak private
      user data; in other cases, the udp traffic is also proxied,
      resulting in the revelation of the proxy nodes' network
      information.

      I fully expect that such proxy users don't expect "speedy"
      mirror selection, or other benefits offered by the draft.  Such
      users have opted already for privacy over convenience.  I do not
      know how the current draft can preserve that bargain, or even
      communicate that preference.  This seems troubling, now that
      recursives which formerly operated as mix nets are now offering
      the user /24 data, in exchange for no benefit (due to the
      proxied TCP flow).

      This dilemma (not merely speculated, but actually happening to
      such tool users), and my inability to find a creative solution,
      are ultimately why I've embraced Stephane Bortzmeyer's
      suggestion that the protocol be made opt-in rather than opt-out.

      This might not be black and white.  Perhaps this means that some
      curated user device platforms (e.g., Android) opt-in to a new
      ECS-enabled service, or something.  While the draft can avoid
      such details, I urge that it either find the creative solution
      that eludes me (how to signal privacy preference, for existing
      users expecting a mixnet?), or that it counsel careful study by
      recursive adopters.  I suggest: Do not surprise the people who
      have, for 5+ years, assumed (wisely or otherwise) that the
      recursive is a mixnet.  It might not have been, but it clearly
      is not, if it speaks the protocol.

   -- There is no protocol-level indication to a stub user which
      recursives are ECS enabled.  I believe one has to repeat my
      steps, and read blogs to find that Google and OpenDNS have
      adopted this protocol.  Perhaps other recursives are ECS enabled
      as well.

      But without fail, everyone I've shown ECS has been
      surprised---some even urgently ending meetings to check their
      infrastructure for 'data leaks' (their words, not mine).  This
      has acutely impressed those DNS users who try to identify
      abusive domains, and find it useful to resolve domains through a
      large service like Google or OpenDNS (again, using them as an
      informal Chaum-mixnet for recursion), simply because malicious
      authorities often attempt to identify such researchers and
      corrupt or ban their recursive traffic.

      I suggest that the draft seek to minimize this surprise, through
      some sort of discovery.  (How can users meaningfully opt-out, if
      they cannot easily test for ECS-enabled recursion.)  I suggest
      that recursives speaking ECS exhibit this behavior to the stubs.
      Others have suggested such a record:

        _edns-client-subnet.${HOST}.in-addr.arpa IN TXT "v=ecs1 optin"

      which could be DNSSEC signed as well.  As a non-repentive abuser
      of TXT records, I'm usually cautious of serious applications
      using TXT policy encodings.  But others I've spoken to have
      given this aspect more serious thought.  Perhaps they'll suggest
      something detailed enough to both convince and assist the
      authors.

      In short: if opt-out is to become more meaningful, users need
      paths beyond web logs to discover ECS behavior.

   -- If opt-out is the majority view of either this group or those
      implementing ECS, then I suggest that opt-out tools be made
      widely available.  If I'm correct, there's currently only a
      development branch of unbound that supports a zero-scoped
      forward from the stub.  Suggesting that users can 'opt-out',
      when there are few tools, seems prematurely hopeful.
      (Informally, we call this the "go pound sand" clause of the
      proposal, since users must potentially write their own stub
      handler, or switch production resolution to development branches
      of code.  I've never found a user educated about ECS to be
      satisified with this suggestion of the draft.)

      I should note that opt-out is in fact honored by all ECS-enabled
      recursives I can find, and I can find no violations of requested
      stub-to-recursive scope restrictions.

      At a minimum, I suggest we learn how users opt-out, and whether
      the opt-out process needs refinement in this draft.  (I.e., is
      this merely an engineering/tools exercise, or could the IETF
      draft help uphold user privacy through more direct means in this
      draft?)  I simply do not know the answer here, beyond the
      (unanimous) request from users I've chatted with to somehow
      opt-out, while still retaining the other benefits of the global
      recursives.  Please note that in some cases, the stub's path to
      an ECS-enabled recursive is part of a firmware image (e.g., most
      of the wifit hotspot devices), which even the hobbyist users
      find difficult to amend or filter for opt-out.

   -- The ECS draft might be amended to exclude certain query types.
      As another experiment, I obtained a reverse delegation for a
      netblock, and had the authority ECS whitelisted.  I then could
      observe reverse scanning by hosts---clearly a needless
      solicitation of user network data, for an answer that will never
      be tempered by localized mirror selection.

      I hope others will agree, there is no use for these ECS
      augmented PTR answers beyond logging on the remote host
      side---there's no TCP session to be balanced, no CDN to
      consider, no mirror to select.  There's no point in ECS in PTR?,
      that I can see.  The same may apply to MX and many other record
      types I've not deeply considered.  Informally, they provide no
      'CDN benefit' in exchange for the 'cost' of user privacy, or
      they portend only queue-based TCP slows (for which mirror
      selection and localization is less urgent), or the rdata
      definition includes 'weights' or distance metrics (e.g., MX),
      making ECS redundant.

      I suggest the draft limit the use of ECS to those query types
      likely to benefit from global balacing, where possible.  Absent
      an exhuastive listing of types, the draft might discourage the
      protocol use (i.e., eitherq enabled by authorities or honored by
      recursives) where the answer cannot reasonably benefit from
      localized rdata.

On a final note, I'd like to thank the authors of the draft for their
patient and thoughtful consideration.  In the course of this inquiry,
I've come to understand the operational benefits they seek to provide
users, and how it will make the Internet a better experience.  My
research focus is on networks and operators that seek to do literally
the opposite.  I hope that the caution I've suggested above does not
in anyway diminish anyone's estimation of the proponent's efforts.

I apologize in advance for all the errors in this post.  Time is
short, and I believe I needed to say something on this topic.  We have
prepared a detailed study of this problem, describing the above tests
and experiments.  I can put others touch with the student author, if
they need more details.  The paper is under consideration in academic
contexts, which requires anonymous submission.  (And ironically, many
of the DNS authorities for these privacy conferences collect user ECS
data, evidently without beneficial purpose.)

-- 
David Dagon
da...@sudo.sh
D970 6D9E E500 E877 B1E3  D3F8 5937 48DC 0FDC E717

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to