On Wed, Nov 21, 2018 at 01:53:09PM +0000, Sara Dickinson wrote:
> 
> 
> > Begin forwarded message:
> > 
> > From: Benjamin Kaduk <ka...@mit.edu <mailto:ka...@mit.edu>>
> > Subject: Benjamin Kaduk's Discuss on 
> > draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT)
> > Date: 19 November 2018 at 00:28:19 GMT
> > To: "The IESG" <i...@ietf.org <mailto:i...@ietf.org>>
> > Cc: draft-ietf-dnsop-dns-capture-for...@ietf.org 
> > <mailto:draft-ietf-dnsop-dns-capture-for...@ietf.org>, Tim Wicinski 
> > <tjw.i...@gmail.com <mailto:tjw.i...@gmail.com>>, dnsop-cha...@ietf.org 
> > <mailto:dnsop-cha...@ietf.org>, tjw.i...@gmail.com 
> > <mailto:tjw.i...@gmail.com>, dnsop@ietf.org <mailto:dnsop@ietf.org>
> > Resent-From: <alias-boun...@ietf.org <mailto:alias-boun...@ietf.org>>
> > Resent-To: j...@sinodun.com <mailto:j...@sinodun.com>, j...@sinodun.com 
> > <mailto:j...@sinodun.com>, s...@sinodun.com <mailto:s...@sinodun.com>, 
> > terry.mander...@icann.org <mailto:terry.mander...@icann.org>, 
> > john.b...@icann.org <mailto:john.b...@icann.org>
> 
> Many thanks for the detailed review. 
> 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > It is pretty shocking to not see any discussion of the privacy
> > considerations of storing data including client addresses (and ports)
> > alongside DNS transactions, given how central DNS resolution is to user
> > behavior on the web.  (Note that there are mentions of potentially
> > anonymized data in Sections 6.2 and 6.2.3 which would presumably
> > forward-reference the privacy considerations.)  Data normalization would
> > probably also be mentioned in this section, since (e.g.) the case used for
> > a query/response could be used in fingerprinting an implementation.
> 
> There have been extensive discussion of data storage risks and practices in 
> two DPRIVE documents so I’d suggest the following changes in the first 
> instance to address this:

This is exactly the sort of thing I was hoping to see, thank you!  I have
just a couple tweaks to suggest, inline.

> New Privacy Considerations section:
> “ Storage of DNS traffic by operators in PCAP and other formats is a long 
> standing and widespread practice. Section 2.5 of 
> draft-bortzmeyer-dprive-rfc7626-bis is an analysis of the risks to Internet 
> users of the storage of DNS traffic data in servers (recursive resolvers, 
> authoritative and rogue server). 
> 
> Section 5.2 of draft-dickinson-dprive-bcp-op describes mitigations for those 
> risks for data stored on recursive resolvers (but which could by extension 
> apply to authoritative servers). These include data handling practices and 
> methods for data minimisation, IP address pseudonymization and anonymization. 
> Appendix B of that document presents an analysis of 7 published anonymization 
> processes. In addition RSSAC have recently published RSSAC04: " 
> Recommendations on Anonymization Processes for Source IP Addresses Submitted 
> for Future Analysis”[1].
> 
> The above analyses consider full data capture (e.g using PCAP) as a
> baseline for privacy considerations and therefore this format
> specification introduces no new user privacy issues beyond those of full
> data capture. It does provides mechanisms to selectively record only

I would say "beyond those of full data capture (which are quite severe)".
That is, while the current state of affairs is a valid baseline for
comparison, that does not absolve us of responsibility for analyzing the
current state of affairs.  (To be clear,
draft-bortzmeyer-dprive-rfc7626-bis is a fine place for the bulk of that
anlaysis to live, but in this document we should not pretend that the
current state of affairs is a good situation to be in.)

> certain fields at the time of data capture to improve user privacy and to
> explicitly indicate that data is sampled and or anonymised. It also
> provide flags to indicate if data normalisation has been performed; data
> normalisation increases user privacy by reducing the potential for
> fingerprinting individuals however a trade-off is potentially reducing

I think "however" would be offset by commas on both sides.

> the capacity to identify attack traffic via query name signatures.
> Operators should carefully consider their operational requirements and
> privacy policies and SHOULD capture at source the minimum user data
> required to meet their needs“
> 
> [1] https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf 
> <https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf>
> 
> 
> As noted, there are a few other places we can also highlight the privacy 
> aspects:
> 
> Introduction:
> OLD: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in 
> practice for packet captures, but these file formats can contain a great deal 
> of additional  information that is not directly pertinent to DNS traffic 
> analysis  and thus unnecessarily increases the capture file size.”
> 
> NEW: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in 
> practice for packet captures, but these file formats can contain a great deal 
> of additional  information that is not directly pertinent to DNS traffic 
> analysis  and thus unnecessarily increases the capture file size. 
> Additionally these tools and format typically have no filter mechanism to 
> selectively record only certain fields at capture time, requiring 
> post-processing for anonymisation or pseudonymistaion of data to protect user 
> privacy.
> 
> Section 4, bullet point 2:
> 
> OLD: “Different users will have different requirements
>           for data to be available for analysis.  Users with minimal
>           requirements should not have to pay the cost of recording full
>           data, though this will limit the ability to perform certain
>           kinds of data analysis and also to reconstruct packet
>           captures.  For example, omitting the resource records from a
>           Response will reduce the C-DNS file size; in principle
>           responses can be synthesized if there is enough context.”
> 
> NEW: “Different operators will have different requirements
>           for data to be available for analysis.  Operators with minimal
>           requirements should not have to pay the cost of recording full
>           data, though this will limit the ability to perform certain
>           kinds of data analysis and also to reconstruct packet
>           captures.  For example, omitting the resource records from a
>           Response will reduce the C-DNS file size; in principle
>           responses can be synthesized if there is enough context.
>           Operators may have different policies for collecting user data
>           and can choose to omit or anonymise certain fields at
>          capture time e.g. client address."
> 
> And yes, in both sections 6.2 and 6.2.3 add forward references to the Privacy 
> Considerations section
> 
> 
> > 
> > I'm also concerned about the policy/procedure for allocating/extending the
> > various bitfields and similar potential extension points in the data
> > structures.  Section 8 covers the major/minor versioning semantics with
> > respect to new map keys and new maps, but not addition of new bits within
> > existing (uint) bitmaps.  Given the usage of the CDDL .bits constraint,
> > it's not really clear that an IANA registry is the right tool to use, but I
> > think some indication of the expected way to allocate new bits is in order,
> > whether it's "a future standards-track document that updates this document"
> > or otherwise.  (I've noted many, but not all, instances of such bitmaps in
> > my COMMENT section.)
> 
> We are inclined to follow the lead of existing RFCs making use of CBOR, namely
> * RFC8152 'CBOR Object Signing and Encryption' (July 2017)
> * RFC8392 ‘CBOR Web Token (CWT)' (May 2018) and 
> * RFC8428 'Sensor Measurement Lists (SenML)' (Aug 2018) 
> and request IANA create a C-DNS registry with
> subregistries with keys for each of the different maps used in C-DNS.
> New entries in these subregistries would follow Expert Review as defined
> in RFC8126. This appears to be the emerging usual way of dealing with
> CBOR map key values, particularly integer.

That sounds like a fine path forward, thanks.

> > 
> > There are also a couple of fields whose semantics don't seem to be
> > sufficiently well specified for a proposed-standard document, such as
> > vlan-ids, generator-id, name-rdata, and ae-code.  (I understand that some
> > of them are probably only going to have locally relevant semantics, but we
> > should be explicit about when that's the case.)
> 
> Acknowledged, we’ll add references or clarifications for these (will put 
> details in a follow up mail that will also address your comments below).

Sounds good.

> > 
> > If I'm reading things correctly that the IP address type is inferred from
> > the bytestring length, then I think we need to enforce a restriction on the
> > address prefix length(s) to allow for that inference to be unambiguous
> > (noting that we only have the *byte* length of the address fields at our
> > disposal for disabmgituation, and not the more precise bit-length).
> 
> Ah, the first bit of the qr-transport-flags contains a IPv4/IPv6 flag so the 
> address type can be explicitly determined from that if it is set but of 
> course there is a corner case where that field isn’t present we hadn’t 
> considered so we’ll have to address that. Making that field mandatory if 
> prefixes are used would be simplest. 

I guess I had forgotten about that bit in the qr-transport-flags on my
first read.  Making it mandatory if prefix lengths are present ought to
work.

-Benjamin

> 
> > 
> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > Section 2
> > 
> > Please consider using the RFC 8174 version of the BCP 14 boilerplate.
> > 
> > Section 3
> > 
> >   Because of these considerations, a major factor in the design of the
> >   format is minimal storage size of the capture files.
> > 
> > maybe "storage and transmission"?
> > 
> > Section 6
> > 
> > In Figure 2, the Query name is marked as "(q)" (only present if there is a
> > query), but the running text in Section 4 (bullet 1) says that the Question
> > section from the response can be used as an identifying QNAME if there is a
> > response with no corresponding query.  Am I misexpanding QNAME here, or is
> > there a disagreement between these two parts of the text?  In particular, I
> > do not see a part of Figure 2 that would correspond to a Question section
> > in the response, given the various "(q)"/"(r)" markings.
> > 
> > Section 6.2.2
> > 
> >   Messages with OPCODES known to the recording application but not
> >   listed in the Storage Parameters are discarded (regardless of whether
> >   they are malformed or not).
> > 
> > (Do we need to say anything that the "discarded" is only w.r.t. the capture
> > process, and not meant to imply that DNS queries would not get a normal
> > response?)
> > 
> > Section 6.2.4
> > 
> > Please consider using IPv6 examples, per
> > https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ 
> > <https://www.iab.org/2016/11/07/iab-statement-on-ipv6/> .
> > 
> > Section 7.2
> > 
> >   o  The column T gives the CBOR data type of the item.
> > 
> >      *  U - Unsigned integer
> > 
> >      *  I - Signed integer
> > 
> > This is venturing a bit far from my normal area of expertise, but my
> > understanding is that CBOR native major types are only provided for
> > unsigned integer and negative integer, with "signed integer" being an
> > abstraction at a slightly higher layer that needs to be managed in the
> > application.  Do we need to add any clarifying text here or will the
> > meaning be clear to the reader?
> > 
> > Section 7.4
> > 
> > Should probably forward-reference section 8 for the format version numbers'
> > semantics.
> > 
> > Section 7.4.1.1
> > 
> > We should we reference the IANA registries by name for any of these fields
> > (e.g., opcodes, rr-types, etc.).  (Also in Section 7.5.3.1, etc.)
> > 
> > Are the storage flags going to be allocated in sequence by updating
> > standards-track documents, or some other mechanism?  (Is a registry
> > necessary?)
> > 
> > For the various address prefix fields, do we need to specify that the full
> > addresses are stored when the corresponding prefix field is absent?
> > 
> > Section 7.4.1.1.1
> > 
> > Am I parsing the "query-response-hints" text correctly to say that a bit is
> > set in the bitmap if the corresponding field is recorded (if present) by
> > the collecting implementation?  The causality of "if the field is omitted
> > the bit is unset" goes in a direction that is not what I expected.
> > (Similarly for the other fields in this table.)
> > 
> > Section 7.4.2
> > 
> > Do we need a reference for "promiscuous mode"?
> > 
> > Just to check: in "server-addresses", I just infer the IP version from the
> > length of the byte string?
> > 
> > Do we need to say more about where the vlan-ids identifiers are taken from?
> > 
> > Is the "generator-id" string intended to only be human readable?  Only
> > within a specific (administrative) context?
> > 
> > Section 7.5.1
> > 
> > Does "earliest-time" include leap seconds?
> > 
> > Section 7.5.3
> > 
> > The "ip-address" description seems to imply that very short ipv6 prefix
> > lengths could cause confusion as to the address type being indicated (e.g.,
> > setting to 32 when no ipv4 prefix length is set, or setting to the same
> > value as the ipv4 prefix length).  Do we need to restrict the ipv6 prefix
> > lengths to being 33 or larger?
> > 
> > Are the "name-rdata" contents in wire format or presentation format?
> > 
> > Section 7.5.3.2
> > 
> > What's the allocation policy/procedure for the remaining
> > qr-transport-flags transport values?  For additional bits in any/all of the
> > flags fields listed here?
> > 
> > Something of a side note, what's the mnemonic for the "sig" in
> > "qr-sig-flags"?  That is, what is it a signature of or over (it doesn't
> > seem like it's a cryptographic signature, which may be what is confusing
> > me)?
> > 
> > For "query-rcode"/"response-rcode", should there be a reference for "OPT",
> > and/or for any of the EDNS stuff in here?  (The Terminology section only
> > mentions using the naming from RFC 1035, that I can see.)
> > 
> > The "mm-transport-flags" here bear a striking resemblance to the
> > "qr-transport-flags" from Section 7.5.3.2; should there be a shared
> > registry for their contents?  (I guess the TransportFlags CDDL to some
> > extent serves this function.)
> > 
> > Section 7.7
> > 
> > How is the value of the "ae-code" determined?
> > 
> > Appendix A
> > 
> > We could perhaps apply some constraints on (e.g.) the address-prefex length
> > fields to be .le the relevant lengths.
> > 
> > Appendix C.6
> > 
> >                                           Using a strong compression,
> >   block sizes over 10,000 query/response pairs would seem to offer
> >   limited improvements.
> > 
> > nit: Using a strong compression scheme
> > 
> > 

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to