On Wed, Nov 21, 2018 at 01:53:09PM +0000, Sara Dickinson wrote: > > > > Begin forwarded message: > > > > From: Benjamin Kaduk <ka...@mit.edu <mailto:ka...@mit.edu>> > > Subject: Benjamin Kaduk's Discuss on > > draft-ietf-dnsop-dns-capture-format-08: (with DISCUSS and COMMENT) > > Date: 19 November 2018 at 00:28:19 GMT > > To: "The IESG" <i...@ietf.org <mailto:i...@ietf.org>> > > Cc: draft-ietf-dnsop-dns-capture-for...@ietf.org > > <mailto:draft-ietf-dnsop-dns-capture-for...@ietf.org>, Tim Wicinski > > <tjw.i...@gmail.com <mailto:tjw.i...@gmail.com>>, dnsop-cha...@ietf.org > > <mailto:dnsop-cha...@ietf.org>, tjw.i...@gmail.com > > <mailto:tjw.i...@gmail.com>, dnsop@ietf.org <mailto:dnsop@ietf.org> > > Resent-From: <alias-boun...@ietf.org <mailto:alias-boun...@ietf.org>> > > Resent-To: j...@sinodun.com <mailto:j...@sinodun.com>, j...@sinodun.com > > <mailto:j...@sinodun.com>, s...@sinodun.com <mailto:s...@sinodun.com>, > > terry.mander...@icann.org <mailto:terry.mander...@icann.org>, > > john.b...@icann.org <mailto:john.b...@icann.org> > > Many thanks for the detailed review. > > > > > ---------------------------------------------------------------------- > > DISCUSS: > > ---------------------------------------------------------------------- > > > > It is pretty shocking to not see any discussion of the privacy > > considerations of storing data including client addresses (and ports) > > alongside DNS transactions, given how central DNS resolution is to user > > behavior on the web. (Note that there are mentions of potentially > > anonymized data in Sections 6.2 and 6.2.3 which would presumably > > forward-reference the privacy considerations.) Data normalization would > > probably also be mentioned in this section, since (e.g.) the case used for > > a query/response could be used in fingerprinting an implementation. > > There have been extensive discussion of data storage risks and practices in > two DPRIVE documents so I’d suggest the following changes in the first > instance to address this:
This is exactly the sort of thing I was hoping to see, thank you! I have just a couple tweaks to suggest, inline. > New Privacy Considerations section: > “ Storage of DNS traffic by operators in PCAP and other formats is a long > standing and widespread practice. Section 2.5 of > draft-bortzmeyer-dprive-rfc7626-bis is an analysis of the risks to Internet > users of the storage of DNS traffic data in servers (recursive resolvers, > authoritative and rogue server). > > Section 5.2 of draft-dickinson-dprive-bcp-op describes mitigations for those > risks for data stored on recursive resolvers (but which could by extension > apply to authoritative servers). These include data handling practices and > methods for data minimisation, IP address pseudonymization and anonymization. > Appendix B of that document presents an analysis of 7 published anonymization > processes. In addition RSSAC have recently published RSSAC04: " > Recommendations on Anonymization Processes for Source IP Addresses Submitted > for Future Analysis”[1]. > > The above analyses consider full data capture (e.g using PCAP) as a > baseline for privacy considerations and therefore this format > specification introduces no new user privacy issues beyond those of full > data capture. It does provides mechanisms to selectively record only I would say "beyond those of full data capture (which are quite severe)". That is, while the current state of affairs is a valid baseline for comparison, that does not absolve us of responsibility for analyzing the current state of affairs. (To be clear, draft-bortzmeyer-dprive-rfc7626-bis is a fine place for the bulk of that anlaysis to live, but in this document we should not pretend that the current state of affairs is a good situation to be in.) > certain fields at the time of data capture to improve user privacy and to > explicitly indicate that data is sampled and or anonymised. It also > provide flags to indicate if data normalisation has been performed; data > normalisation increases user privacy by reducing the potential for > fingerprinting individuals however a trade-off is potentially reducing I think "however" would be offset by commas on both sides. > the capacity to identify attack traffic via query name signatures. > Operators should carefully consider their operational requirements and > privacy policies and SHOULD capture at source the minimum user data > required to meet their needs“ > > [1] https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf > <https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf> > > > As noted, there are a few other places we can also highlight the privacy > aspects: > > Introduction: > OLD: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in > practice for packet captures, but these file formats can contain a great deal > of additional information that is not directly pertinent to DNS traffic > analysis and thus unnecessarily increases the capture file size.” > > NEW: “The PCAP [pcap] or PCAP-NG [pcapng] formats are typically used in > practice for packet captures, but these file formats can contain a great deal > of additional information that is not directly pertinent to DNS traffic > analysis and thus unnecessarily increases the capture file size. > Additionally these tools and format typically have no filter mechanism to > selectively record only certain fields at capture time, requiring > post-processing for anonymisation or pseudonymistaion of data to protect user > privacy. > > Section 4, bullet point 2: > > OLD: “Different users will have different requirements > for data to be available for analysis. Users with minimal > requirements should not have to pay the cost of recording full > data, though this will limit the ability to perform certain > kinds of data analysis and also to reconstruct packet > captures. For example, omitting the resource records from a > Response will reduce the C-DNS file size; in principle > responses can be synthesized if there is enough context.” > > NEW: “Different operators will have different requirements > for data to be available for analysis. Operators with minimal > requirements should not have to pay the cost of recording full > data, though this will limit the ability to perform certain > kinds of data analysis and also to reconstruct packet > captures. For example, omitting the resource records from a > Response will reduce the C-DNS file size; in principle > responses can be synthesized if there is enough context. > Operators may have different policies for collecting user data > and can choose to omit or anonymise certain fields at > capture time e.g. client address." > > And yes, in both sections 6.2 and 6.2.3 add forward references to the Privacy > Considerations section > > > > > > I'm also concerned about the policy/procedure for allocating/extending the > > various bitfields and similar potential extension points in the data > > structures. Section 8 covers the major/minor versioning semantics with > > respect to new map keys and new maps, but not addition of new bits within > > existing (uint) bitmaps. Given the usage of the CDDL .bits constraint, > > it's not really clear that an IANA registry is the right tool to use, but I > > think some indication of the expected way to allocate new bits is in order, > > whether it's "a future standards-track document that updates this document" > > or otherwise. (I've noted many, but not all, instances of such bitmaps in > > my COMMENT section.) > > We are inclined to follow the lead of existing RFCs making use of CBOR, namely > * RFC8152 'CBOR Object Signing and Encryption' (July 2017) > * RFC8392 ‘CBOR Web Token (CWT)' (May 2018) and > * RFC8428 'Sensor Measurement Lists (SenML)' (Aug 2018) > and request IANA create a C-DNS registry with > subregistries with keys for each of the different maps used in C-DNS. > New entries in these subregistries would follow Expert Review as defined > in RFC8126. This appears to be the emerging usual way of dealing with > CBOR map key values, particularly integer. That sounds like a fine path forward, thanks. > > > > There are also a couple of fields whose semantics don't seem to be > > sufficiently well specified for a proposed-standard document, such as > > vlan-ids, generator-id, name-rdata, and ae-code. (I understand that some > > of them are probably only going to have locally relevant semantics, but we > > should be explicit about when that's the case.) > > Acknowledged, we’ll add references or clarifications for these (will put > details in a follow up mail that will also address your comments below). Sounds good. > > > > If I'm reading things correctly that the IP address type is inferred from > > the bytestring length, then I think we need to enforce a restriction on the > > address prefix length(s) to allow for that inference to be unambiguous > > (noting that we only have the *byte* length of the address fields at our > > disposal for disabmgituation, and not the more precise bit-length). > > Ah, the first bit of the qr-transport-flags contains a IPv4/IPv6 flag so the > address type can be explicitly determined from that if it is set but of > course there is a corner case where that field isn’t present we hadn’t > considered so we’ll have to address that. Making that field mandatory if > prefixes are used would be simplest. I guess I had forgotten about that bit in the qr-transport-flags on my first read. Making it mandatory if prefix lengths are present ought to work. -Benjamin > > > > > > > ---------------------------------------------------------------------- > > COMMENT: > > ---------------------------------------------------------------------- > > > > Section 2 > > > > Please consider using the RFC 8174 version of the BCP 14 boilerplate. > > > > Section 3 > > > > Because of these considerations, a major factor in the design of the > > format is minimal storage size of the capture files. > > > > maybe "storage and transmission"? > > > > Section 6 > > > > In Figure 2, the Query name is marked as "(q)" (only present if there is a > > query), but the running text in Section 4 (bullet 1) says that the Question > > section from the response can be used as an identifying QNAME if there is a > > response with no corresponding query. Am I misexpanding QNAME here, or is > > there a disagreement between these two parts of the text? In particular, I > > do not see a part of Figure 2 that would correspond to a Question section > > in the response, given the various "(q)"/"(r)" markings. > > > > Section 6.2.2 > > > > Messages with OPCODES known to the recording application but not > > listed in the Storage Parameters are discarded (regardless of whether > > they are malformed or not). > > > > (Do we need to say anything that the "discarded" is only w.r.t. the capture > > process, and not meant to imply that DNS queries would not get a normal > > response?) > > > > Section 6.2.4 > > > > Please consider using IPv6 examples, per > > https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ > > <https://www.iab.org/2016/11/07/iab-statement-on-ipv6/> . > > > > Section 7.2 > > > > o The column T gives the CBOR data type of the item. > > > > * U - Unsigned integer > > > > * I - Signed integer > > > > This is venturing a bit far from my normal area of expertise, but my > > understanding is that CBOR native major types are only provided for > > unsigned integer and negative integer, with "signed integer" being an > > abstraction at a slightly higher layer that needs to be managed in the > > application. Do we need to add any clarifying text here or will the > > meaning be clear to the reader? > > > > Section 7.4 > > > > Should probably forward-reference section 8 for the format version numbers' > > semantics. > > > > Section 7.4.1.1 > > > > We should we reference the IANA registries by name for any of these fields > > (e.g., opcodes, rr-types, etc.). (Also in Section 7.5.3.1, etc.) > > > > Are the storage flags going to be allocated in sequence by updating > > standards-track documents, or some other mechanism? (Is a registry > > necessary?) > > > > For the various address prefix fields, do we need to specify that the full > > addresses are stored when the corresponding prefix field is absent? > > > > Section 7.4.1.1.1 > > > > Am I parsing the "query-response-hints" text correctly to say that a bit is > > set in the bitmap if the corresponding field is recorded (if present) by > > the collecting implementation? The causality of "if the field is omitted > > the bit is unset" goes in a direction that is not what I expected. > > (Similarly for the other fields in this table.) > > > > Section 7.4.2 > > > > Do we need a reference for "promiscuous mode"? > > > > Just to check: in "server-addresses", I just infer the IP version from the > > length of the byte string? > > > > Do we need to say more about where the vlan-ids identifiers are taken from? > > > > Is the "generator-id" string intended to only be human readable? Only > > within a specific (administrative) context? > > > > Section 7.5.1 > > > > Does "earliest-time" include leap seconds? > > > > Section 7.5.3 > > > > The "ip-address" description seems to imply that very short ipv6 prefix > > lengths could cause confusion as to the address type being indicated (e.g., > > setting to 32 when no ipv4 prefix length is set, or setting to the same > > value as the ipv4 prefix length). Do we need to restrict the ipv6 prefix > > lengths to being 33 or larger? > > > > Are the "name-rdata" contents in wire format or presentation format? > > > > Section 7.5.3.2 > > > > What's the allocation policy/procedure for the remaining > > qr-transport-flags transport values? For additional bits in any/all of the > > flags fields listed here? > > > > Something of a side note, what's the mnemonic for the "sig" in > > "qr-sig-flags"? That is, what is it a signature of or over (it doesn't > > seem like it's a cryptographic signature, which may be what is confusing > > me)? > > > > For "query-rcode"/"response-rcode", should there be a reference for "OPT", > > and/or for any of the EDNS stuff in here? (The Terminology section only > > mentions using the naming from RFC 1035, that I can see.) > > > > The "mm-transport-flags" here bear a striking resemblance to the > > "qr-transport-flags" from Section 7.5.3.2; should there be a shared > > registry for their contents? (I guess the TransportFlags CDDL to some > > extent serves this function.) > > > > Section 7.7 > > > > How is the value of the "ae-code" determined? > > > > Appendix A > > > > We could perhaps apply some constraints on (e.g.) the address-prefex length > > fields to be .le the relevant lengths. > > > > Appendix C.6 > > > > Using a strong compression, > > block sizes over 10,000 query/response pairs would seem to offer > > limited improvements. > > > > nit: Using a strong compression scheme > > > > _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop