Apologies for my sending this after the deadline. I hope the comments
are still usable...
Review of: Privacy Considerations for Internet Protocols
I-D: draft-iab-privacy-considerations-07.txt
Reviewed by: D. Crocker
Review date: 14 March 2013
Summary:
The document provides a broad introduction to the needs, nature and
details of adding privacy considerations to IETF specifications.
Broadly, it is divided into introduction, terminology, generic
exposure/analysis model, threats, mitigations, and analysis guidelines.
The document is generally well-organized and written clearly. An
example analysis is provided that concretely demonstrates the approach
to doing a considerations analysis; it was intentionally chosen as a
difficult case, with inherent tradeoffs between privacy and required
functionality.
As an introduction to the topic, the document is accessible and
practical.
A glaring deficiency of the document is its conscious choice to
refrain from defining the term 'privacy'. The choice is understandable,
a given long, messy and varying real-world history with term. However
the reader is left with having to formulate their own -- possibly
unvoiced and therefore entirely ambiguous -- working definition. For
doing the technical worked needed in a specification, this simply does
not give the reader the linchpin to the topic, needed to anchor their
understanding in a way that will be consistent across authors and
readers of specifications. The draft needs to choose a definition, in
spite of the fact that other groups, people and contexts will use other
definitions. We do specifications and this starts with definitions. It
simply makes no sense to be missing a definition for the key word.
By way of priming that pump, I'll proffer the simplest definition that
seems plausible here:
Privacy is the concern for protecting information
of or about an individual person.
Tweak this or replace it entirely, but /please/ provide a concrete,
pragmatic definition that explicitly defines what is in scope and what
is out, for them to focus their considerations on.
Also, given the challenges of this topic and the desire to get useful
privacy considerations into IETF work, I suggest creating a privacy
directorate, which can be asked to assist authors and review their work.
Think of it as a topic-specific mentoring group...
Except for the requirement to define its motivating term, the draft is
usable in its current form, although a number of specific improvements
cited in the detailed comments are recommended.
Detailed Comments:
The following comments are left raw, written as I read the draft...
Abstract
This document offers guidance for developing privacy considerations
for inclusion in protocol specifications. It aims to make protocol
designers aware of privacy-related design choices. It suggests that
whether any individual RFC warrants a specific privacy considerations
section will depend on the document's content.
Given the degree of ambiguity in the word 'privacy' -- since there is
such a wide range of definitions people assign it, as noted in the
second paragraph of the Introduction -- the Abstract needs to provide a
summary of its definition here, so that the reader can understand the
focus and scope of the term's use in this document. The definitional
text needs to refrain from using the word 'privacy' as part of the
definition...
1. Introduction
[RFC3552] provides detailed guidance to protocol designers about both
how to consider security as part of protocol design and how to inform
readers of protocol specifications about security issues. This
document intends to provide a similar set of guidance for considering
privacy in protocol design.
Privacy is a complicated concept with a rich history that spans many
disciplines. With regard to data, often it is a concept applied to
"With regard to data" implies that it could be with regard to something
else. What?
"personal data," information relating to an identified or
identifiable individual. Many sets of privacy principles and privacy
design frameworks have been developed in different forums over the
years. These include the Fair Information Practices [FIPs], a
baseline set of privacy protections pertaining to the collection and
use of personal data (often based on the principles established in
[OECD], for example), and the Privacy by Design concept, which
provides high-level privacy guidance for systems design (see [PbD]
for one example). The guidance provided in this document is inspired
by this prior work, but it aims to be more concrete, pointing
protocol designers to specific engineering choices that can impact
the privacy of the individuals that make use of Internet protocols.
Different people have radically different conceptions of what privacy
means, both in general, and as it relates to them personally
[Westin]. Furthermore, privacy as a legal concept is understood
differently in different jurisdictions. The guidance provided in
this document is generic and can be used to inform the design of any
protocol to be used anywhere in the world, without reference to
specific legal frameworks.
Whether any individual document warrants a specific privacy
considerations section will depend on the document's content.
Documents whose entire focus is privacy may not merit a separate
OK. Enough is enough. It's fine to have a quick survey of earlier
work, but that's not sufficient.
You keep using the word privacy, and I don't know what you mean.
The typical writer and reader of RFCs is not experienced in the topic of
privacy. They won't know what you mean either: they need very concrete
guidance about the word's meaning.
Telling me that different people mean different things with the term
merely assures me that I have no idea what /you/ mean unless you tell
me. Having each reader make guesses about the meaning is a way to
ensure non-interoperability of the construct.
Guidance can't be very helpful if the reader has no idea when to apply it.
section (for example, "Private Extensions to the Session Initiation
Protocol (SIP) for Asserted Identity within Trusted Networks"
[RFC3325]). For certain specifications, privacy considerations are a
subset of security considerations and can be discussed explicitly in
I strongly suggest that any explicit privacy discussion be required to
be an entirely separate from the 'security considerations' section.
My reasoning is simple: This community sees 'security' in terms of
encryption and signing, traffic analysis, and other such mechanical,
relatively low-level components. Privacy is an entirely different and
broader and more human beast, even when its details devolve to these
familiar mechanics.
At the least, making it a separate section will help writers and readers
to distinguish privacy from the security stuff we are used to seeing
discussed.
the security considerations section. Some documents will not require
discussion of privacy considerations (for example, "Definition of the
Opus Audio Codec" [RFC6716]). The guidance provided here can and
should be used to assess the privacy considerations of protocol,
architectural, and operational specifications and to decide whether
those considerations are to be documented in a stand-alone section,
within the security considerations section, or throughout the
document.
Not sure whether this is a question or a suggestion; if it's the latter,
I'm not sure what to suggest: privacy issues often develop as a
combinatorial problem -- 'correlation' as you note farther down -- that
is, developing out of unpredicted integration of information from
discrete services. While any specific IETF specification might have its
own, direct privacy issues needing consideration, where should
discussion of these combinatorial dangers be discussed?
2. Terminology
This section defines basic terms used in this document, with
references to pre-existing definitions as appropriate. As in
[RFC4949], each entry is preceded by a dollar sign ($) and a space
for automated searching. Note that this document does not try to
attempt to define the term 'privacy' itself. Instead privacy is the
sum of what is contained in this document. We therefore follow the
approach taken by [RFC3552].
Sorry. Not workable, if you want meaningful consideration by authors
and meaningful understanding by readers.
2.1. Entities
Several of these terms are further elaborated in Section 3.
$ Attacker: An entity that intentionally works against some privacy
protection goal. Unlike observers, attackers' behavior is
unauthorized.
This precludes accidental privacy violations?
$ Eavesdropper: A type of attacker that passively observes an
initiator's communications without the initiator's knowledge or
authorization. See [RFC4949].
$ Enabler: A protocol entity that facilitates communication between
an initiator and a recipient without being directly in the
communications path.
For example...?
2.3. Identifiability
...
$ Personal Name: A natural name for an individual. Personal names
are often not unique, and often comprise given names in
combination with a family name. An individual may have multiple
personal names at any time and over a lifetime, including official
names. From a technological perspective, it cannot always be
determined whether a given reference to an individual is, or is
based upon, the individual's personal name(s) (see Pseudonym).
Official Names also are typically not unique.
$ Pseudonym: A name assumed by an individual in some context,
unrelated to the individual's personal names known by others in
that context, with an intent of not revealing the individual's
identities associated with her other names.
(Might be worth mentioning that this is sometimes called "persona".)
Pseudonyms also often are not unique.
My point is that it's good that you mentioned this issue and should
repeat it for each term to which it applies.
3. Communications Model
To understand attacks in the privacy-harm sense, it is helpful to
consider the overall communication architecture and different actors'
roles within it. Consider a protocol entity, the "initiator," that
initiates communication with some recipient. Privacy analysis is
most relevant for protocols with use cases in which the initiator
acts on behalf of an individual (or different individuals at
different times). It is this individual whose privacy is potentially
threatened.
If I receive a credit dunning notice or a legal notification, I'm the
recipient, but unauthorized disclosure of such messages would be
privacy-harm for me. It isn't just initiator-side individuals.
Communications may be direct between the initiator and the recipient,
or they may involve an application-layer intermediary (such as a
proxy or cache) that is necessary for the two parties to communicate.
proxy or cache -> proxy, cache or (mail) relay
In some cases this intermediary stays in the communication path for
the entire duration of the communication and sometimes it is only
used for communication establishment, for either inbound or outbound
communication. In rare cases there may be a series of intermediaries
For email, it isn't rare at all. In fact, it's universal, probably for
literally every email sent.
that are traversed. At lower layers, additional entities are
involved in packet forwarding that may interfere with privacy
protection goals as well.
...
Protocol design is often predicated on the notion that recipients,
intermediaries, and enablers are assumed to be authorized to receive
and handle data from initiators. As [RFC3552] explains, "we assume
that the end-systems engaging in a protocol exchange have not
Cooper, et al. Expires August 27, 2013 [Page 10]
Internet-Draft Privacy Considerations February 2013
themselves been compromised." However, by its nature privacy
which nature?
seriously, how is the reader to know (or even guess) what exactly is
being implied?
analysis requires questioning this assumption since systems are often
compromised for the purpose of obtaining personal data.
Although recipients, intermediaries, and enablers may not generally
be considered as attackers, they may all pose privacy threats
(depending on the context) because they are able to observe, collect,
exactly!
4. Privacy Threats
...
This section lists common privacy threats (drawing liberally from
[Solove], as well as [CoE]), showing how each of them may cause
individuals to incur privacy harms and providing examples of how
these threats can exist on the Internet.
Some privacy threats are already considered in IETF protocols as a
cite some examples.
matter of routine security analysis. Others are more pure privacy
What does it mean to be a "more pure privacy threat"? Really, I can't
guess.
threats that existing security considerations do not usually address.
The threats described here are divided into those that may also be
considered security threats and those that are primarily privacy
threats.
Note that an individual's awareness of and consent to the practices
described below may change an individual's perception of and concern
for the extent to which they threaten privacy. If an individual
authorizes surveillance of his own activities, for example, the
individual may be able to take actions to mitigate the harms
associated with it, or may consider the risk of harm to be tolerable.
4.1. Combined Security-Privacy Threats
The fact that you have a string like "Combined Security-Privacy"
supports the view that Privacy Considerations is distinct from Security
and should not be in the Security Considerations section...
4.1.4. Misattribution
Misattribution occurs when data or communications related to one
individual are attributed to another. Misattribution can result in
adverse reputational, financial, or other consequences for
individuals that are misidentified.
It's probably worth mentioning that for spam, this is often called
"spoofing".
5.1. Data Minimization
...
However, the most direct application of data minimization to protocol
design is limiting identifiability. Reducing the identifiability of
data by using pseudonyms or no identifiers at all helps to weaken the
link between an individual and his or her communications. Allowing
for the periodic creation of new identifiers reduces the possibility
also randomization of chosen identifiers
5.2. User Participation
As explained in Section 4.2.5, data collection and use that happens
"in secret," without the individual's knowledge, is apt to violate
the individual's expectation of privacy and may create incentives for
misuse of data. As a result, privacy regimes tend to include
provisions to require informing individuals about data collection and
use and involving them in decisions about the treatment of their
data. In an engineering context, supporting the goal of user
participation usually means providing ways for users to control the
data that is shared about them. It may also mean providing ways for
users to signal how they expect their data to be used and shared.
There is a serious downside to this. It presumes that this burden on
users is reasonable. For many scenarios, it isn't. Rather, the focus
on user participation is often used as an alternative to the difficult
work (or research) on mechanisms that require less user participation.
6. Scope of Privacy Implications of Internet Protocols
Internet protocols are often built flexibly, making them useful in a
variety of architectures, contexts, and deployment scenarios without
requiring significant interdependency between disparately designed
components. Although protocol designers often have a particular
target architecture or set of architectures in mind at design time,
it is not uncommon for architectural frameworks to develop later,
after implementations exist and have been deployed in combination
with other protocols or components to form complete systems.
Independent of the purpose of this draft, the above paragraph is quite a
nice bit of text about an aspect of IETF technical work.
As a consequence, the extent to which protocol designers can foresee
all of the privacy implications of a particular protocol at design
time is limited. An individual protocol may be relatively benign on
its own, and it may make use of privacy and security features at
lower layers of the protocol stack (Internet Protocol Security,
Transport Layer Security, and so forth) to mitigate the risk of
attack. But when deployed within a larger system or used in a way
not envisioned at design time, its use may create new privacy risks.
Protocols are often implemented and deployed long after design time
by different people than those who did the protocol design. The
guidelines in Section 7 ask protocol designers to consider how their
protocols are expected to interact with systems and information that
exist outside the protocol bounds, but not to imagine every possible
deployment scenario.
Furthermore, in many cases the privacy properties of a system are
dependent upon the complete system design where various protocols are
combined together to form a product solution; the implementation,
which includes the user interface design; and operational deployment
practices, including default privacy settings and security processes
within the company doing the deployment. These details are specific
to particular instantiations and generally outside the scope of the
work conducted in the IETF. The guidance provided here may be useful
in making choices about these details, but its primary aim is to
assist with the design, implementation, and operation of protocols.
Perhaps the largest challenge I repeatedly see in the IETF is what I
call "systems thinking", which is considering an integrated set of
components and their interactions. The above three paragraphs very
nicely target exactly that scope of concern, in the context of privacy.
So I /strongly/ suggest you move the three paragraphs up to the
Introduction. Note that this would largely resolve the concern I raised
there, that the Introduction really doesn't introduce cross-component
(multi-specification) scoping issues for privacy. Add a citation in it
to this section.
Transparency of data collection and use -- often effectuated through
user interface design -- is normally a key factor in determining the
I realize that's a common view, but has it been validated or is it
merely the default perspective that user permission solves everything?
privacy impact of a system. Although most IETF activities do not
involve standardizing user interfaces or user-facing communications,
in some cases understanding expected user interactions can be
important for protocol design. Unexpected user behavior may have an
adverse impact on security and/or privacy.
While a generically reasonable view, the challenge with its application
in the IETF is our general tendency to think that we understand UI and
UX issues, although few in the IETF actually have the background for it.
For example we tend to think that simply giving users more information
is a universal palliative. Most discussions here about "expected user
interactions" are simply wrong. Worse, I've no idea what to suggest to
counter this for the draft.
7. Guidelines
This section provides guidance for document authors in the form of a
questionnaire about a protocol being designed. The questionnaire may
be useful at any point in the design process, particularly after
document authors have developed a high-level protocol model as
described in [RFC4101].
Note that the guidance does not recommend specific practices. The
range of protocols developed in the IETF is too broad to make
recommendations about particular uses of data or how privacy might be
balanced against other design goals. However, by carefully
considering the answers to each question, document authors should be
able to produce a comprehensive analysis that can serve as the basis
for discussion of whether the protocol adequately protects against
privacy threats.
For some years after Security Considerations were made mandatory,
authors mostly floundered with the topic, given their/our lack of
background for assessing security considerations. Eventually there was
IETF focus on making the section useful.
While this draft goes a long way to making the nature and requirements
of a Privacy Considerations section substantive, it's going to be some
time before the community develops helpful skills at writing these sections.
I suggest setting up a Privacy Directorate, essentially as a
consulting/review service for authors to use in developing their text
for the section in their documents. The Directorate might also take
initiative at reviewing new documents.
The framework is divided into four sections that address each of the
mitigation classes from Section 5, plus a general section. Security
is not fully elaborated since substantial guidance already exists in
[RFC3552].
7.1. Data Minimization
a. Identifiers. What identifiers does the protocol use for
distinguishing initiators of communications? Does the protocol
use identifiers that allow different protocol interactions to be
correlated? What identifiers could be omitted or be made less
identifying while still fulfilling the protocol's goals?
I'd think that retention of recipient identifiers might also be an issue?
b. Data. What information does the protocol expose about
individuals, their devices, and/or their device usage (other than
the identifiers discussed in (a))? To what extent is this
information linked to the identities of the individuals? How does
the protocol combine personal data with the identifiers discussed
in (a)?
c. Observers. Which information discussed in (a) and (b) is
exposed to each other protocol entity (i.e., recipients,
intermediaries, and enablers)? Are there ways for protocol
implementers to choose to limit the information shared with each
entity? Are there operational controls available to limit the
information shared with each entity?
d. Fingerprinting. In many cases the specific ordering and/or
occurrences of information elements in a protocol allow users,
devices, or software using the protocol to be fingerprinted. Is
this protocol vulnerable to fingerprinting? If so, how? Can it
Cooper, et al. Expires August 27, 2013 [Page 25]
Internet-Draft Privacy Considerations February 2013
be designed to reduce or eliminate the vulnerability? If not, why
not?
e. Persistence of identifiers. What assumptions are made in the
protocol design about the lifetime of the identifiers discussed in
(a)? Does the protocol allow implementers or users to delete or
replace identifiers? How often does the specification recommend
to delete or replace identifiers by default? Can the identifiers,
along with other state information, be set to automatically
expire?
f. Correlation. Does the protocol allow for correlation of
identifiers? Are there expected ways that information exposed by
Is it productive to also look for 'unexpected' ways? This could be a
silly and wasteful exercise, or thinking creatively about strange
combinations might trigger better insight. I've no direct experience,
so can't judge.
8. Example
...
The fundamental architecture defined in RFC 2778 and RFC 3859 is a
mediated one. Clients (presentities in RFC 2778 terms) publish their
presence information to presence servers, which in turn distribute
information to authorized watchers. Presence servers thus retain
presence information for an interval of time, until it either changes
or expires, so that it can be revealed to authorized watchers upon
request. This architecture mirrors existing pre-standard deployment
models. The integration of an explicit authorization mechanism into
the presence architecture has been widely successful in involving the
end users in the decision making process before sharing information.
Nearly all presence systems deployed today provide such a mechanism,
typically through a reciprocal authorization system by which a pair
of users, when they agree to be "buddies," consent to divulge their
presence information to one another. Buddylists are managed by
servers but controlled by end users. Users can also explicitly block
one another through a similar interface, and in some deployments it
is desirable to provide "polite blocking" of various kinds.
As the discussion moves into the details of analyzing each type of
privacy concern, I suggest making the format be bulleted and/or tabular.
This will make each segment of analysis more accessible to the reader
and easier to correlate with the lists of privacy concerns/attributes
provided earlier in the document. It will also aid scanning for review
and later consultation.
From a perspective of privacy design, however, the classical presence
architecture represents nearly a worst-case scenario. In terms of
Cooper, et al. Expires August 27, 2013 [Page 28]
Internet-Draft Privacy Considerations February 2013
data minimization, presentities share their sensitive information
with presence services, and while services only share this presence
information with watchers authorized by the user, no technical
mechanism constrains those watchers from relaying presence to further
Offhand, I don't know what mechanisms are practical to impose such a
constraint, in a protocol specification. It would help to see an example.
third parties. Any of these entities could conceivably log or retain
presence information indefinitely. The sensitivity cannot be
mitigated by rendering the user anonymous, as it is indeed the
purpose of the system to facilitate communications between users who
know one another. The identifiers employed by users are long-lived
and often contain personal information, including personal names and
the domains of service providers. While users do participate in the
construction of buddylists and blacklists, they do so with little
prospect for accountability: the user effectively throws their
presence information over the wall to a presence server that in turn
distributes the information to watchers. Users typically have no way
to verify that presence is being distributed only to authorized
watchers, especially as it is the server that authenticates watchers,
not the end user. Connections between the server and all publishers
and consumers of presence data are moreover an attractive target for
eavesdroppers, and require strong confidentiality mechanisms, though
again the end user has no way to verify what mechanisms are in place
between the presence server and a watcher.
Again, what would be realistic choices for fixing this? (It's possible
that there aren't any and that privacy considerations would merely need
to document an inherent and unfixable exposure. In terms of guidance to
writers of privacy considerations, that's ok, but it's worth making this
point clear.)
...
Privacy concerns about presence information largely arise due to the
built-in mediation of the presence architecture. The need for a
presence server is motivated by two primary design requirements of
presence: in the first place, the server can respond with an
"offline" indication when the user is not online; in the second
place, the server can compose presence information published by
different devices under the user's control. Additionally, to
Cooper, et al. Expires August 27, 2013 [Page 29]
Internet-Draft Privacy Considerations February 2013
preserve the use of URIs as identifiers for entities, some service
"preserve"?
must operate a host with the domain name appearing in a presence URI,
and in practical terms no commercial presence architecture would
force end users to own and operate their own domain names. Many end
users of applications like presence are behind NATs or firewalls, and
effectively cannot receive direct connections from the Internet - the
persistent bidirectional channel these clients open and maintain with
a presence server is essential to the operation of the protocol.
So? I'm not understanding what makes this a privacy issue.
One must first ask if the trade-off of mediation for presence is
worth it. Does a server need to be in the middle of all publications
worth it -> worthwhile.
of presence information? It might seem that end-to-end encryption of
the presence information could solve many of these problems. A
Not as described: You'd still have mediation. That is, the solution
you offer does not answer the question you ask.
I think you mean to ask whether the intermediary needs to see all
presence information in the clear. If you really intend to suggest that
an intermediary isn't needed, then you need to describe a scenario
without one.
presentity could encrypt the presence information with the public key
of a watcher, and only then send the presence information through the
server. The IETF defined an object format for presence information
called the Presence Information Data Format (PIDF), which for the
purposes of conveying location information was extended to the PIDF
Location Object (PIDF-LO) - these XML objects were designed to
accommodate an encrypted wrapper. Encrypting this data would have
the added benefit of preventing stored cleartext presence information
from being seized by an attacker who manages to compromise a presence
server. This proposal, however, quickly runs into usability
problems. Discovering the public keys of watchers is the first
difficulty, one that few Internet protocols have addressed
successfully. This solution would then require the presentity to
publish one encrypted copy of its presence information per authorized
watcher to the presence service, regardless of whether or not a
watcher is actively seeking presence information - for a presentity
with many watchers, this may place an unacceptable burden on the
presence server, especially given the dynamism of presence
information. Finally, it prevents the server from composing presence
information reported by multiple devices under the same user's
control. On the whole, these difficulties render object encryption
of presence information a doubtful prospect.
Some protocols that provide presence information, such as SIP, can
hmmm. I didn't think that SIP, itself, provided presence
information...? SIMPLE uses SIP, but it isn't SIP doing the presence work.
operate intermediaries in a redirecting mode, rather than a
publishing or proxying mode. Instead of sending presence information
through the server, in other words, these protocols can merely
redirect watchers to the presentity, and then presence information
could pass directly and securely from the presentity to the watcher.
It is worth noting that this would disclose the IP address of the
presentity to the watcher, which has its own set of risks. In that
case, the presentity can decide exactly what information it would
like to share with the watcher in question, it can authenticate the
watcher itself with whatever strength of credential it chooses, and
with end-to-end encryption it can reduce the likelihood of any
Cooper, et al. Expires August 27, 2013 [Page 30]
Internet-Draft Privacy Considerations February 2013
eavesdropping. In a redirection architecture, a presence server
could still provide the necessary "offline" indication, without
requiring the presence server to observe and forward all information
itself. This mechanism is more promising than encryption, but also
suffers from significant difficulties. It too does not provide for
composition of presence information from multiple devices - it in
fact forces the watcher to perform this composition itself. The
largest single impediment to this approach is however the difficulty
of creating end-to-end connections between the presentity's device(s)
and a watcher, as some or all of these endpoints may be behind NATs
or firewalls that prevent peer-to-peer connections. While there are
potential solutions for this problem, like STUN and TURN, they add
complexity to the overall system.
Given the pragmatics, I'm surprised you'd call this 'promising'.
Consequently, mediation is a difficult feature of the presence
architecture to remove, and due especially to the requirement for
composition it is hard to minimize the data shared with
intermediaries. Control over sharing with intermediaries must
therefore come from some other explicit component of the
architecture. As such, the presence work in the IETF focused on
improving the user participation over the activities of the presence
server. This work began in the GEOPRIV working group, with controls
on location privacy, as location of users is perceived as having
especially sensitive properties. With the aim to meet the privacy
requirements defined in [RFC2779] a set of usage indications, such as
whether retransmission is allowed or when the retention period
expires, have been added to PIDF-LO that always travel with location
information itself. These privacy preferences apply not only to the
intermediaries that store and forward presence information, but also
to the watchers who consume it.
This approach very much follows the spirit of Creative Commons [CC],
namely the usage of a limited number of conditions (such as 'Share
Alike' [CC-SA]). Unlike Creative Commons, the GEOPRIV working group
did not, however, initiate work to produce legal language nor to
design graphical icons since this would fall outside the scope of the
hmmm. This raises a possible issue with finding and liaising with other
groups relevant to privacy and with complementary skills. So, for
example, here's a case of needing work to aid privacy that was
identified but needed to be handed off to another group.
Lining up such contacts ahead of time could be a useful bit of work for
a privacy directorate?
d/
--
Dave Crocker
Brandenburg InternetWorking
bbiw.net