Re: Usage of Differential Privacy & RAPPOR

2017-10-24 Thread Andrew Meyer via governance
That's pretty much what differential privacy already does. Some percentage
of the data each person sends will be fake.

On Mon, Oct 23, 2017, 1:29 AM  wrote:

> On Monday, September 11, 2017 at 7:08:54 AM UTC-6, Georg Fritzsche wrote:
> > Thanks for the feedback, we are evaluating it and will follow up in the
> > next weeks.
>
> Regardless of what you decide, you'll have to take into account the fact
> that some people will, to preserve their own privacy, feed you nonsense
> data in some unknown amount from some unknown group of IPs if you make it
> opt-out instead of opt-in.
>
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-10-24 Thread peterevans111--- via governance
On Monday, 21 August 2017 16:56:44 UTC+1, Georg Fritzsche  wrote:
> Hi,
> 
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
> 
> The problem.
> 
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
> 
> Currently we can collect this data when the user opts in,  but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
> 
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
> 
>-
> 
>"Which top sites are users visiting?"
>-
> 
>"Which sites using Flash does a user encounter?"
>-
> 
>"Which sites does a user see heavy Jank on?"
> 
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
> 
> The solution.
> 
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
> 
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
> 
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
> 
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
> 
> Our plan.
> 
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population  We
> are hoping to launch this in mid-September.
> 
> This is not the type of data we have collected as opt-out in the past and
> is a new approach for Mozilla. As such, we are still experimenting with the
> project and wanted to reach out for feedback.
> 
> Georg
> 
> References:
> 
> 1: https://en.wikipedia.org/wiki/Public_Suffix_List
> 
> 2: https://en.wikipedia.org/wiki/Differential_privacy
> 
> 3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/
> 
> 4: https://github.com/google/rappor
> 5: https://arxiv.org/abs/1407.6981
> 6:
> https://wiki.mozilla.org/Firefox/Shield/Shield_Studies

The changes that have been made benefit the majority of first time users, in as 
much that there is normally a specific purpose in mind when downloading Firefox 
that takes priority over reading the Terms and Conditions data immediately.
 Additionally, when a seasoned user is moving on to a new computer that 
seasoned user will almost always take for granted the long standing objectives 
built into Mozilla products and only refer to the privacy principles when there 
is a clear reason for doing so. I count myself in this group but I do have 
security measures in place to defend myself against unauthorized intrusions and 
the instant I witness an unwanted incursion I unplug the internet and avoid 
"frozen" pages and worse!
Well done Mozilla for giving us a stable product that helps us to stay safe!
Peter Evans
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-10-23 Thread chbarts--- via governance
On Monday, September 11, 2017 at 7:08:54 AM UTC-6, Georg Fritzsche wrote:
> Thanks for the feedback, we are evaluating it and will follow up in the
> next weeks.

Regardless of what you decide, you'll have to take into account the fact that 
some people will, to preserve their own privacy, feed you nonsense data in some 
unknown amount from some unknown group of IPs if you make it opt-out instead of 
opt-in.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-09-12 Thread Tom via governance



The current plan is to run a SHIELD study, to confirm that we can get
answers for this kind of data from the Firefox population. Using RAPPOR we
collect aggregate data on the most common domain value users set their
homepage to (e.g. foo.com) or the value of "about:home".

Through the use of RAPPOR


Will you run that study with RAPPOR+optout and with the current opt-in, 
in order to compare the result between the current supposed biased state 
(opt-in) and the new supposedly less biased method (RAPPOR+optout) in 
order to correctly see the real improvement that RAPPOR+optout gives?

___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-09-11 Thread Georg Fritzsche via governance
Thanks for the feedback, we are evaluating it and will follow up in the
next weeks.

I'll summarize and expand on some key points that came up.

Some background.

As originally outlined, there are some important questions that we
currently can not answer. Our existing data sources can either not give us
the data needed or are not sufficiently representative of our user base.

There are different constraints that we are working with:

We have a strong stance on our privacy principles
 and about how we impact
our users
.

We also need reliable and representative data to make decisions, within the
limits of these principles.

Some kind of data we will not collect by default based on our principles,
instead requiring additional consent. This means that we can not generally
get representative populations, due to selection bias.

Filling the gap.

This is why we are exploring techniques to address this. What if we could
collect data in a way that ensured a strict level of privacy inside
Firefox, before anything was sent to a server?

If we could achieve this, then we get both anonymous and representative
data collection.

This would be subject to our privacy policy
. If users turn off data sharing
through the preferences, we would not submit this data - Firefox will
always respect user choice.

How this works.

This is where Differential Privacy
 techniques come in,
which rely on practices of hashing and noise injection so that no
conclusions can be made about individual users. The most common example
comes from Social Science studies, where participants are lying about their
answers as determined by coin flip.

RAPPOR  is one specific technique that
can be applied to strings and allows giving formal privacy guarantees,
depending on the choice of parameters.

What we plan to do.

The current plan is to run a SHIELD study, to confirm that we can get
answers for this kind of data from the Firefox population. Using RAPPOR we
collect aggregate data on the most common domain value users set their
homepage to (e.g. foo.com) or the value of "about:home".

Through the use of RAPPOR, only obfuscated data will leave Firefox. By
sending out noisy data, we protect the privacy of individual users.

Then, for getting answers out of the noisy data we receive, we can test the
aggregated data of all users against a list of domains from other sources.
We can use e.g. the Alexa Top 500 sites  to estimate if any of them are
present in the aggregated data.

What's next.

Currently we are evaluating the feedback and will decide about next steps.

We will continue to work in the open and maintain a dialog around our data
collection practices. Stay tuned for further communications regarding our
research into differential privacy, best practices, scientific
collaborations and more technical details.

Georg

On Mon, Aug 21, 2017 at 5:56 PM, Georg Fritzsche 
wrote:

> Hi,
>
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
>
> The problem.
>
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
>
> Currently we can collect this data when the user opts in,  but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
>
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
>
>-
>
>"Which top sites are users visiting?"
>-
>
>"Which sites using Flash does a user encounter?"
>-
>
>"Which sites does a user see heavy Jank on?"
>
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>
> The solution.
>
> One solution is the use of differential privacy [2] [3], which allows us
> to collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
>
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
>
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
>
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
>
> Our plan.
>
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will 

Re: Usage of Differential Privacy & RAPPOR

2017-09-01 Thread Georg Fritzsche via governance
On Tue, Aug 29, 2017 at 5:13 PM, Kurt Roeckx via governance <
governance@lists.mozilla.org> wrote:

> On 2017-08-29 15:50, Georg Fritzsche wrote:
>
>> On Thu, Aug 24, 2017 at 10:23 AM, Kurt Roeckx via governance <
>> governance@lists.mozilla.org> wrote:
>>
>> On 2017-08-23 16:33, Alex Gaynor wrote:
>>>
>>> I had the same question, but it looks like RAPPOR has gotten
 significantly
 more advanced since I originally learned about the "just boolean
 questions"
 version. https://arxiv.org/pdf/1503.01214.pdf explains how to build
 privacy
 preserving measurements without knowing the values of the population.


>>> So if I understand things correctly from the paper, you create a bloom
>>> filter for the URL/hostname you want to send, then randomly change it,
>>> store that. And each time they ask about the URL/hostname you take the
>>> stored version, randomly change it and that's what you send.
>>>
>>> What I understand from that is that you don't get to learn the
>>> URL/hostname at all, but can query if a URL/hostname has been submitted.
>>> You don't get to learn what the population is, but the whole population
>>> can
>>> be send.
>>>
>>> Is that accurate?
>>>
>>>
>> Hi,
>>
>> through RAPPOR, we can send randomized values for all encountered domain
>> values.
>>
>> Then, in analysis, we can test the noisy aggregate data against known
>> domain values and get an estimate of how frequently they occurred.
>>
>> This gives immediate insights and we can increase the detail by adding
>> more
>> sources for known domain values.
>>
>
> The paper has several algorithms in it. The first is described in "II.
> BACKGROUND", which does not allow you to learn the dictionary, but you can
> check that certain URLs are in it or not.
>
> Then in "III. ESTIMATING JOINT DISTRIBUTIONS" they describe how you can
> correlate different answers with each other.
>
> Then in "IV. RAPPOR WITHOUT A KNOWN DICTIONARY" they describe that you can
> send some additional data, and then using the algorithm from III to learn
> something about the dictionary.
>
> Do you intend to use the algorithm from II or from IV?
>
> From what I understand, for the algorithm of II there are various
> parameters that affect the noise, and how likely it is someone can learn
> something about the data you're sending. I think they at least include:
> - The size of the bloom filter
> - The number of hashes you use
> - probability of the randomization for the PRR (f in the paper)
> - probability of the randomization for the IRR (q and p from the paper)
>
> Do you have any idea which you plan to use, and what the effect of that is?
>

The referenced paper is a newer one ("Building RAPPOR with the unknown
[...]").

Our current work is based on the first paper
. The anonymization part is described
in paragraphs 3.1 and 3.2. The aggregation/decoding is described in 4.

We will publish a summary of the technical details if we decide to move
forward with this.

We are also working on a blog post that will share the best practices and
approaches that we found from working on this.

Georg
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-09-01 Thread Georg Fritzsche via governance
On Sun, Aug 27, 2017 at 2:47 PM, David Bruant  wrote:

> Asks for sensitive data center most commonly around knowing something in
>> relation to which sites a user visits:
>>
>> -
>>
>> "Which top sites are users visiting?"
>> -
>>
>> "Which sites using Flash does a user encounter?"
>> -
>>
>> "Which sites does a user see heavy Jank on?"
>>
>> In summary most asks are for occurrences of an event X per domain (more
>> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
>>
>> The solution.
>>
>> One solution is the use of differential privacy [2] [3], which allows us
>> to
>> collect sensitive data without being able to make conclusions about
>> individual users, thus preserving their privacy.
>>
>> An attacker that has access to the data a single user submits is not able
>> to tell whether a specific site was visited by that user or not.
>>
> Just to be 100% sure i understand, what will happen is that Firefox will
> lie (or answer randomly) to the question with probability p. This way, even
> if an attacker reaches to Moz servers, they can trust the answer only with
> probability 1-p.
> There is a trade-off between utility (low p) and stronger privacy (high p).
> Could this trade-off be documented and a hard low limit be decided?
> Should each study decide on a different p based on data sensitivity?
>

Yes, once the value is encoded we will lie or answer randomly about the
status of each bit with a certain probability. This probability depends on
a prior state of  the bloom filter which holds potential responses. it was
a 1 or a 0 and on all the parameters of the RAPPOR algorithm.

As an end-result we effectively constrain it to 1-p.

As your intuition correctly suggests, there is a balance between utility
and privacy. Our goal is to choose parameters such that the privacy of
users is assured, while also getting statistical insights from the
aggregate data.

The privacy guarantee is expressed in terms of the ε parameter. For RAPPOR
this takes into account the addition of noise on the client-side via the
“lying” mechanism described above. Depending on the data sensitivity, the
population size and the collection frequency (one-time or repeated) the
ε-level should be fixed and the appropriate set of parameters need to be
tuned. Under some circumstances this may mean that useful data may not be
collected, in which case user privacy is still preserved.

This parameter choice should be transparently documented and we need to
establish hard limits as well as best practices around choosing them.

We are working on a blog post that will share the best practices and
approaches that we found.


Georg
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-31 Thread Henri Sivonen via governance
On Tue, Aug 22, 2017 at 5:59 PM, dan.callahan--- via governance
 wrote:
> Differential privacy is a great tool, however, I'm concerned that even if we 
> do everything *technically* correctly to preserve user privacy, the *optics* 
> associated with this sort of data collection were not address in this email.
>
> We attempted to do similarly with User Profile ("UP") / Directory Tiles 
> projects in Content Services, which proposed completely local history 
> analysis for purposes of advertising and content discovery. All of which was 
> done in a way that absolutely protected user privacy (the analysis never left 
> the local machine), but we weren't able to overcome the superficial 
> impression that Firefox was tracking users.

I think Dan's point is super-important. Reputational damage will occur
if people *think* Mozilla performs a privacy violation even if the
technical implementation was carefully privacy-preserving.

It's difficult for me to imagine a scenario where the usefulness of
the results of the planned study could outweigh the risk of a meme of
Mozilla doing something privacy violating spreading around. That's
why, I think Mozilla should not gather opt-out telemetry that sends
information about the sites accessed in any manner (even if users
could deem it privacy-preserving after looking into the details of the
implementation; my concern is about the case when users form their
opinion without reading papers from arxiv, etc.).

As a Gecko developer, very much want to see feature usage data and,
while I haven't had the need yet, I can very well imagine needing
in-the-field performance metrics.  I don't want users to opt out of or
not to opt into feature usage and performance telemetry because they
think that enabling it would send a list of the sites accessed to
Mozilla.

So I would like to ask that Mozilla categorically not gather telemetry
about sites accessed and *clearly say so* in order to maximize user
comfort with having feature usage and performance telemetry enabled.
Failing that, I would like to ask that feature usage and performance
metrics be behind a different checkbox than telemetry about sites
accessed and the latter be clearly opt-in. (And, yes, I realize that
having a different checkbox for the latter makes it look more
nefarious, because the distinct checkbox implies an admission that the
two are somehow of different impact.)

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-30 Thread Georg Fritzsche via governance
On Wed, Aug 23, 2017 at 4:14 PM, oliver.hodgson--- via governance <
governance@lists.mozilla.org> wrote:

> I know it's super-anonymised, but given the controversial nature of the
> subject, it might help people to understand more about the actual data
> you're planning to collect.
>
> So what exactly do you plan to collect? How long do you plan to store this
> data? How will it be stored? Is there a process in place to ensure it isn't
> kept for any longer than necessary? Who will have access to the data and
> how?
>
> Thanks,
> Olly.
>

For the currently planned study, we intend to collect aggregate data on the
domain value users set their homepage (e.g. foo.com) or the value of
"about:home".

We plan to submit and store it through our existing Telemetry system, where
we keep individual submissions for up to 180 days. This is subject to our
privacy policy .

Access to this kind of data is limited to Mozilla staff. We intend to
publish the sanitized conclusions from the study publicly.


Georg
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-29 Thread Laurențiu Nicola via governance
On Tuesday, 29 August 2017 16:54:39 UTC+3, Georg Fritzsche  wrote:
> For opt-out, we inform users about our data collection on first use and
> that they can turn it off.
> 
> Georg

Just so you know, this alleviates a large part of concerns. Thanks for 
implementing it.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-29 Thread Kurt Roeckx via governance

On 2017-08-29 15:50, Georg Fritzsche wrote:

On Thu, Aug 24, 2017 at 10:23 AM, Kurt Roeckx via governance <
governance@lists.mozilla.org> wrote:


On 2017-08-23 16:33, Alex Gaynor wrote:


I had the same question, but it looks like RAPPOR has gotten significantly
more advanced since I originally learned about the "just boolean
questions"
version. https://arxiv.org/pdf/1503.01214.pdf explains how to build
privacy
preserving measurements without knowing the values of the population.



So if I understand things correctly from the paper, you create a bloom
filter for the URL/hostname you want to send, then randomly change it,
store that. And each time they ask about the URL/hostname you take the
stored version, randomly change it and that's what you send.

What I understand from that is that you don't get to learn the
URL/hostname at all, but can query if a URL/hostname has been submitted.
You don't get to learn what the population is, but the whole population can
be send.

Is that accurate?



Hi,

through RAPPOR, we can send randomized values for all encountered domain
values.

Then, in analysis, we can test the noisy aggregate data against known
domain values and get an estimate of how frequently they occurred.

This gives immediate insights and we can increase the detail by adding more
sources for known domain values.


The paper has several algorithms in it. The first is described in "II. 
BACKGROUND", which does not allow you to learn the dictionary, but you 
can check that certain URLs are in it or not.


Then in "III. ESTIMATING JOINT DISTRIBUTIONS" they describe how you can 
correlate different answers with each other.


Then in "IV. RAPPOR WITHOUT A KNOWN DICTIONARY" they describe that you 
can send some additional data, and then using the algorithm from III to 
learn something about the dictionary.


Do you intend to use the algorithm from II or from IV?

From what I understand, for the algorithm of II there are various 
parameters that affect the noise, and how likely it is someone can learn 
something about the data you're sending. I think they at least include:

- The size of the bloom filter
- The number of hashes you use
- probability of the randomization for the PRR (f in the paper)
- probability of the randomization for the IRR (q and p from the paper)

Do you have any idea which you plan to use, and what the effect of that is?


Kurt
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-29 Thread Georg Fritzsche via governance
On Fri, Aug 25, 2017 at 12:53 AM, hagis6789--- via governance <
governance@lists.mozilla.org> wrote:

> It would be better to keep it opt-in as the default, but during
> installation prompt with the option to opt-in to telemetry, rather than
> quietly letting the install go with the default as opted-out.
>

For opt-out, we inform users about our data collection on first use and
that they can turn it off.


> 1. Keep personally identifiable information (like the combination of
> browser, add-ons, computer hardware, and antivirus) separate and
> disconnected from records of what websites were visited.
>

What is being proposed here will use RAPPOR to submit randomized data on
domains, so we will not be able to see any individual records. We only get
statistical information out of this data when looking at the aggregate of
all collected data.

Georg
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-29 Thread Georg Fritzsche via governance
On Fri, Aug 25, 2017 at 12:50 AM, hagis6789--- via governance <
governance@lists.mozilla.org> wrote:

> It is one thing to give a count of how many cases of syphilis there are in
> a city, quite another to give a count of how many cases there are in a town
> of 300.  It is one thing to say that credit card spending is up 20%, quite
> another to say that Bob Smith's credit card spending is up 20%.
>

Usage of techniques like RAPPOR allows us to keep individual data private,
regardless of the population size.

The trade-off is that we need a minimum population of a certain size to get
answers out of the aggregate data we collect.

In your specific example, we would not be able to get answers from a
population size of 300 people.


Georg
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-29 Thread Georg Fritzsche via governance
On Thu, Aug 24, 2017 at 10:23 AM, Kurt Roeckx via governance <
governance@lists.mozilla.org> wrote:

> On 2017-08-23 16:33, Alex Gaynor wrote:
>
>> I had the same question, but it looks like RAPPOR has gotten significantly
>> more advanced since I originally learned about the "just boolean
>> questions"
>> version. https://arxiv.org/pdf/1503.01214.pdf explains how to build
>> privacy
>> preserving measurements without knowing the values of the population.
>>
>
> So if I understand things correctly from the paper, you create a bloom
> filter for the URL/hostname you want to send, then randomly change it,
> store that. And each time they ask about the URL/hostname you take the
> stored version, randomly change it and that's what you send.
>
> What I understand from that is that you don't get to learn the
> URL/hostname at all, but can query if a URL/hostname has been submitted.
> You don't get to learn what the population is, but the whole population can
> be send.
>
> Is that accurate?
>

Hi,

through RAPPOR, we can send randomized values for all encountered domain
values.

Then, in analysis, we can test the noisy aggregate data against known
domain values and get an estimate of how frequently they occurred.

This gives immediate insights and we can increase the detail by adding more
sources for known domain values.

Georg
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-27 Thread David Bruant via governance

Hi Georg,

Some questions inlined

Le 21/08/2017 à 17:56, Georg Fritzsche via governance a écrit :

Hi,

for Firefox we want to better understand how people use our product to
improve their experience. To do that, we are planning to run a new SHIELD
study that tests how we can collect additional data in a privacy preserving
way. Check out the details below and send me your thoughts.

The problem.

One recurring ask from the Firefox product teams is the ability to collect
more sensitive data, like top sites users visit and how features perform on
specific sites.

Currently we can collect this data when the user opts in,  but we don't
have a way to collect unbiased data, without explicit consent (opt-out).

What is the current percentage of Firefox users opting-in?
What are the known biaises? How do they affect the study results?


Asks for sensitive data center most commonly around knowing something in
relation to which sites a user visits:

-

"Which top sites are users visiting?"
-

"Which sites using Flash does a user encounter?"
-

"Which sites does a user see heavy Jank on?"

In summary most asks are for occurrences of an event X per domain (more
specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).

The solution.

One solution is the use of differential privacy [2] [3], which allows us to
collect sensitive data without being able to make conclusions about
individual users, thus preserving their privacy.

An attacker that has access to the data a single user submits is not able
to tell whether a specific site was visited by that user or not.
Just to be 100% sure i understand, what will happen is that Firefox will 
lie (or answer randomly) to the question with probability p. This way, 
even if an attacker reaches to Moz servers, they can trust the answer 
only with probability 1-p.

There is a trade-off between utility (low p) and stronger privacy (high p).
Could this trade-off be documented and a hard low limit be decided?
Should each study decide on a different p based on data sensitivity?


The Google Open Source project called RAPPOR [4] [5] is the most widely
known and deployed implementation of differential privacy.

We have been investigating the use of RAPPOR for these kind of use-cases,
with initial simulation results being promising.

Our plan.

What we plan to do now is run an opt-out SHIELD study [6] to validate our
implementation of RAPPOR. This study will collect the value for users’ home
page (eTLD+1) for a randomly selected group of our release population  We
are hoping to launch this in mid-September.

This is not the type of data we have collected as opt-out in the past and
is a new approach for Mozilla. As such, we are still experimenting with the
project and wanted to reach out for feedback.
When this is on, can you publish the percentage and evolution of opt-out 
somewhere?


Maybe I'm unfamiliar with Firefox data collection and Shield stuides, 
but what's the policy regarding data deletion? Should the one about 
opt-out study data be stricter?
The Shield study page suggests a study lasts 7 days but "can last much 
longer" [1]. Could there be a strict policy about opt-out studies?
Or if a study needs to be longer, implement that each user does not have 
the addon for more than 7 days (for instance) in a row?


Thanks,

David

[1] 
https://wiki.mozilla.org/Firefox/Shield/Shield_Studies#How_long_do_Shield_Studies_last.3F 


___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-26 Thread Athena82 via governance
Op maandag 21 augustus 2017 17:56:44 UTC+2 schreef Georg Fritzsche:
> Hi,
> 
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
> 
> The problem.
> 
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
> 
> Currently we can collect this data when the user opts in,  but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
> 
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
> 
>-
> 
>"Which top sites are users visiting?"
>-
> 
>"Which sites using Flash does a user encounter?"
>-
> 
>"Which sites does a user see heavy Jank on?"
> 
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
> 
> The solution.
> 
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
> 
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
> 
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
> 
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
> 
> Our plan.
> 
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population  We
> are hoping to launch this in mid-September.
> 
> This is not the type of data we have collected as opt-out in the past and
> is a new approach for Mozilla. As such, we are still experimenting with the
> project and wanted to reach out for feedback.
> 
> Georg
> 
> References:
> 
> 1: https://en.wikipedia.org/wiki/Public_Suffix_List
> 
> 2: https://en.wikipedia.org/wiki/Differential_privacy
> 
> 3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/
> 
> 4: https://github.com/google/rappor
> 5: https://arxiv.org/abs/1407.6981
> 6:
> https://wiki.mozilla.org/Firefox/Shield/Shield_Studies

Thank you for reaching out to the community for feedback on this topic!
It is this kind of openness and transparency that makes me trust Mozilla's 
products more than anything else.

Which brings me to the planned opt-out SHIELD study.
I took the time to read a few things about the mechanism behind differential 
privacy, and while I believe this technology is promising and could be of value 
for Firefox to anonymize the more sensitive data, I don't think the goal of the 
study and the technology alone justify these data to be acquired in an opt-out 
fashion.

The benefits (eliminating occasional performance issues on popular websites) do 
not weigh up against the drawbacks (perception that Firefox resorts to 
techniques that put the user out of control, negative media coverage, declining 
user trust). I also wonder how this can be compatible with the GDPR's 
principles of consent?

So may I suggest to make this kind of anonymized but sensitive data collection 
always opt-in and persuade more users than ever (UX exercise!) to participate 
by building trust and informing about the purpose and the technology being 
used? Remember: trust takes years to build, seconds to break and forever to 
repair...
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-24 Thread hagis6789--- via governance
It would be better to keep it opt-in as the default, but during installation 
prompt with the option to opt-in to telemetry, rather than quietly letting the 
install go with the default as opted-out.

Also please make sure you've got the security precautions mentioned in my 
earlier post.

1. Keep personally identifiable information (like the combination of browser, 
add-ons, computer hardware, and antivirus) separate and disconnected from 
records of what websites were visited.

2. Ensure transmissions of monitoring data are encrypted.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-24 Thread hagis6789--- via governance
I'm just a regular Firefox user, albeit on who spent over 30 years in IT, 
including time on sensitive applications like BANKING and HEALTH CARE.

Whether kept as Opt-in or changed to Opt-out this is my feedback on maintaining 
privacy while monitoring product use:

1. I trust telemetry and dumps are encrypted for transmission.

2. The big thing is being very careful with "PERSONALLY IDENTIFIABLE" 
information. 

The combination of add-ons, computer hardware, antivirus, and other options is 
UNIQUE ENOUGH to qualify as personally identifiable.  That information MUST be 
kept disconnected from lists of what sites we visit and how often we visit them.

3. Since it is not typical monitoring, I think most people would accept that 
when there is a dump/error/crash that there is website info.

It is one thing to give a count of how many cases of syphilis there are in a 
city, quite another to give a count of how many cases there are in a town of 
300.  It is one thing to say that credit card spending is up 20%, quite another 
to say that Bob Smith's credit card spending is up 20%.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-24 Thread Georg Fritzsche via governance
Hi,

if users turn off data sharing through the preferences, we will not submit
this data. Firefox will always respect the users choice.

Georg

On Tue, Aug 22, 2017 at 9:32 PM, genetizen--- via governance <
governance@lists.mozilla.org> wrote:

> How do you see this study lining up with the data already being collected
> for the Firefox Health Report? This discussion strikes me as raising
> similar questions and tradeoffs, which were discussed on this forum and
> also blogged about by Mitchell and the metrics team.*
>
> Users currently have options under Preferences to "automatically send
> technical and interaction data to Mozilla" and to "send crash reports to
> Mozilla." How do you see handling the opt-out for the study from a UX
> perspective? Is there a way to automatically opt out those of us sending a
> DNT header? Or respect a user with data sharing turned off?
>
> *https://blog.lizardwrangler.com/2012/09/21/firefox-health-report/ and
> https://blog.mozilla.org/metrics/2012/09/21/firefox-health-report/
> ___
> governance mailing list
> governance@lists.mozilla.org
> https://lists.mozilla.org/listinfo/governance
>
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-24 Thread Kurt Roeckx via governance

On 2017-08-23 16:33, Alex Gaynor wrote:

I had the same question, but it looks like RAPPOR has gotten significantly
more advanced since I originally learned about the "just boolean questions"
version. https://arxiv.org/pdf/1503.01214.pdf explains how to build privacy
preserving measurements without knowing the values of the population.


So if I understand things correctly from the paper, you create a bloom 
filter for the URL/hostname you want to send, then randomly change it, 
store that. And each time they ask about the URL/hostname you take the 
stored version, randomly change it and that's what you send.


What I understand from that is that you don't get to learn the 
URL/hostname at all, but can query if a URL/hostname has been submitted. 
You don't get to learn what the population is, but the whole population 
can be send.


Is that accurate?


Kurt
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread sk.griffinix--- via governance
Do it with google, apple, windows first
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Danny via governance
On Wednesday, August 23, 2017 at 10:37:58 AM UTC-7, Doug Thayer wrote:
> Sticking to the farting analogy, it would be more like a methane 
> detector in a
> large building. If one person farts, really we couldn't tell since we 
> couldn't
> distinguish between one fart and regular fluctuations in the methane content
> of the air. However, if lots of people are farting, we should be able to
> estimate roughly how many farts are happening in a given time period. I 
> think
> it's important to make this distinction, because it means that we can only
> observe _common_ behaviors of the crowd, while deviant behaviors of an
> individual can _never_ be observed.

Hi Doug,

Thanks for the response.

I definitely wrote that when I haven't understood RAPPOR as well, so I 
apologize for that quick trigger response.

Reading the RAPPOR paper more, it looks like it does think through the case I 
was alluding to. The situation I was worried about is multiple collections and 
over a period of time. Yes, one participation in the methane detection test 
might not reveal much. But what's being asked is the automatic participation in 
all subsequent tests.

The RAPPOR paper does talk about this situation and does have cautions needed 
to accurately mitigate these. Especially things like multiple accidental 
participations. (install Firefox, install Firefox nightly for example)

I guess it's impossible for me to actually drill into whether Firefox's 
implementation would have all the cases covered. But just to say that what's 
asked is still the automatic trust of all and future behaviors (of the 
implementation).
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Danny via governance
On Wednesday, August 23, 2017 at 10:47:38 AM UTC-7, Georg Fritzsche wrote:
> Hi Danny.
> 
> On Tue, Aug 22, 2017 at 5:19 PM, Danny via governance <
> governance@lists.mozilla.org> wrote:
> 
> > I know that perf data is extremely important. In fact, I was just seeing
> > freezes yesterday and that's kinda frustrating. But I still won't enable
> > automatic data collection. What I think would be nice is if you actually
> > just prompted me "crash reporting" style. Ask me, "hey... we know Firefox
> > was a bit slow for you on such and such site, would you like to let us
> > know?" And then give me the option of "Yes, this one time", "Yes, always on
> > this site", "No".
> 
> 
> This would be subject to opt-in bias.
> 
> This works well to answer some kind of questions, but not generally when we
> need representative samples. Generally submission rates for this kind of
> opt-in mechanism are often low, which limits our possible insights.
> 
> Georg

Hi Georg,

I'm still not convinced. Does anyone actually have data on this?

We're not talking about opt-in vs opt-out in _general_. In general, you are 
absolutely right.

I meant to suggest the solution specifically for the perf issues. I meant to 
suggest to do an in-context, in-the-moment type request.

Like Laurențiu mentioned, this is very common in iOS apps, which I'm very 
comfortable with. An app want my contacts? I can decide at the moment of 
request. An app want my location? I can decide to allow it while the app is 
running or allow it also in background.

Apple goes even further and once in a while ask you "this app has been using 
your location in the background, do you want to continue to allow this?"

Apple received a ton of backlash early on and have implemented mechanisms to 
protect users' privacy. That level of protection I'm comfortable with.

The stated purpose is to discover top sites that users experience "heavy jank" 
on. My question is more that if users are experiencing "heavy jank", would they 
not want to submit a report? Are there desktop apps where such an approach has 
been taken?

I don't know of any.

It's always the all or nothing approach. Always "key to my house" vs not. To 
those requests, I always opt out.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Laurențiu Nicola via governance
Hi Georg,

I have a couple of questions and/or concerns and they don't seem to be 
addressed too well in this thread. It's probably going to be rather long, so 
sorry for that.

> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
> 
> Currently we can collect this data when the user opts in,  but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).

Does this refer to the Firefox Pioneer [1] add-on, or something else?

>> (...) We are not in the advertising business, (...) 
> In fairness, we have been, at one moment, to the surprise of many. I can
understand that people could fear that happens again some day. 

Thanks, Mike, for admitting this. I assume it's about the sponsored/suggested 
tiles functionality, but I'm not convinced that it stopped. Are there still 
plans to make about:newtab load from the Mozilla servers [2]? Is Activity 
Stream fundamentally different?

Note that RAPPOR was originally implemented years ago with the intention of 
being used for this kind of data collection [4] and not for monitoring 
performance.

I also think it's happening with Pocket.

> This data will be sent once (and only once) per copy of Firefox, to
make sure that nobody (including Mozilla) can deduce more detailed data
by observing specific users. 

That is the promise for this SHIELD experiment. We don't know how RAPPOR will 
be used in the future. It might, for example, be expanded to cover whole 
domains instead of eTLD+1s (that's been considered in the past [5], so it's not 
just a slippery slope argument).

> What we would be sending is a neat list of jumbled garbage that is almost 
> indistinguishable from random noise. No conclusions can be made about what 
> websites you visit from this.

My (admittedly shallow) understanding of DP is that there is always a risk of 
data being exposed. This is a parameter of the implementation and can be tuned 
in one direction of another, but it's always there. DP is not perfect privacy.

There's also a discussion of client identifiers (FHR/Telemetry ids) being 
included or not in the data. This is not obviously safe.

>> Offering to send anonymous info on one of these events, through a popup or
>> dropdown hanger (similar to the password manager, security certificates,
>> etc), would fulfill the same objective. A user is inclined to help when
>> his/her favorite website suddenly starts slowing down, or throwing errors.
>> At this point it's also easy to check a box to "always do this from now on".
> We don't want to annoy users _more_ by asking them to tell us about their
performance issue.

I feel like you're too eager to dismiss suggestions like this. Please don't. 
Mobile applications on iOS and Android do something similar [6], so the users 
might be familiar to them. Don't ask for a thousand permissions at install 
time. Ask nicely when you need something and show what you need it for. Allow 
the user to decide on a site by site basis.

> For crawling the sites, this will allow us to see how many sites use Flash,
but can't tell us which sites our users encounter it on.

If I understand it correctly, RAPPOR needs a pre-defined list of sites. If 
users encounter Flash applets in a RAPPOR study, you will already know it's on 
a site in that pre-defined list. It can be most likely be found via crawling.

Now you might be interested in how often users interact with Flash on those 
sites. I admit that's not possible with only crawling, but it's not obvious 
from your message.

I strongly dislike you giving the example of Flash. It's already dying and we 
all know that. Adobe will discontinue it in a couple of years. My guess is that 
the top visited sites are no longer using it. Would any information obtained 
via RAPPOR change Mozilla or Adobe's stance on Flash support? Compare this with 
the XUL add-on situation, where Mozilla already knows exactly what add-ons the 
users have and what they are installing.

>"Which top sites are users visiting?"

Alexa's top list should be enough, or whatever list you would be preloading 
into RAPPOR. If Firefox works well on those sites, the users will be happy. 
There's no reason to believe that Firefox users are interested in completely 
different sites from other Internet users.

The feeling I got from your first post is that you want to have the mechanism 
available without a clear idea of how it's going to be used. Myself, I'm really 
uncomfortable with this.

> Hello, Redditors... 

Please don't dismiss posters from high-profile sites like HN and Reddit. They 
came here because they care. They're the ones that recommend Firefox to their 
friends. Some of them are the ones who offered constructive feedback related to 
the issue at hand. And they are the Firefox users, even if they might not care 
to read a Wikipedia page full of formulae.

I understand how you might be annoyed 

Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Georg Fritzsche via governance
Hi Danny.

On Tue, Aug 22, 2017 at 5:19 PM, Danny via governance <
governance@lists.mozilla.org> wrote:

> I know that perf data is extremely important. In fact, I was just seeing
> freezes yesterday and that's kinda frustrating. But I still won't enable
> automatic data collection. What I think would be nice is if you actually
> just prompted me "crash reporting" style. Ask me, "hey... we know Firefox
> was a bit slow for you on such and such site, would you like to let us
> know?" And then give me the option of "Yes, this one time", "Yes, always on
> this site", "No".


This would be subject to opt-in bias.

This works well to answer some kind of questions, but not generally when we
need representative samples. Generally submission rates for this kind of
opt-in mechanism are often low, which limits our possible insights.

Georg
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Georg Fritzsche via governance
On Tue, Aug 22, 2017 at 5:19 PM, jotaf98--- via governance <
governance@lists.mozilla.org> wrote:

> The example questions can be answered with no need for the bulk telemetry
> that's proposed:
>
> >"Which top sites are users visiting?"
>
> There's enough public data available on what sites are most popular. No
> need for yet another database on that.
>
> >"Which sites using Flash does a user encounter?"
>
> Mozilla can crawl this information itself, based on the above websites
> list. It doesn't need to ask users to do it.


For crawling the sites, this will allow us to see how many sites use Flash,
but can't tell us which sites our users encounter it on.

Similarly for the top sites - third-party data is useful, but can't
reliably tell us which the actual top-sites for our Firefox users are.

Georg
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Doug Thayer via governance

(Edit: sending this again because it didn't seem to make it to the archives)

> The objection is not to DP's privacy guarantees, but to the fact that FF
> will phone home with every website we visit. A neat list of all the 
websites

> I visit will be sent to a central location, in chronological order.

I think this is misleading. What we would be sending is a neat list of 
jumbled
garbage that is almost indistinguishable from random noise. No 
conclusions can

be made about what websites you visit from this. With many records, we could
tell that a given site was probably visited X number of times by various
people, but at no point in time will anyone be able to say that you 
visited a
particular website. Apologies if you already understood this, but I 
wanted to

make it clear to anyone else reading your comment that it's not as if we're
sending "sketchywebsite.com" back to a central location.

> RAPPOR is kind of like the protection of farting in a crowded elevator.
> Somebody in that group did it, but we don't know who for sure. Yes, 
that's

> better privacy for sure, but is it total privacy? Not to me. Because you
> still know that somebody in that elevator did it very likely. Not a 
perfect

> analogy, but hopefully demonstrates the cracks.

Sticking to the farting analogy, it would be more like a methane 
detector in a
large building. If one person farts, really we couldn't tell since we 
couldn't

distinguish between one fart and regular fluctuations in the methane content
of the air. However, if lots of people are farting, we should be able to
estimate roughly how many farts are happening in a given time period. I 
think

it's important to make this distinction, because it means that we can only
observe _common_ behaviors of the crowd, while deviant behaviors of an
individual can _never_ be observed.

> Offering to send anonymous info on one of these events, through a 
popup or

> dropdown hanger (similar to the password manager, security certificates,
> etc), would fulfill the same objective. A user is inclined to help when
> his/her favorite website suddenly starts slowing down, or throwing 
errors.
> At this point it's also easy to check a box to "always do this from 
now on".


We don't want to annoy users _more_ by asking them to tell us about their
performance issue. Crashes are severe enough and can require detailed enough
information to diagnose that it's worth it in this case, but we would 
like to

be able to observe information about more minor events without pestering
people. This doesn't justify sacrificing their privacy, but the claim is 
that

RAPPOR allows us to do this without degrading anyone's privacy, since no
conclusions can be made about individual users or highly uncommon behavior.

> Exactly. Because the data is more sensitive the idea of opt-out comes 
into
> question before the question of the technology. If a person thinks 
that opt-

> out data collection is wrong it does not matter how effective the privacy
> technology is.
>
> This definitely has the potential to hurt the Firefox brand as a product
> that respects choice and does not try to trick you.
>
> Anyway since you wish a greater discussion on the actual technology i 
will

> stop here. Thank you for the replies.

We're focusing on the technology because the claim is that the technology
means that this data is not _actually_ more sensitive than the data we're
already collecting in an opt-out manner. We're not trying to hush users who
can't talk about the technical aspects of RAPPOR, but rather trying to 
keep it
on the topic of whether RAPPOR satisfies your definition of privacy or 
not. My

understanding of privacy is that if no one at all (malicious or not) is
capable of making conclusions about me in particular, then my privacy is 
being

protected. Differential privacy satisfies that definition, but privacy can
mean different things to different people.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread David Teller via governance
 Hi Olly,

 This is a good question. I am not part of that team. However, I have
followed (admittedly from afar) some of the project, so I'll try to
answer from my limited knowledge.

If I understand correctly, the plan is the following:

- Start with a pre-defined list of the N most visited websites around
the world (I don't know the value of N, but I would guess somewhere
around 1000). A website being "google.com" or "blogspot.com", for
instance, but without any further detail, so no differentiating between
blogs or actual search requests.

- Each copy of Firefox involved in this survey will send once a list of
booleans "yes I have visited this website during the past ... days" ...
except the data will be partially falsified by Firefox to make sure that
nobody (including Mozilla) has accurate data on individual users.

- This data will be sent once (and only once) per copy of Firefox, to
make sure that nobody (including Mozilla) can deduce more detailed data
by observing specific users.


I will let you judge whether this information is privacy-invasive.

I have no information regarding storage and retention policy. I am
nearly certain, however, that the IP is not stored.

I imagine that someone actually working on the project is working on a
more detailed presentation that would answer all your questions.

Best regards,
 David

On 23/08/17 16:14, oliver.hodgson--- via governance wrote:
> I know it's super-anonymised, but given the controversial nature of the 
> subject, it might help people to understand more about the actual data you're 
> planning to collect.
> 
> So what exactly do you plan to collect? How long do you plan to store this 
> data? How will it be stored? Is there a process in place to ensure it isn't 
> kept for any longer than necessary? Who will have access to the data and how?
> 
> Thanks,
> Olly.
> ___
> governance mailing list
> governance@lists.mozilla.org
> https://lists.mozilla.org/listinfo/governance
> 
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Kurt Roeckx via governance

On 2017-08-21 17:56, Georg Fritzsche wrote:

What we plan to do now is run an opt-out SHIELD study [6] to validate our
implementation of RAPPOR. This study will collect the value for users’ home
page (eTLD+1) for a randomly selected group of our release population  We
are hoping to launch this in mid-September.


This at least looks confusing to me. Will Firefox have a list of 
possible homepages, and then send some answer to Mozilla for a random 
sample (or all) of those?


That is at least how I expect it to work, and "collect the value" can be 
interpreted in multiple ways. I suggest someones writes a nice 
explanation of how this works.



Kurt
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Display name via governance
On Monday, August 21, 2017 at 5:56:44 PM UTC+2, Georg Fritzsche wrote:
> for Firefox we want to better understand how people use our product to
> improve their experience. 

You already have the tools to do that. It's called *survey*.
Plus, never heard any answer from you from all sugestions we give in my 
Computer Group (20+ persons). In years, never one single answer...
And now you want to "improve our experience"??? Really???
LOL
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread pgnet.dev--- via governance
On Wednesday, August 23, 2017 at 6:05:52 AM UTC-7, Mike Hommey wrote:
> In fairness, we have been, at one moment, to the surprise of many. I can
> understand that people could fear that happens again some day.

Well noted. And then some.

This debate is astonishing, but no longer surprising.

In my book, if a vendor wants to *begin* to estabish trust, step *1* is, as 
Irving above referenced, "Privacy by Default".

OTOH, if they want to ensure that they lose it rapidly, "Opt-Out" is, 
respevtively, a GREAT starting point.

Is there at least going to be an about:config param, ENV var, etc. to set PRIOR 
to this auto-toggle to Opt-Out? So that the toggle, even if only for moments, 
is preventable for both existing users, and for new installs?  Or do we need to 
start sleuthing for, and firewalling, telemetry endpoints?
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread oliver.hodgson--- via governance
I know it's super-anonymised, but given the controversial nature of the 
subject, it might help people to understand more about the actual data you're 
planning to collect.

So what exactly do you plan to collect? How long do you plan to store this 
data? How will it be stored? Is there a process in place to ensure it isn't 
kept for any longer than necessary? Who will have access to the data and how?

Thanks,
Olly.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Mike Hommey via governance
On Wed, Aug 23, 2017 at 03:10:31PM +0300, Panos Astithas via governance wrote:
> (...) We are not in the advertising business, (...)

In fairness, we have been, at one moment, to the surprise of many. I can
understand that people could fear that happens again some day.

Mike
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-23 Thread Panos Astithas via governance
On Tue, Aug 22, 2017 at 9:45 PM, hen000.c.young--- via governance <
governance@lists.mozilla.org> wrote:

> On Monday, 21 August 2017 15:56:44 UTC, Georg Fritzsche  wrote:
> > "Which sites does a user see heavy Jank on?"
>
>
> Why can't FireFox display a bar at the top asking the user to report the
> page for issues instead?
>

Because this is the definition of opt-in data collection ("can we collect
this data? Sure, I'm in!"), which has the data quality issues already
mentioned. Opt-out data collection means that by default we would be
collecting the data, unless the user goes to the preferences panel and opts
out of it (there is also a notification bar for every new installation to
remind users of this policy and how to opt out).

There is already both opt-out data collection going on (e.g. longest cycle
collection pause) and opt-in data collection (e.g. whether the device
supports touch input) in Firefox. The differential privacy approach of
RAPPOR we believe gives us the mathematical proof that we can collect some
of the more privacy-sensitive data in a way that doesn't reduce user
privacy.

And to reiterate the obvious: we don't collect user data to build user
profiles and we never want to. We are not in the advertising business, we
are a non profit working for the public benefit. We will always be
providing ways to disable data collection, even for the non personal
identifying information we need to collect in order to improve the browser.
We believe the academic research of the last few years on differential
privacy has figured out ways to collect data without infringing on user
privacy. Apple, Google and others are already using the fruits of this
research in their products. If there are reasons to believe these methods
aren't working well enough, we would very much like to know about them!

Panos
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread patricksmyth01--- via governance
I think the premise that you need to collect data on the top sites that a user 
visits may be flawed. Won't you be contributing to the dominance of 
(already-dominant) top sites by optimizing for them specifically?

It also seems that you could get a reasonably accurate idea of what sites are 
most popular among FIrefox users by looking at the most popular sites overall 
and optimizing for those. Do you expect that Firefox users are so wildly 
different that their top sites don't look more or less the same as the top 
sites overall?

Further, as has been shown again and again, data thought to be untraceable to 
any particular user has been deanonymized through correlations with other data 
sets. Something like top visited sites are actually a pretty juicy target as 
well for state actors, blackmailers, etc.

Finally, the mere act of doing random (from the user's perspective) telemetry 
is problematic. First, users on limited connections don't need to be using more 
data than they already are. Second, the mere act of making a request with IP 
endpoints, even if it sends only a ping, can expose an unprepared user who 
needs privacy. I understand that Firefox already does some of this, but that's 
not really a reason to do more.

From a business perspective, a major differentiating factor (arguably the only 
differentiating factor) of Firefox is that Mozilla isn't Google. The closer you 
get to that line, the more damage you'll do to the trust users have in Mozilla.

I recommend that you take the high road on this one. I'm not sure what the 
motivator is here (does having more data give you leverage with partners)? But 
the stated justification (improving speeds on particular websites) seems too 
weak to excuse the valid privacy concerns.

Mozilla: we want to trust you. We do trust you. We know it's tough out there. 
You're playing with the big kids, and they have intel that, admittedly, 
probably helps them improve their products. But the way you can improve your 
product is by NOT collecting that intel. Do the Mozilla thing, not the Google 
thing.

On Monday, August 21, 2017 at 11:56:44 AM UTC-4, Georg Fritzsche wrote:
> Hi,
> 
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
> 
> The problem.
> 
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
> 
> Currently we can collect this data when the user opts in,  but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
> 
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
> 
>-
> 
>"Which top sites are users visiting?"
>-
> 
>"Which sites using Flash does a user encounter?"
>-
> 
>"Which sites does a user see heavy Jank on?"
> 
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
> 
> The solution.
> 
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
> 
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
> 
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
> 
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
> 
> Our plan.
> 
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population  We
> are hoping to launch this in mid-September.
> 
> This is not the type of data we have collected as opt-out in the past and
> is a new approach for Mozilla. As such, we are still experimenting with the
> project and wanted to reach out for feedback.
> 
> Georg
> 
> References:
> 
> 1: https://en.wikipedia.org/wiki/Public_Suffix_List
> 
> 2: https://en.wikipedia.org/wiki/Differential_privacy
> 
> 3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/
> 
> 4: https://github.com/google/rappor
> 5: https://arxiv.org/abs/1407.6981
> 6:
> https://wiki.mozilla.org/Firefox/Shield/Shield_Studies

___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread moroccanhashish--- via governance
My initial thoughts were RAPPOR is just another data collection system that 
claims to respect its user's privacy but doesn't really, though upon a little 
research I've found it does the exact opposite, wherein it really does respect 
privacy by aggregating real data with fake, random data. It reminds me of what 
Wikileaks did to prevent its real sources from being discovered. I just hope 
more sites take this approach to data collection, since it gives the best of 
both worlds.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread Rubén Martín via governance
Hi,

I completely agree with Irvin here. This proposal is extremely
uncomfortable for me, even after reading about Rappor, I share the same
concerns others have expressed in this topic about privacy and user
expectations.

I think we can be creative if the problem is that we need to understand
which sites are not performing OK on Firefox without compromising our
values. Sending sensitive information as opt-out is not, in my opinion,
the way to go.

Cheers.

El 22/08/17 a las 19:54, Irvin Chen via governance escribió:
> I'm totally support for any user research, if it is following the rules we
> advocate for...
>
> “Individuals’ security and privacy on the Internet are fundamental and must
> not be treated as optional.”
> https://www.mozilla.org/en-US/about/manifesto/#principle-04
>
> “No surprises
> Use and share information in a way that is transparent and benefits the
> user.”
> https://www.mozilla.org/en-US/privacy/principles/
>
> “Privacy as the default setting: ...privacy must be top of mind. It also
> means that strong privacy should always be the ‘by-default setting’.”
> https://blog.mozilla.org/netpolicy/2016/05/25/the-countdown-is-on-24-months-to-gdpr-compliance/
>
> “Privacy by Default
> Privacy by Default simply means that the strictest privacy settings
> automatically apply once a customer acquires a new product or service. In
> other words, no manual change to the privacy settings should be required on
> the part of the user.”
> http://www.eudataprotectionregulation.com/data-protection-design-by-default
>
>
>
> Aaron Klotz via governance 於 2017年8月23日
> 週三,上午1:21寫道:
>
>> For the purposes of this thread I am not taking a specific position on
>> the overall issue, but as somebody who has worked on performance I would
>> like to point something out for discussion:
>>
>> On 8/22/2017 9:19 AM, jotaf98--- via governance wrote:
 "Which top sites are users visiting?"
>>> There's enough public data available on what sites are most popular. No
>> need for yet another database on that.
 "Which sites using Flash does a user encounter?"
>>> Mozilla can crawl this information itself, based on the above websites
>> list. It doesn't need to ask users to do it.
>>
>> I don't think it's that simple. Plenty of content on top sites is
>> tailored to the user in some way. To measure how a browser is performing
>> on those sites, one would want to measure performance for actual content
>> that real users are seeing. Throwing some kind of crawler at it is
>> unlikely to produce representative samples of encounters with such
>> content, IMHO.
>> ___
>> governance mailing list
>> governance@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/governance
>>
> ___
> governance mailing list
> governance@lists.mozilla.org
> https://lists.mozilla.org/listinfo/governance


-- 
Rubén Martín [Nukeador]
Mozilla Reps Mentor
http://www.mozilla-hispano.org
http://twitter.com/mozilla_hispano
http://facebook.com/mozillahispano




signature.asc
Description: OpenPGP digital signature
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread genetizen--- via governance
How do you see this study lining up with the data already being collected for 
the Firefox Health Report? This discussion strikes me as raising similar 
questions and tradeoffs, which were discussed on this forum and also blogged 
about by Mitchell and the metrics team.*

Users currently have options under Preferences to "automatically send technical 
and interaction data to Mozilla" and to "send crash reports to Mozilla." How do 
you see handling the opt-out for the study from a UX perspective? Is there a 
way to automatically opt out those of us sending a DNT header? Or respect a 
user with data sharing turned off?

*https://blog.lizardwrangler.com/2012/09/21/firefox-health-report/ and 
https://blog.mozilla.org/metrics/2012/09/21/firefox-health-report/
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread Doug Thayer via governance
> The objection is not to DP's privacy guarantees, but to the fact that FF
> will phone home with every website we visit. A neat list of all the
websites
> I visit will be sent to a central location, in chronological order.

I think this is misleading. What we would be sending is a neat list of
jumbled
garbage that is almost indistinguishable from random noise. No conclusions
can
be made about what websites you visit from this. With many records, we could
tell that a given site was probably visited X number of times by various
people, but at no point in time will anyone be able to say that you visited
a
particular website. Apologies if you already understood this, but I wanted
to
make it clear to anyone else reading your comment that it's not as if we're
sending "sketchywebsite.com" back to a central location.

> RAPPOR is kind of like the protection of farting in a crowded elevator.
> Somebody in that group did it, but we don't know who for sure. Yes, that's
> better privacy for sure, but is it total privacy? Not to me. Because you
> still know that somebody in that elevator did it very likely. Not a
perfect
> analogy, but hopefully demonstrates the cracks.

Sticking to the farting analogy, it would be more like a methane detector in
a large building. If one person farts, really we couldn't tell since we
couldn't distinguish between one fart and regular fluctuations in the
methane
content of the air. However, if lots of people are farting, we should be
able
to estimate roughly how many farts are happening in a given time period. I
think it's important to make this distinction, because it means that we can
only observe _common_ behaviors of the crowd, while deviant behaviors of an
individual can _never_ be observed.

> Offering to send anonymous info on one of these events, through a popup or
> dropdown hanger (similar to the password manager, security certificates,
> etc), would fulfill the same objective. A user is inclined to help when
> his/her favorite website suddenly starts slowing down, or throwing errors.
> At this point it's also easy to check a box to "always do this from now
on".

We don't want to annoy users _more_ by asking them to tell us about their
performance issue. Crashes are severe enough and can require detailed enough
information to diagnose that it's worth it in this case, but we would like
to
be able to observe information about more minor events without pestering
people. This doesn't justify sacrificing their privacy, but the claim is
that
RAPPOR allows us to do this without degrading anyone's privacy, since no
conclusions can be made about individual users or highly uncommon behavior.

> Exactly. Because the data is more sensitive the idea of opt-out comes into
> question before the question of the technology. If a person thinks that
opt-
> out data collection is wrong it does not matter how effective the privacy
> technology is.
>
> This definitely has the potential to hurt the Firefox brand as a product
that
> respects choice and does not try to trick you.
>
> Anyway since you wish a greater discussion on the actual technology i will
> stop here. Thank you for the replies.

We're focusing on the technology because the claim is that the technology
means that this data is not _actually_ more sensitive than the data we're
already collecting in an opt-out manner. We're not trying to hush users who
can't talk about the technical aspects of RAPPOR, but rather trying to keep
it
on the topic of whether RAPPOR satisfies your definition of privacy or not.
My
understanding of privacy is that if no one at all (malicious or not) is
capable of making conclusions about me in particular, then my privacy is
being
protected. Differential privacy satisfies that definition, but privacy can
mean
different things to different people.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread Irvin Chen via governance
I'm totally support for any user research, if it is following the rules we
advocate for...

“Individuals’ security and privacy on the Internet are fundamental and must
not be treated as optional.”
https://www.mozilla.org/en-US/about/manifesto/#principle-04

“No surprises
Use and share information in a way that is transparent and benefits the
user.”
https://www.mozilla.org/en-US/privacy/principles/

“Privacy as the default setting: ...privacy must be top of mind. It also
means that strong privacy should always be the ‘by-default setting’.”
https://blog.mozilla.org/netpolicy/2016/05/25/the-countdown-is-on-24-months-to-gdpr-compliance/

“Privacy by Default
Privacy by Default simply means that the strictest privacy settings
automatically apply once a customer acquires a new product or service. In
other words, no manual change to the privacy settings should be required on
the part of the user.”
http://www.eudataprotectionregulation.com/data-protection-design-by-default



Aaron Klotz via governance 於 2017年8月23日
週三,上午1:21寫道:

> For the purposes of this thread I am not taking a specific position on
> the overall issue, but as somebody who has worked on performance I would
> like to point something out for discussion:
>
> On 8/22/2017 9:19 AM, jotaf98--- via governance wrote:
> >> "Which top sites are users visiting?"
> > There's enough public data available on what sites are most popular. No
> need for yet another database on that.
> >
> >> "Which sites using Flash does a user encounter?"
> > Mozilla can crawl this information itself, based on the above websites
> list. It doesn't need to ask users to do it.
>
> I don't think it's that simple. Plenty of content on top sites is
> tailored to the user in some way. To measure how a browser is performing
> on those sites, one would want to measure performance for actual content
> that real users are seeing. Throwing some kind of crawler at it is
> unlikely to produce representative samples of encounters with such
> content, IMHO.
> ___
> governance mailing list
> governance@lists.mozilla.org
> https://lists.mozilla.org/listinfo/governance
>
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread turin231--- via governance

> 
> The idea of opt-out data collection is not really the
> question; the difference here is that the data is potentially more
> sensitive. 
> 
> Gerv

Exactly. Because the data is more sensitive the idea of opt-out comes into 
question before the question of the technology. If a person thinks that opt-out 
data collection is wrong it does not matter how effective the privacy 
technology is. 

This definitely has the potential to hurt the Firefox brand as a product that 
respects choice and does not try to trick you. 

Anyway since you wish a greater discussion on the actual technology i will stop 
here. Thank you for the replies.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread Danny via governance
Hoping to provide constructive feedback.

A little about me as a user first so you can understand:
1) I purposely do not use Chrome
2) I purposely do not use Google and use DuckDuckGo instead as my search engine
3) I purposely do not use Gmail and use FastMail instead
4) I use uBlock, self-destructing cookies, Privacy Badger, etc
5) I use container tabs
6) I opt-out of any data collection
7) The first thing I do with Firefox Focus on iOS is to opt out of data 
collection
8) I like the idea of Firefox Send being end-to-end encrypted
9) I encrypt my backup locally prior to sending it to the cloud
10) I do not use Dropbox and use Tresorit instead
11) I disabled all telemetry on Windows 10

Now, if this was pushed out, the first thing I would do is still to disable it.
But why is that? RAPPOR is awesome right?

I briefly read the overview for it, so please correct me if I have any 
misunderstanding.

RAPPOR is kind of like the protection of farting in a crowded elevator. 
Somebody in that group did it, but we don't know who for sure. Yes, that's 
better privacy for sure, but is it total privacy? Not to me. Because you still 
know that somebody in that elevator did it very likely. Not a perfect analogy, 
but hopefully demonstrates the cracks.

Why do users like me do end-to-end encryption? What does that give me?
It gives me the ability to trust nobody except my end.

RAPPOR does not offer that same level of protection, and I think that's 
hopefully clear by the elevator example. That's why the first thing I'll do is 
disable it.

Why do users like me use uBlock and other things? What do those things give us? 
Total control in our hands. Does RAPPOR measure up to that standard? I think no.

But I very much want Firefox to succeed because the alternative of Chrome or 
Edge is a sad world. And I very much would like to submit data to Firefox, but 
not in an automatic and uncontrolled (by me) way.

Why does the choice have to be binary?

If I may suggest, could Mozilla investigate doing a bit more UX work to make 
data collection palatable to users like me?

And that means putting me in control.

I know that perf data is extremely important. In fact, I was just seeing 
freezes yesterday and that's kinda frustrating. But I still won't enable 
automatic data collection. What I think would be nice is if you actually just 
prompted me "crash reporting" style. Ask me, "hey... we know Firefox was a bit 
slow for you on such and such site, would you like to let us know?" And then 
give me the option of "Yes, this one time", "Yes, always on this site", "No".

Of course you still have to anonymize that data, but what this does is you've 
given me control. See the distinction?

What if you want to know what top sites I'm visiting? or what sites with Flash 
that I encounter? Same thing. Yes, I know the suggestion is eTLD+1, but 
hopefully mindset of users like me has been explained at this point to show 
that is still not measuring up. I would love to let you know what sites I 
visit. Give me a little feedback ability, show me the sites that you're going 
to send and let me check off anything I don't want. Again, you probably don't 
want to annoy the user with needless prompts, so maybe the ability to say "Yes, 
these sites are always ok to send", or "Always exclude this site".

I'm sure if you proceed as planned to automatic opt-in users to this 
collection, you're going to get more data. But you can bet you'll get none from 
me.

I want to send you data, so please help me help you.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread jotaf98--- via governance

I made a thoughtful comment and it was rejected with a response that I need to 
show I'm familiar with Differential Privacy and RAPPOR before commenting. I'll 
do that before my actual comment.

I'm a computer scientist working in an adjacent field and I've read enough 
papers on Differential Privacy to understand it.

The objection is not to DP's privacy guarantees, but to the fact that FF will 
phone home with every website we visit. A neat list of all the websites I visit 
will be sent to a central location, in chronological order.

A second objection is the users' response, regardless of guarantees. You can't 
explain DP to everyone. For many users it will amount to "trust us". Microsoft 
did the same with the Windows 10 telemetry and it resulted in enormous backlash 
from users, widely reported in tech websites. Consider that before committing.

---

What follows was my actual suggestion, which is orthogonal to DP.

The example questions can be answered with no need for the bulk telemetry 
that's proposed:

>"Which top sites are users visiting?"

There's enough public data available on what sites are most popular. No need 
for yet another database on that.

>"Which sites using Flash does a user encounter?"

Mozilla can crawl this information itself, based on the above websites list. It 
doesn't need to ask users to do it.

>"Which sites does a user see heavy Jank on?"

Slowdowns and similar bad user experiences would better be treated like crash 
reports.

Offering to send anonymous info on one of these events, through a popup or 
dropdown hanger (similar to the password manager, security certificates, etc), 
would fulfill the same objective. A user is inclined to help when his/her 
favorite website suddenly starts slowing down, or throwing errors. At this 
point it's also easy to check a box to "always do this from now on".

Rather than authorizing abstract, bulk usage, the user would see the value in 
sending a report about the current issue, because he/she is experiencing it and 
wants Mozilla to fix it. I'm sure there would be more reports in this manner, 
just like there are more than enough crash reports being sent.

---

In conclusion, no telemetry is one of the main reasons for adopting FF over 
Chrome. Without dismissing the developers' point of view, given the importance 
of this feature, the onus should be on them to show that the alternatives have 
been explored and are not feasible, rather than putting the onus on users to 
show holes in the DP scheme, which is too restrictive for a discussion.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread Gervase Markham via governance
On 22/08/17 08:07, turin...@gmail.com wrote:
> Correct me if i am wrong but this is presented as a solution to
> collect data without having to get explicit consent. It is not clear
> that user will be able to disabled it or not. If this the case then
> please be more clear as it will lead to misunderstandings.

Perhaps it could have been more clear in the initial post, but Georg
definitely says:

"This is not the type of data we have collected as opt-out in the past
and is a new approach for Mozilla."

So this data collection will have an opt-out.

> How will this work? Having it enabled by default without making
> explicitly clear that it is happening is still morally wrong and
> anti-privacy. The policy pretty much hopes that people will be either
> be uninformed or complacent in disabling it. Otherwise whats the
> different to asking for explicit consent?

We have other data we collect which is opt-out, such as how the browser
is performing. The idea of opt-out data collection is not really the
question; the difference here is that the data is potentially more
sensitive. To address that, we want to use differential privacy and
RAPPOR; a good discussion to have, therefore, is whether those tools do
the job or not.

Gerv
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread turin231--- via governance
On Tuesday, August 22, 2017 at 4:49:56 PM UTC+2, Gervase Markham wrote:
> On 22/08/17 07:45,
> > But the disagreement is not about the idea that the technology does
> > not work. But that in principal collecting more data without users
> > having the option for disable it is moral wrong no matter how
> > trustworthy you are or useful it is for the product.
> 
> Users _do_ have the option to disable it.
> 
> Gerv

Correct me if i am wrong but this is presented as a solution to collect data 
without having to get explicit consent. It is not clear that user will be able 
to disabled it or not. If this the case then please be more clear as it will 
lead to misunderstandings. 

How will this work? Having it enabled by default without making explicitly 
clear that it is happening is still morally wrong and anti-privacy. The policy 
pretty much hopes that people will be either be uninformed or complacent in 
disabling it. Otherwise whats the different to asking for explicit consent?
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread David Teller via governance
  Hello Siva,

 I'll try and chime in.

1. The main problem that is at stake here is improving Firefox for
websites that our users actually use. We fight a perpetual fight to
improve Firefox for our users, which means that we need to know where to
spend our limited resources. While we can manually or semi-automatically
test a number of websites to find out whether, for instance, Firefox 55
is faster or slower than the previous version, for the moment, we have
to rely upon guesses to determine whether these are websites that our
users actually use.

With more data, we could automatically determine such information and more.

For instance, we could correlate this with crash reports of users who
choose to submit these reports and automatically find out that since
version 60 of Firefox, site foobar.com causes crashes, or maybe that
crashes have decreased on that site since the release of a new graphics
driver. Or we could correlate this with performance reports of users who
opt-in for such reports and automatically find out that since version 60
of Firefox, our performance on foobar.com has improved/decreased.

So, to summarize:

- being able to apply effort to websites that matter to our users;

- being able to automatically detect problems (or improvements) on websites.



2. I don't know the details sufficiently to answer on this point.
However, I can give you my personal thoughts on it.

We have known for long (by comparing our data with other available
sources of data such as Alexa) that there is a considerable bias between
users that opt-in for Telemetry and the rest of our users. Users who
opt-in for Telemetry are typically much more technically aware than
other users, but also some countries were largely over-represented.



3. That's a UX question, so that's pretty far from my expertise, but
there is already a section "Firefox Data Collection and Use" in
preferences, which may be used to opt-in/opt-out. I also seem to
remember that Firefox actually asks you upon first installation whether
you are ok with sending data.


I hope this helps,
 David

On 22/08/17 16:44, siva.rk.sw--- via governance wrote:
>> Asks for sensitive data center most commonly around knowing something in
>> relation to which sites a user visits:
>>
>>-
>>
>>"Which top sites are users visiting?"
>>-
>>
>>"Which sites using Flash does a user encounter?"
>>-
>>
>>"Which sites does a user see heavy Jank on?"
>>
>> In summary most asks are for occurrences of an event X per domain (more
>> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
> 
> 
> Hello Georg, three questions:
> 
> 1. Could you explain exactly what kinds of problems (which are currently a 
> big source of trouble) would be solved easily with the currently proposed 
> plan? And also what kinds of problems cannot be solved with this data, but 
> could be solved with more invasive data collection?
> 
> 2. What exactly is the problem if the collection is opt-in? Yes the data is 
> "biased", so what? Are you worried that you might miss certain issues faced 
> mostly by users who don't opt in? Is there any justification for this 
> argument, or is it just a hunch?
> 
> 3. For those users who consider privacy most valuable, would there be an easy 
> way to opt out, which *guarantees* that Mozilla collects *no information* 
> about their browser usage?
> 
> Thank you.
> 
> Siva
> ___
> governance mailing list
> governance@lists.mozilla.org
> https://lists.mozilla.org/listinfo/governance
> 
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread malte.dik--- via governance
Why collect on the client side when the server side for the larger sites most 
definitely collects usage data to much more detail than you would ever do?

Wouldn't Mozilla be in a strong enough position to ask for statistics of user 
agents from Facebook, Google, etc., and maybe even what hoops their respective 
engineering departments have to jump through to make the site work on all major 
platforms?
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread dan.callahan--- via governance
On Monday, August 21, 2017 at 10:56:44 AM UTC-5, Georg Fritzsche wrote:
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.

Differential privacy is a great tool, however, I'm concerned that even if we do 
everything *technically* correctly to preserve user privacy, the *optics* 
associated with this sort of data collection were not address in this email.

We attempted to do similarly with User Profile ("UP") / Directory Tiles 
projects in Content Services, which proposed completely local history analysis 
for purposes of advertising and content discovery. All of which was done in a 
way that absolutely protected user privacy (the analysis never left the local 
machine), but we weren't able to overcome the superficial impression that 
Firefox was tracking users.

1. How do you propose we address the change in (and mis-)perception of Firefox 
as a result of this telemetry?

2. Secondly, I'm far more comfortable with data collection that's strictly tied 
to performance (jank, Flash domains, etc.) than I am with personal data, like 
homepages or top sites. Would this project be as valuable *without* collecting 
personalized information like the above?

Best,
Dan
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread Gervase Markham via governance
On 22/08/17 07:45, turin...@gmail.com wrote:
> But the disagreement is not about the idea that the technology does
> not work. But that in principal collecting more data without users
> having the option for disable it is moral wrong no matter how
> trustworthy you are or useful it is for the product.

Users _do_ have the option to disable it.

Gerv

___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread turin231--- via governance
On Tuesday, August 22, 2017 at 4:39:36 PM UTC+2, Gervase Markham wrote:
> Hello, Redditors...
> 
> On 21/08/17 08:56, Georg Fritzsche wrote:
> > One solution is the use of differential privacy [2] [3], which allows us to
> > collect sensitive data without being able to make conclusions about
> > individual users, thus preserving their privacy.
> 
> If you are going to comment here, your comment would be more useful if
> it showed that you have taken the time to understand differential
> privacy and RAPPOR, and explained why you think it's not sufficient (if
> that's what you think, after studying it).
> 
> Comments which assume that we are proposing to collect browser data with
> no privacy protections at all are not helpful, because they assume
> things which are not true.
> 
> Gerv

But the disagreement is not about the idea that the technology does not work. 
But that in principal collecting more data without users having the option for 
disable it is moral wrong no matter how trustworthy you are or useful it is for 
the product.  
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread siva.rk.sw--- via governance
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
> 
>-
> 
>"Which top sites are users visiting?"
>-
> 
>"Which sites using Flash does a user encounter?"
>-
> 
>"Which sites does a user see heavy Jank on?"
> 
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).


Hello Georg, three questions:

1. Could you explain exactly what kinds of problems (which are currently a big 
source of trouble) would be solved easily with the currently proposed plan? And 
also what kinds of problems cannot be solved with this data, but could be 
solved with more invasive data collection?

2. What exactly is the problem if the collection is opt-in? Yes the data is 
"biased", so what? Are you worried that you might miss certain issues faced 
mostly by users who don't opt in? Is there any justification for this argument, 
or is it just a hunch?

3. For those users who consider privacy most valuable, would there be an easy 
way to opt out, which *guarantees* that Mozilla collects *no information* about 
their browser usage?

Thank you.

Siva
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread Gervase Markham via governance
Hello, Redditors...

On 21/08/17 08:56, Georg Fritzsche wrote:
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.

If you are going to comment here, your comment would be more useful if
it showed that you have taken the time to understand differential
privacy and RAPPOR, and explained why you think it's not sufficient (if
that's what you think, after studying it).

Comments which assume that we are proposing to collect browser data with
no privacy protections at all are not helpful, because they assume
things which are not true.

Gerv
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread omar.abid2006--- via governance
On Monday, August 21, 2017 at 4:56:44 PM UTC+1, Georg Fritzsche wrote:
> Hi,
> 
> for Firefox we want to better understand how people use our product to
> improve their experience. To do that, we are planning to run a new SHIELD
> study that tests how we can collect additional data in a privacy preserving
> way. Check out the details below and send me your thoughts.
> 
> The problem.
> 
> One recurring ask from the Firefox product teams is the ability to collect
> more sensitive data, like top sites users visit and how features perform on
> specific sites.
> 
> Currently we can collect this data when the user opts in,  but we don't
> have a way to collect unbiased data, without explicit consent (opt-out).
> 
> Asks for sensitive data center most commonly around knowing something in
> relation to which sites a user visits:
> 
>-
> 
>"Which top sites are users visiting?"
>-
> 
>"Which sites using Flash does a user encounter?"
>-
> 
>"Which sites does a user see heavy Jank on?"
> 
> In summary most asks are for occurrences of an event X per domain (more
> specifically eTLD+1 [1], e.g. facebook.com or google.co.uk).
> 
> The solution.
> 
> One solution is the use of differential privacy [2] [3], which allows us to
> collect sensitive data without being able to make conclusions about
> individual users, thus preserving their privacy.
> 
> An attacker that has access to the data a single user submits is not able
> to tell whether a specific site was visited by that user or not.
> 
> The Google Open Source project called RAPPOR [4] [5] is the most widely
> known and deployed implementation of differential privacy.
> 
> We have been investigating the use of RAPPOR for these kind of use-cases,
> with initial simulation results being promising.
> 
> Our plan.
> 
> What we plan to do now is run an opt-out SHIELD study [6] to validate our
> implementation of RAPPOR. This study will collect the value for users’ home
> page (eTLD+1) for a randomly selected group of our release population  We
> are hoping to launch this in mid-September.
> 
> This is not the type of data we have collected as opt-out in the past and
> is a new approach for Mozilla. As such, we are still experimenting with the
> project and wanted to reach out for feedback.
> 
> Georg
> 
> References:
> 
> 1: https://en.wikipedia.org/wiki/Public_Suffix_List
> 
> 2: https://en.wikipedia.org/wiki/Differential_privacy
> 
> 3: https://robertovitillo.com/2016/07/29/differential-privacy-for-dummies/
> 
> 4: https://github.com/google/rappor
> 5: https://arxiv.org/abs/1407.6981
> 6:
> https://wiki.mozilla.org/Firefox/Shield/Shield_Studies

What about the fact that I don't want to give my information even in an 
anonymous and untraceable way? You understand that anonymity is just part of 
the equation and not the single issue at stake here.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread philipp.koschinski--- via governance
If this will be implemented, I’ll have to file a complaint with the relevant 
Landes- and Bundesbeauftragten für Datenschutz, and, possibly, escalate this to 
the EU Data Privacy commissioners office.

I’d prefer if you’d avoid doing this.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance


Re: Usage of Differential Privacy & RAPPOR

2017-08-22 Thread djoacogala--- via governance
Hello.

I don't have the neccesary information to say whether this is correct, moral, 
or neccesary, but I will say that I believe Opt-in is pro-privacy, while 
Opt-out is anti-privacy.

If Firefox is dedicated to preserving privacy, then no Opt-in data feature 
should be added.

Thank you.
___
governance mailing list
governance@lists.mozilla.org
https://lists.mozilla.org/listinfo/governance