On 04/04/2016 11:01, Romain Testard wrote:
The privacy review bug is
https://bugzilla.mozilla.org/show_bug.cgi?id=1261467.
More details added below.

See response at the bottom.

On Mon, Apr 4, 2016 at 11:23 AM, Gijs Kruitbosch <gijskruitbo...@gmail.com>
wrote:
On 04/04/2016 10:01, Romain Testard wrote:

     We would use a whitelist client-side to only collect domains that are
     part of the top 2000 domains (Alexa list of top domains). This
prevents
     personal identification based on obscure domain usage.


Mathematically, the combination of a set of (popular) domains shared could
still be uniquely identifying, especially as, AIUI, you will get the counts
of each domain and in what sequence they were visited / which ones were
visited in which session. It all depends on the number of unique users and
the number of domains they visit / share (not clear: see above). Because
the total number of Hello users compared with the number of Firefox users
is quite low, this still seems somewhat concerning to me. Have you tried to
remedy this in any way?


We are aggregating domain names, and are not storing session histories.
These are submitted at the end of the session, so exact timestamps of any
visit are not included.

But both Firefox and Hello sessions are commonly relatively short (<1d) and numerous. That means lots of data points, which will likely be enough to uniquely identify people even without exact timestamps of their visits. (FWIW, from a technical perspective, there is no reason why the submission time implies ("so") that exact timestamps of visits are not included.)

We looked into this approach originally although we found that we'd lose a
level of granularity that can have an importance. We may find that Hello
gets used a lot with a specific Website for a specific reason and using
client side categories would prevent us from learning this.

This was explicitly not in your original motivation, so you're moving the goalposts here. If the goal is about separate categories or separate sites then those are pretty distinct goals that require different approaches. If the real point is "we have no idea, so we figured we'd just get the data and then go from there", why not be upfront about it? But in that case, yeah, why not consider a survey or something less intrusive, like asking people explicitly what type of site they were using, or asking if Mozilla can use the domain in question ?

Also Alexa
website categories are far from perfect which would add another level of
complexity to understand the collected data.

At no point did I say I expected you to use their categorization, whatever that is. Categorize as you see fit, rather than as Alexa does it.

Conversely, if their categorization is questionable, then your scrubbing of the Adult category sounds like it might need auditing? Also, why not other categories like "Banking" or "Medical" (NB: no idea what categorization Alexa employs, but these seem like categories that ought to be scrubbed, too)?


6 months also seems incredibly long. You should be able to aggregate the
data and keep that ("60% of users share on sites of type X") and throw away
the raw data much sooner than that.

Yes agreed, we'll look into what's the most optimal amount of time required
to process the data and extract the useful information. I agree we should
try to make this shorter - we'll learn from being on Beta and will adjust
this accordingly.

Well, why not make it 1 week to start with, and make it longer if you don't get enough information from beta (with a rationale as to why that is the case) ?

Finally, I am surprised that you're sharing this 2 weeks before we're
releasing Firefox 46. Hasn't this been tested and verified on Nightly
and/or other channels? Why was no privacy update made at/before that time?


We are shipping Hello through Go Faster. The Go Faster process allows us to
uplift directly to Beta 46 directly since we're a system add-on
(development was done about 2 weeks ago).
Firefox Hello has its own privacy notice (details here
<https://www.mozilla.org/en-US/privacy/firefox-hello/>).

But shipping through go faster does not absolve you from adequately testing changes and getting feedback on them. Is the add-on not getting tested on nightly at all? Or at the same time as it goes to beta? When will it be used on release - when 46 ships as release, or earlier, or later?

It also seems like you filed the privacy review after the functionality was implemented and is now shipping, which per https://wiki.mozilla.org/Privacy/Reviews seems like it is too late to incorporate meaningful feedback. I'm not on the privacy team, but that order looks wrong to me.

Finally, that privacy policy at no point says anything about Mozilla having access to visited/shared domains and thereby potentially to personally identifying information.

~ Gijs
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to