On 04/04/2016 11:01, Romain Testard wrote:
The privacy review bug is
https://bugzilla.mozilla.org/show_bug.cgi?id=1261467.
More details added below.
See response at the bottom.
On Mon, Apr 4, 2016 at 11:23 AM, Gijs Kruitbosch <gijskruitbo...@gmail.com>
wrote:
On 04/04/2016 10:01, Romain Testard wrote:
We would use a whitelist client-side to only collect domains that are
part of the top 2000 domains (Alexa list of top domains). This
prevents
personal identification based on obscure domain usage.
Mathematically, the combination of a set of (popular) domains shared could
still be uniquely identifying, especially as, AIUI, you will get the counts
of each domain and in what sequence they were visited / which ones were
visited in which session. It all depends on the number of unique users and
the number of domains they visit / share (not clear: see above). Because
the total number of Hello users compared with the number of Firefox users
is quite low, this still seems somewhat concerning to me. Have you tried to
remedy this in any way?
We are aggregating domain names, and are not storing session histories.
These are submitted at the end of the session, so exact timestamps of any
visit are not included.
But both Firefox and Hello sessions are commonly relatively short (<1d)
and numerous. That means lots of data points, which will likely be
enough to uniquely identify people even without exact timestamps of
their visits. (FWIW, from a technical perspective, there is no reason
why the submission time implies ("so") that exact timestamps of visits
are not included.)
We looked into this approach originally although we found that we'd lose a
level of granularity that can have an importance. We may find that Hello
gets used a lot with a specific Website for a specific reason and using
client side categories would prevent us from learning this.
This was explicitly not in your original motivation, so you're moving
the goalposts here. If the goal is about separate categories or separate
sites then those are pretty distinct goals that require different
approaches. If the real point is "we have no idea, so we figured we'd
just get the data and then go from there", why not be upfront about it?
But in that case, yeah, why not consider a survey or something less
intrusive, like asking people explicitly what type of site they were
using, or asking if Mozilla can use the domain in question ?
Also Alexa
website categories are far from perfect which would add another level of
complexity to understand the collected data.
At no point did I say I expected you to use their categorization,
whatever that is. Categorize as you see fit, rather than as Alexa does it.
Conversely, if their categorization is questionable, then your scrubbing
of the Adult category sounds like it might need auditing? Also, why not
other categories like "Banking" or "Medical" (NB: no idea what
categorization Alexa employs, but these seem like categories that ought
to be scrubbed, too)?
6 months also seems incredibly long. You should be able to aggregate the
data and keep that ("60% of users share on sites of type X") and throw away
the raw data much sooner than that.
Yes agreed, we'll look into what's the most optimal amount of time required
to process the data and extract the useful information. I agree we should
try to make this shorter - we'll learn from being on Beta and will adjust
this accordingly.
Well, why not make it 1 week to start with, and make it longer if you
don't get enough information from beta (with a rationale as to why that
is the case) ?
Finally, I am surprised that you're sharing this 2 weeks before we're
releasing Firefox 46. Hasn't this been tested and verified on Nightly
and/or other channels? Why was no privacy update made at/before that time?
We are shipping Hello through Go Faster. The Go Faster process allows us to
uplift directly to Beta 46 directly since we're a system add-on
(development was done about 2 weeks ago).
Firefox Hello has its own privacy notice (details here
<https://www.mozilla.org/en-US/privacy/firefox-hello/>).
But shipping through go faster does not absolve you from adequately
testing changes and getting feedback on them. Is the add-on not getting
tested on nightly at all? Or at the same time as it goes to beta? When
will it be used on release - when 46 ships as release, or earlier, or later?
It also seems like you filed the privacy review after the functionality
was implemented and is now shipping, which per
https://wiki.mozilla.org/Privacy/Reviews seems like it is too late to
incorporate meaningful feedback. I'm not on the privacy team, but that
order looks wrong to me.
Finally, that privacy policy at no point says anything about Mozilla
having access to visited/shared domains and thereby potentially to
personally identifying information.
~ Gijs
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform