Hello!

We here in Data Stewardship have been receiving inquiries about how the
Data Collection Review process[0] works now that more products are being
built out of reusable components. What follows is a memo about how to
approach Data Review when you're adding a data collection to a reusable
component, or adding a reusable component to a product. The current home
for the living document version of this is here:
https://mana.mozilla.org/wiki/pages/viewpage.action?spaceKey=DATAPRACTICES&title=Data+Review+in+Components


Data Review was designed assuming that the Product was responsible for both
the data collection and reporting. The measurement code and the submission
code all lived in the same place so the developer instrumenting the probe
(using Telemetry.scalarSet or Telemetry::Accumulate or what-have-you) knew
not only what they were instrumenting, but across what populations this
probe would be reported.

This is because the code being instrumented was only ever a part of
Firefox[1].


With Android Components we radically shifted how we would build Firefox
(and other things) on Android. Instead of having all the pieces live
together and only ever being used for one product, we'd be developing the
pieces separately and using them in any number of products.


This means that when a data collection is added, chances are it's being
added to a Component, not a Product[2]. The developer adding the data
collection may not be aware of all the Products currently using their
Component, and can't know of future Products that might integrate it. This
makes Data Collection Review difficult as Question 7 tries to ascertain
what population is being measured with this new collection.

To solve this, the developer adding the data collection should list all the
Products they know of that currently embed their Component, and a phrase
like "Users of products that embed $MyComponent" (where $MyComponent is
replaced with the name of their Component). This will help the Data Steward
understand where this collection is expected to be collected today, and
help any interested person in the future learn what names they should use
when looking these things up.

If a Product that submits data (usually by initializing the Glean SDK) adds
a Component that collects data (these can be identified by their metrics
documentation, usually in docs/metrics.md), then this is an expansion of
the population of a data collection. This means the Product needs to submit
a Data Collection Review to expand the scope of the Component's Data
Collection to the population using the Product.

To complete the review some questions (like why the data is being
collected) will not need firm answers (as those will have been provided
when the collections were added). The list of metrics can be found in the
Component's documentation. The population is the population using the
Product, and this is an answer the Product is most suited to give. As is
the description of the opt-out mechanism.

With these small allowances, Data Review is adaptable to the new
component-based development situation on Android and wherever reusable
components are included. This is new, and we will make mistakes. Please do
ask questions of the Data Stewards along the way, and let them know if you
find anything they've missed.

Things that require Data Collection Review:

1. A new data collection.

2. A Product integrating a Component that collects data.

3. A Product adding a new Data Collection System (by integrating the Glean
SDK, for instance). In most cases merely integrating a new system will add
collection, so this will be covered under (1). In other cases, you may need
special permission to start using a new system.

Things that do not require Data Collection Review:

1. A Product upgrading an integrated Component to a new version that has
new data collections. (This is covered by (1) above. The Product could be
included in the review by name, or as a product that embeds $MyComponent.
If clarification is desired, we can amend the data collection review to
specifically include the Product by name. No biggie.)

Assumptions:

* All of the Products and Components engaging in this process are subject
to Mozilla's Privacy Policy.


If you have any questions, please find us at fx-datastewa...@mozilla.com,
at #data-stewards on chat.mozilla.org (when available), or reach out to any
Data Steward listed on the wiki[0].


Thanks!

[0]: https://wiki.mozilla.org/Firefox/Data_Collection

[1]: This isn't actually true. It could also be a part of Thunderbird or
Geckoview, but let's keep it simple for now.

[2]: Data collections can be added to Products, too. In those cases, the
old mental model from Firefox still applies.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to