Hello Legal,

My name is Michael Winters, typically known here as @mwinters. I have some questions about Fedora's data privacy policies, which I'll provide a bit of context to first.

There has been a long-standing desire within Fedora for better tools with which to analyze our user data and understand our community so that we can improve it. To this end, I have recently created a "Data Lakehouse" proof of concept known as "Hatlas", available at https://hatlas.mwinters.net . This technology consolidates data from existing public Fedora datasets and provides simplified tools to facilitate public access and analysis.

Since these datasets were previously quite difficult to access, I believe that most people are unaware of what data exists about them within Fedora and/or the fact that it's being published publicly. I expect that the announcement of easy access to this data will raise some community concerns about data privacy, so this email is in anticipation of those concerns. I wish to have clear resources to refer people to, and current resources such as https://docs.fedoraproject.org/en-US/legal/privacy/ have left some questions open.

In particular, many of these datasets include usernames and records of user activity tied to those usernames, e.g. the contents and exact timing of forum posts, git commits, group membership changes, etc. My current questions are:

1) Does an arbitrary username (not necessarily tied to a real name) constitute PII which must be protected / anonymized? It is not currently anonymized in Fedora datasets.

2) Do current Fedora policies permit collecting user activity tied to usernames? This is not explicitly stated under "Information We Collect", though it is mentioned later under "Using (Processing) Your Personal Data."

3) Do current Fedora policies permit publishing user activity tied to usernames? Section "Sharing Your Personal Data" does mention "For research activities", but it does not specify that data must be shared *only* in aggregate.

4) How does GDPR view downstream users of public data sources, i.e. Hatlas? Is Hatlas a "data processor"? Must Hatlas integrate with Fedora's Personal Data Removal process? We intend to do so, but there seems to be no obligation for either party.

5) Are there any data licenses applicable to downstream users such as Hatlas? I intend to apply one restricting the use of Hatlas data to non-commercial purposes, but there seem to be no restrictions coming from Fedora.

Thanks in advance!

Michael Winters
--
_______________________________________________
legal mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to