Hello Legal,
My name is Michael Winters, typically known here as @mwinters. I have
some questions about Fedora's data privacy policies, which I'll provide
a bit of context to first.
There has been a long-standing desire within Fedora for better tools
with which to analyze our user data and understand our community so that
we can improve it. To this end, I have recently created a "Data
Lakehouse" proof of concept known as "Hatlas", available at
https://hatlas.mwinters.net . This technology consolidates data from
existing public Fedora datasets and provides simplified tools to
facilitate public access and analysis.
Since these datasets were previously quite difficult to access, I
believe that most people are unaware of what data exists about them
within Fedora and/or the fact that it's being published publicly. I
expect that the announcement of easy access to this data will raise some
community concerns about data privacy, so this email is in anticipation
of those concerns. I wish to have clear resources to refer people to,
and current resources such as
https://docs.fedoraproject.org/en-US/legal/privacy/ have left some
questions open.
In particular, many of these datasets include usernames and records of
user activity tied to those usernames, e.g. the contents and exact
timing of forum posts, git commits, group membership changes, etc. My
current questions are:
1) Does an arbitrary username (not necessarily tied to a real name)
constitute PII which must be protected / anonymized? It is not
currently anonymized in Fedora datasets.
2) Do current Fedora policies permit collecting user activity tied to
usernames? This is not explicitly stated under "Information We
Collect", though it is mentioned later under "Using (Processing) Your
Personal Data."
3) Do current Fedora policies permit publishing user activity tied to
usernames? Section "Sharing Your Personal Data" does mention "For
research activities", but it does not specify that data must be shared
*only* in aggregate.
4) How does GDPR view downstream users of public data sources, i.e.
Hatlas? Is Hatlas a "data processor"? Must Hatlas integrate with
Fedora's Personal Data Removal process? We intend to do so, but there
seems to be no obligation for either party.
5) Are there any data licenses applicable to downstream users such as
Hatlas? I intend to apply one restricting the use of Hatlas data to
non-commercial purposes, but there seem to be no restrictions coming
from Fedora.
Thanks in advance!
Michael Winters
--
_______________________________________________
legal mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it:
https://pagure.io/fedora-infrastructure/new_issue