On Tue, Apr 17 2018, Peter Backes wrote:

> I'd like to ask whether anyone has best practices for achieving GDPR
> compliance for git repos? The GDPR will come into effect in the EU next
> month.
>
> In particular, how do you cope with the "Right to erasure" concerning
> entries in the history of your git repos?
>
> Erasing author names from the history changes the commit hashes.  It is
> well known that this leads to a lot of problems.  So I don't consider
> this a workable solution.
>
> And how do you justify publishing your employee's name/email as part of
> a git commit under GDPR rules in the first place?
>
> github has the following page mentioning the "Right to erasure" but
> AFAICS nothing about how it will be implemented
> https://about.gitlab.com/gdpr/
>
> Here are discussions I found but they do not really provide a solution:
> https://law.stackexchange.com/questions/24623/gdpr-git-history
> https://news.ycombinator.com/item?id=16509755

[Not a lawyer and all that]

I've been loosely following a similar discussion around blockchains and
my understanding of the situation is that for a project such as say
Linux the GDPR gives you this potential out for that[1]:

    "the personal data are no longer necessary in relation to the
    purposes for which they were collected or otherwise processed"

I.e. you understand that when you submit a patch to linux.git how it's
going to get used, and that it's in a storage system that isn't going to
be pruned just because you ask for it.

In combination with the "Conditions for consent"[2] this becomes a bit
more tricky. I.e. "The data subject shall have the right to withdraw his
or her consent at any time".

You can make a compelling case that for say submitting your data to the
Bitcoin blockhcain the above quote from article 17 overrides it, but can
you for other hash-based-on-hash systems like linux.git? Maybe, maybe
not. I think nobody really knows at this point.

What I do think is for sure is that there's not going to be any one size
fits all solution based on the underlying technology.

If I start storing my webserver access logs with IP information in a git
repo, I don't get to say "sorry git stores stuff this way, I don't want
to rebase it". No court's going to buy that, I've just gone out of my
way to use technology that circumvents the GDPR for no particularly good
reason.

This is very different from you say joining a company, committing to its
internal git repo, and your name being there in perpetuity, or choosing
to submit a patch to linux.git or git.git.

I'd think that would be handled the same way as a structural engineering
firm being able to record in perpetuity who it was that drew up the
design for some bridge. I don't think it's plausible that the GDPR,
which is probably mainly going to be about consumer protection, is going
to concern itself with that in practice.

There's a lot of middle ground in between those two
though. E.g. children are specially protected under the GDPR. Is Linus
going to say he doesn't want to rebase linux.git after some 14 year old
who regrets submitting code doesn't want his name there anymore? Who
knows.

Depending on such common cases maybe git itself should eventually
support some ways to work around the issues. E.g. we could have some
mode to always supply a fake name/e-mail, or make the notice
implicit_ident_advice() spews out somewhat scarier.

1. https://gdpr-info.eu/art-17-gdpr/

2. https://gdpr-info.eu/art-7-gdpr/

Reply via email to