Grant Taylor via Mailman-Users writes: > On 05/14/2018 06:33 AM, Andrew Hodgson wrote:
> > Current advice from the GDPR people is we may have to delete the whole > > thread. > > What is their working definition of "thread"? I would imagine that it is the subthread rooted at the first post containing complainant's PII -- "Personally Identifying Information". > Why can't just the individual's message(s) be delete? Or better > redacted to not reflect them? That is going to depend on the presence of PII in the messages. If *whole messages* are to be deleted, that would presumably involve content that somehow identifies the person. I would expect that we don't have to delete whole bug reports on this list just because somebody requests their PII be redacted. What worries me more is the implications for blockchain, or more precisely, DAG-based VCSes that use hashes for integrity check like git: the identity of commits will change if authors and emails are redacted, including if a commit log refers to PII of a bug reporter as they often do. I guess you'd need to maintain an index of pointers from old commit ids, or at least for branches and tags (we do have the reflog in git). And heaven help you if you're a security conscious group like the Linux kernel and use signed commits. I guess the person who does the redaction would sign the new commits, but that's pretty yucky -- that person could do anything and nobody would know when it happened because you have to delete the old commits and blobs that get redacted. > > Still under discussion, this is also complex because threads and > > subjects change, if we delete the whole thread there may be > > messages from the same author in other threads that don't have > > correct atribution etc. As I understand the "right to be forgotten", it's *not* a right to arbitrarily edit content stored by someone else, it's the right to redact *all* PII in that content. It's not just messages from a person, it's headers containing their name and email address, attribution lines for quoted material, quoted .sigs, etc etc. > I see six modes of access to the data: > > 1) List subscribers > 2) List owners / administrators > 3) Host system administrators > 4) Administrators that are in the downstream SMTP / HTTP path and can > track things. > 5) Backups. > 6) Ongoing Discovery. You're missing 0) Randos accessing public archives. For (0), the only logging would be IP addresses in the webserver. > I would expect that #1 requires authentication to MM for > subscribers to see data, and I expect that this is logged in some > (indirect) capacity. No. The accessing IPs will be in the webserver logs, but I don't think there is any logging in either Mailman 2 or Mailman 3 of authentication data. All there would be is the implication that authentication was successful if that data were accessed. In Mailman 2 there's no PII data whatsoever except for email address and (maybe) display name in the subscriber data. I suppose you could put phone #s and junk like that in the display name, but GDPR is more concerned with the database fields that might store PII than the actual content. > I would expect that #2 would have access to the data as part of their > role of owning / administering a mailing list. However, in Mailman 2 the various list passwords are shared, and would not identify individuals in cases with multiple moderators or list owners. > I would also expect that #3 has the capability to access the data. But > I would also expect that #3 would not access the data in normal day to > day operations. Indeed. The problem is identifying them if they do, since they can just use normal filesystem operations from the shell, which are not normally logged at all. In Mailman 3, we can configure databases like PostgreSQL, which I suppose can log access to the subscriber databases, and which make it hard (but not impossible) to access data via ordinary filesystem operations. However, I think that the issue here is basically moot. You keep host access logs to check for suspicious IP addresses (attempting to) log in, and otherwise (for #2 and #3) you just give the list of all the people who can access that data in the normal course of their duties. I don't think the issue with logging is pinning down a particular access to specific data, but rather determining who *could* access that data. The relevant access might have been by a long-since fired engineer who did a Snowden on your database. How could you possibly know? > Are you saying that GDPR is going to complicate things related to > #3 and make it such that there is more of a union between #2 and > #3? I.e. exclude 3rd party site hosters from being able to be #3? I don't understand the "exclude third party site hosters". The GDPR requirement is not to *limit* access, it's to *log* access. > What is their working definition of "marketing"? I'm pretty sure they're referring to CRM-type databases where you track customer interactions over time, linked by PII, and build up a profile. One-off "for sale" posts wouldn't matter. However, if this were a common activity on the list, the *archives* might qualify as such a database. > IMHO: History happened. (Some) People will remember (some) details > (for a while). Removing evidence of them does not mean that > history did not happen. Sure, the point is to make it difficult for 3rd parties to discover that history ex post. I don't think the legislators envisioned people invoking these rights frivolously or maliciously (though I do :-/). > Are #5 and #6 accounted for? Backups would need to be redacted as well, I suppose. I have no idea what you mean by "ongoing discovery". > What about #4 downstream? Not Mailman host's problem, assuming all subscribers have properly been opted in and are allowed to opt out at will, as is normally the case. Distributing content downstream is the purpose of the software, and subscribers are aware of that. The only edge cases I can imagine offhand is the one discussed elsewhere in the thread, where a subscriber posts a third party's information without permission, and possibly an open-post list where the poster doesn't realize that it's open subscription/public archives/whatever. > Or something like the NSA's PRISM program. Not Mailman host's problem. > I fell like there should be a GDPR counterpart of reasonable level of > effort in good faith. Sure, but you probably won't like what the courts consider reasonable. > I'm not quite sure what to do in a situation of a litigation hold > that suspends expunging of backups. You lock up the backups offline unless and until the court asks for them or you actually need to restore. That reasonably addresses the privacy issue itself, and you're covered by the "essential to business purpose" clause for the duration of the court order. > I'm simply bringing up things that I think are potential concerns > that the powers that be probably need to consider, and have a pat > response to. ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org https://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-users/archive%40jab.org