Re: [Mailman-Users] [Mailman-cabal] GDPR

Grant Taylor via Mailman-Users Tue, 22 May 2018 19:42:08 -0700

On 05/22/2018 07:33 PM, Stephen J. Turnbull wrote:

I would imagine that it is the subthread rooted at the first postcontaining complainant's PII -- "Personally Identifying Information".


I feel like that's a self referencing definition.

A "thread" is "a subthread rooted at the first post containing PII".

I agree that's where the focus should start. But I don't think itdefines a thread in the way that I'm asking.


What is their working definition of "thread"?

Let's say:

1)  Bla
2)   +--- Re: Bla
3)   +--- Re: Bla
4)   |     +--- BlaBlaBla
5)   +--- Re: Bla
6)         +--- I hijacked this thread because I need help!!!

Let's say the PII was in message 3 and the person replying to it inmessage 4 removed the PII. Do messages 3 and 4 need to be removed (orotherwise modified)?

Let's say that message 1 had the PII, messages 2, 3, and 5 quoted it,but 4 did not and 6 is a hijacker that hit reply on the most convenientmessage (under his cursor) and removed all content. Do messages 4 and 6need to be removed?


What is the "(sub)thread" that needs to be removed?

That is going to depend on the presence of PII in the messages. If *wholemessages* are to be deleted, that would presumably involve content thatsomehow identifies the person. I would expect that we don't have todelete whole bug reports on this list just because somebody requeststheir PII be redacted.

I agree that it's possible to remove / redact PII without deleting theitems containing the PII.

Think about it this way, spooks don't shred the entire sheet of paper,instead they take a black marker and redact just the pieces that need tobe removed.

I'm afraid that the infinite wisdom of politicians will say that theentire paper needs to be shredded.

I think it also significantly depends on what needs to be redacted.Removing "supercalifragilisticexpialidocious" is a LOT different thanremoving "Grant Taylor" from the Mailman-Users archive."supercalifragilisticexpialidocious" would be like reference to anevent. "Grant Taylor" would be any mention of my (or an impostor's) name.

The former is likely MUCH simpler to do than the latter. The latterwill also impact MANY more messages.

What worries me more is the implications for blockchain, or moreprecisely, DAG-based VCSes that use hashes for integrity check like git:the identity of commits will change if authors and emails are redacted,including if a commit log refers to PII of a bug reporter as they oftendo. I guess you'd need to maintain an index of pointers from old commitids, or at least for branches and tags (we do have the reflog in git).


I don't want to try to work that out.

And heaven help you if you're a security conscious group like the Linuxkernel and use signed commits. I guess the person who does the redactionwould sign the new commits, but that's pretty yucky -- that person coulddo anything and nobody would know when it happened because you have todelete the old commits and blobs that get redacted.


Yep.

As I understand the "right to be forgotten", it's *not* a right toarbitrarily edit content stored by someone else, it's the right to redact*all* PII in that content.


Agreed.

In this case, I don't think that supercalifragilisticexpialidociousqualifies under GDPR's right to be forgotten. }:-)

It's not just messages from a person, it's headers containing their nameand email address, attribution lines for quoted material, quoted .sigs,etc etc.


Agreed.

What about headers containing message ID from an uncommon / single userdomain like mine? I'd say that anything that can be used to identifyless than a group of 1000 people would probably need to be redacted. (Ijust chose 1000 arbitrarily, but it's a starting point.)

You're missing

0)  Randos accessing public archives.


What other modes have we collectively missed?

For (0), the only logging would be IP addresses in the webserver.


True.

No. The accessing IPs will be in the webserver logs, but I don't thinkthere is any logging in either Mailman 2 or Mailman 3 of authenticationdata. All there would be is the implication that authentication wassuccessful if that data were accessed.


Okay.

I wonder if there's any correlation between the IP that authenticatedand the IP that accessed data.

In Mailman 2 there's no PII data whatsoever except for email addressand (maybe) display name in the subscriber data.

I expect that either of those, the email address -or- the display nameare enough to count as PII.

I believe it's fair to say that people expect gtaylor (at)tnetconsulting (dot) net to reference a single person. I also believeit's fair to say that most people expect most email addresses toidentify be associated with one person. The only exceptions to the rulebeing things like positional addresses; sales@ or info@ or webmaster@.

I suppose you could put phone #s and junk like that in the display name,but GDPR is more concerned with the database fields that might storePII than the actual content.

1) I'd consider the phone numbers in the display name to be a form ofdisplay name.2) *sigh* It sounds like GDPR is talking about specific fields thatcould contain PII, even if they don't, while ignoring other fields thaterroneously do contain PII.

However, in Mailman 2 the various list passwords are shared, and wouldnot identify individuals in cases with multiple moderators or list owners.

IMHO that's an operational mis-step. I get that it does happen. But Ithink that it shouldn't. People tend to share root password on unixtoo, despite multiple other options where it's not needed.

Indeed. The problem is identifying them if they do, since they canjust use normal filesystem operations from the shell, which are notnormally logged at all.

Where I've worked, it was assumed that if you had an ID on the box andfile system level permission to access things then you effectively hadaccessed it. — If you can't prove that they didn't access the data,then you assume that they did access the data.

In Mailman 3, we can configure databases like PostgreSQL, which I supposecan log access to the subscriber databases, and which make it hard(but not impossible) to access data via ordinary filesystem operations.

Having an RDBMS (et al) manage the files doesn't prevent file levelaccess. I can very likely still copy the DB file(s) and do my own thingwith them to extract the data.

This is where (and why) DB encryption comes into play. Though, if arogue admin has access to the decryption key through any method. (Thisincludes extracting it out of memory.) }:-)

However, I think that the issue here is basically moot. You keep hostaccess logs to check for suspicious IP addresses (attempting to) login, and otherwise (for #2 and #3) you just give the list of all thepeople who can access that data in the normal course of their duties.


Yep.

I don't think the issue with logging is pinning down a particular accessto specific data, but rather determining who *could* access that data.


Yep. Yep.

The relevant access might have been by a long-since fired engineer whodid a Snowden on your database. How could you possibly know?


Yep. Yep. Yep.

I don't understand the "exclude third party site hosters". The GDPRrequirement is not to *limit* access, it's to *log* access.

I was trying to imply that companies would need to host their own listservers. Meaning that they couldn't outsource it to 3rd partycompanies, whom have their own host system administrators.

I'm pretty sure they're referring to CRM-type databases where you trackcustomer interactions over time, linked by PII, and build up a profile.One-off "for sale" posts wouldn't matter. However, if this were a commonactivity on the list, the *archives* might qualify as such a database.


~chuckle~

How many grains of sand does it take to make a pile?

IMHO none.  You just have to declare the pile's location.

Sure, the point is to make it difficult for 3rd parties to discoverthat history ex post.

Okay. I want to make sure I'm understanding you correctly. (Part of)GDPR is not about (just) knowing who has (had at the time) legitimateaccess to data, but additionally making it more difficult for other 3rdparties to gain access to the data in the future. By the fact that thedata is removed from the corpus that the 3rd party is subsequently givenaccess to.

I don't think the legislators envisioned people invoking these rightsfrivolously or maliciously (though I do :-/).


Agreed.

Backups would need to be redacted as well, I suppose.

Um... that also presents a severe technical problem. One that couldimpose large operational expenses. Suppose a company contracts to storetheir backup tapes off sight. This means that they would need to recallthe tapes that need to be redacted, do so, send the tapes back to theoffsite storage. This may involve an additional company that is simplythe courier. Let's not forget about the off site companies handlingfees and the courier's fees. Both ways for each tape. Let's also throwcompany policies in place that dictate that only X number of drives canbe in transit or recalled at one time. That's a logistical nightmare,could take more than a trivial amount of time to complete, and untoldcost. Ouch!

I have no idea what you mean by "ongoing discovery".

Ah.

Let's say that Wile E. Coyote decides to sue Acme because of their badproducts. As soon as the lawsuit is initiated, chances are very goodthat Acme's lawyers will 1) tell them to destroy all records or 2) tellAcme's IT staff that they can no longer rotate out any backups that maycontain data pertinent to the lawsuit. This is to facilitate the legalprocess of discovering evidence to be used in the case. (Either way,for or against, Mr. Coyote, doesn't matter.)

I frequently hear about this referred to as one of two things"Litigation Hold" or "(Electronic) Discovery". Discovery being the morecommon term and applies to more than just electronics.

Not Mailman host's problem, assuming all subscribers have properly beenopted in and are allowed to opt out at will, as is normally the case.

What about that pesky time where the moderator hasn't approved theunsubscribe request. (I think I remember seeing that option in Mailman.)

Distributing content downstream is the purpose of the software, andsubscribers are aware of that. The only edge cases I can imagine offhandis the one discussed elsewhere in the thread, where a subscriber posts athird party's information without permission, and possibly an open-postlist where the poster doesn't realize that it's open subscription/publicarchives/whatever.

I think you misinterpreted what I was referring to. Or I'mmisinterpreting your reply.

I'm talking about 3rd party spam filtering services that are in the pathbetween, downstream in between Mailman and the recipient's server. Theycollect logs / data all the time. Usually those logs and that data arewhat help them be better at their job of spam filtering.

Not Mailman host's problem.


Okay.

Sure, but you probably won't like what the courts consider reasonable.


"reasonable" is always subject to deliberation.

Lawyers get payed to tell a judge that "It will cost $Company $50,000dollars to recover the messages that $Plaintiff is requesting from$Defendant as part of their sunshine law request. Here's why:

1) We don't have a server that we can use so we must buy a low endmachine. (Legit, when there is only one mail server and the businesscan't be without mail for days / weeks.)

2)  We need another tape drive to do the restores.
3)  It will take $X number of (wo)man hours at $Y dollars per hour.

4) We, $Defendant's lawyers must go through the emails at $YYYYYdollars per hour to make sure there's nothing given out that's outsideof the sunshine law request.5) You just expanded the scope of your discovery? Well, now we need toincrease #1 and #2 to go through the last 5 years of things in the nextthree weeks. Also #3 and #4. }:-)

So … the total bill for your sunshine request comes to just over$50,000. Are you willing to pay that bill to get an answer to yourquestion via a sunshine law request?

Aside: A sunshine law request is a request from a citizen to agovernmental body for data that was arguably payed for by tax fundingand on behalf of citizens, thus the citizen effectively owns the data ina round about way. — I don't know how wide spread that is.

You lock up the backups offline unless and until the court asks for themor you actually need to restore. That reasonably addresses the privacyissue itself, and you're covered by the "essential to business purpose"clause for the duration of the court order.

6) We have to buy additional tapes to replace the tapes that are onLit' Hold.7) We have to pay for more storage to accommodate #6. (Or we have topay someone to house the tapes in a secure manner.)


I digress.



--
Grant. . . .
unix || die
------------------------------------------------------
Mailman-Users mailing list [email protected]
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Re: [Mailman-Users] [Mailman-cabal] GDPR

Reply via email to