On 05/22/2018 07:33 PM, Stephen J. Turnbull wrote:
I would imagine that it is the subthread rooted at the first post
containing complainant's PII -- "Personally Identifying Information".
I feel like that's a self referencing definition.
A "thread" is "a subthread rooted at the first post containing PII".
I agree that's where the focus should start. But I don't think it
defines a thread in the way that I'm asking.
What is their working definition of "thread"?
Let's say:
1) Bla
2) +--- Re: Bla
3) +--- Re: Bla
4) | +--- BlaBlaBla
5) +--- Re: Bla
6) +--- I hijacked this thread because I need help!!!
Let's say the PII was in message 3 and the person replying to it in
message 4 removed the PII. Do messages 3 and 4 need to be removed (or
otherwise modified)?
Let's say that message 1 had the PII, messages 2, 3, and 5 quoted it,
but 4 did not and 6 is a hijacker that hit reply on the most convenient
message (under his cursor) and removed all content. Do messages 4 and 6
need to be removed?
What is the "(sub)thread" that needs to be removed?
That is going to depend on the presence of PII in the messages. If *whole
messages* are to be deleted, that would presumably involve content that
somehow identifies the person. I would expect that we don't have to
delete whole bug reports on this list just because somebody requests
their PII be redacted.
I agree that it's possible to remove / redact PII without deleting the
items containing the PII.
Think about it this way, spooks don't shred the entire sheet of paper,
instead they take a black marker and redact just the pieces that need to
be removed.
I'm afraid that the infinite wisdom of politicians will say that the
entire paper needs to be shredded.
I think it also significantly depends on what needs to be redacted.
Removing "supercalifragilisticexpialidocious" is a LOT different than
removing "Grant Taylor" from the Mailman-Users archive.
"supercalifragilisticexpialidocious" would be like reference to an
event. "Grant Taylor" would be any mention of my (or an impostor's) name.
The former is likely MUCH simpler to do than the latter. The latter
will also impact MANY more messages.
What worries me more is the implications for blockchain, or more
precisely, DAG-based VCSes that use hashes for integrity check like git:
the identity of commits will change if authors and emails are redacted,
including if a commit log refers to PII of a bug reporter as they often
do. I guess you'd need to maintain an index of pointers from old commit
ids, or at least for branches and tags (we do have the reflog in git).
I don't want to try to work that out.
And heaven help you if you're a security conscious group like the Linux
kernel and use signed commits. I guess the person who does the redaction
would sign the new commits, but that's pretty yucky -- that person could
do anything and nobody would know when it happened because you have to
delete the old commits and blobs that get redacted.
Yep.
As I understand the "right to be forgotten", it's *not* a right to
arbitrarily edit content stored by someone else, it's the right to redact
*all* PII in that content.
Agreed.
In this case, I don't think that supercalifragilisticexpialidocious
qualifies under GDPR's right to be forgotten. }:-)
It's not just messages from a person, it's headers containing their name
and email address, attribution lines for quoted material, quoted .sigs,
etc etc.
Agreed.
What about headers containing message ID from an uncommon / single user
domain like mine? I'd say that anything that can be used to identify
less than a group of 1000 people would probably need to be redacted. (I
just chose 1000 arbitrarily, but it's a starting point.)
You're missing
0) Randos accessing public archives.
What other modes have we collectively missed?
For (0), the only logging would be IP addresses in the webserver.
True.
No. The accessing IPs will be in the webserver logs, but I don't think
there is any logging in either Mailman 2 or Mailman 3 of authentication
data. All there would be is the implication that authentication was
successful if that data were accessed.
Okay.
I wonder if there's any correlation between the IP that authenticated
and the IP that accessed data.
In Mailman 2 there's no PII data whatsoever except for email address
and (maybe) display name in the subscriber data.
I expect that either of those, the email address -or- the display name
are enough to count as PII.
I believe it's fair to say that people expect gtaylor (at)
tnetconsulting (dot) net to reference a single person. I also believe
it's fair to say that most people expect most email addresses to
identify be associated with one person. The only exceptions to the rule
being things like positional addresses; sales@ or info@ or webmaster@.
I suppose you could put phone #s and junk like that in the display name,
but GDPR is more concerned with the database fields that might store
PII than the actual content.
1) I'd consider the phone numbers in the display name to be a form of
display name.
2) *sigh* It sounds like GDPR is talking about specific fields that
could contain PII, even if they don't, while ignoring other fields that
erroneously do contain PII.
However, in Mailman 2 the various list passwords are shared, and would
not identify individuals in cases with multiple moderators or list owners.
IMHO that's an operational mis-step. I get that it does happen. But I
think that it shouldn't. People tend to share root password on unix
too, despite multiple other options where it's not needed.
Indeed. The problem is identifying them if they do, since they can
just use normal filesystem operations from the shell, which are not
normally logged at all.
Where I've worked, it was assumed that if you had an ID on the box and
file system level permission to access things then you effectively had
accessed it. — If you can't prove that they didn't access the data,
then you assume that they did access the data.
In Mailman 3, we can configure databases like PostgreSQL, which I suppose
can log access to the subscriber databases, and which make it hard
(but not impossible) to access data via ordinary filesystem operations.
Having an RDBMS (et al) manage the files doesn't prevent file level
access. I can very likely still copy the DB file(s) and do my own thing
with them to extract the data.
This is where (and why) DB encryption comes into play. Though, if a
rogue admin has access to the decryption key through any method. (This
includes extracting it out of memory.) }:-)
However, I think that the issue here is basically moot. You keep host
access logs to check for suspicious IP addresses (attempting to) log
in, and otherwise (for #2 and #3) you just give the list of all the
people who can access that data in the normal course of their duties.
Yep.
I don't think the issue with logging is pinning down a particular access
to specific data, but rather determining who *could* access that data.
Yep. Yep.
The relevant access might have been by a long-since fired engineer who
did a Snowden on your database. How could you possibly know?
Yep. Yep. Yep.
I don't understand the "exclude third party site hosters". The GDPR
requirement is not to *limit* access, it's to *log* access.
I was trying to imply that companies would need to host their own list
servers. Meaning that they couldn't outsource it to 3rd party
companies, whom have their own host system administrators.
I'm pretty sure they're referring to CRM-type databases where you track
customer interactions over time, linked by PII, and build up a profile.
One-off "for sale" posts wouldn't matter. However, if this were a common
activity on the list, the *archives* might qualify as such a database.
~chuckle~
How many grains of sand does it take to make a pile?
IMHO none. You just have to declare the pile's location.
Sure, the point is to make it difficult for 3rd parties to discover
that history ex post.
Okay. I want to make sure I'm understanding you correctly. (Part of)
GDPR is not about (just) knowing who has (had at the time) legitimate
access to data, but additionally making it more difficult for other 3rd
parties to gain access to the data in the future. By the fact that the
data is removed from the corpus that the 3rd party is subsequently given
access to.
I don't think the legislators envisioned people invoking these rights
frivolously or maliciously (though I do :-/).
Agreed.
Backups would need to be redacted as well, I suppose.
Um... that also presents a severe technical problem. One that could
impose large operational expenses. Suppose a company contracts to store
their backup tapes off sight. This means that they would need to recall
the tapes that need to be redacted, do so, send the tapes back to the
offsite storage. This may involve an additional company that is simply
the courier. Let's not forget about the off site companies handling
fees and the courier's fees. Both ways for each tape. Let's also throw
company policies in place that dictate that only X number of drives can
be in transit or recalled at one time. That's a logistical nightmare,
could take more than a trivial amount of time to complete, and untold
cost. Ouch!
I have no idea what you mean by "ongoing discovery".
Ah.
Let's say that Wile E. Coyote decides to sue Acme because of their bad
products. As soon as the lawsuit is initiated, chances are very good
that Acme's lawyers will 1) tell them to destroy all records or 2) tell
Acme's IT staff that they can no longer rotate out any backups that may
contain data pertinent to the lawsuit. This is to facilitate the legal
process of discovering evidence to be used in the case. (Either way,
for or against, Mr. Coyote, doesn't matter.)
I frequently hear about this referred to as one of two things
"Litigation Hold" or "(Electronic) Discovery". Discovery being the more
common term and applies to more than just electronics.
Not Mailman host's problem, assuming all subscribers have properly been
opted in and are allowed to opt out at will, as is normally the case.
What about that pesky time where the moderator hasn't approved the
unsubscribe request. (I think I remember seeing that option in Mailman.)
Distributing content downstream is the purpose of the software, and
subscribers are aware of that. The only edge cases I can imagine offhand
is the one discussed elsewhere in the thread, where a subscriber posts a
third party's information without permission, and possibly an open-post
list where the poster doesn't realize that it's open subscription/public
archives/whatever.
I think you misinterpreted what I was referring to. Or I'm
misinterpreting your reply.
I'm talking about 3rd party spam filtering services that are in the path
between, downstream in between Mailman and the recipient's server. They
collect logs / data all the time. Usually those logs and that data are
what help them be better at their job of spam filtering.
Not Mailman host's problem.
Okay.
Sure, but you probably won't like what the courts consider reasonable.
"reasonable" is always subject to deliberation.
Lawyers get payed to tell a judge that "It will cost $Company $50,000
dollars to recover the messages that $Plaintiff is requesting from
$Defendant as part of their sunshine law request. Here's why:
1) We don't have a server that we can use so we must buy a low end
machine. (Legit, when there is only one mail server and the business
can't be without mail for days / weeks.)
2) We need another tape drive to do the restores.
3) It will take $X number of (wo)man hours at $Y dollars per hour.
4) We, $Defendant's lawyers must go through the emails at $YYYYY
dollars per hour to make sure there's nothing given out that's outside
of the sunshine law request.
5) You just expanded the scope of your discovery? Well, now we need to
increase #1 and #2 to go through the last 5 years of things in the next
three weeks. Also #3 and #4. }:-)
So … the total bill for your sunshine request comes to just over
$50,000. Are you willing to pay that bill to get an answer to your
question via a sunshine law request?
Aside: A sunshine law request is a request from a citizen to a
governmental body for data that was arguably payed for by tax funding
and on behalf of citizens, thus the citizen effectively owns the data in
a round about way. — I don't know how wide spread that is.
You lock up the backups offline unless and until the court asks for them
or you actually need to restore. That reasonably addresses the privacy
issue itself, and you're covered by the "essential to business purpose"
clause for the duration of the court order.
6) We have to buy additional tapes to replace the tapes that are on
Lit' Hold.
7) We have to pay for more storage to accommodate #6. (Or we have to
pay someone to house the tapes in a secure manner.)
I digress.
--
Grant. . . .
unix || die
------------------------------------------------------
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe:
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org