On 5/15/18 11:51 AM, Grant Taylor via Mailman-Users wrote:
> 
> I would likely have (presuming sufficient motivation):
> 
> 1)  Get mailman into a state that I can safely modify the archive.
> 2)  Run a script (likely sed) to REDACT the contents.
>       sed -i$ticketID 's/phone number/REDACTED/g;s/Eventbright
> Link/REDACTED/g;#etc'
> 3)  Restarted Mailman and possibly web server serving the archive.
>     (Or otherwise flushed caches.)
> 
> I quite like "REDACTED" as it shows that there was something, and that
> it was removed, but it does not show what that something was.


I've been silent in this thread because it doesn't interest me that
much, but I want to point out that redacting a pipermail archive is more
difficult than it would first appear.

You not only have to redact the HTML pages, but also the .txt and
.txt.gz files, and if there is sensitive information in the index pages
(subject and sender info), you also have to redact that in the pipermail
database. See the script at <https://www.msapiro.net/scripts/hdfix> and
read its docstring for an idea.

Finally, you have to redact the cumulative LIST.mbox/LIST.mbox and maybe
the attachments directory.

Actually, the easiest way is to just redact the cumulative
LIST.mbox/LIST.mbox file and rebuild the archive with 'bin/arch --wipe'
but that can have undesired side effects.

-- 
Mark Sapiro <m...@msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan
------------------------------------------------------
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Reply via email to