>>>>> "BAW" == Barry A Warsaw <[EMAIL PROTECTED]> writes:
BAW> If you're watching the CVS log messages, you might see some BAW> checkins to address the problems with Pipermail in 2.1a3. BAW> Had an all day meeting today, and I'm beat so I'll email more BAW> about it tomorrow, but I think I have a neat solution that BAW> will also address Ben's patch to clean attachments out of the BAW> archives, and may serve as a basis for a built-in de-mimer. So here's the scoop. I've been thinking about Ben Gertzfield's code to sanitize the archives, and I've been mulling about the de-mime stuff. It all came to a head when 2.1a3 broke archiving for multipart messages. Here's what I've now got in cvs and it seems to work fairly well. Only more testing will tell for sure. There's a new handler module called Scrubber.py, but it's not in the primary pipeline. Only Pipermail is going to call it, and that via the new mm_cfg.py/Default.py variable ARCHIVE_SCRUBBER. This module hardcodes the following de-mime decisions: - text/plain parts are passed through unchanged - text/html parts are removed completely. If the outer message is of type text/html then the whole message is discarded (i.e. DiscardMessage is raised). - For all other non-multipart parts, we treat them as "attachments" by pulling the decoded payload out of the message, storing it in a file inside the list's private archive directory (e.g. archives/private/mylist/attachments) and rewriting the payload of the part to include a description of the attachment. Included in this description is a url to the attachment file, which Pipermail will hyperlink. One drawback here is that if archives are switched from public to private, or vice versa, all the attachment urls will break. But you could re-run bin/arch to regenerate the whole thing -- the key being that Scrubber works only on a copy of the message being prepped for the archiver, /not/ on the message being saved in the mbox. - multiparts are ignored for the first pass, but are recursed to perform the above cleaning. Then the entire scrubbed message is converted into a flat message, where only the headers are parsed and the body is slurped in one gulp; it isn't parsed recursively. Along the way, we throw out the headers for any internal parts, and we play games with the inter-part boundary strings so they are move useful (yes, this is a kludge). There's even more kludgery involved to get Pipermail to archive scrubbed message without having to rewrite huge chunks of inscrutable code. But it seems to work. Now, the interesting thing is that Scrubber.py is written so that it /could/ be used in the main pipeline. E.g. it supports the proper signature and semantics for use in the pipeline. But I'm not adding it there for now primarily because it isn't configurable via the web. All its decisions above are hardcoded because getting the u/i right is more work than I want to do right now. But if you were interested in mainlining Scrubber.py, here's how you might do it: Add it to GLOBAL_PIPELINE in your mm_cfg.py. I would suggest sticking it after ToArchives so that the mbox gets the original unscrubbed message (this lets you adjust the scrubber's behavior for archive purposes and regenerate from the raw mbox). In fact, what I'd do is move ToArchive to just after the Hold module, and stick Scrubber just between Hold and Tagger. This is untested. I think this will give us a foothold into providing a cleaner archive with Pipermail, and to experimenting with Mailman supported de-mime-ification. Probably the best that'll happen for MM2.1. Enjoy, -Barry _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers