On 22Jan2013 12:58, Michael Elkins <m...@sigpipe.org> wrote:
| On Tue, Jan 22, 2013 at 08:19:14PM +1100, Cameron Simpson wrote:
| >I have some useful scripts to undo the munging mailman does to the
| >archives which produces a sane mbox result. This I then open with mutt
| >and move the messages into the target folder.
| 
| Would you mind sharing your script?  I recently committed support 
| for parsing the munged From_ line that pipermail generates, but it 
| doesn't undo the munging that occurs in the message header.

The script itself is here:

  https://bitbucket.org/cameron_simpson/css/src/tip/bin/get-mailman-archive

The "Raw" button" will fetch you the script. You invoke it as per the
usage message:

  get-mailman-archive archive-page-url... >mlist.mbox

passing the URL of the mailman archive page. It pulls all the archive
files, decompresses then, unmunges them and sticks them into an mbox
file, read for refiling using mutt directly.

It is rather dependent on my other scripts (pilfer in particular is a
web scraper, and still a bit unstable). You can get the kit here:

  http://www.cskk.ezoshosting.com/cs/css/

I need to update the tarball, too; I can make an up to date one soon
enough on request. [...] Here:

  https://www.dropbox.com/s/zy1z13xx5rr0fff/css-20130123.tar.gz

That unpacks to "css/..."; I traditionally install to /opt/css as a
third party install, and source /opt/css/env.sh to tweak $PATH etc.

If you don't want the whole kit, pilfer's used just to read the URLs
of the archive files and to fetch them. I imagine wget or curl could do
the last half for you easily enough.

The important unmunging bit is at the bottom, which takes the decompressed
archive files as input (they're almost mbox files as they are) and
filters them through:

  fix-mail-dates --mbox | un-at-

fix-mail-dates is here:

  https://bitbucket.org/cameron_simpson/css/src/tip/bin/fix-mail-dates

which is a wrapper for a sed command the reformat the mail date headers
and un-at- is here:

  https://bitbucket.org/cameron_simpson/css/src/tip/bin/un-at-

which is a small Perl script to undo mailman's transcription of addresses
from "bill at snort.example.com" into "b...@snort.example.com" in the
message headers.

I'm happy to offer any assistance to make these scripts work for you or
others; they're meant to be an out-of-the-box kit.

Cheers,
-- 
Cameron Simpson <c...@zip.com.au>

You can't always get what you want.     - The Rolling Stones

Reply via email to