Well, it turns out there was so much cruft in that data from YahooGroups that
it was easier to write an awk script to zap most of it. Here's the script:

**************** BEGIN LISTING **************
#!/usr/bin/awk -f
#
# Attempts to clean up some ugly header problems when importing
# mail from YahooGroups to mbox format.
#
# Author: Scott Courtney <[EMAIL PROTECTED]>
#
# License: GPL
#
# Disclaimer: Written for my own one-time use; NOT thoroughly tested.
#
BEGIN {
        hdr=0;
}
/^From .*@.* .*:..:.. / {
        hdr=1;
        print $0;
}
/^$/ {
        hdr=0;
        print $0;
}
/^[A-Za-z0-9-]+: / {
        print $0;
}
! /^[A-Za-z0-9-]+: / && ! /^From .*@.* .*:..:.. / {
        if (hdr) {
                print " " $0;
        } else {
                print $0;
        }
}
********************* END LISTING ****************888

Another change that may or may not apply to your lists: Some versions of KMail,
the client that comes with KDE, produce a header called "Message-Id:". The
parser in "arch" requires this to be "Message-ID:" or it chokes. I didn't
put that into my awk script because it may not apply everywhere, and fixing
it is just a matter of :%s/^Message-Id/Message-ID/ in vi, or equivalent.

Hope this is helpful.

By the way, I apologize for posting so much today. Several people have been
in touch with me off-list indicating that I'm not the only one struggling
with these problems.

The good news: After running this new awk script, I'm able to import much
larger archives in a single chunk. The 80-message limit was highly
repeatable for me, and I still don't know why, but it's not hard-wired as
I had thought. Maybe just coincidental because all my data is so
homogeneous.

Good luck, everyone. I'm now up and running with four live lists. I hope
this documentation of the hurdles I've encountered will help the next
person in line to not have so many dents in his or her forehead. :-)

To bed, now, at last. :-)

Scott

-- 
-----------------------+------------------------------------------------------
Scott Courtney         | "I don't mind Microsoft making money. I mind them
[EMAIL PROTECTED]       | having a bad operating system."    -- Linus Torvalds
http://www.4th.com/    | ("The Rebel Code," NY Times, 21 February 1999)



------------------------------------------------------
Mailman-Users mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py

Reply via email to