On August 12, 2006 at 13:28, "Jeff Breidenbach" wrote: > The majority of mbox files I've been handed do not escape "From" like > they should, and this causes problems on M-A's end; inc from the nmh > suite gets unhappy and starts trashing messages. Are there any > recommendations for an mbox2mbox converter that will clean up > these wayward almost-but-not-quite-mbox files?
Depends on how the bogus "From" lines are structured. In mhonarc, the MSGSEP resource can be set to provide a stricter check, which generally gets around most cases of unescaped "From "s. For your case, a simple Perl script can be used to do what you want. Maybe something like: #!/usr/bin/perl my $msgsep = qr/^From\s+(?:"[^"]+"@\S+|\S+)\s+\S+\s+\S+\s+\d+\s+\d+:\d+:\d+\s+\d+/; while (<>) { if (!/^From / || !/$msgsep/) { print STDOUT $_; next; } print STDOUT '>'.$_; } If you call the above "escapefrom", invoke like the following: escapefrom mbox > escaped-mbox Then run a diff to see how well it worked. The main limitation is when messages include mbox from lines in their bodies unescaped. In this case, it requires a human to determine if the line indicates a new message of it is part of an existing one. If your MDA creates a "From " line that is unique to your site, you can modify the above regex to just match that. --ewh _______________________________________________ Discussion list for The Mail Archive Gossip@jab.org http://jab.org/cgi-bin/mailman/listinfo/gossip