On Thu, 2003-07-24 at 22:01, John Haywood wrote:

> The file in question is a corrupt Microsoft Entourage message file. It is 
> 1.8Gig in size (approx). I need to step through it and convert it to an mbox 
> format file, by searching for patterns such as :
> 
> received: from <name>
> Received: from <name>
> 
> and replace these with:
> 
> >From <name> 

Personally I would do it in Python, but then again that's what I use to
code with :-)  If you do it correctly (i.e. don't open the whole file
all at once), you could do it with very little RAM on a pokey PC.

For the change you mention above, you can use this (paste it into a text
file, call it replace.py, then chmod +x replace.py, then run it):

#!/usr/bin/python
import re

r = re.compile('^received: from <', re.I)

f = open('mesg.txt')
fo = open('output.txt', 'w')

while 1:
    line = f.readline()
    if not line: break
    line = r.sub('>From <', line)
    fo.write(line)

> 
> also, some messages start with
> 
> From:
> Return Path:

I don't know what you want to do with these...... I had a brief look at
the webpage you mention below but I think it would be a nice exercise
for you to look at this: (!!)

http://www.amk.ca/python/howto/regex/

*grin*

Please note that with Python, whitespace is significant!

Damon



Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com

Reply via email to