On Thu, 2003-07-24 at 22:01, John Haywood wrote: > The file in question is a corrupt Microsoft Entourage message file. It is > 1.8Gig in size (approx). I need to step through it and convert it to an mbox > format file, by searching for patterns such as : > > received: from <name> > Received: from <name> > > and replace these with: > > >From <name>
Personally I would do it in Python, but then again that's what I use to code with :-) If you do it correctly (i.e. don't open the whole file all at once), you could do it with very little RAM on a pokey PC. For the change you mention above, you can use this (paste it into a text file, call it replace.py, then chmod +x replace.py, then run it): #!/usr/bin/python import re r = re.compile('^received: from <', re.I) f = open('mesg.txt') fo = open('output.txt', 'w') while 1: line = f.readline() if not line: break line = r.sub('>From <', line) fo.write(line) > > also, some messages start with > > From: > Return Path: I don't know what you want to do with these...... I had a brief look at the webpage you mention below but I think it would be a nice exercise for you to look at this: (!!) http://www.amk.ca/python/howto/regex/ *grin* Please note that with Python, whitespace is significant! Damon
Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com