Re: [expert] OT: regular expression help

2003-07-24 Thread John Haywood
On Thu, 24 Jul 2003 01:28 pm, Brant Fitzsimmons wrote:
 John Haywood wrote:
[SNIPPY]
 I´m in a bit of a bind, trying to recover a corrupt file for a client.
 The first issue is that the file is 1.8gig, so I´m wondering if I´m going
  to have problems with cat file|grep regex outputfile.txt

 Do you need the cat?

 Someone please tell me if I'm wrong, but shouldn't this work:

 grep regex file  outputfile.txt

 One less action to get in the way.

 or is there a text editor which could handle that size (I have 512-meg
  RAM, and I know some editors try to load the whole thing into memory
  first)
 
 Then the real crunch:
 
 The following is the expression which is given on a website to do exactly
  what I need:
 
 [^\received: from .*\r
 
 and the second,
 
 [^\t].*\rReceived: from .*\r
 
 
 but they both appear to have syntax errors!!!

 What are the exact strings that you are looking for?  It may be a little
 bit easier to determine what characters you need in your search pattern
 if we knew that.

[SNIPPY]

Thanks for that, Brant. Let me state as exactly as possible what I am trying 
to do:

The file in question is a corrupt Microsoft Entourage message file. It is 
1.8Gig in size (approx). I need to step through it and convert it to an mbox 
format file, by searching for patterns such as :

received: from name
Received: from name

and replace these with:

From name 

also, some messages start with

From:
Return Path:

Then I need to get rid of garbage stuff between messages (odd characters, 
number strings etc..

Then save the whole thing out as a text file


I´d prefer to use a GUI text editor if possible, as it looks as if I´ll still 
have to screen out some additional gumph manually according to the author.

Just for completeness, here is the URL of what I am trying to do:

http://www.entourage.mvps.org/faq/database.html, under Database Woes section

Thanks for any help you may be able to offer
-- 
 john in sydney
 Mandrake Linux 9.1, Kernel version: 2.4.21-0.18mdk
 OpenPGP key available on www.keyserver.net
 1024D/3E4A902F B38A AB0F 8658 D9E1 4900 3050 08FA D4FA 3E4A 902F


Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com


Re: [expert] OT: regular expression help

2003-07-24 Thread Damon Lynch
On Thu, 2003-07-24 at 22:01, John Haywood wrote:

 The file in question is a corrupt Microsoft Entourage message file. It is 
 1.8Gig in size (approx). I need to step through it and convert it to an mbox 
 format file, by searching for patterns such as :
 
 received: from name
 Received: from name
 
 and replace these with:
 
 From name 

Personally I would do it in Python, but then again that's what I use to
code with :-)  If you do it correctly (i.e. don't open the whole file
all at once), you could do it with very little RAM on a pokey PC.

For the change you mention above, you can use this (paste it into a text
file, call it replace.py, then chmod +x replace.py, then run it):

#!/usr/bin/python
import re

r = re.compile('^received: from ', re.I)

f = open('mesg.txt')
fo = open('output.txt', 'w')

while 1:
line = f.readline()
if not line: break
line = r.sub('From ', line)
fo.write(line)

 
 also, some messages start with
 
 From:
 Return Path:

I don't know what you want to do with these.. I had a brief look at
the webpage you mention below but I think it would be a nice exercise
for you to look at this: (!!)

http://www.amk.ca/python/howto/regex/

*grin*

Please note that with Python, whitespace is significant!

Damon



Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com


Re: [expert] OT: regular expression help

2003-07-24 Thread Kwan Lowe
On Thu, 2003-07-24 at 06:01, John Haywood wrote:

 [SNIPPY]
 
 Thanks for that, Brant. Let me state as exactly as possible what I am trying 
 to do:
 
 The file in question is a corrupt Microsoft Entourage message file. It is 
 1.8Gig in size (approx). I need to step through it and convert it to an mbox 
 format file, by searching for patterns such as :

It sounds like formail would do much of what you need. It reformats
text into mailbox format.

The other substitutions could be done with awk/sed. 


Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com


[expert] OT: regular expression help

2003-07-23 Thread John Haywood
Sorry to ask here -it´s just that I know someone here can fix me up (shameless 
grovel!!).

I´m in a bit of a bind, trying to recover a corrupt file for a client. 
The first issue is that the file is 1.8gig, so I´m wondering if I´m going to have 
problems with cat file|grep regex outputfile.txt

or is there a text editor which could handle that size (I have 512-meg RAM, and I know 
some editors try to load the whole thing into memory first)

Then the real crunch:

The following is the expression which is given on a website to do exactly what I need:

[^\received: from .*\r

and the second,

[^\t].*\rReceived: from .*\r


but they both appear to have syntax errors!!!

Can anyone help fix these up please - it´s rather urgent.

Thanks a lot

john

Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com


Re: [expert] OT: regular expression help

2003-07-23 Thread Brant Fitzsimmons
John Haywood wrote:

Sorry to ask here -it´s just that I know someone here can fix me up (shameless grovel!!).

I´m in a bit of a bind, trying to recover a corrupt file for a client. 
The first issue is that the file is 1.8gig, so I´m wondering if I´m going to have problems with cat file|grep regex outputfile.txt

Do you need the cat?

Someone please tell me if I'm wrong, but shouldn't this work:

grep regex file  outputfile.txt

One less action to get in the way.

or is there a text editor which could handle that size (I have 512-meg RAM, and I know some editors try to load the whole thing into memory first)

Then the real crunch:

The following is the expression which is given on a website to do exactly what I need:

[^\received: from .*\r

and the second,

[^\t].*\rReceived: from .*\r

but they both appear to have syntax errors!!!

What are the exact strings that you are looking for?  It may be a little 
bit easier to determine what characters you need in your search pattern 
if we knew that.

Can anyone help fix these up please - it´s rather urgent.

Thanks a lot

john

--
Brant Fitzsimmons
[EMAIL PROTECTED]

Liberty means responsibility. That is why most men dread it.
-George Bernard Shaw, Man and Superman (1903)
Maxims for Revolutionists



Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com