Re: [expert] OT: regular expression help
On Thu, 24 Jul 2003 01:28 pm, Brant Fitzsimmons wrote: John Haywood wrote: [SNIPPY] I´m in a bit of a bind, trying to recover a corrupt file for a client. The first issue is that the file is 1.8gig, so I´m wondering if I´m going to have problems with cat file|grep regex outputfile.txt Do you need the cat? Someone please tell me if I'm wrong, but shouldn't this work: grep regex file outputfile.txt One less action to get in the way. or is there a text editor which could handle that size (I have 512-meg RAM, and I know some editors try to load the whole thing into memory first) Then the real crunch: The following is the expression which is given on a website to do exactly what I need: [^\received: from .*\r and the second, [^\t].*\rReceived: from .*\r but they both appear to have syntax errors!!! What are the exact strings that you are looking for? It may be a little bit easier to determine what characters you need in your search pattern if we knew that. [SNIPPY] Thanks for that, Brant. Let me state as exactly as possible what I am trying to do: The file in question is a corrupt Microsoft Entourage message file. It is 1.8Gig in size (approx). I need to step through it and convert it to an mbox format file, by searching for patterns such as : received: from name Received: from name and replace these with: From name also, some messages start with From: Return Path: Then I need to get rid of garbage stuff between messages (odd characters, number strings etc.. Then save the whole thing out as a text file I´d prefer to use a GUI text editor if possible, as it looks as if I´ll still have to screen out some additional gumph manually according to the author. Just for completeness, here is the URL of what I am trying to do: http://www.entourage.mvps.org/faq/database.html, under Database Woes section Thanks for any help you may be able to offer -- john in sydney Mandrake Linux 9.1, Kernel version: 2.4.21-0.18mdk OpenPGP key available on www.keyserver.net 1024D/3E4A902F B38A AB0F 8658 D9E1 4900 3050 08FA D4FA 3E4A 902F Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com
Re: [expert] OT: regular expression help
On Thu, 2003-07-24 at 22:01, John Haywood wrote: The file in question is a corrupt Microsoft Entourage message file. It is 1.8Gig in size (approx). I need to step through it and convert it to an mbox format file, by searching for patterns such as : received: from name Received: from name and replace these with: From name Personally I would do it in Python, but then again that's what I use to code with :-) If you do it correctly (i.e. don't open the whole file all at once), you could do it with very little RAM on a pokey PC. For the change you mention above, you can use this (paste it into a text file, call it replace.py, then chmod +x replace.py, then run it): #!/usr/bin/python import re r = re.compile('^received: from ', re.I) f = open('mesg.txt') fo = open('output.txt', 'w') while 1: line = f.readline() if not line: break line = r.sub('From ', line) fo.write(line) also, some messages start with From: Return Path: I don't know what you want to do with these.. I had a brief look at the webpage you mention below but I think it would be a nice exercise for you to look at this: (!!) http://www.amk.ca/python/howto/regex/ *grin* Please note that with Python, whitespace is significant! Damon Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com
Re: [expert] OT: regular expression help
On Thu, 2003-07-24 at 06:01, John Haywood wrote: [SNIPPY] Thanks for that, Brant. Let me state as exactly as possible what I am trying to do: The file in question is a corrupt Microsoft Entourage message file. It is 1.8Gig in size (approx). I need to step through it and convert it to an mbox format file, by searching for patterns such as : It sounds like formail would do much of what you need. It reformats text into mailbox format. The other substitutions could be done with awk/sed. Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com
[expert] OT: regular expression help
Sorry to ask here -it´s just that I know someone here can fix me up (shameless grovel!!). I´m in a bit of a bind, trying to recover a corrupt file for a client. The first issue is that the file is 1.8gig, so I´m wondering if I´m going to have problems with cat file|grep regex outputfile.txt or is there a text editor which could handle that size (I have 512-meg RAM, and I know some editors try to load the whole thing into memory first) Then the real crunch: The following is the expression which is given on a website to do exactly what I need: [^\received: from .*\r and the second, [^\t].*\rReceived: from .*\r but they both appear to have syntax errors!!! Can anyone help fix these up please - it´s rather urgent. Thanks a lot john Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com
Re: [expert] OT: regular expression help
John Haywood wrote: Sorry to ask here -it´s just that I know someone here can fix me up (shameless grovel!!). I´m in a bit of a bind, trying to recover a corrupt file for a client. The first issue is that the file is 1.8gig, so I´m wondering if I´m going to have problems with cat file|grep regex outputfile.txt Do you need the cat? Someone please tell me if I'm wrong, but shouldn't this work: grep regex file outputfile.txt One less action to get in the way. or is there a text editor which could handle that size (I have 512-meg RAM, and I know some editors try to load the whole thing into memory first) Then the real crunch: The following is the expression which is given on a website to do exactly what I need: [^\received: from .*\r and the second, [^\t].*\rReceived: from .*\r but they both appear to have syntax errors!!! What are the exact strings that you are looking for? It may be a little bit easier to determine what characters you need in your search pattern if we knew that. Can anyone help fix these up please - it´s rather urgent. Thanks a lot john -- Brant Fitzsimmons [EMAIL PROTECTED] Liberty means responsibility. That is why most men dread it. -George Bernard Shaw, Man and Superman (1903) Maxims for Revolutionists Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com