Re: Extracting patterns after matching a regex

MRAB Tue, 08 Sep 2009 07:55:01 -0700

Mart. wrote:

On Sep 8, 3:14 pm, "Andreas Tawn" <[email protected]> wrote:

Hi,
I need to extract a string after a matching a regular expression. For
example I have the string...
s = "FTPHOST: e4ftl01u.ecs.nasa.gov"
and once I match "FTPHOST" I would like to extract
"e4ftl01u.ecs.nasa.gov". I am not sure as to the best approach to the
problem, I had been trying to match the string using something like
this:
m = re.findall(r"FTPHOST", s)
But I couldn't then work out how to return the "e4ftl01u.ecs.nasa.gov"
part. Perhaps I need to find the string and then split it? I had some
help with a similar problem, but now I don't seem to be able to
transfer that to this problem!
Thanks in advance for the help,
Martin

No need for regex.
s = "FTPHOST: e4ftl01u.ecs.nasa.gov"
If "FTPHOST" in s:
    return s[9:]
Cheers,
Drea

Sorry perhaps I didn't make it clear enough, so apologies. I only
presented the example  s = "FTPHOST: e4ftl01u.ecs.nasa.gov" as I
thought this easily encompassed the problem. The solution presented
works fine for this i.e. re.search(r'FTPHOST: (.*)',s).group(1). But
when I used this on the actual file I am trying to parse I realised it
is slightly more complicated as this also pulls out other information,
for example it prints
e4ftl01u.ecs.nasa.gov\r\n', 'FTPDIR: /PullDir/0301872638CySfQB\r\n',
'Ftp Pull Download Links: \r\n', 'ftp://e4ftl01u.ecs.nasa.gov/PullDir/
0301872638CySfQB\r\n', 'Down load ZIP file of packaged order:\r\n',
etc. So I need to find a way to stop it before the \r
slicing the string wouldn't work in this scenario as I can envisage a
situation where the string lenght increases and I would prefer not to
keep having to change the string.

If, as Terry suggested, you do have a tuple of strings and the first element has FTPHOST, 
then s[0].split(":")[1].strip() will work.


It is an email which contains information before and after the main
section I am interested in, namely...

FINISHED: 09/07/2009 08:42:31

MEDIATYPE: FtpPull
MEDIAFORMAT: FILEFORMAT
FTPHOST: e4ftl01u.ecs.nasa.gov
FTPDIR: /PullDir/0301872638CySfQB
Ftp Pull Download Links:
ftp://e4ftl01u.ecs.nasa.gov/PullDir/0301872638CySfQB
Down load ZIP file of packaged order:
ftp://e4ftl01u.ecs.nasa.gov/PullDir/0301872638CySfQB.zip
FTPEXPR: 09/12/2009 08:42:31
MEDIA 1 of 1
MEDIAID:

I have been doing this to turn the email into a string

email = sys.argv[1]
f = open(email, 'r')
s = str(f.readlines())

To me that seems a strange thing to do. You could just read the entire
file as a string:

    f = open(email, 'r')
    s = f.read()

so FTPHOST isn't the first element, it is just part of a larger
string. When I turn the email into a string it looks like...

'FINISHED: 09/07/2009 08:42:31\r\n', '\r\n', 'MEDIATYPE: FtpPull\r\n',
'MEDIAFORMAT: FILEFORMAT\r\n', 'FTPHOST: e4ftl01u.ecs.nasa.gov\r\n',
'FTPDIR: /PullDir/0301872638CySfQB\r\n', 'Ftp Pull Download Links: \r
\n', 'ftp://e4ftl01u.ecs.nasa.gov/PullDir/0301872638CySfQB\r\n', 'Down
load ZIP file of packaged order:\r\n',

So not sure splitting it like you suggested works in this case.


--
http://mail.python.org/mailman/listinfo/python-list

Re: Extracting patterns after matching a regex

Reply via email to