En Sat, 24 Jan 2009 19:03:26 -0200, Sean Brown gmail.com>
<"<sbrown.home"@[spammy]> escribió:
Using python 2.4.4 on OpenSolaris 2008.11
I have the following string created by opening a url that has the
following string in it:
td[ct] = [[ ... ]];\r\n
The ... above is what I'm interested in extracting which is really a
whole bunch of text. So I think the regex \[\[(.*)\]\]; should do it.
The problem is it appears that python is escaping the \ in the regex
because I see this:
reg = '\[\[(.*)\]\];'
reg
'\\[\\[(.*)\\]\\];'
Now to me looks like it would match the string - \[\[ ... \]\];
No. Python escape character is the backslash \; if you want to include a
backslash inside a string, you have to double it. By example, these are
all single character strings: 'a' '\n' '\\'
Coincidentally (or not), the backslash has a similar meaning in a regular
expression: if you want a string containing \a (two characters) you should
write "\\a".
That's rather tedious and error prone. To help with this, Python allows
for "raw-string literals", where no escape interpretation is done. Just
put an r before the opening quote: r"\(\d+\)" (seven characters; matches
numbers inside parenthesis).
Also, note that when you *evaluate* an expression in the interpreter (like
the lone "reg" above), it prints the "repr" of the result: for a string,
it is the escaped contents surrounded by quotes. (That's very handy when
debugging, but may be confusing if don't know how to interpret it)
Third, Python is very permissive with wrong escape sequences: they just
end up in the string, instead of flagging them as an error. In your case,
\[ is an invalid escape sequence, which is left untouched in the string.
py> reg = r'\[\[(.*)\]\];'
py> reg
'\\[\\[(.*)\\]\\];'
py> print reg
\[\[(.*)\]\];
py> len(reg)
13
Which obviously doesn't match anything because there are no literal \ in
the above string. Leaving the \ out of the \[\[ above has re.compile
throw an error because [ is a special regex character. Which is why it
needs to be escaped in the first place.
It works in this example:
py> txt = """
... Some text
... and td[ct] = [[ more things ]];
... more text"""
py> import re
py> m = re.search(reg, txt)
py> m
<_sre.SRE_Match object at 0x00AC66A0>
py> m.groups()
(' more things ',)
So maybe your r.e. doesn't match the text (the final ";"? whitespace?)
For more info, see the Regular Expressions HOWTO at
http://docs.python.org/howto/regex.html
--
Gabriel Genellina
--
http://mail.python.org/mailman/listinfo/python-list