En Sat, 24 Jan 2009 19:03:26 -0200, Sean Brown gmail.com> <"<sbrown.home"@[spammy]> escribió:

Using python 2.4.4 on OpenSolaris 2008.11

I have the following string created by opening a url that has the
following string in it:

td[ct] = [[ ... ]];\r\n

The ...  above is what I'm interested in extracting which is really a
whole bunch of text. So I think the regex \[\[(.*)\]\]; should do it.
The problem is it appears that python is escaping the \ in the regex
because I see this:
reg = '\[\[(.*)\]\];'
reg
'\\[\\[(.*)\\]\\];'

Now to me looks like it would match the string - \[\[ ... \]\];

No. Python escape character is the backslash \; if you want to include a backslash inside a string, you have to double it. By example, these are all single character strings: 'a' '\n' '\\' Coincidentally (or not), the backslash has a similar meaning in a regular expression: if you want a string containing \a (two characters) you should write "\\a". That's rather tedious and error prone. To help with this, Python allows for "raw-string literals", where no escape interpretation is done. Just put an r before the opening quote: r"\(\d+\)" (seven characters; matches numbers inside parenthesis).

Also, note that when you *evaluate* an expression in the interpreter (like the lone "reg" above), it prints the "repr" of the result: for a string, it is the escaped contents surrounded by quotes. (That's very handy when debugging, but may be confusing if don't know how to interpret it)

Third, Python is very permissive with wrong escape sequences: they just end up in the string, instead of flagging them as an error. In your case, \[ is an invalid escape sequence, which is left untouched in the string.

py> reg = r'\[\[(.*)\]\];'
py> reg
'\\[\\[(.*)\\]\\];'
py> print reg
\[\[(.*)\]\];
py> len(reg)
13

Which obviously doesn't match anything because there are no literal \ in
the above string. Leaving the \ out of the \[\[ above has re.compile
throw an error because [ is a special regex character. Which is why it
needs to be escaped in the first place.

It works in this example:

py> txt = """
... Some text
... and td[ct] = [[ more things ]];
... more text"""
py> import re
py> m = re.search(reg, txt)
py> m
<_sre.SRE_Match object at 0x00AC66A0>
py> m.groups()
(' more things ',)

So maybe your r.e. doesn't match the text (the final ";"? whitespace?)
For more info, see the Regular Expressions HOWTO at http://docs.python.org/howto/regex.html

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to