Re: RegEx issues

Gabriel Genellina Sat, 24 Jan 2009 16:47:43 -0800

En Sat, 24 Jan 2009 19:03:26 -0200, Sean Brown gmail.com><"<sbrown.home"@[spammy]> escribió:

Using python 2.4.4 on OpenSolaris 2008.11


I have the following string created by opening a url that has the
following string in it:

td[ct] = [[ ... ]];\r\n

The ...  above is what I'm interested in extracting which is really a
whole bunch of text. So I think the regex \[\[(.*)\]\]; should do it.
The problem is it appears that python is escaping the \ in the regex
because I see this:

reg = '\[\[(.*)\]\];'
reg

'\\[\\[(.*)\\]\\];'

Now to me looks like it would match the string - \[\[ ... \]\];

No. Python escape character is the backslash \; if you want to include abackslash inside a string, you have to double it. By example, these areall single character strings: 'a' '\n' '\\'Coincidentally (or not), the backslash has a similar meaning in a regularexpression: if you want a string containing \a (two characters) you shouldwrite "\\a".That's rather tedious and error prone. To help with this, Python allowsfor "raw-string literals", where no escape interpretation is done. Justput an r before the opening quote: r"\(\d+\)" (seven characters; matchesnumbers inside parenthesis).

Also, note that when you *evaluate* an expression in the interpreter (likethe lone "reg" above), it prints the "repr" of the result: for a string,it is the escaped contents surrounded by quotes. (That's very handy whendebugging, but may be confusing if don't know how to interpret it)

Third, Python is very permissive with wrong escape sequences: they justend up in the string, instead of flagging them as an error. In your case,\[ is an invalid escape sequence, which is left untouched in the string.


py> reg = r'\[\[(.*)\]\];'
py> reg
'\\[\\[(.*)\\]\\];'
py> print reg
\[\[(.*)\]\];
py> len(reg)
13

Which obviously doesn't match anything because there are no literal \ in
the above string. Leaving the \ out of the \[\[ above has re.compile
throw an error because [ is a special regex character. Which is why it
needs to be escaped in the first place.


It works in this example:

py> txt = """
... Some text
... and td[ct] = [[ more things ]];
... more text"""
py> import re
py> m = re.search(reg, txt)
py> m
<_sre.SRE_Match object at 0x00AC66A0>
py> m.groups()
(' more things ',)

So maybe your r.e. doesn't match the text (the final ";"? whitespace?)

For more info, see the Regular Expressions HOWTO athttp://docs.python.org/howto/regex.html


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: RegEx issues

Reply via email to