Re: escapes in regular expressions

Paul McGuire Sun, 21 May 2006 11:55:49 -0700

"James Thiele" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> I was helping a guy at work with regular expressions and found
> something I didn't expect:
>
> >>> re.match('\d', '7').group()
> '7'
> >>> re.match('\\d', '7').group()
> '7'
> >>>
>
> It's not clear to me why these are the same. Could someone please
> explain?
>


This is not a feature of regexp's at all, but of Python strings.  If the
backslash precedes a character that is not normally interpreted, then it is
treated like just a backslash.  Look at this sample from the Python command
line:

>>> s = "\d"
>>> s
'\\d'
>>> s = "\t"
>>> s
'\t'
>>>

This is one reason why Python programmers who use regexp's use the "raw"
notation to create strings (this is often misnomered as a "raw string", but
the resulting string is an ordinary string in every respect - what is "raw"
about it is the disabling of escape behavior of any backslashes that are not
the last character in the string).  It is painful enough to litter your
regexp with backslashes, just because you have the misfortune of having to
match a '.', '+', '?', '*', or brackets or parentheses in your expression,
without having to double up the backslashes for escaping purposes.  Consider
these sample statements:

>>> "\d" == "\\d"
True
>>> "\t" == "\\t"
False
>>> r"\t" == "\\t"
True
>>>

So your question is really a string question - you just happened to trip
over it while defining a regexp.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: escapes in regular expressions

Reply via email to