"James Thiele" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > I was helping a guy at work with regular expressions and found > something I didn't expect: > > >>> re.match('\d', '7').group() > '7' > >>> re.match('\\d', '7').group() > '7' > >>> > > It's not clear to me why these are the same. Could someone please > explain? >
This is not a feature of regexp's at all, but of Python strings. If the backslash precedes a character that is not normally interpreted, then it is treated like just a backslash. Look at this sample from the Python command line: >>> s = "\d" >>> s '\\d' >>> s = "\t" >>> s '\t' >>> This is one reason why Python programmers who use regexp's use the "raw" notation to create strings (this is often misnomered as a "raw string", but the resulting string is an ordinary string in every respect - what is "raw" about it is the disabling of escape behavior of any backslashes that are not the last character in the string). It is painful enough to litter your regexp with backslashes, just because you have the misfortune of having to match a '.', '+', '?', '*', or brackets or parentheses in your expression, without having to double up the backslashes for escaping purposes. Consider these sample statements: >>> "\d" == "\\d" True >>> "\t" == "\\t" False >>> r"\t" == "\\t" True >>> So your question is really a string question - you just happened to trip over it while defining a regexp. -- Paul -- http://mail.python.org/mailman/listinfo/python-list