Bugs item #1518406, was opened at 2006-07-06 21:26 Message generated for change (Comment added) made by niemeyer You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1518406&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Regular Expressions Group: Python 2.4 Status: Closed Resolution: Invalid Priority: 5 Submitted By: ollie oldham (ooldham) Assigned to: Gustavo Niemeyer (niemeyer) Summary: re '\' char interpretation problem Initial Comment: I've run across 2 problems having to do with '\' character problems with the re module. Problem 1 does not match the re when it should have. Problem 2 matches, when it should not have. There is a short snippet of code attached that shows the problems I'm having, and the output as it occurs on my machine. I'm running on Windows 2000 Python versions: 2.4b1 and 2.4.3c1 both act the same way. Problem (1) : why does * work and not + ? import re rex = re.compile(r'[a-z]:\.*', re.IGNORECASE) rey = re.compile(r'[a-z]:\.+', re.IGNORECASE) path1 = r'D:\Logs' print rex.match(path1) # Matches - as it should have. print rey.match(path1) # FAILES to match - should have. Problem 2) : match occurs on nonUncPath when it should not import re uncPath = r'\\someUNC\path' nonUncPath = r'\nonUnc\path' rew = re.compile('\\\\.+', re.IGNORECASE) print rew.match(uncPath) # works as it should. print rew.match(nonUncPath) # matches and it should NOT. ---------------------------------------------------------------------- >Comment By: Gustavo Niemeyer (niemeyer) Date: 2006-07-06 22:55 Message: Logged In: YES user_id=7887 Please, use a single way to report issues. Do not message *and* add a comment to the bug. I think you're missing the behavior of r'' in Python. It changes the way the Python interpreter parses the string, not the way the regular expression compiler/interpreter works. r'\.' is precisely the same as '\\.', and both of them really describe the string |\.|. >>> r'\.' == '\\.' True >>> print r'\.' \. Escaping a dot means a real dot. Please have a look at the re module documentation and perhaps some general regular expression info for more details. ---------------------------------------------------------------------- Comment By: ollie oldham (ooldham) Date: 2006-07-06 22:46 Message: Logged In: YES user_id=649833 I beg to differ on problem 1) Since ârâ was used in the definition of both the re and path, the â.â Char is not being escaped (not supposed to be anyway). And even if it is, then rex=re.compile(â[a-z]:\\.+â, re.IGNORECASE) should get me what I want (in textual form:: char a-z colon backslash with 1 or more trailing chars). But that does not work either. I beg to differ on item 2) as well: Yes - '\\\\.+' is the equivalent of r'\\.+' BUT I then read this as: 2 backslashes with 1 or more chars â NOT backslash with escaped â.â ---------------------------------------------------------------------- Comment By: Gustavo Niemeyer (niemeyer) Date: 2006-07-06 21:36 Message: Logged In: YES user_id=7887 1) r'[a-z]:\.+' should not match r'D:\Logs'. r'\.+' matches one or more dots. There's no dot in this string. 2) '\\\\.+' is the equivalent of r'\\.+', and should match anything that starts with a '\' and has at least one char following it, which includes r'\nonUnc\path'. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1518406&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com