Re: [Tutor] Regex/Raw String confusion
On 08/04/2016 03:27 AM, Alan Gauld via Tutor wrote: On 04/08/16 02:54, Jim Byrnes wrote: Is the second example a special case? phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)') I ask because it produces the same results with or without the ' r '. That's because in this specific case there are no conflicts between the regex escape codes and the Python escape codes. In other words Python does not treat '\(' or '\d' as special characters so it doesn't change the string passed to the regex. (It would be a different story if you had used, say, a '\x' or '\n' or '\b' in the regex.) In general you should proceed with caution and assume that there might be a Python escape sequence lurking in the regex and use raw just in case. Ok, thanks again. I understand what is going on now. Regards, Jim ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex/Raw String confusion
On 04/08/16 02:54, Jim Byrnes wrote: > Is the second example a special case? > > phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)') > > I ask because it produces the same results with or without the ' r '. That's because in this specific case there are no conflicts between the regex escape codes and the Python escape codes. In other words Python does not treat '\(' or '\d' as special characters so it doesn't change the string passed to the regex. (It would be a different story if you had used, say, a '\x' or '\n' or '\b' in the regex.) In general you should proceed with caution and assume that there might be a Python escape sequence lurking in the regex and use raw just in case. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex/Raw String confusion
> On Aug 3, 2016, at 20:54, Jim Byrneswrote: > > Is the second example a special case? > > phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)') > mo = phoneNumRegex.search('My phone number is: (415) 555-4242.') > print(mo.group(1)) > print() > print(mo.group(2)) > > I ask because it produces the same results with or without the ' r '. No, it’s not a special case. The backslashes in this case are a way to simplify what could otherwise be very unwieldy. There are several of these character groups (called special sequences in the documentation). For example, \s means any whitespace character, \w means any alphanumeric or underscore, \d means any digit, etc. You can look them up in the docs: https://docs.python.org/2/library/re.html — David Rock da...@graniteweb.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex/Raw String confusion
On 03/08/16 20:49, Jim Byrnes wrote: > Regular Expressions he talks about the python escape character being a > '\' and regex using alot of backslashes. In effect there are two levels of escape character, python and the regex processor. Unfortunately they both use backslash! Python applies its level of escape first then passes the modified string to the regex engine which processes the remaining regex escapes. It is confusing and one reason you should avoid complex regexes if possible. > by putting an r before the first quote of the string value, you can > mark the string as a raw sting, which does not escape characters. This avoids python trying to process the escapes. The raw string is then passed to the regex which will process the backslash escapes that it recognises. > A couple of pages later he talks about parentheses having special > meaning in regex and what to do if they are in your text. > > In this case, you need to escape the ( and ) characters with a > backslash. The \( and \) escape characters in the raw string passed to > re.compile() will match actual parenthesis characters. These are regex escape characters. If you did not have the r in front you would need to double escape them: \\( and \\) So by using the raw string you avoid the initial layer of escaping by the python interpreter and only need to worry about the regex parser - which is more than enough for anyone to worry about! -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Regex/Raw String confusion
I am reading Automate The Boring Stuff With Python. In the chapter on Regular Expressions he talks about the python escape character being a '\' and regex using alot of backslashes. Then he says, However, by putting an r before the first quote of the string value, you can mark the string as a raw sting, which does not escape characters. He give this example: import re phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') mo = phoneNumRegex.search('My number is 415-555-4242.') print('Phone number found: ' + mo.group()) A couple of pages later he talks about parentheses having special meaning in regex and what to do if they are in your text. In this case, you need to escape the ( and ) characters with a backslash. The \( and \) escape characters in the raw string passed to re.compile() will match actual parenthesis characters. import re phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)') mo = phoneNumRegex.search('My phone number is: (415) 555-4242.') print(mo.group(1)) print() print(mo.group(2)) Both examples work, but one place he says you can't escape raw strings and the other he says you can. What am I missing here? Regards, Jim ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor