Re: [Tutor] Regex/Raw String confusion

2016-08-04 Thread Jim Byrnes

On 08/04/2016 03:27 AM, Alan Gauld via Tutor wrote:

On 04/08/16 02:54, Jim Byrnes wrote:


Is the second example a special case?

phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')

I ask because it produces the same results with or without the ' r '.


That's because in this specific case there are no conflicts between
the regex escape codes and the Python escape codes. In other
words Python does not treat '\(' or '\d' as special characters
so it doesn't change the string passed to the regex.
(It would be a different story if you had used, say, a
'\x' or '\n' or '\b' in the regex.)

In general you should proceed with caution and assume that
there might be a Python escape sequence lurking in the regex
and use raw just in case.



Ok, thanks again.  I understand what is going on now.

Regards,  Jim

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex/Raw String confusion

2016-08-04 Thread Alan Gauld via Tutor
On 04/08/16 02:54, Jim Byrnes wrote:

> Is the second example a special case?
> 
> phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
> 
> I ask because it produces the same results with or without the ' r '.

That's because in this specific case there are no conflicts between
the regex escape codes and the Python escape codes. In other
words Python does not treat '\(' or '\d' as special characters
so it doesn't change the string passed to the regex.
(It would be a different story if you had used, say, a
'\x' or '\n' or '\b' in the regex.)

In general you should proceed with caution and assume that
there might be a Python escape sequence lurking in the regex
and use raw just in case.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex/Raw String confusion

2016-08-03 Thread David Rock

> On Aug 3, 2016, at 20:54, Jim Byrnes  wrote:
> 
> Is the second example a special case?
> 
> phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
> mo = phoneNumRegex.search('My phone number is: (415) 555-4242.')
> print(mo.group(1))
> print()
> print(mo.group(2))
> 
> I ask because it produces the same results with or without the ' r '.

No, it’s not a special case.  The backslashes in this case are a way to 
simplify what could otherwise be very unwieldy.  There are several of these 
character groups (called special sequences in the documentation).  For example, 
\s means any whitespace character, \w means any alphanumeric or underscore,  \d 
means any digit, etc.

You can look them up in the docs:
https://docs.python.org/2/library/re.html


— 
David Rock
da...@graniteweb.com




___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Regex/Raw String confusion

2016-08-03 Thread Alan Gauld via Tutor
On 03/08/16 20:49, Jim Byrnes wrote:

> Regular Expressions he talks about the python escape character being a 
> '\' and regex using alot of backslashes. 

In effect there are two levels of escape character, python and
the regex processor. Unfortunately they both use backslash!
Python applies its level of escape first then passes the
modified string to the regex engine which processes the
remaining regex escapes. It is confusing and one reason
you should avoid complex regexes if possible.

> by putting an r before the first quote of the string value, you can 
> mark the string as a raw sting, which does not escape characters.

This avoids python  trying to process the escapes.
The raw string is then passed to the regex which will
process the backslash escapes that it recognises.

> A couple of pages later he talks about parentheses having special 
> meaning in regex and what to do if they are in your text.
> 
> In this case, you need to escape the ( and )  characters with a 
> backslash. The \( and \) escape characters in the raw string passed to 
> re.compile() will match actual parenthesis characters.

These are regex escape characters. If you did not have the r in front
you would need to double escape them:

\\( and \\)

So by using the raw string you avoid the initial layer of
escaping by the python interpreter and only need to worry
about the regex parser - which is more than enough for anyone
to worry about!

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Regex/Raw String confusion

2016-08-03 Thread Jim Byrnes
I am reading Automate The Boring Stuff With Python.  In the chapter on 
Regular Expressions he talks about the python escape character being a 
'\' and regex using alot of backslashes. Then he says,  However, 
by putting an r before the first quote of the string value, you can 
mark the string as a raw sting, which does not escape characters.


He give this example:

import re

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print('Phone number found: ' + mo.group())


A couple of pages later he talks about parentheses having special 
meaning in regex and what to do if they are in your text.


In this case, you need to escape the ( and )  characters with a 
backslash. The \( and \) escape characters in the raw string passed to 
re.compile() will match actual parenthesis characters.


import re

phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My phone number is: (415) 555-4242.')
print(mo.group(1))
print()
print(mo.group(2))

Both examples work, but one place he says you can't escape raw strings 
and the other he says you can.  What am I missing here?


Regards, Jim

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor