regular expresions and dolar sign ($) in source string

2009-04-23 Thread Jax
Hi

I encountered problem with dolar sign in source string. It seems that $ require
special threatening. Below is copy of session with interactive Python's shell:

Python 2.5.2 (r252:60911, Jan  8 2009, 12:17:37)
[GCC 4.3.2] on linux2
Type help, copyright, credits or license for more information.
 import re
 a = unicode(r(instead of $399.99), utf8)
 print re.search(unicode(r^\(instead of.*(\d+[.]\d+)\)$, utf8),
a).group(1)
9.99
 print re.search(unicode(r^\(.*(\d+[.]\d+)\)$, utf8), a).group(1)
9.99
 print re.search(unicode(r^\(.*\$(\d+[.]\d+)\)$, utf8), a).group(1)
399.99

My question is: Why only third regular expression is correct?

Please help! It boggles my mind.

Jax
--
http://mail.python.org/mailman/listinfo/python-list


Re: regular expresions and dolar sign ($) in source string

2009-04-23 Thread Peter Otten
Jax wrote:

 I encountered problem with dolar sign in source string. It seems that $
 require special threatening. Below is copy of session with interactive
 Python's shell:
 
 Python 2.5.2 (r252:60911, Jan  8 2009, 12:17:37)
 [GCC 4.3.2] on linux2
 Type help, copyright, credits or license for more information.
 import re
 a = unicode(r(instead of $399.99), utf8)
 print re.search(unicode(r^\(instead of.*(\d+[.]\d+)\)$, utf8),
 a).group(1)
 9.99
 print re.search(unicode(r^\(.*(\d+[.]\d+)\)$, utf8), a).group(1)
 9.99
 print re.search(unicode(r^\(.*\$(\d+[.]\d+)\)$, utf8), a).group(1)
 399.99
 
 My question is: Why only third regular expression is correct?

They are all correct, they just don't give what you expect. This has nothing
to do with the $. The .* expression is greedy, it tries to match as
many characters as possible. You can see that by adding another group:

 a = u(instead of $399.99)
 re.search(ur^\(instead of(.*)(\d+[.]\d+)\)$, a).groups()
(u' $39', u'9.99')

Fortunately there is also a non-greedy variant .*? which matches as few
characters as possible:

 a = u(instead of $399.99)
 re.search(ur^\(instead of.*?(\d+[.]\d+)\)$, a).group(1)
u'399.99'

Peter
--
http://mail.python.org/mailman/listinfo/python-list