Jax wrote:
I encountered problem with dolar sign in source string. It seems that $
require special threatening. Below is copy of session with interactive
Python's shell:
Python 2.5.2 (r252:60911, Jan 8 2009, 12:17:37)
[GCC 4.3.2] on linux2
Type help, copyright, credits or license for more information.
import re
a = unicode(r(instead of $399.99), utf8)
print re.search(unicode(r^\(instead of.*(\d+[.]\d+)\)$, utf8),
a).group(1)
9.99
print re.search(unicode(r^\(.*(\d+[.]\d+)\)$, utf8), a).group(1)
9.99
print re.search(unicode(r^\(.*\$(\d+[.]\d+)\)$, utf8), a).group(1)
399.99
My question is: Why only third regular expression is correct?
They are all correct, they just don't give what you expect. This has nothing
to do with the $. The .* expression is greedy, it tries to match as
many characters as possible. You can see that by adding another group:
a = u(instead of $399.99)
re.search(ur^\(instead of(.*)(\d+[.]\d+)\)$, a).groups()
(u' $39', u'9.99')
Fortunately there is also a non-greedy variant .*? which matches as few
characters as possible:
a = u(instead of $399.99)
re.search(ur^\(instead of.*?(\d+[.]\d+)\)$, a).group(1)
u'399.99'
Peter
--
http://mail.python.org/mailman/listinfo/python-list