On 21.12.2009 12:38, Oltmans wrote:
Hello,. everyone.
I've a string that looks something like
----
lksjdfls<div id ='amazon_345343'> kdjff lsdfs</div> sdjfls<div id
= "amazon_35343433">sdfsd</div><div id='amazon_8898'>welcome</div>
----
> From above string I need the digits within the ID attribute. For
example, required output from above string is
- 35343433
- 345343
- 8898
I've written this regex that's kind of working
re.findall("\w+\s*\W+amazon_(\d+)",str)
but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.
If you filter in two or even more sequential steps the problem becomes a
lot simpler, not least because you can
test each step separately:
>>> r1 = re.compile ('<div id\D*\d+[^>]*') # Add ignore case and
variable white space
>>> r2 = re.compile ('\d+')
>>> [r2.search (item).group () for item in r1.findall (s) if item]
# s is your sample
['345343', '35343433', '8898'] # Supposing all ids have digits
Frederic
--
http://mail.python.org/mailman/listinfo/python-list