On 21.12.2009 12:38, Oltmans wrote:
Hello,. everyone.

I've a string that looks something like
----
lksjdfls<div id ='amazon_345343'>  kdjff lsdfs</div>  sdjfls<div id
=   "amazon_35343433">sdfsd</div><div id='amazon_8898'>welcome</div>
----

> From above string I need the digits within the ID attribute. For
example, required output from above string is
- 35343433
- 345343
- 8898

I've written this regex that's kind of working
re.findall("\w+\s*\W+amazon_(\d+)",str)

but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.

If you filter in two or even more sequential steps the problem becomes a lot simpler, not least because you can
test each step separately:

>>> r1 = re.compile ('<div id\D*\d+[^>]*') # Add ignore case and variable white space
>>> r2 = re.compile ('\d+')
>>> [r2.search (item).group () for item in r1.findall (s) if item] # s is your sample
['345343', '35343433', '8898']     # Supposing all ids have digits

Frederic

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to