Oltmans wrote: > I've a string that looks something like > ---- > lksjdfls <div id ='amazon_345343'> kdjff lsdfs </div> sdjfls <div id > = "amazon_35343433">sdfsd</div><div id='amazon_8898'>welcome</div> > ---- > > From above string I need the digits within the ID attribute. For > example, required output from above string is > - 35343433 > - 345343 > - 8898 > > I've written this regex that's kind of working > re.findall("\w+\s*\W+amazon_(\d+)",str) > > but I was just wondering that there might be a better RegEx to do that > same thing. Can you kindly suggest a better/improved Regex. Thank you > in advance.
>>> from BeautifulSoup import BeautifulSoup >>> bs = BeautifulSoup("""lksjdfls <div id ='amazon_345343'> kdjff lsdfs </div> sdjfls <div id ... = "amazon_35343433">sdfsd</div><div id='amazon_8898'>welcome</div>""") >>> [node["id"][7:] for node in bs(id=lambda id: id.startswith("amazon_"))] [u'345343', u'35343433', u'8898'] I think BeautifulSoup is a better tool for the task since it actually "understands" HTML. Peter -- http://mail.python.org/mailman/listinfo/python-list