<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi, > I am new to python regular expression, I would like to use it to get an > attribute of an html element from an html file? > > for example, I was able to read the html file using this: > req = urllib2.Request(url=acaURL) > f = urllib2.urlopen(req) > > data = f.read() > > my question is how can I just get the src attribute value of an img > tag? > something like this: > (.*)<img src="href of the image source">(.*) > > I need to get the href of the image source. > > Thanks. >
As Fredrik pointed out, re's are not the only tool out there. Here's a pyparsing solution. -- Paul import pyparsing import urllib # define HTML tag format using makeHTMLTags helper # (we don't really care about the ending </img> tag, # even though makeHTMLTags returns definitions for both # starting and ending tag patterns) imgStartTag, dummy = pyparsing.makeHTMLTags("img") # get HTML source from some web site htmlPage = urllib.urlopen("http://www.yahoo.com") htmlSource = htmlPage.read() htmlPage.close() # scan HTML source, printing SRC attribute from each <img> tag for tokens,start,end in imgStartTag.scanString(htmlSource): print tokens.src Prints: http://us.i1.yimg.com/us.yimg.com/i/ww/beta/edit_plink.gif http://us.i1.yimg.com/us.yimg.com/i/ww/bt1/125.gif http://us.i1.yimg.com/us.yimg.com/i/ww/bt1/13441.gif http://us.i1.yimg.com/us.yimg.com/i/ww/bt1/136.gif http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif http://us.i1.yimg.com/us.yimg.com/i/ww/bt1/ml.gif http://us.i1.yimg.com/us.yimg.com/i/ww/bt1/my.gif http://us.i1.yimg.com/us.yimg.com/i/ww/bt1/msgn.gif http://us.i1.yimg.com/us.yimg.com/i/ww/v5_mail_t2.gif http://us.i1.yimg.com/us.yimg.com/i/mntl/aut/06q2/hea_0411.gif http://us.i1.yimg.com/us.yimg.com/i/mntl/aut/06q2/img_0607.jpg http://us.i1.yimg.com/us.yimg.com/i/ww/news/2006/06/07/0607notorious_big.jpg http://us.i1.yimg.com/us.yimg.com/i/ww/beta/news/video.gif http://us.i1.yimg.com/us.yimg.com/i/buzz/2006/06/wholefoodssmall.jpg http://us.i1.yimg.com/us.yimg.com/i/mntl/msg/06q2/img_im.jpg http://us.i1.yimg.com/us.yimg.com/i/ww/trfc_bckt.gif http://us.i1.yimg.com/us.yimg.com/i/mntl/sh/04q2/camera.gif -- http://mail.python.org/mailman/listinfo/python-list