Re: Retrieve url's of all jpegs at a web page URL
On Sep 15, 11:32 pm, Stefan Behnel wrote: > Also untested: > > from lxml import html > > doc = html.parse(page_url) > doc.make_links_absolute(page_url) > > urls = [ img.src for img in doc.xpath('//img') ] > > Then use e.g. urllib2 to save the images. Looks similar to what a pyparsing approach would look like: from pyparsing import makeHTMLTags, htmlComment import urllib html = urllib.urlopen(url).read() imgTag = makeHTMLTags("img")[0] imgTag.ignore(htmlComment) urls = [ img.src for img in imgTag.searchString(html) ] -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Retrieve url's of all jpegs at a web page URL
Chris Rebert wrote: > page_url = "http://the.url.here"; > > with urllib.urlopen(page_url) as f: > soup = BeautifulSoup(f.read()) > for img_tag in soup.findAll("img"): > relative_url = img_tag.src > img_url = make_absolute(relative_url, page_url) > save_image_from_url(img_url) > > 2. Write make_absolute() and save_image_from_url() Note that lxml.html provides a make_links_absolute() function. Also untested: from lxml import html doc = html.parse(page_url) doc.make_links_absolute(page_url) urls = [ img.src for img in doc.xpath('//img') ] Then use e.g. urllib2 to save the images. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Retrieve url's of all jpegs at a web page URL
On Tue, Sep 15, 2009 at 7:28 AM, grimmus wrote: > Hi, > > I would like to achieve something like Facebook has when you post a > link. It shows images located at the URL you entered so you can choose > what one to display as a summary. > > I was thinking i could loop through the html of a page with a regex > and store all the jpeg url's in an array. Then, i could open the > images one by one and save them as thumbnails with something like > below. 0. Install BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/) 1: #untested from BeautifulSoup import BeautifulSoup import urllib page_url = "http://the.url.here"; with urllib.urlopen(page_url) as f: soup = BeautifulSoup(f.read()) for img_tag in soup.findAll("img"): relative_url = img_tag.src img_url = make_absolute(relative_url, page_url) save_image_from_url(img_url) 2. Write make_absolute() and save_image_from_url() 3. Profit. Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Retrieve url's of all jpegs at a web page URL
Hi, I would like to achieve something like Facebook has when you post a link. It shows images located at the URL you entered so you can choose what one to display as a summary. I was thinking i could loop through the html of a page with a regex and store all the jpeg url's in an array. Then, i could open the images one by one and save them as thumbnails with something like below. Can anyone provide some advice on how best to achieve this ? Sorry for the noob questions and syntax !! from PIL import Image import urllib image_uri = 'http://www.gstatic.com/news/img/bluelogo/en_ie/news.gif' PILInfo = Image.open(urllib.urlretrieve(image_uri)[0]) PILInfo.save("saved-image.gif"); -- http://mail.python.org/mailman/listinfo/python-list