Re: Retrieve url's of all jpegs at a web page URL

2009-09-15 Thread Paul McGuire
On Sep 15, 11:32 pm, Stefan Behnel  wrote:
> Also untested:
>
>         from lxml import html
>
>         doc = html.parse(page_url)
>         doc.make_links_absolute(page_url)
>
>         urls = [ img.src for img in doc.xpath('//img') ]
>
> Then use e.g. urllib2 to save the images.

Looks similar to what a pyparsing approach would look like:

from pyparsing import makeHTMLTags, htmlComment

import urllib
html = urllib.urlopen(url).read()

imgTag = makeHTMLTags("img")[0]
imgTag.ignore(htmlComment)

urls = [ img.src for img in imgTag.searchString(html) ]

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Retrieve url's of all jpegs at a web page URL

2009-09-15 Thread Stefan Behnel
Chris Rebert wrote:
> page_url = "http://the.url.here";
> 
> with urllib.urlopen(page_url) as f:
> soup = BeautifulSoup(f.read())
> for img_tag in soup.findAll("img"):
> relative_url = img_tag.src
> img_url = make_absolute(relative_url, page_url)
> save_image_from_url(img_url)
> 
> 2. Write make_absolute() and save_image_from_url()

Note that lxml.html provides a make_links_absolute() function.

Also untested:

from lxml import html

doc = html.parse(page_url)
doc.make_links_absolute(page_url)

urls = [ img.src for img in doc.xpath('//img') ]

Then use e.g. urllib2 to save the images.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Retrieve url's of all jpegs at a web page URL

2009-09-15 Thread Chris Rebert
On Tue, Sep 15, 2009 at 7:28 AM, grimmus  wrote:
> Hi,
>
> I would like to achieve something like Facebook has when you post a
> link. It shows images located at the URL you entered so you can choose
> what one to display as a summary.
>
> I was thinking i could loop through the html of a page with a regex
> and store all the jpeg url's in an array. Then, i could open the
> images one by one and save them as thumbnails with something like
> below.

0. Install BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/)

1:
#untested
from BeautifulSoup import BeautifulSoup
import urllib

page_url = "http://the.url.here";

with urllib.urlopen(page_url) as f:
soup = BeautifulSoup(f.read())
for img_tag in soup.findAll("img"):
relative_url = img_tag.src
img_url = make_absolute(relative_url, page_url)
save_image_from_url(img_url)

2. Write make_absolute() and save_image_from_url()

3. Profit.

Cheers,
Chris
--
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Retrieve url's of all jpegs at a web page URL

2009-09-15 Thread grimmus
Hi,

I would like to achieve something like Facebook has when you post a
link. It shows images located at the URL you entered so you can choose
what one to display as a summary.

I was thinking i could loop through the html of a page with a regex
and store all the jpeg url's in an array. Then, i could open the
images one by one and save them as thumbnails with something like
below.

Can anyone provide some advice on how best to achieve this ?

Sorry for the noob questions and syntax !!

from PIL import Image
import urllib

image_uri = 'http://www.gstatic.com/news/img/bluelogo/en_ie/news.gif'
PILInfo = Image.open(urllib.urlretrieve(image_uri)[0])
PILInfo.save("saved-image.gif");
-- 
http://mail.python.org/mailman/listinfo/python-list