Re: Scraping a web page
On Apr 7, 1:44 pm, Tim Chase wrote: > > f = urllib.urlopen("http://www.google.com";) > > s = f.read() > > > It is working, but it's returning the source of the page. Is there anyway I > > can get almost a screen capture of the page? > > This is the job of a browser -- to render the source HTML. As > such, you'd want to look into any of the browser-automation > libraries to hook into IE, FireFox, Opera, or maybe using the > WebKit/KHTML control. You may then be able to direct it to > render the HTML into a canvas you can then treat as an image. > > Another alternative might be provided by some web-services that > will render a page as HTML with various browsers and then send > you the result. However, these are usually either (1) > asynchronous or (2) paid services (or both). > > -tkc WX can render html. -- http://mail.python.org/mailman/listinfo/python-list
RE: Scraping a web page
In message , Support Desk wrote: > You could do something like below to get the rendered page. > > Import os > site = 'website.com' > X = os.popen('lynx --dump %s' % site).readlines() I wonder how easy it would be to get the page image in SVG format? I believe the Gecko HTML engine in Firefox already uses Cairo for its rendering, and Cairo supports SVG as one of its surface types. -- http://mail.python.org/mailman/listinfo/python-list
Re: Scraping a web page
> Is there anyway I > can get almost a screen capture of the page? I'm not sure exactly what you mean by "screen capture". But the webbrowser module in the standard lib might be of some help. You can use it to drive a web browser from Python. to load a page in your browser, you can do something like this: --- #! /usr/bin/env python import webbrowser url = 'http://www.google.com' webbrowser.open(url) --- -Corey -- http://mail.python.org/mailman/listinfo/python-list
RE: Scraping a web page
If your only interested in the Images, perhaps you want to use wget like: wget -r --accept=jpg,jpeg www.xyz.org or maybe this http://www.vex.net/~x/python_stuff.html BackCrawler <http://www.vex.net/%7Ex/files/backcrawler.zip> 1.1 A crude web spider with only one purpose: mercilessly suck the background images from all web pages it can find. Understands frames and redirects, uses MD5 to elimate duplicates. Need web page backgrounds? This'll get lots of them. Sadly, most are very tacky, and Backcrawler can't help with that. Requires Threads. _ From: Ronn Ross [mailto:ronn.r...@gmail.com] Sent: Tuesday, April 07, 2009 9:37 AM To: Support Desk Subject: Re: Scraping a web page This works great, but is there a way to do this with firefox or something similar so I can also print the images from the site? On Tue, Apr 7, 2009 at 9:58 AM, Support Desk wrote: You could do something like below to get the rendered page. Import os site = 'website.com' X = os.popen('lynx --dump %s' % site).readlines() -Original Message- From: Tim Chase [mailto:python.l...@tim.thechases.com] Sent: Tuesday, April 07, 2009 7:45 AM To: Ronn Ross Cc: python-list@python.org Subject: Re: Scraping a web page > f = urllib.urlopen("http://www.google.com";) > s = f.read() > > It is working, but it's returning the source of the page. Is there anyway I > can get almost a screen capture of the page? This is the job of a browser -- to render the source HTML. As such, you'd want to look into any of the browser-automation libraries to hook into IE, FireFox, Opera, or maybe using the WebKit/KHTML control. You may then be able to direct it to render the HTML into a canvas you can then treat as an image. Another alternative might be provided by some web-services that will render a page as HTML with various browsers and then send you the result. However, these are usually either (1) asynchronous or (2) paid services (or both). -tkc -- http://mail.python.org/mailman/listinfo/python-list
RE: Scraping a web page
You could do something like below to get the rendered page. Import os site = 'website.com' X = os.popen('lynx --dump %s' % site).readlines() -Original Message- From: Tim Chase [mailto:python.l...@tim.thechases.com] Sent: Tuesday, April 07, 2009 7:45 AM To: Ronn Ross Cc: python-list@python.org Subject: Re: Scraping a web page > f = urllib.urlopen("http://www.google.com";) > s = f.read() > > It is working, but it's returning the source of the page. Is there anyway I > can get almost a screen capture of the page? This is the job of a browser -- to render the source HTML. As such, you'd want to look into any of the browser-automation libraries to hook into IE, FireFox, Opera, or maybe using the WebKit/KHTML control. You may then be able to direct it to render the HTML into a canvas you can then treat as an image. Another alternative might be provided by some web-services that will render a page as HTML with various browsers and then send you the result. However, these are usually either (1) asynchronous or (2) paid services (or both). -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Scraping a web page
f = urllib.urlopen("http://www.google.com";) s = f.read() It is working, but it's returning the source of the page. Is there anyway I can get almost a screen capture of the page? This is the job of a browser -- to render the source HTML. As such, you'd want to look into any of the browser-automation libraries to hook into IE, FireFox, Opera, or maybe using the WebKit/KHTML control. You may then be able to direct it to render the HTML into a canvas you can then treat as an image. Another alternative might be provided by some web-services that will render a page as HTML with various browsers and then send you the result. However, these are usually either (1) asynchronous or (2) paid services (or both). -tkc -- http://mail.python.org/mailman/listinfo/python-list