In the standard Python install (Windows 2.5, at least), there's there's a 
couple example scripts you might find useful:

 

<python>\Tools\webchecker\webchecker.py

Crawls specified URL, checking for broken links.

 

<python>\Tools\webchecker\websucker.py

Variant on the above that archives the specified site locally.  Including 
images, but you could probably limit it to HTML easily enough.

 

I haven't used either extensively, but they appear to work as advertised.  It 
should be easy to modify one and tie it into the MySQLdb extensions:

http://sourceforge.net/projects/mysql-python

 

--

Adam Pletcher

Technical Art Director

Volition/THQ <http://www.volition-inc.com/> 

 

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fabian López
Sent: Monday, November 12, 2007 12:33 PM
To: Python-list@python.org
Subject: crawler in python and mysql

 

Hi,
I would like to write a code that needs to crawl an url and take all the HTML 
code. I have noticed that there are different opensource webcrawlers, but they 
are very extensive for what I need. I only need to crawl an url, and don't know 
if it is so easy as using an html parser. Is it? Which libraries would you 
recommend me? 
Thanks!!
Fabian

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to