Kirk, 

That's exactly what I needed. Thx!
 

-----Original Message-----
From: Kirk Strauser [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 24, 2008 11:42 AM
To: python-list@python.org
Subject: Re: More regex help

At 2008-09-24T16:25:02Z, "Support Desk" <[EMAIL PROTECTED]> writes:

> I am working on a python webcrawler, that will extract all links from an
> html page, and add them to a queue, The problem I am having is building
> absolute links from relative links, as there are so many different types
of
> relative links. If I just append the relative links to the current url,
some
> websites will send it into a never-ending loop. 

>>> import urllib
>>> urllib.basejoin('http://www.example.com/path/to/deep/page',
                    '/foo')
'http://www.example.com/foo'
>>> urllib.basejoin('http://www.example.com/path/to/deep/page',
                    'http://slashdot.org/foo')
'http://slashdot.org/foo'

-- 
Kirk Strauser
The Day Companies


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to