On Mar 12, 2009, at 3:57 PM, IanR wrote:

I'm processing RSS content from a # of given sources.  Most of the
time the url given by the RSS feed redirects to the real URL (I'm
guessing they do this for tracking purposes)

For example.

This is a url that I get from and RSS feed,
http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512
It redirects to
http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/

I want to record the final URL and not the URL I get from the RSS feed
(However sometimes there is no redirect so I might want the original
URL)

I've tried sniffing the header and don't see any "Location:"... I
think sites are using different ways to redirect.  Does anyone have
any suggestions on how I might handle this?


Hi Ian,
Using Firefox's Live HTTP Headers extension, I see a 302 redirect with a Location header (see session log below). Are aware that urrlib2 resolves redirects for you? That might be why you're not seeing what you expect. If you want a record of each URL you'll have to implement an HTTPRedirectHandler.



http://www.pheedcontent.com/click.phdo?i=d22e9bc7641aab8a0566526f61806512

GET /click.phdo?i=d22e9bc7641aab8a0566526f61806512 HTTP/1.1
Host: www.pheedcontent.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv: 1.9.0.7) Gecko/2009021906 Firefox/3.0.7
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.7,sv;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 302 Found
Date: Thu, 12 Mar 2009 20:41:29 GMT
Server: Apache
X-Powered-By: PHP/5.2.3-1ubuntu6.3
Pragma: no-cache
Cache-Control: no-cache, must-revalidate
Set-Cookie: phdo=1-tst %7Cv3 %3Ac3cbcae440ff783381d0d9fa96f14d05 %3Aa8t5sELbkk9oy3pXsrohSnPslqQxQKIhVP%2F8Ots%3D; expires=Fri, 13- Mar-2009 20:41:29 GMT; path=/; domain=pheedo.com
Location: 
http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 26
Connection: close
Content-Type: text/html
----------------------------------------------------------
http://www.macsimumnews.com/index.php/archive/klipsch_developing_headphones_for_new_ipod_shuffle/


etc. etc.


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to