On Sun, Feb 27, 2011 at 9:38 PM, monkeys paw <mon...@joemoney.net> wrote: > I have a working urlopen routine which opens > a url, parses it for <a> tags and prints out > the links in the page. On some sites, wikipedia for > instance, i get a > > HTTP error 403, forbidden. > > What is the difference in accessing the site through a web browser > and opening/reading the URL with python urllib2.urlopen?
The User-Agent header (http://en.wikipedia.org/wiki/User_agent ). "By default, the URLopener class sends a User-Agent header of urllib/VVV, where VVV is the urllib version number." – http://docs.python.org/library/urllib.html Some sites block obvious non-search-engine bots based on their HTTP User-Agent header value. You can override the urllib default: http://docs.python.org/library/urllib.html#urllib.URLopener.version Sidenote: Wikipedia has a proper API for programmatic browsing, likely hence why it's blocking your program. Cheers, Chris -- http://mail.python.org/mailman/listinfo/python-list