karl added the comment: Setting a user agent string should be possible. My guess is that the default library has been used by an abusive client (by mistake or intent) and wikimedia project has decided to blacklist the client based on the user-agent string sniffing.
The match is on anything which matches "Python-urllib" in UserAgentString See below: >>> import urllib.request >>> opener = urllib.request.build_opener() >>> opener.addheaders = [('User-agent', 'Python-urllib')] >>> fobj = opener.open('http://en.wikipedia.org/robots.txt') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 479, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 591, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 517, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 451, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 599, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden >>> import urllib.request >>> opener = urllib.request.build_opener() >>> opener.addheaders = [('User-agent', 'Pythonurllib/3.3')] >>> fobj = opener.open('http://en.wikipedia.org/robots.txt') >>> fobj <http.client.HTTPResponse object at 0x101275850> >>> import urllib.request >>> opener = urllib.request.build_opener() >>> opener.addheaders = [('User-agent', 'Pyt-honurllib/3.3')] >>> fobj = opener.open('http://en.wikipedia.org/robots.txt') >>> import urllib.request >>> opener = urllib.request.build_opener() >>> opener.addheaders = [('User-agent', 'Python-urllib')] >>> fobj = opener.open('http://en.wikipedia.org/robots.txt') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 479, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 591, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 517, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 451, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 599, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden >>> import urllib.request >>> opener = urllib.request.build_opener() >>> opener.addheaders = [('User-agent', 'Python-urlli')] >>> fobj = opener.open('http://en.wikipedia.org/robots.txt') >>> Being able to change the header might indeed be a good thing. ---------- nosy: +karlcow _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue15851> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com