New submission from Eduardo A. Bustamante López: I found that http://en.wikipedia.org/robots.txt returns 403 if the provided user agent is in a specific blacklist.
And since robotparser doesn't provide a mechanism to change the default user agent used by the opener, it becomes unusable for that site (and sites that have a similar policy). I think the user should have the possibility to set a specific user agent string, to better identify their bot. I attach a patch that allows the user to change the opener used by RobotFileParser, in case the need of some specific behavior arises. I also attach a simple example of how it solves the issue, at least with wikipedia. ---------- components: Library (Lib) files: robotparser.py.diff keywords: patch messages: 169718 nosy: Eduardo.A..Bustamante.López priority: normal severity: normal status: open title: Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default. type: enhancement versions: Python 2.7 Added file: http://bugs.python.org/file27100/robotparser.py.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue15851> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com