New submission from Eduardo A. Bustamante López:

I found that http://en.wikipedia.org/robots.txt returns 403 if the provided 
user agent is in a specific blacklist.

And since robotparser doesn't provide a mechanism to change the default user 
agent used by the opener, it becomes unusable for that site (and sites that 
have a similar policy).

I think the user should have the possibility to set a specific user agent 
string, to better identify their bot.

I attach a patch that allows the user to change the opener used by 
RobotFileParser, in case the need of some specific behavior arises.

I also attach a simple example of how it solves the issue, at least with 
wikipedia.

----------
components: Library (Lib)
files: robotparser.py.diff
keywords: patch
messages: 169718
nosy: Eduardo.A..Bustamante.López
priority: normal
severity: normal
status: open
title: Lib/robotparser.py doesn't accept setting a user agent string, instead 
it uses the default.
type: enhancement
versions: Python 2.7
Added file: http://bugs.python.org/file27100/robotparser.py.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15851>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to