New submission from John Nagle:

"robotparser" uses the default Python user agent when reading the "robots.txt" 
file, and there's no parameter for changing that.

Unfortunately, the "mod_security" add-on for Apache web server, when used with 
the standard OWASP rule set, blacklists the default Python USER-AGENT in Rule 
990002, User Agent Identification. It doesn't like certain HTTP USER-AGENT 
values. One of them is "python-httplib2". So any program in Python which 
accesses the web site will trigger this rule and be blocked form access.  

For regular HTTP accesses, it's possible to put a user agent string in the 
Request object and work around this. But "robotparser" has no such option. 

Worse, if "robotparser" has its read of "robots.txt" rejected, it interprets 
that as a "deny all" robots.txt file, and returns False for all "can_fetch()" 
requests.

----------
components: Library (Lib)
messages: 265900
nosy: nagle
priority: normal
severity: normal
status: open
title: robotparser user agent considered hostile by mod_security rules.
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27065>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to