[issue17403] Robotparser fails to parse some robots.txt

Ben Mezger Tue, 12 Mar 2013 03:58:32 -0700

New submission from Ben Mezger:

I am trying to parse Google's robots.txt (http://google.com/robots.txt) and it 
fails when checking whether I can crawl the url /catalogs/p? (which it's 
allowed) but it's returning false, according to my question on stackoverflow -> 
http://stackoverflow.com/questions/15344253/robotparser-doesnt-seem-to-parse-correctly


Someone has answered it has to do with the line 
"rllib.quote(urlparse.urlparse(urllib.unquote(url))[2])" in robotparser's 
module, since it removes the "?" from the end of the url. 

Here is the answer I received -> http://stackoverflow.com/a/15350039/1649067

----------
components: Library (Lib)
messages: 184017
nosy: benmezger
priority: normal
severity: normal
status: open
title: Robotparser fails to parse some robots.txt
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue17403>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17403] Robotparser fails to parse some robots.txt

Reply via email to