New submission from Ben Mezger: I am trying to parse Google's robots.txt (http://google.com/robots.txt) and it fails when checking whether I can crawl the url /catalogs/p? (which it's allowed) but it's returning false, according to my question on stackoverflow -> http://stackoverflow.com/questions/15344253/robotparser-doesnt-seem-to-parse-correctly
Someone has answered it has to do with the line "rllib.quote(urlparse.urlparse(urllib.unquote(url))[2])" in robotparser's module, since it removes the "?" from the end of the url. Here is the answer I received -> http://stackoverflow.com/a/15350039/1649067 ---------- components: Library (Lib) messages: 184017 nosy: benmezger priority: normal severity: normal status: open title: Robotparser fails to parse some robots.txt type: behavior versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17403> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com