Senthil Kumaran added the comment: Hello Eduardo,
I fail to see the bug in here. Robotparser module is for reading and parsing the robot.txt file, the module responsible for fetching it could urllib. robots.txt is always available from web-server and you can download the robot.txt by any means, even by using robotparser.read by providing the full url to robots.txt. You do not need to set user-agent to read/fetch the robots.txt file. Once fetched, now when you are crawling the site using your custom written crawler or using urllib, you can honor the User-Agent requirement by sending proper headers with your request. That can be done using urllib module itself and there is documentation on adding headers I believe. I think, this is way most folks would be (or I believe are ) using it. Am I missing something? If my above explanation is okay, then we can close this bug as invalid. Thanks, Senthil ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue15851> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com