[issue31661] Issues with request rate in robotparser
Nikolay Bogoychev <nhe...@gmail.com> added the comment: Hey Serhiy, The use of namedtuple was requested specifically at a review, I didn't implement it like this initially: https://bugs.python.org/review/16099/#ps6205 I wasn't aware of the performance implications. Could you please explain to me the type vs instance in terms of performance (or point me to a resource, a quick googling didn't yield anything? How was I supposed to have coded it properly? Cheers, Nick -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31661> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Hey, Friendly reminder that there has been no activity on this issue for more than an year. Cheers, Nick -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16099> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Hey, Just a friendly reminder that the patch is pending for review and there has been no activity for 3 months (: -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Hey, Just a friendly reminder that there has been no activity for a month and a half and v3 is pending for review (: -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Updated patch, all comments addressed, sorry for the 6 months delay. Please review -- Added file: http://bugs.python.org/file35377/robotparser_v3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Hey, Just a reminder friendly reminder that there hasn't been any activity for a month and I have released a v2, pending for review (: -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Thank you for the review! I have addressed your comments and release a v2 of the patch: Highlights: No longer crashes when provided with malformed crawl-delay/robots.txt parameter. Returns None when parameter is missing or syntax is invalid. Simplified several functions. Extended tests. http://bugs.python.org/review/16099/diff/6206/Doc/library/urllib.robotparser.rst File Doc/library/urllib.robotparser.rst (right): http://bugs.python.org/review/16099/diff/6206/Doc/library/urllib.robotparser Doc/library/urllib.robotparser.rst:56: .. method:: crawl_delay(useragent) On 2013/12/09 03:30:54, berkerpeksag wrote: Is crawl_delay used for search engines? Google recommends you to set crawl speed via Google Webmaster Tools instead. See https://support.google.com/webmasters/answer/48620?hl=en. Crawl delay and request rate parameters are targeted to custom crawlers that many people/companies write for specific tasks. The Google webmaster tools is targeted only to google's crawler and typically web admins have different rates for google/yahoo/bing and all other user agents. http://bugs.python.org/review/16099/diff/6206/Lib/urllib/robotparser.py File Lib/urllib/robotparser.py (right): http://bugs.python.org/review/16099/diff/6206/Lib/urllib/robotparser.py#newco... Lib/urllib/robotparser.py:168: for entry in self.entries: On 2013/12/09 03:30:54, berkerpeksag wrote: Is there a better way to calculate this? (perhaps O(1)?) I have followed the model of what was written beforehand. A 0(1) implementation (probably based on dictionaries) would require a complete rewrite of this library, as all previously implemented functions employ the: for entry in self.entries: if entry.applies_to(useragent): logic. I don't think this matters particularly here, as those two functions in particular need only be called once per domain and robots.txt seldom contains more than 3 entries. This is why I have just followed the design laid out by the original developer. Thanks Nick -- Added file: http://bugs.python.org/file33071/robotparser_v2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Oh... Sorry for the spam, could you please verify my documentation link syntax. I'm not entirely sure I got it right. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Hey, it has been more than an year since the last activity. Is there anything else I should do in order for someone of the python devs team to review my changes and perhaps give some feedback? Nick -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Okay, here's a proper patch with documentation entry and test cases. Please review and comment -- Added file: http://bugs.python.org/file27476/robotparser.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Reformatted patch -- Added file: http://bugs.python.org/file27477/robotparser_reformatted.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
New submission from Nikolay Bogoychev: Robotparser doesn't support two quite important optional parameters from the robots.txt file. I have implemented those in the following way: (Robotparser should be initialized in the usual way: rp = robotparser.RobotFileParser() rp.set_url(..) rp.read ) crawl_delay(useragent) - Returns time in seconds that you need to wait for crawling if none is specified, or doesn't apply to this user agent, returns -1 request_rate(useragent) - Returns a list in the form [request,seconds]. if none is specified, or doesn't apply to this user agent, returns -1 -- components: Library (Lib) files: robotparser.patch keywords: patch messages: 171711 nosy: XapaJIaMnu priority: normal severity: normal status: open title: robotparser doesn't support request rate and crawl delay parameters type: enhancement versions: Python 2.7 Added file: http://bugs.python.org/file27373/robotparser.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16099] robotparser doesn't support request rate and crawl delay parameters
Nikolay Bogoychev added the comment: Okay, sorry didn't know that (: Here's the same patch (Same functionality) for python3 Feedback is welcome, as always (: -- Added file: http://bugs.python.org/file27374/robotparser.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16099 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com