Re: Rate limiting a web crawler

Simon Connah Wed, 26 Dec 2018 11:37:21 -0800

On 26/12/2018 19:04, Terry Reedy wrote:

On 12/26/2018 10:35 AM, Simon Connah wrote:
Hi,
I want to build a simple web crawler. I know how I am going to do itbut I have one problem.
Obviously I don't want to negatively impact any of the websites that Iam crawling so I want to implement some form of rate limiting of HTTPrequests to specific domain names.
What I'd like is some form of timer which calls a piece of code sayevery 5 seconds or something and that code is what goes off and crawlsthe website.
I'm just not sure on the best way to call code based on a timer.
Could anyone offer some advice on the best way to do this? It will berunning on Linux and using the python-daemon library to run it as aservice and will be using at least Python 3.6.
You can use asyncio to make repeated non-blocking requests to a web siteat timed intervals and to work with multiple websites at once. You cando the same with tkinter except that requests would block until aresponse unless you implemented your own polling.


Thank you. I'll look into asynio.
--
https://mail.python.org/mailman/listinfo/python-list

Re: Rate limiting a web crawler

Reply via email to