On 26/12/2018 19:04, Terry Reedy wrote:
On 12/26/2018 10:35 AM, Simon Connah wrote:
Hi,

I want to build a simple web crawler. I know how I am going to do it but I have one problem.

Obviously I don't want to negatively impact any of the websites that I am crawling so I want to implement some form of rate limiting of HTTP requests to specific domain names.

What I'd like is some form of timer which calls a piece of code say every 5 seconds or something and that code is what goes off and crawls the website.

I'm just not sure on the best way to call code based on a timer.

Could anyone offer some advice on the best way to do this? It will be running on Linux and using the python-daemon library to run it as a service and will be using at least Python 3.6.

You can use asyncio to make repeated non-blocking requests to a web site at timed intervals and to work with multiple websites at once.  You can do the same with tkinter except that requests would block until a response unless you implemented your own polling.


Thank you. I'll look into asynio.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to