Thanks, as I understand it though there is only so much you can do with threading. For more scalable solutions you need to go with async programming techniques. See http://www.kegel.com/c10k.html for a summary of the problem. I want to do large scale webcrawling and am not sure if wget2 is up to the job.
On Tue, Jul 31, 2018 at 6:22 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote: > On 31.07.2018 18:39, James Read wrote: > > Hi, > > > > how much work would it take to convert wget into a fully fledged > > asynchronous webcrawler? > > > > I was thinking something like using select. Ideally, I want to be able to > > supply wget with a list of starting point URLs and then for wget to crawl > > the web from those starting points in an asynchronous fashion. > > > > James > > > > Just use wget2. It is already packaged in Debian sid. > To build from git source, see https://gitlab.com/gnuwget/wget2. > > To build from tarball (much easier), download from > https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz. > > Regards, Tim > >