Re: [Bug-wget] Async webcrawling

James Read Tue, 31 Jul 2018 11:18:01 -0700

Thanks,

as I understand it though there is only so much you can do with threading.
For more scalable solutions you need to go with async programming
techniques. See http://www.kegel.com/c10k.html for a summary of the
problem. I want to do large scale webcrawling and am not sure if wget2 is
up to the job.


On Tue, Jul 31, 2018 at 6:22 PM, Tim Rühsen <tim.rueh...@gmx.de> wrote:

> On 31.07.2018 18:39, James Read wrote:
> > Hi,
> >
> > how much work would it take to convert wget into a fully fledged
> > asynchronous webcrawler?
> >
> > I was thinking something like using select. Ideally, I want to be able to
> > supply wget with a list of starting point URLs and then for wget to crawl
> > the web from those starting points in an asynchronous fashion.
> >
> > James
> >
>
> Just use wget2. It is already packaged in Debian sid.
> To build from git source, see https://gitlab.com/gnuwget/wget2.
>
> To build from tarball (much easier), download from
> https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz.
>
> Regards, Tim
>
>

Re: [Bug-wget] Async webcrawling

Reply via email to