At 06:25 AM 3/3/2002 -0800, Randal L. Schwartz wrote:
>
>If you need a decent spidering linkchecker with more tweaking configuration
>settings and parallel processing power than is reasonable (:-), check
><http://www.stonehenge.com/merlyn/LinuxMag/col16.html>, downloadable
>from <http://www.stonehenge.com/merlyn/LinuxMag/col16.listing.txt>.
>It retries bad "HEAD" requests as "GET", for example.

Oh Randal, that made me flash back to my comp.lang.perl.misc days where I
belive we discussed spidering quite a bit.

I've got my own collection of parallel link checking programs that retry
bad links as a GET and then retry them again a few hours later, splits them
up into sections by who is responsible for the link and then emails them a
report.  One also compares the directory tree against the web tree looking
for "orphaned" files.

I think writing a spider might be up their with writing a templating system
as a perl rite of passage. ;)

Thanks.  I'll peek at your code again.



Bill Moseley
mailto:[EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to