According to Geoff Hutchison:
> On Fri, 28 Jul 2000, Jonathan Bartlett wrote:
> > I once wrote a spider program that ran into the same problem. The way I
> > fixed it there was to have an option of the maximum URL size. This should
> > prevent such a loop. The default could be infinite, or just a really huge
> > number.
>
> Nah, max_hop_count is IMHO a more elegant way of doing it. Who knows why
> you might want to have some very long URL, but there's probably no reason
> to be desending beyond some number of hops from your top page.
>
> Of course a duplicate detection scheme (i.e. checksum the pages) would be
> nice, but it doesn't look like that's going to happen unless someone
> volunteers to do it soon.
I don't know that a checksum would catch this problem, if the "page"
being repeatedly indexed is a dynamically generated directory listing.
Apache would just keep lengthening the path, and that path shows up in
the title and h1 header of the page, so the checksum would be different
each time. I think exclude_urls is the best attribute for dealing with
this particular problem.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.