Webmaster writes:
> Geoff Hutchison writes:
> >We have lots of links on our website and it's annoying to see duplicates in
> >search results. But the problem with duplicate detection is deciding which
> >duplicate to use! My current thought is to use the document with the lower
> >hopcount.
> shortest hopcount is somewhat reasonable...
> also might want to use the one with the shortest URL(or is this what you
> mean by shortest hopcount?),
> or maybe put in some kind of 'server ranking'...
Why not take the 'shortest hopcount URL' _unless_ the machine name to
show is explicitely stated in the 'server_aliases' directive?
And a second thought on document checksums: Quite often I see root
documents more than 100 kb in size. Why not just computing a checksum of
the header. A HTTP 1.1 'HEAD' command would be sufficient and imho this
would save a lot of bandwidth and computing time.
-Walter
--
Walter Hafner_______________________________ [EMAIL PROTECTED]
<A href=http://www.tum.de/~hafner/>*CLICK*</A>
The best observation I can make is that the BSD Daemon logo
is _much_ cooler than that Penguin :-) (Donald Whiteside)
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.