Webmaster writes:

> Geoff Hutchison writes:

 > >We have lots of links on our website and it's annoying to see duplicates in
 > >search results. But the problem with duplicate detection is deciding which
 > >duplicate to use! My current thought is to use the document with the lower
 > >hopcount.

 > shortest hopcount is somewhat reasonable...
 > also might want to use the one with the shortest URL(or is this what you
 > mean by shortest hopcount?),
 > or maybe put in some kind of 'server ranking'...

Why not take the 'shortest hopcount URL' _unless_ the machine name to
show is explicitely stated in the 'server_aliases' directive?

And a second thought on document checksums: Quite often I see root
documents more than 100 kb in size. Why not just computing a checksum of 
the header. A HTTP 1.1 'HEAD' command would be sufficient and imho this
would save a lot of bandwidth and computing time.

-Walter

-- 
Walter Hafner_______________________________ [EMAIL PROTECTED]
       <A href=http://www.tum.de/~hafner/>*CLICK*</A>
 The best observation I can make is that the BSD Daemon logo
 is _much_ cooler than that Penguin :-)   (Donald Whiteside)
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to