According to Ivan C Chang:
> After htdig finishes indexing my site, I discovered that some of the URLs
> are duplicated with the following characteristics:
>
> http://www.myschool.edu./
>
> i.e. besides indicing http://www.myschool.edu, there's an extra . there in
> the above URL this happens for every descendant link that follows, as a
> result there are large numbers of duplicates. I tried to locate if there
> are pages that mistakenly contain links of the form
> http://www.myschool.edu. explicitly but haven't found yet.
>
> Isn't htdig smart enough to remove the . during the normalization process?
> How could I deal with this problem?
All it takes is one, to make htdig traverse what it thinks is an entirely
different hierarchy on another server. You could try a server_aliases
attribute setting like this:
server_aliases: www.myschool.edu.:80=www.myschool.edu:80
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.