Maybe someone has accounted the following problem:
After htdig finishes indexing my site, I discovered that some of the URLs
are duplicated with the following characteristics:
http://www.myschool.edu./
i.e. besides indicing http://www.myschool.edu, there's an extra . there in
the above URL this happens for every descendant link that follows, as a
result there are large numbers of duplicates. I tried to locate if there
are pages that mistakenly contain links of the form
http://www.myschool.edu. explicitly but haven't found yet.
Isn't htdig smart enough to remove the . during the normalization process?
How could I deal with this problem?
Ivan Chang
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.