According to Eric Bliss:
> Htdig has been acting well for us for some time now, but there is one glitch that
>has been brought to my attention.
>
> We have a number of websites which are updated on a regular basis. Because of this,
>old pages are being unlinked every week from
> the main body of the site. To keep these pages in the search engine database (as
>opposed to being lost forever), I've created a
> page for each website that just consists of the URLs of each of these pages. At the
>top of these pages, I place the meta tags to
> tell htdig to follow the links, but not index the page <META NAME="ROBOTS"
>CONTENT="NOINDEX">. I use these pages as the base
> documents for htdig to crawl from.
>
> My problem is that although htdig's website says that it follows the robot rules, my
>index documents still show up when a search is
> done. Is there a different tag I should be using, or do you need to specify a
>setting in htdig for it to obey robot rules?
There's a subtle bug in 3.1.5 and earlier versions. The content parameter
of the meta robots tag should be case-insensitive, but htdig was expecting
lower-case. You can either change the tag, or apply this patch to fix the
code:
ftp://ftp.ccsf.org/htdig-patches/3.1.5/robotsCaseI.0
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>