[ Reposted from [EMAIL PROTECTED] ]
From: Gilles Detillieux <[EMAIL PROTECTED]>
Subject: Re: htdig ignores noindex META-Tag (PR#810)
To: [EMAIL PROTECTED]
Date: Fri, 17 Mar 2000 11:08:31 -0600 (CST)
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
According to David Pruem ([EMAIL PROTECTED]):
> ht://Dig ignored the following directives in a bunch of pages and indexed
> them.
>
> <HTML>
> <HEAD>
> <TITLE>7</TITLE>
> <META NAME="robots" CONTENT="NOINDEX,FOLLOW">
> </HEAD>
>
> Have You any idea what could cause this behaviour?
Oops! The standard clearly says that the name and contents of such tags
should be case insensitive, but when htdig looked at the content parameter,
it looked for words in lower case only! Clearly a bug. I've fixed it in
3.2, but here is the fix for 3.1.5...
--- htdig/HTML.cc.robotsbug Tue Feb 15 14:08:41 2000
+++ htdig/HTML.cc Fri Mar 17 10:59:38 2000
@@ -911,7 +911,7 @@ HTML::do_tag(Retriever &retriever, Strin
&& strlen(conf["content"]) !=0)
{
String content_cache = conf["content"];
-
+ content_cache.lowercase();
if (content_cache.indexOf("noindex") != -1)
{
doindex = 0;
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.