A <meta name="robots" content="none"> or any of the other variety of ways of
telling htdig not to follow links through a page has two small bugs. Either
by it self would not manifest this problem I saw. The following patch seems
to have fixed the problem.
*** HTML.orig Mon Nov 2 16:21:51 1998
--- HTML.cc Sat Nov 14 00:40:55 1998
*************** HTML::parse(Retriever &retriever, URL &b
*** 256,262 ****
if (description.length() > max_description_length)
{
description << " ...";
! retriever.got_href(*href, description);
in_ref = 0;
description = 0;
}
--- 256,263 ----
if (description.length() > max_description_length)
{
description << " ...";
! if (dofollow)
! retriever.got_href(*href, description);
in_ref = 0;
description = 0;
}
*************** HTML::do_tag(Retriever &retriever, Strin
*** 512,520 ****
}
case 3: // "/a"
! if (dofollow && in_ref)
{
! retriever.got_href(*href, description);
in_ref = 0;
}
break;
--- 513,522 ----
}
case 3: // "/a"
! if (in_ref)
{
! if (dofollow)
! retriever.got_href(*href, description);
in_ref = 0;
}
break;
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.