On Thu, 28 Feb 2002, Gilles Detillieux wrote:

> According to Bill Carlson:
> > I ran across a logical problem when handling <META name="robots" 
> > content="noindex"> on a page. The behavior expected is that links on the 
> > page will be followed and indexed. This works fine on the initial index.

> I believe this would be a problem with all 3.1.x and 3.2.0x releases.
> The problem is that when a document is marked as "noindex", it gets
> removed from db.docdb by htmerge or htpurge, so subsequent update runs
> of htdig don't check this file for changes (either in it's noindex
> status, or in the links it can harvest from it) - it's off htdig's radar
> entirely at that point.

Yes, further examination showed this is the exact problem. It is a small 
problem, once known, with two work arounds that I see: do an initial dig 
every reindex or make sure any given page has a link path to it from the 
start_url that does not include noindex documents. I found that 
ht://check is very handy for the latter option. :)

Thanks,

Bill Carlson
-- 
Systems Programmer    [EMAIL PROTECTED]         | Anything is possible,
Virtual Hospital      http://www.vh.org/      | given time and money.
University of Iowa Hospitals and Clinics      |       
Opinions are mine, not my employer's.         | 


_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to