On Thu, 28 Feb 2002, Gilles Detillieux wrote: > According to Bill Carlson: > > I ran across a logical problem when handling <META name="robots" > > content="noindex"> on a page. The behavior expected is that links on the > > page will be followed and indexed. This works fine on the initial index.
> I believe this would be a problem with all 3.1.x and 3.2.0x releases. > The problem is that when a document is marked as "noindex", it gets > removed from db.docdb by htmerge or htpurge, so subsequent update runs > of htdig don't check this file for changes (either in it's noindex > status, or in the links it can harvest from it) - it's off htdig's radar > entirely at that point. Yes, further examination showed this is the exact problem. It is a small problem, once known, with two work arounds that I see: do an initial dig every reindex or make sure any given page has a link path to it from the start_url that does not include noindex documents. I found that ht://check is very handy for the latter option. :) Thanks, Bill Carlson -- Systems Programmer [EMAIL PROTECTED] | Anything is possible, Virtual Hospital http://www.vh.org/ | given time and money. University of Iowa Hospitals and Clinics | Opinions are mine, not my employer's. | _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
