Hey all,

I ran across a logical problem when handling <META name="robots" 
content="noindex"> on a page. The behavior expected is that links on the 
page will be followed and indexed. This works fine on the initial index.

Let's call the page that shouldn't be indexed TOC (Tables Of Contents, a
typical application)  and pages linked to the TOC are the content.

If the only link to a page of the content is on the TOC, later indexing 
will not index that page as the bridging TOC is dropped from the list of 
documents (this assumes any pages linking to the TOC have not been 
modified since the last run and hence are not re-fetched). This causes the 
page to drop from the database, it will only be picked up on the next 
full index and dropped again on the next partial index.

I didn't see that this issue had been discussed before, would this still 
be an issue for 3.2x?

Later,

Bill Carlson
-- 
Systems Programmer    [EMAIL PROTECTED]         | Anything is possible,
Virtual Hospital      http://www.vh.org/      | given time and money.
University of Iowa Hospitals and Clinics      |       
Opinions are mine, not my employer's.         | 


_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to