> since none of those existing files got changed (modified-since),
> they won't be processed and thus those missing files
> can't be seen by htdig.

This is partly correct. If you have set remove_bad_urls, this is correct.

>From the documentation (http://www.htdig.org/attrs.html#remove_bad_urls)

If TRUE, htmerge will remove any URLs which were marked as unreachable by
htdig from the database. If FALSE, it will not do this. When htdig is run
in initial mode, documents which were referred to but could not be
accessed should probably be removed, and hence this option should then be
set to TRUE, however, if htdig is run to update the database, this may
cause documents on a server which is temporarily unavailable to be
removed. This is probably NOT what was intended, so hence this option
should be set to FALSE in that case.

> should, instead of skipping this file (won't process
> it at all), still parse the file for links. Of course,

In general, the slowest part of the indexing is retrieving the document.
So the update dig saves a *lot* of time by just sending out
If-Modified-Since headers. So if an update dig "reparsed looking for
URLs," it really wouldn't be any faster than the initial dig. In that
case, why bother doing an update dig?

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to