I wrote: > I don't understand why it says 'it will cut down on reindexing from such > servers when doing updates'. > > EG > 1) a doc has 'last modified' unknown (which, as I recall from a previous > post, means actually 0) > 2) this, on the first run, gets changer to now (lets say 30/11/1999 > 00.00) > 3) the next runthe same doc will return 0 again > 4) then what happens? will it > > a) compare 0 to 30/11 and decide that it has not been changed? > > or > > b) transform 0 to now again (let's say 01/12/1999) and reindex it? > > >From that phrase in the doc I guess the first, isn't it? You wrote: > b. The only way I can see it not being reindexed is if the server > accepts the Last-Modified header and doesn't send the document back. > Caveat: This is actually what happens in a specific case and is the > reason the option is in there. Someone pointed out that pages that do not return a mod_t are mostly dinamic ones. So it seems logic that assigning them a mod_t = now will force a reindex anyway, but that phrase ('cutting on reindexing') made some confusion... > If you're indexing from a cache > (specifically WWWWoffle), it will see that the date you sent matches > the date it has in cache and not bother to d/l or send it on to htdig. > No. It's real world indexing. All $start_url are singly selected ones (max_hop_count: 0). No digging at all is wanted. Nevertheless I assure that a 9999 dig starts. (anyway wwwoffle seems to preserve the original doc's mod_t) I think the 'unwanted 9999 dig' bug is a real one, and I jus made a test to prove it, you can try it too: 1) initial dig: ---------htdig.conf common_dir: /home/htdig/common database_dir: /home/htdig/db/test start_url: http://www.yahoo.com/ limit_urls_to: $start_url max_hop_count: 0 create_url_list: yes modification_time_is_now: true date_factor: 100 ------ commands to execute /usr/sbin/htdig -v -s -t -i -l -h0 -c htdig.conf>log /usr/sbin/htmerge -vv -s -c htdig.conf>>log -This will correctly index only the start page of yahoo 2) update dig --- htdig-u.conf common_dir: /home/htdig/common database_dir: /home/htdig/db/test start_url: http://www.yahoo.com/ limit_urls_to: $start_url max_hop_count: 0 create_url_list: yes modification_time_is_now: false ### only difference date_factor: 100 ----command executed /usr/sbin/htdig -v -s -t -l -h0 -c htdig-u.conf>>log -This unchains the 'unwanted 9999 dig' on the whole yahoo site :-( Maybe I'missing something though... Gian ------------------------------------ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.