I wrote:

> I don't understand why it says 'it will cut down on reindexing from such
> servers when doing updates'.
> 
> EG 
> 1) a doc has 'last modified' unknown (which, as I recall from a previous
> post, means actually 0)
> 2) this, on the first run, gets changer to now (lets say 30/11/1999
> 00.00)
> 3) the next runthe same doc will return 0 again
> 4) then what happens? will it 
> 
> a) compare 0 to 30/11 and decide that it has not been changed?
> 
> or 
> 
> b) transform 0 to now again (let's say 01/12/1999)  and reindex it?
> 
> >From that phrase in the doc I guess the first, isn't it?



You wrote:

> b. The only way I can see it not being reindexed is if the server 
> accepts the Last-Modified header and doesn't send the document back. 
> Caveat: This is actually what happens in a specific case and is the 
> reason the option is in there. 


Someone pointed out that pages that do not return a mod_t are mostly
dinamic ones.
So it seems logic that assigning them a mod_t = now will force a reindex
anyway, but that phrase ('cutting on reindexing') made some confusion...

> If you're indexing from a cache 
> (specifically WWWWoffle), it will see that the date you sent matches 
> the date it has in cache and not bother to d/l or send it on to htdig.
> 

No. It's real world indexing. 
All $start_url are singly selected ones (max_hop_count: 0). No digging
at all is wanted. Nevertheless I assure that a 9999 dig starts.
(anyway wwwoffle seems to preserve the original doc's mod_t)


I think the 'unwanted 9999 dig' bug is a real one, and I jus made a test
to prove it, you can try it too:


1) initial dig:

---------htdig.conf
common_dir:   /home/htdig/common
database_dir: /home/htdig/db/test
start_url: http://www.yahoo.com/
limit_urls_to: $start_url
max_hop_count: 0
create_url_list: yes
modification_time_is_now: true
date_factor: 100

------ commands to execute
/usr/sbin/htdig -v -s -t -i -l -h0 -c htdig.conf>log 
/usr/sbin/htmerge -vv -s -c htdig.conf>>log    


-This will correctly index only the start page of yahoo


2) update dig

--- htdig-u.conf
common_dir:   /home/htdig/common
database_dir: /home/htdig/db/test
start_url: http://www.yahoo.com/
limit_urls_to: $start_url
max_hop_count: 0
create_url_list: yes
modification_time_is_now: false         ### only difference
date_factor: 100

----command executed
/usr/sbin/htdig -v -s -t -l -h0 -c htdig-u.conf>>log 




-This unchains the 'unwanted 9999 dig' on the whole yahoo site :-(


Maybe I'missing something though...

Gian

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to