Re: [htdig] modification_time_is_now again

Geoff Hutchison Fri, 3 Dec 1999 12:40:35 -0800
At 2:15 PM -0600 12/3/99, Gilles Detillieux wrote:
>In the 3.2 development code, Geoff hacked it a bit so the initial hopcount
>field is set to 0, instead of -1, when DocumentRef and URLRef objects are
>first constructed.  I don't know if that actually solves this problem or
>not, but in any case it doesn't get to the root of the problem: what is
>happening to those hopcounts in the first place?!

It's not really a hack. First off, a document will only have a 
hopcount >= 0, so making it -1 doesn't make a lot of sense IMHO. 
Furthermore, the database seemed to ignore the -1 listed for 
documents that hadn't been retrieved yet and make up a number. (I kid 
you not, but I can't remember the exact details. Try doing a dig with 
a limited server_max_docs and then do an update...)

But there are simply a *lot* of issues with hopcounts in 3.1. The 
biggest problem is that pages are not indexed by hopcount. On an 
update dig, all the pages that were in the database already are put 
into the queue in *alphabetical* order, ahead of any new pages. Since 
the queue is not ordered by hopcount, it's very difficult to ensure 
the hopcounts are accurate.

The indexing queue in 3.2 is based on hopcount--this guarantees that 
the first time it comes to a page, that was the fastest way it could 
get there. Furthermore, on updates, any new pages will fall into the 
queue in the proper place.

I don't know whether this has any influence on the particular bug 
mentioned, but suffice to say that fixing all the problems with 
hopcount in 3.1 is not going to happen--it would require backporting 
too much code. I'll stick by the documentation: using -h or 
max_hop_count is *only* reliable when you're doing an initial dig. 
Other results may vary.

-Geoff


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
Re: [htdig] modification_time_is_now again

Reply via email to