At 12:14 PM -0500 2/22/00, Walter Addison March wrote:
>When we run the 3.1.3 thus: htdig -i -l -t  it runs from Wed Feb 16
>11:38:58 EST 2000 until Wed Feb 16 13:05:48 EST 2000.
>
>When I run the 3.2.0b1 thus: htdig -i -t -a -v  well... it started at 9am
>and it still isn't done 3 hours later.
>
>The 3.2 htdig actually should be finding even fewer urls to follow (the
>limit_urls_to list for the 3.2 is shorter than the 3.1.3) and pages to
>dig... any ideas on why it is already taking twice as long and it isn't
>near done?

I would not be surprised if for many people 3.2.0b1 is slower than 
3.1.x versions. First off, it's essentially doing the work of htdig 
and htmerge in one step--you don't need to do any sorting in 3.2. For 
right now, you'll still want to run htmerge though--it weeds out 
bogus URLs and so on.

Secondly, the indexing in the 3.2 code is a bit more I/O intensive. 
For one, the word database will probably come out a bit larger 
because it's storing every single word, rather than a record for 
every document that has a certain word. For another, it splits the 
excerpts out into another database, which means it's writing to a few 
files at once.

Finally, we haven't made much effort to optimize for speed--I think 
it can be faster, but without some feedback, it's hard to know what 
the slow parts are.

In short, don't worry, but any feedback as to performance is most welcome.
-Geoff


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to