Re: [htdig3-dev] Re: Multiple database (patch)

loic Thu, 10 Feb 2000 03:28:56 -0800
Geoff Hutchison writes:
 > At 3:02 PM -0500 2/9/00, Rajendra Inamdar wrote:
 > >Incidently, when I moved to 3.2.x, my searches using "substring" algorithm
 > >seem to be running slower than with 3.1.3. Has anybody had similar
 > >experience?
 > 
 > I'm not surprised. The new format of the word database (i.e. *every* 
 > word in every document is stored) means the substring algorithm is 
 > going to generate a very large number of possible matches. I have 
 > some suggestions on how to improve the speed of this algorithm using 
 > trigrams, but I don't think I'll have time to work on it for a while.
 > 

 There might be a better solution. The indexer implements storing of 
words frequency. I did not activate it by default since it is not used
by the code. But if it's activated, a list of unique words is maintained
in the index. I use this a lot in a context other than htdig so I'm really
sure it works well. But it takes a bit more space, of course. 
 The 'substring' search could browse this list instead of the complete index
and that would give a list of candidates much more quickly. 

 To activate the unique word frequency storage just set 
wordlist_extended: true.

    Cheers,

-- 
                Loic Dachary

                24 av Secretan
                75019 Paris
                Tel: 33 1 42 45 09 16
                e-mail: [EMAIL PROTECTED]
                URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this.
Re: [htdig3-dev] Re: Multiple database (patch)

Reply via email to