Geoff Hutchison writes:
> At 3:02 PM -0500 2/9/00, Rajendra Inamdar wrote:
> >Incidently, when I moved to 3.2.x, my searches using "substring" algorithm
> >seem to be running slower than with 3.1.3. Has anybody had similar
> >experience?
>
> I'm not surprised. The new format of the word database (i.e. *every*
> word in every document is stored) means the substring algorithm is
> going to generate a very large number of possible matches. I have
> some suggestions on how to improve the speed of this algorithm using
> trigrams, but I don't think I'll have time to work on it for a while.
>
There might be a better solution. The indexer implements storing of
words frequency. I did not activate it by default since it is not used
by the code. But if it's activated, a list of unique words is maintained
in the index. I use this a lot in a context other than htdig so I'm really
sure it works well. But it takes a bit more space, of course.
The 'substring' search could browse this list instead of the complete index
and that would give a list of candidates much more quickly.
To activate the unique word frequency storage just set
wordlist_extended: true.
Cheers,
--
Loic Dachary
24 av Secretan
75019 Paris
Tel: 33 1 42 45 09 16
e-mail: [EMAIL PROTECTED]
URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.