Re: Speed of fuzzy searches

mark harwood Thu, 02 Apr 2009 09:49:58 -0700

Try setting the minimum prefix length for fuzzy queries ( I think there is a 
setting on QueryParser or you may need to subclass)


Prefix length of zero does edit distance comparisons for all unique terms e.g. 
from "aardvark" to "zzzz"
Prefix length of one would cut this search space down to just terms "car" to 
"czar"

- you should get the picture. Massive reductions in CPU usage at each increment 
of prefix length but you need to balance that with the inability to match "cow" 
with "kow".

Cheers
Mark



----- Original Message ----
From: Matt Schraeder <mschrae...@btsb.com>
To: java-user@lucene.apache.org
Sent: Thursday, 2 April, 2009 17:16:57
Subject: Speed of fuzzy searches

I've got a simple Lucene index and search built for testing purposes. 
So far everything seems great. Most searches take 0.02 seconds or less.
Searches with 4-5 terms take 0.25 seconds or less.  However, once I
began playing with fuzzy searches everything seemed to really slow down.
A fuzzy search seems to take vastly longer time, 6 seconds for a single
term such as "cow~" and 24 seconds for fuzzy searches of multiple
terms.

Is there anything I can do to speed up fuzzy searches or are they by
default just simply slow?  

My index is only 6.1M, with ~18000 documents.  Each document has 5
fields, a combination of text and keywords. I'm afraid that when I begin
to scale up to have more fields it will only make the problem worse.



    


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Speed of fuzzy searches

Reply via email to