Re: Speed of fuzzy searches

Grant Ingersoll Fri, 03 Apr 2009 07:59:30 -0700

In a really weird "what is old, is new again" sort of thing, I amresearching spellchecking, and came across: http://www.lucidimagination.com/search/document/cc46ac41bd4ee661/ngramspeller_contribution_re_combining_open_office_spellchecker_with_lucene#4f731c4209e3d7d0which suggests speeding up FuzzyQuery using JaroWinkler, but I don'tthink it was ever done.

Now, we have an implementation of JaroWinkler in the spell checker (infact, we have pluggable distance measures there), perhaps it makessense to think about how FuzzyQuery could leverage this pluggability?

Also, Matt, perhaps as an alternative to Fuzzy (which still can beslow even w/ your upgrade), you may look at doing spell checkinginstead.


-Grant

On Apr 3, 2009, at 11:26 AM, Matt Schraeder wrote:

After doing some research I broke down and just updated my Zend
Framework.  I just installed it not long ago so I didn't think much of
it, but then I realized I'm running version 1.6.1 and that Zend is
currently on 1.7.8.  Upon upgrading the complex fuzzy search that was

taking 30 seconds now takes 0.067 seconds. I have no idea whatchanged

in the past few months, and see no mention of performance issues on
their issue tracker  but all seems well now.  Figured I'd post here to
give others a heads up if they run into similar issues like I did.

mschrae...@btsb.com 4/2/2009 3:32:01 PM >>>

erickerick...@gmail.com 4/2/2009 10:24:42 AM >>>

This seems really odd, especially with an index that size. The
first question is usually "Do you open an IndexReader for
each query?"


I'm using the Zend_Search_Lucene implementation so I'm really not sure
how it handles the IndexReader.  At the top of the script I open the
index and do searches on it.  Unless Zend is doing something special
in
the background I'm assuming I'm using the IndexReader on a per-page
basis.  I haven't been able to find any information on this yet, but
from all the examples I've been reading none of them say to keep the
index in a session to improve speed.  I'll have to get on the zend
mailing lists to find out more about best practices.

markrmil...@gmail.com 4/2/2009 10:40:59 AM >>>

You might try setting a longer prefix. Fuzzy queries don't scale by

the

way. By default they enumerate every unique term. How many unique

terms

do you have in the index?


I'll look at a longer prefix setting, as the reply above mentioned the
improving search speed article.  Currently my index has 104076 terms
in
it.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Speed of fuzzy searches

Reply via email to