In a really weird "what is old, is new again" sort of thing, I am researching spellchecking, and came across: http://www.lucidimagination.com/search/document/cc46ac41bd4ee661/ngramspeller_contribution_re_combining_open_office_spellchecker_with_lucene#4f731c4209e3d7d0 which suggests speeding up FuzzyQuery using JaroWinkler, but I don't think it was ever done.

Now, we have an implementation of JaroWinkler in the spell checker (in fact, we have pluggable distance measures there), perhaps it makes sense to think about how FuzzyQuery could leverage this pluggability?

Also, Matt, perhaps as an alternative to Fuzzy (which still can be slow even w/ your upgrade), you may look at doing spell checking instead.

-Grant

On Apr 3, 2009, at 11:26 AM, Matt Schraeder wrote:

After doing some research I broke down and just updated my Zend
Framework.  I just installed it not long ago so I didn't think much of
it, but then I realized I'm running version 1.6.1 and that Zend is
currently on 1.7.8.  Upon upgrading the complex fuzzy search that was
taking 30 seconds now takes 0.067 seconds. I have no idea what changed
in the past few months, and see no mention of performance issues on
their issue tracker  but all seems well now.  Figured I'd post here to
give others a heads up if they run into similar issues like I did.

mschrae...@btsb.com 4/2/2009 3:32:01 PM >>>

erickerick...@gmail.com 4/2/2009 10:24:42 AM >>>
This seems really odd, especially with an index that size. The
first question is usually "Do you open an IndexReader for
each query?"

I'm using the Zend_Search_Lucene implementation so I'm really not sure
how it handles the IndexReader.  At the top of the script I open the
index and do searches on it.  Unless Zend is doing something special
in
the background I'm assuming I'm using the IndexReader on a per-page
basis.  I haven't been able to find any information on this yet, but
from all the examples I've been reading none of them say to keep the
index in a session to improve speed.  I'll have to get on the zend
mailing lists to find out more about best practices.

markrmil...@gmail.com 4/2/2009 10:40:59 AM >>>
You might try setting a longer prefix. Fuzzy queries don't scale by
the
way. By default they enumerate every unique term. How many unique
terms
do you have in the index?

I'll look at a longer prefix setting, as the reply above mentioned the
improving search speed article.  Currently my index has 104076 terms
in
it.



--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to