I'm baffled. As probably are you. If all you want is a fuzzy match against a list of strings, Lucene is a huge fat overkill, and you need to look elsewhere.
2011/5/19 Guilherme Aiolfi <grad...@gmail.com>: > Well, it was about the implementation of a algorithm that was purposed by a > user and was implemented in another way. And this, and not the user mailing > list was recommended by this developer to ask this question. > So, not entirely my fault. But I apologize for the inconvenience. > I just want to clarify that searching for the tokens separably is not what I > want since those words can exist but not all in the same doc. I want to > compare the whole phrase. For that to work I not using any Analyzer. > As I said, I've got it working, but I don't know how to use the right > algorithm for the job. > I'm going to redirect my question to the other mailing list. > Thanks anyway. > > On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot <ear...@gmail.com> wrote: >> >> You aren't likely to encounter strings like "abc company inc" in >> Lucene index, as it will be tokenized into three tokens "abc", >> "company", "inc" under most Analyzers. >> So, for this exact example you don't even need fuzzy matching. >> >> Also, maybe you should try 'user' mailing list for questions regarding >> the use of Lucene. >> >> On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi <grad...@gmail.com> wrote: >> > I'm re-sending my first message because I've just received the >> > mailing-list >> > confirmation. If it's a duplicated, forget about this one. >> > >> > Hi, >> > I want to do a fuzzy search and always return documents no matter what >> > the >> > score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It >> > worked >> > great and does ALMOST exactly what I wanted. The problem is that the >> > algorithms supported jw, ngram and edit are not the best fit for my >> > scenario. >> > The best results come from StrikeAMatch >> > >> > (http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/). >> > So, I've found this >> > link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented >> > what >> > I wanted. But I was told that I should use trunk because there were some >> > really great news about fuzzy search there. >> > I read this article explaining some >> > >> > changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html. >> > But I still don't think it replaces the StrikeAMatch algo, because that >> > one >> > can have best results in searches like "abc" comparing to strings like >> > "abc >> > company inc" (distance > 2). >> > But still, Fuad Efendi told me that StrikeAMatch is toys for kids >> > compare to >> > the state of lucene trunk. So here I'm, I want to know how 4.0 will help >> > achieve what I want. >> > Thanks. >> > >> > >> > >> >> >> >> -- >> Kirill Zakharenko/Кирилл Захаренко >> E-Mail/Jabber: ear...@gmail.com >> Phone: +7 (495) 683-567-4 >> ICQ: 104465785 >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org