Re: Fuzzy search always returning docs sorted by the highest match
You aren't likely to encounter strings like abc company inc in Lucene index, as it will be tokenized into three tokens abc, company, inc under most Analyzers. So, for this exact example you don't even need fuzzy matching. Also, maybe you should try 'user' mailing list for questions regarding the use of Lucene. On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi grad...@gmail.com wrote: I'm re-sending my first message because I've just received the mailing-list confirmation. If it's a duplicated, forget about this one. Hi, I want to do a fuzzy search and always return documents no matter what the score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked great and does ALMOST exactly what I wanted. The problem is that the algorithms supported jw, ngram and edit are not the best fit for my scenario. The best results come from StrikeAMatch (http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/). So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented what I wanted. But I was told that I should use trunk because there were some really great news about fuzzy search there. I read this article explaining some changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html. But I still don't think it replaces the StrikeAMatch algo, because that one can have best results in searches like abc comparing to strings like abc company inc (distance 2). But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to the state of lucene trunk. So here I'm, I want to know how 4.0 will help achieve what I want. Thanks. -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Fuzzy search always returning docs sorted by the highest match
Well, it was about the implementation of a algorithm that was purposed by a user and was implemented in another way. And this, and not the user mailing list was recommended by this developer to ask this question. So, not entirely my fault. But I apologize for the inconvenience. I just want to clarify that searching for the tokens separably is not what I want since those words can exist but not all in the same doc. I want to compare the whole phrase. For that to work I not using any Analyzer. As I said, I've got it working, but I don't know how to use the right algorithm for the job. I'm going to redirect my question to the other mailing list. Thanks anyway. On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot ear...@gmail.com wrote: You aren't likely to encounter strings like abc company inc in Lucene index, as it will be tokenized into three tokens abc, company, inc under most Analyzers. So, for this exact example you don't even need fuzzy matching. Also, maybe you should try 'user' mailing list for questions regarding the use of Lucene. On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi grad...@gmail.com wrote: I'm re-sending my first message because I've just received the mailing-list confirmation. If it's a duplicated, forget about this one. Hi, I want to do a fuzzy search and always return documents no matter what the score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked great and does ALMOST exactly what I wanted. The problem is that the algorithms supported jw, ngram and edit are not the best fit for my scenario. The best results come from StrikeAMatch ( http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/). So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented what I wanted. But I was told that I should use trunk because there were some really great news about fuzzy search there. I read this article explaining some changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html . But I still don't think it replaces the StrikeAMatch algo, because that one can have best results in searches like abc comparing to strings like abc company inc (distance 2). But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to the state of lucene trunk. So here I'm, I want to know how 4.0 will help achieve what I want. Thanks. -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Fuzzy search always returning docs sorted by the highest match
I'm baffled. As probably are you. If all you want is a fuzzy match against a list of strings, Lucene is a huge fat overkill, and you need to look elsewhere. 2011/5/19 Guilherme Aiolfi grad...@gmail.com: Well, it was about the implementation of a algorithm that was purposed by a user and was implemented in another way. And this, and not the user mailing list was recommended by this developer to ask this question. So, not entirely my fault. But I apologize for the inconvenience. I just want to clarify that searching for the tokens separably is not what I want since those words can exist but not all in the same doc. I want to compare the whole phrase. For that to work I not using any Analyzer. As I said, I've got it working, but I don't know how to use the right algorithm for the job. I'm going to redirect my question to the other mailing list. Thanks anyway. On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot ear...@gmail.com wrote: You aren't likely to encounter strings like abc company inc in Lucene index, as it will be tokenized into three tokens abc, company, inc under most Analyzers. So, for this exact example you don't even need fuzzy matching. Also, maybe you should try 'user' mailing list for questions regarding the use of Lucene. On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi grad...@gmail.com wrote: I'm re-sending my first message because I've just received the mailing-list confirmation. If it's a duplicated, forget about this one. Hi, I want to do a fuzzy search and always return documents no matter what the score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked great and does ALMOST exactly what I wanted. The problem is that the algorithms supported jw, ngram and edit are not the best fit for my scenario. The best results come from StrikeAMatch (http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/). So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented what I wanted. But I was told that I should use trunk because there were some really great news about fuzzy search there. I read this article explaining some changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html. But I still don't think it replaces the StrikeAMatch algo, because that one can have best results in searches like abc comparing to strings like abc company inc (distance 2). But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to the state of lucene trunk. So here I'm, I want to know how 4.0 will help achieve what I want. Thanks. -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Fuzzy search always returning docs sorted by the highest match
I'm re-sending my first message because I've just received the mailing-list confirmation. If it's a duplicated, forget about this one. Hi, I want to do a fuzzy search and always return documents no matter what the score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked great and does ALMOST exactly what I wanted. The problem is that the algorithms supported jw, ngram and edit are not the best fit for my scenario. The best results come from StrikeAMatch ( http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/). So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented what I wanted. But I was told that I should use trunk because there were some really great news about fuzzy search there. I read this article explaining some changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html. But I still don't think it replaces the StrikeAMatch algo, because that one can have best results in searches like abc comparing to strings like abc company inc (distance 2). But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to the state of lucene trunk. So here I'm, I want to know how 4.0 will help achieve what I want. Thanks.
Fuzzy search always returning docs sorted by the highest match
Hi, I want to do a fuzzy search and always return documents no matter what the score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked great and does ALMOST exactly what I wanted. The problem is that the algorithms supported jw, ngram and edit are not the best fit for my scenario. The best results come from StrikeAMatch ( http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/). So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented what I wanted. But I was told that I should use trunk because there were some really great news about fuzzy search there. I read this article explaining some changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html. But I still don't think it replaces the StrikeAMatch algo, because that one can have best results in searches like abc comparing to strings like abc company inc (distance 2). But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to the state of lucene trunk. So here I'm, I want to know how 4.0 will help achieve what I want. Thanks.