Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Earwin Burrfoot
You aren't likely to encounter strings like abc company inc in
Lucene index, as it will be tokenized into three tokens abc,
company, inc under most Analyzers.
So, for this exact example you don't even need fuzzy matching.

Also, maybe you should try 'user' mailing list for questions regarding
the use of Lucene.

On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi grad...@gmail.com wrote:
 I'm re-sending my first message because I've just received the mailing-list
 confirmation. If it's a duplicated, forget about this one.

 Hi,
 I want to do a fuzzy search and always return documents no matter what the
 score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
 great and does ALMOST exactly what I wanted. The problem is that the
 algorithms supported  jw, ngram and edit are not the best fit for my
 scenario.
 The best results come from StrikeAMatch
 (http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
 So, I've found this
 link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented what
 I wanted. But I was told that I should use trunk because there were some
 really great news about fuzzy search there.
 I read this article explaining some
 changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
 But I still don't think it replaces the StrikeAMatch algo, because that one
 can have best results in searches like abc comparing to strings like abc
 company inc (distance  2).
 But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
 the state of lucene trunk. So here I'm, I want to know how 4.0 will help
 achieve what I want.
 Thanks.






-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Guilherme Aiolfi
Well, it was about the implementation of a algorithm that was purposed by a
user and was implemented in another way. And this, and not the user mailing
list was recommended by this developer to ask this question.

So, not entirely my fault. But I apologize for the inconvenience.

I just want to clarify that searching for the tokens separably is not what I
want since those words can exist but not all in the same doc. I want to
compare the whole phrase. For that to work I not using any Analyzer.

As I said, I've got it working, but I don't know how to use the right
algorithm for the job.

I'm going to redirect my question to the other mailing list.

Thanks anyway.

On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot ear...@gmail.com wrote:

 You aren't likely to encounter strings like abc company inc in
 Lucene index, as it will be tokenized into three tokens abc,
 company, inc under most Analyzers.
 So, for this exact example you don't even need fuzzy matching.

 Also, maybe you should try 'user' mailing list for questions regarding
 the use of Lucene.

 On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi grad...@gmail.com wrote:
  I'm re-sending my first message because I've just received the
 mailing-list
  confirmation. If it's a duplicated, forget about this one.
 
  Hi,
  I want to do a fuzzy search and always return documents no matter what
 the
  score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It
 worked
  great and does ALMOST exactly what I wanted. The problem is that the
  algorithms supported  jw, ngram and edit are not the best fit for my
  scenario.
  The best results come from StrikeAMatch
  (
 http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
  So, I've found this
  link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented
 what
  I wanted. But I was told that I should use trunk because there were some
  really great news about fuzzy search there.
  I read this article explaining some
  changes
 http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html
 .
  But I still don't think it replaces the StrikeAMatch algo, because that
 one
  can have best results in searches like abc comparing to strings like
 abc
  company inc (distance  2).
  But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare
 to
  the state of lucene trunk. So here I'm, I want to know how 4.0 will help
  achieve what I want.
  Thanks.
 
 
 



 --
 Kirill Zakharenko/Кирилл Захаренко
 E-Mail/Jabber: ear...@gmail.com
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Earwin Burrfoot
I'm baffled. As probably are you.

If all you want is a fuzzy match against a list of strings, Lucene is
a huge fat overkill, and you need to look elsewhere.

2011/5/19 Guilherme Aiolfi grad...@gmail.com:
 Well, it was about the implementation of a algorithm that was purposed by a
 user and was implemented in another way. And this, and not the user mailing
 list was recommended by this developer to ask this question.
 So, not entirely my fault. But I apologize for the inconvenience.
 I just want to clarify that searching for the tokens separably is not what I
 want since those words can exist but not all in the same doc. I want to
 compare the whole phrase. For that to work I not using any Analyzer.
 As I said, I've got it working, but I don't know how to use the right
 algorithm for the job.
 I'm going to redirect my question to the other mailing list.
 Thanks anyway.

 On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot ear...@gmail.com wrote:

 You aren't likely to encounter strings like abc company inc in
 Lucene index, as it will be tokenized into three tokens abc,
 company, inc under most Analyzers.
 So, for this exact example you don't even need fuzzy matching.

 Also, maybe you should try 'user' mailing list for questions regarding
 the use of Lucene.

 On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi grad...@gmail.com wrote:
  I'm re-sending my first message because I've just received the
  mailing-list
  confirmation. If it's a duplicated, forget about this one.
 
  Hi,
  I want to do a fuzzy search and always return documents no matter what
  the
  score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It
  worked
  great and does ALMOST exactly what I wanted. The problem is that the
  algorithms supported  jw, ngram and edit are not the best fit for my
  scenario.
  The best results come from StrikeAMatch
 
  (http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
  So, I've found this
  link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented
  what
  I wanted. But I was told that I should use trunk because there were some
  really great news about fuzzy search there.
  I read this article explaining some
 
  changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
  But I still don't think it replaces the StrikeAMatch algo, because that
  one
  can have best results in searches like abc comparing to strings like
  abc
  company inc (distance  2).
  But still, Fuad Efendi told me that StrikeAMatch is toys for kids
  compare to
  the state of lucene trunk. So here I'm, I want to know how 4.0 will help
  achieve what I want.
  Thanks.
 
 
 



 --
 Kirill Zakharenko/Кирилл Захаренко
 E-Mail/Jabber: ear...@gmail.com
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org






-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Fuzzy search always returning docs sorted by the highest match

2011-05-17 Thread Guilherme Aiolfi
I'm re-sending my first message because I've just received the mailing-list
confirmation. If it's a duplicated, forget about this one.

Hi,

I want to do a fuzzy search and always return documents no matter what the
score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
great and does ALMOST exactly what I wanted. The problem is that the
algorithms supported  jw, ngram and edit are not the best fit for my
scenario.

The best results come from StrikeAMatch (
http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that
implemented what I wanted. But I was told that I should use trunk because
there were some really great news about fuzzy search there.

I read this article explaining some changes
http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
But I still don't think it replaces the StrikeAMatch algo, because that one
can have best results in searches like abc comparing to strings like abc
company inc (distance  2).

But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
the state of lucene trunk. So here I'm, I want to know how 4.0 will help
achieve what I want.

Thanks.


Fuzzy search always returning docs sorted by the highest match

2011-05-17 Thread Guilherme Aiolfi
Hi,

I want to do a fuzzy search and always return documents no matter what the
score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
great and does ALMOST exactly what I wanted. The problem is that the
algorithms supported  jw, ngram and edit are not the best fit for my
scenario.

The best results come from StrikeAMatch (
http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that
implemented what I wanted. But I was told that I should use trunk because
there were some really great news about fuzzy search there.

I read this article explaining some changes
http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
But I still don't think it replaces the StrikeAMatch algo, because that one
can have best results in searches like abc comparing to strings like abc
company inc (distance  2).

But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
the state of lucene trunk. So here I'm, I want to know how 4.0 will help
achieve what I want.

Thanks.