subject:"Fuzzy search always returning docs sorted by the highest match"

Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Earwin Burrfoot

You aren't likely to encounter strings like abc company inc in
Lucene index, as it will be tokenized into three tokens abc,
company, inc under most Analyzers.
So, for this exact example you don't even need fuzzy matching.

Also, maybe you should try 'user' mailing list for questions regarding
the use of Lucene.

On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi grad...@gmail.com wrote:
I'm re-sending my first message because I've just received the mailing-list
confirmation. If it's a duplicated, forget about this one.

Hi,
I want to do a fuzzy search and always return documents no matter what the
score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
great and does ALMOST exactly what I wanted. The problem is that the
algorithms supported jw, ngram and edit are not the best fit for my
scenario.
The best results come from StrikeAMatch
(http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
So, I've found this
link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented what
I wanted. But I was told that I should use trunk because there were some
really great news about fuzzy search there.
I read this article explaining some
changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
But I still don't think it replaces the StrikeAMatch algo, because that one
can have best results in searches like abc comparing to strings like abc
company inc (distance 2).
But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
the state of lucene trunk. So here I'm, I want to know how 4.0 will help
achieve what I want.
Thanks.

--
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Guilherme Aiolfi

Well, it was about the implementation of a algorithm that was purposed by a
user and was implemented in another way. And this, and not the user mailing
list was recommended by this developer to ask this question.

So, not entirely my fault. But I apologize for the inconvenience.

I just want to clarify that searching for the tokens separably is not what I
want since those words can exist but not all in the same doc. I want to
compare the whole phrase. For that to work I not using any Analyzer.

As I said, I've got it working, but I don't know how to use the right
algorithm for the job.

I'm going to redirect my question to the other mailing list.

Thanks anyway.

On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot ear...@gmail.com wrote:

 You aren't likely to encounter strings like abc company inc in
 Lucene index, as it will be tokenized into three tokens abc,
 company, inc under most Analyzers.
 So, for this exact example you don't even need fuzzy matching.

 Also, maybe you should try 'user' mailing list for questions regarding
 the use of Lucene.

 On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi grad...@gmail.com wrote:
  I'm re-sending my first message because I've just received the
 mailing-list
  confirmation. If it's a duplicated, forget about this one.
 
  Hi,
  I want to do a fuzzy search and always return documents no matter what
 the
  score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It
 worked
  great and does ALMOST exactly what I wanted. The problem is that the
  algorithms supported  jw, ngram and edit are not the best fit for my
  scenario.
  The best results come from StrikeAMatch
  (
 http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
  So, I've found this
  link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented
 what
  I wanted. But I was told that I should use trunk because there were some
  really great news about fuzzy search there.
  I read this article explaining some
  changes
 http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html
 .
  But I still don't think it replaces the StrikeAMatch algo, because that
 one
  can have best results in searches like abc comparing to strings like
 abc
  company inc (distance  2).
  But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare
 to
  the state of lucene trunk. So here I'm, I want to know how 4.0 will help
  achieve what I want.
  Thanks.
 
 
 



 --
 Kirill Zakharenko/Кирилл Захаренко
 E-Mail/Jabber: ear...@gmail.com
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Earwin Burrfoot

I'm baffled. As probably are you.

If all you want is a fuzzy match against a list of strings, Lucene is
a huge fat overkill, and you need to look elsewhere.

2011/5/19 Guilherme Aiolfi grad...@gmail.com:
Well, it was about the implementation of a algorithm that was purposed by a
user and was implemented in another way. And this, and not the user mailing
list was recommended by this developer to ask this question.
So, not entirely my fault. But I apologize for the inconvenience.
I just want to clarify that searching for the tokens separably is not what I
want since those words can exist but not all in the same doc. I want to
compare the whole phrase. For that to work I not using any Analyzer.
As I said, I've got it working, but I don't know how to use the right
algorithm for the job.
I'm going to redirect my question to the other mailing list.
Thanks anyway.

On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot ear...@gmail.com wrote:

Also, maybe you should try 'user' mailing list for questions regarding
the use of Lucene.

On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi grad...@gmail.com wrote:
I'm re-sending my first message because I've just received the
mailing-list
confirmation. If it's a duplicated, forget about this one.

(http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
So, I've found this
link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented
what
I wanted. But I was told that I should use trunk because there were some
really great news about fuzzy search there.
I read this article explaining some

changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
But I still don't think it replaces the StrikeAMatch algo, because that
one
can have best results in searches like abc comparing to strings like
abc
company inc (distance 2).
But still, Fuad Efendi told me that StrikeAMatch is toys for kids
compare to
the state of lucene trunk. So here I'm, I want to know how 4.0 will help
achieve what I want.
Thanks.

--
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

--
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Fuzzy search always returning docs sorted by the highest match

2011-05-17 Thread Guilherme Aiolfi

I'm re-sending my first message because I've just received the mailing-list
confirmation. If it's a duplicated, forget about this one.

Hi,

I want to do a fuzzy search and always return documents no matter what the
score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
great and does ALMOST exactly what I wanted. The problem is that the
algorithms supported  jw, ngram and edit are not the best fit for my
scenario.

The best results come from StrikeAMatch (
http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that
implemented what I wanted. But I was told that I should use trunk because
there were some really great news about fuzzy search there.

I read this article explaining some changes
http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
But I still don't think it replaces the StrikeAMatch algo, because that one
can have best results in searches like abc comparing to strings like abc
company inc (distance  2).

But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
the state of lucene trunk. So here I'm, I want to know how 4.0 will help
achieve what I want.

Thanks.

Fuzzy search always returning docs sorted by the highest match

2011-05-17 Thread Guilherme Aiolfi

Hi,

I want to do a fuzzy search and always return documents no matter what the
score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
great and does ALMOST exactly what I wanted. The problem is that the
algorithms supported  jw, ngram and edit are not the best fit for my
scenario.

The best results come from StrikeAMatch (
http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that
implemented what I wanted. But I was told that I should use trunk because
there were some really great news about fuzzy search there.

I read this article explaining some changes
http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
But I still don't think it replaces the StrikeAMatch algo, because that one
can have best results in searches like abc comparing to strings like abc
company inc (distance  2).

But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
the state of lucene trunk. So here I'm, I want to know how 4.0 will help
achieve what I want.

Thanks.

Re: Fuzzy search always returning docs sorted by the highest match

Re: Fuzzy search always returning docs sorted by the highest match

Re: Fuzzy search always returning docs sorted by the highest match

Fuzzy search always returning docs sorted by the highest match

Fuzzy search always returning docs sorted by the highest match

5 matches

Site Navigation

Mail list logo

Footer information