subject:"Re\: Fuzzy search always returning docs sorted by the highest match"

Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Earwin Burrfoot

I'm baffled. As probably are you.

If all you want is a fuzzy match against a list of strings, Lucene is
a huge fat overkill, and you need to look elsewhere.

2011/5/19 Guilherme Aiolfi :
> Well, it was about the implementation of a algorithm that was purposed by a
> user and was implemented in another way. And this, and not the user mailing
> list was recommended by this developer to ask this question.
> So, not entirely my fault. But I apologize for the inconvenience.
> I just want to clarify that searching for the tokens separably is not what I
> want since those words can exist but not all in the same doc. I want to
> compare the whole phrase. For that to work I not using any Analyzer.
> As I said, I've got it working, but I don't know how to use the right
> algorithm for the job.
> I'm going to redirect my question to the other mailing list.
> Thanks anyway.
>
> On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot  wrote:
>>
>> You aren't likely to encounter strings like "abc company inc" in
>> Lucene index, as it will be tokenized into three tokens "abc",
>> "company", "inc" under most Analyzers.
>> So, for this exact example you don't even need fuzzy matching.
>>
>> Also, maybe you should try 'user' mailing list for questions regarding
>> the use of Lucene.
>>
>> On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi  wrote:
>> > I'm re-sending my first message because I've just received the
>> > mailing-list
>> > confirmation. If it's a duplicated, forget about this one.
>> >
>> > Hi,
>> > I want to do a fuzzy search and always return documents no matter what
>> > the
>> > score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It
>> > worked
>> > great and does ALMOST exactly what I wanted. The problem is that the
>> > algorithms supported  jw, ngram and edit are not the best fit for my
>> > scenario.
>> > The best results come from StrikeAMatch
>> >
>> > (http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
>> > So, I've found this
>> > link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented
>> > what
>> > I wanted. But I was told that I should use trunk because there were some
>> > really great news about fuzzy search there.
>> > I read this article explaining some
>> >
>> > changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
>> > But I still don't think it replaces the StrikeAMatch algo, because that
>> > one
>> > can have best results in searches like "abc" comparing to strings like
>> > "abc
>> > company inc" (distance > 2).
>> > But still, Fuad Efendi told me that StrikeAMatch is toys for kids
>> > compare to
>> > the state of lucene trunk. So here I'm, I want to know how 4.0 will help
>> > achieve what I want.
>> > Thanks.
>> >
>> >
>> >
>>
>>
>>
>> --
>> Kirill Zakharenko/Кирилл Захаренко
>> E-Mail/Jabber: ear...@gmail.com
>> Phone: +7 (495) 683-567-4
>> ICQ: 104465785
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Guilherme Aiolfi

Well, it was about the implementation of a algorithm that was purposed by a
user and was implemented in another way. And this, and not the user mailing
list was recommended by this developer to ask this question.

So, not entirely my fault. But I apologize for the inconvenience.

I just want to clarify that searching for the tokens separably is not what I
want since those words can exist but not all in the same doc. I want to
compare the whole phrase. For that to work I not using any Analyzer.

As I said, I've got it working, but I don't know how to use the right
algorithm for the job.

I'm going to redirect my question to the other mailing list.

Thanks anyway.

On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot  wrote:

> You aren't likely to encounter strings like "abc company inc" in
> Lucene index, as it will be tokenized into three tokens "abc",
> "company", "inc" under most Analyzers.
> So, for this exact example you don't even need fuzzy matching.
>
> Also, maybe you should try 'user' mailing list for questions regarding
> the use of Lucene.
>
> On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi  wrote:
> > I'm re-sending my first message because I've just received the
> mailing-list
> > confirmation. If it's a duplicated, forget about this one.
> >
> > Hi,
> > I want to do a fuzzy search and always return documents no matter what
> the
> > score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It
> worked
> > great and does ALMOST exactly what I wanted. The problem is that the
> > algorithms supported  jw, ngram and edit are not the best fit for my
> > scenario.
> > The best results come from StrikeAMatch
> > (
> http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
> > So, I've found this
> > link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented
> what
> > I wanted. But I was told that I should use trunk because there were some
> > really great news about fuzzy search there.
> > I read this article explaining some
> > changes
> http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html
> .
> > But I still don't think it replaces the StrikeAMatch algo, because that
> one
> > can have best results in searches like "abc" comparing to strings like
> "abc
> > company inc" (distance > 2).
> > But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare
> to
> > the state of lucene trunk. So here I'm, I want to know how 4.0 will help
> > achieve what I want.
> > Thanks.
> >
> >
> >
>
>
>
> --
> Kirill Zakharenko/Кирилл Захаренко
> E-Mail/Jabber: ear...@gmail.com
> Phone: +7 (495) 683-567-4
> ICQ: 104465785
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Earwin Burrfoot

You aren't likely to encounter strings like "abc company inc" in
Lucene index, as it will be tokenized into three tokens "abc",
"company", "inc" under most Analyzers.
So, for this exact example you don't even need fuzzy matching.

Also, maybe you should try 'user' mailing list for questions regarding
the use of Lucene.

On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi  wrote:
> I'm re-sending my first message because I've just received the mailing-list
> confirmation. If it's a duplicated, forget about this one.
>
> Hi,
> I want to do a fuzzy search and always return documents no matter what the
> score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
> great and does ALMOST exactly what I wanted. The problem is that the
> algorithms supported  jw, ngram and edit are not the best fit for my
> scenario.
> The best results come from StrikeAMatch
> (http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
> So, I've found this
> link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented what
> I wanted. But I was told that I should use trunk because there were some
> really great news about fuzzy search there.
> I read this article explaining some
> changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
> But I still don't think it replaces the StrikeAMatch algo, because that one
> can have best results in searches like "abc" comparing to strings like "abc
> company inc" (distance > 2).
> But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
> the state of lucene trunk. So here I'm, I want to know how 4.0 will help
> achieve what I want.
> Thanks.
>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Fuzzy search always returning docs sorted by the highest match

Re: Fuzzy search always returning docs sorted by the highest match

Re: Fuzzy search always returning docs sorted by the highest match

3 matches

Site Navigation

Mail list logo

Footer information