>>Could you tell me what's wrong here, please?

There are potentially a number of factors at play here.

Your use of FuzzyLikeThis is fine - just tried the code on my single-term 
"Paul" query and as I outlined before it is doing a much better job of matching 
(Paul~= results Paul,Paul,Paul....Phul rather than FuzzyQuery's Paul~= results 
Phul, Saul, Paulo , Paul, Paul.....)

Try the query on just the term artist:Coldplay and see the results. What 
artists Does FuzzyLikeThis  return vs FuzzyQuery?

If you aren't getting Coldplay as the first result from FuzzyLikeThis double 
check the content is indexed using the same analyzer that you pass to 
FuzzyLikeThisQuery (your code below uses SimpleAnalyzer). If you indexed with 
WhitespaceAnalyzer for example or as "UN_TOKENIZED the index and the query 
differ so "Coldplay"!=coldplay.

I notice the song title in your original code is treated as a single term in 
your query - is that how it is indexed? I can see that artist might possibly 
make sense as a single term which gets fuzzy matched but song titles are 
generally longer which means it may work better as a tokenized field.

Cheers
Mark


----- Original Message ----
From: László Monda <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Sent: Monday, 23 June, 2008 1:11:50 PM
Subject: Re: Getting irrelevant results using fuzzy query

Thanks for your reply, Mark.



This was my original code for constructing my query using FuzzyQuery:

BooleanQuery query = new BooleanQuery();
if (artist.length() > 0) {
    FuzzyQuery artist_query = new FuzzyQuery(new Term("artist",
artist));
    query.add(artist_query, BooleanClause.Occur.MUST);
}
if (song.length() > 0) {
    FuzzyQuery song_query = new FuzzyQuery(new Term("song", song));
    query.add(song_query, BooleanClause.Occur.MUST);
}



This is my first attempt to use FuzzyLikeThisQuery (with no success):

FuzzyLikeThisQuery query = new FuzzyLikeThisQuery(2, new
SimpleAnalyzer());
if (artist.length() > 0) {
    query.addTerms(artist, "artist", 0.5f, 0);
}
if (song.length() > 0) {
    query.addTerms(song, "song", 0.5f, 0);
}



This is my second attempt to use FuzzyLikeThisQuery (with no success):

BooleanQuery query = new BooleanQuery();
if (artist.length() > 0) {
    FuzzyLikeThisQuery artist_query = new FuzzyLikeThisQuery(1, new
SimpleAnalyzer());
    artist_query.addTerms(artist, "artist", 0.5f, 0);
    query.add(artist_query, BooleanClause.Occur.MUST);
}
if (song.length() > 0) {
    FuzzyLikeThisQuery song_query = new FuzzyLikeThisQuery(1, new
SimpleAnalyzer());
    song_query.addTerms(song, "song", 0.5f, 0);
    query.add(song_query, BooleanClause.Occur.MUST);
}



I think it's my lack of undersanding of the usage of FuzzyLikeThisQuery
that makes me getting irrelevant results.

Could you tell me what's wrong here, please?

Thank you.

On Mon, 2008-06-23 at 11:28 +0000, mark harwood wrote:
> >>I do have serious problems with the relevance of the results with fuzzy 
> >>queries.
> 
> Please take the time to read my response here:
> 
>      http://www.gossamer-threads.com/lists/lucene/java-user/62050#62050
> 
> I had a work colleague come up with exactly the same problem this week and 
> the solution is the same.
> 
> Just tested my index with a standard Lucene FuzzyQuery for "Paul~" - this 
> gives "Phul", "Saul", and "Paulo" before ANY "Paul" records due to IDF issues.
> Using FuzzyLikeThisQuery puts all the "Paul" records ahead of the variants.
> 
> 
> 
> ----- Original Message ----
> From: László Monda <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Cc: [EMAIL PROTECTED]
> Sent: Monday, 23 June, 2008 12:10:05 PM
> Subject: Re: Getting irrelevant results using fuzzy query
> 
> On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber wrote:
> > On Mittwoch, 18. Juni 2008, László Monda wrote:
> > 
> > > Additional info: Lucene seems to do the right thing when only few
> > > documents are present, but goes crazy when there is about 1.5 million
> > > documents in the index.
> > 
> > Lucene works well with more documents (currently using it with 9 million). 
> > but the fuzzy query requires iteration over all terms which makes this 
> > query slow. This can be avoid by setting the prefixLength parameter of the 
> > FuzzyQuery constructor to 1 or 2. Or maybe you should use an n-gram index, 
> > see the spellchecker in the contrib area.
> 
> Thanks for the suggestion, but I don't have any performance problems
> yet, but I do have serious problems with the relevance of the results
> with fuzzy queries.
> 
-- 
Laci  <http://monda.hu>


      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to