This could be down to IDF ie "Lucane" is ranked higher because it is rarer 
despite having worse edit distance.
This is arguably a bug.
See http://issues.apache.org/jira/browse/LUCENE-329 which discusses this. You 
could try subclass QueryParser and override newFuzzyQuery to return 
FuzzyLikeThisQuery (found in "contrib/queries")

Cheers
Mark



----- Original Message ----
From: stefcl <stefatw...@gmail.com>
To: java-user@lucene.apache.org
Sent: Mon, 15 February, 2010 14:13:52
Subject: Strange Fuzzyquery results scoring when using a low minimal distance


Hello,

I'm using Lucene v3. 
Please consider the following spellings 

Lucene
Lucéne
lucéne
Lucane
Lucen

When searching for "lucéne" among those words using a FuzzyQuery (with 0.5
edit distance), results show :

1. Lucene 1.0259752
2. Lucane 1.0259752
3. Lucéne 0.95660806
4. lucéne 0.95660806
5. Lucen 0.30779266

#4 is an exact match, why does it receive a lower score than "Lucane" which
contains one incorrect letter?

Also, if you raise min similarity a bit higher (0.6 of above), everything
becomes normal :

1. Lucéne 1.0438477
2. lucéne 1.0438477
3. Lucene 0.97959816
4. Lucane 0.97959816


Any idea?
Thanks in advance...


The code I use :

   /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException,
ParseException
    {

        StandardAnalyzer analyzer = new
StandardAnalyzer(Version.LUCENE_CURRENT);

        // TODO code application logic here
        Directory index = new RAMDirectory();
        IndexWriter w = new IndexWriter(index, analyzer, true,
IndexWriter.MaxFieldLength.UNLIMITED);

        addDoc(w, "Lucene");
        addDoc(w, "Lucéne");
        addDoc(w, "lucéne");
        addDoc(w, "Lucane");
        addDoc(w, "Lucen");

        w.close();

        FuzzyQuery q =  new FuzzyQuery( new Term("title", "lucéne") , 0.5f
);
        
        // 3. search
        IndexSearcher searcher = new IndexSearcher(index);
        
        TopDocs collector = searcher.search(q, 10);
        ScoreDoc[] hits = collector.scoreDocs;

        // 4. display results
        System.out.println("Found " + hits.length + " hits.");
        for(int i = 0 ; i < hits.length; i++)
        {
              Document d = searcher.doc(hits[i].doc);
              System.out.println((i + 1) + ". " + d.get("title") + " " + 
hits[i].score );
        }

        // searcher can only be closed when there
        // is no need to access the documents any more.
        searcher.close();
    }


    private static void addDoc(IndexWriter w, String value) throws
IOException
    {
        Document doc = new Document();
        doc.add(new Field("title", value, Field.Store.YES,
Field.Index.ANALYZED));
        w.addDocument(doc);
    }
-- 
View this message in context: 
http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-a-low-minimal-distance-tp27594371p27594371.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to