Thanksa lot, But I still don't understand why raising a little bit the min similarity change the ordering...
markharw00d wrote: > > This could be down to IDF ie "Lucane" is ranked higher because it is rarer > despite having worse edit distance. > This is arguably a bug. > See http://issues.apache.org/jira/browse/LUCENE-329 which discusses this. > You could try subclass QueryParser and override newFuzzyQuery to return > FuzzyLikeThisQuery (found in "contrib/queries") > > Cheers > Mark > > > > ----- Original Message ---- > From: stefcl <stefatw...@gmail.com> > To: java-user@lucene.apache.org > Sent: Mon, 15 February, 2010 14:13:52 > Subject: Strange Fuzzyquery results scoring when using a low minimal > distance > > > Hello, > > I'm using Lucene v3. > Please consider the following spellings > > Lucene > Lucéne > lucéne > Lucane > Lucen > > When searching for "lucéne" among those words using a FuzzyQuery (with 0.5 > edit distance), results show : > > 1. Lucene 1.0259752 > 2. Lucane 1.0259752 > 3. Lucéne 0.95660806 > 4. lucéne 0.95660806 > 5. Lucen 0.30779266 > > #4 is an exact match, why does it receive a lower score than "Lucane" > which > contains one incorrect letter? > > Also, if you raise min similarity a bit higher (0.6 of above), everything > becomes normal : > > 1. Lucéne 1.0438477 > 2. lucéne 1.0438477 > 3. Lucene 0.97959816 > 4. Lucane 0.97959816 > > > Any idea? > Thanks in advance... > > > The code I use : > > /** > * @param args the command line arguments > */ > public static void main(String[] args) throws IOException, > ParseException > { > > StandardAnalyzer analyzer = new > StandardAnalyzer(Version.LUCENE_CURRENT); > > // TODO code application logic here > Directory index = new RAMDirectory(); > IndexWriter w = new IndexWriter(index, analyzer, true, > IndexWriter.MaxFieldLength.UNLIMITED); > > addDoc(w, "Lucene"); > addDoc(w, "Lucéne"); > addDoc(w, "lucéne"); > addDoc(w, "Lucane"); > addDoc(w, "Lucen"); > > w.close(); > > FuzzyQuery q = new FuzzyQuery( new Term("title", "lucéne") , 0.5f > ); > > // 3. search > IndexSearcher searcher = new IndexSearcher(index); > > TopDocs collector = searcher.search(q, 10); > ScoreDoc[] hits = collector.scoreDocs; > > // 4. display results > System.out.println("Found " + hits.length + " hits."); > for(int i = 0 ; i < hits.length; i++) > { > Document d = searcher.doc(hits[i].doc); > System.out.println((i + 1) + ". " + d.get("title") + " " + > hits[i].score ); > } > > // searcher can only be closed when there > // is no need to access the documents any more. > searcher.close(); > } > > > private static void addDoc(IndexWriter w, String value) throws > IOException > { > Document doc = new Document(); > doc.add(new Field("title", value, Field.Store.YES, > Field.Index.ANALYZED)); > w.addDocument(doc); > } > -- > View this message in context: > http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-a-low-minimal-distance-tp27594371p27594371.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-a-low-minimal-distance-tp27594371p27605395.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org