Thanks for the very detailed answer. Using fuzzylikethis solves the problem.
Uwe Schindler wrote: > > The problem ist he following: > The docFreq of the term "lucéne" is 2, all other terms have 1 (because > StandardAnalyzer lowercases everything). What happens is, that terms with > lower docFreq get a higher score in TermQuery. This score overweighs the > boosting done by FuzzyQuery (because you index is so small). > > If you raise the minSimilarity a little bit, your query matches less terms > and the rewritten BooleanQuery contains less clauses. At some point the > score overweigh of the less frequent terms is no longer relevant for the > final score. > > By the way, you can always look at the explain() results which informs you > about the scoring done. > > The fix is (applies only to trunk, see issue > https://issues.apache.org/jira/browse/LUCENE-124) to ignore scoring of the > TermQueries generated by Fuzzy and only look at the edit distance > (implemented by another MTQ.RewriteMode), that can be set with > FuzzyQuery.setRewriteMode(). > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -----Original Message----- >> From: stefcl [mailto:stefatw...@gmail.com] >> Sent: Tuesday, February 16, 2010 10:11 AM >> To: java-user@lucene.apache.org >> Subject: Re: Strange Fuzzyquery results scoring when using a low >> minimal distance >> >> >> Thanksa lot, >> But I still don't understand why raising a little bit the min >> similarity >> change the ordering... >> >> >> >> markharw00d wrote: >> > >> > This could be down to IDF ie "Lucane" is ranked higher because it is >> rarer >> > despite having worse edit distance. >> > This is arguably a bug. >> > See http://issues.apache.org/jira/browse/LUCENE-329 which discusses >> this. >> > You could try subclass QueryParser and override newFuzzyQuery to >> return >> > FuzzyLikeThisQuery (found in "contrib/queries") >> > >> > Cheers >> > Mark >> > >> > >> > >> > ----- Original Message ---- >> > From: stefcl <stefatw...@gmail.com> >> > To: java-user@lucene.apache.org >> > Sent: Mon, 15 February, 2010 14:13:52 >> > Subject: Strange Fuzzyquery results scoring when using a low minimal >> > distance >> > >> > >> > Hello, >> > >> > I'm using Lucene v3. >> > Please consider the following spellings >> > >> > Lucene >> > Lucéne >> > lucéne >> > Lucane >> > Lucen >> > >> > When searching for "lucéne" among those words using a FuzzyQuery >> (with 0.5 >> > edit distance), results show : >> > >> > 1. Lucene 1.0259752 >> > 2. Lucane 1.0259752 >> > 3. Lucéne 0.95660806 >> > 4. lucéne 0.95660806 >> > 5. Lucen 0.30779266 >> > >> > #4 is an exact match, why does it receive a lower score than "Lucane" >> > which >> > contains one incorrect letter? >> > >> > Also, if you raise min similarity a bit higher (0.6 of above), >> everything >> > becomes normal : >> > >> > 1. Lucéne 1.0438477 >> > 2. lucéne 1.0438477 >> > 3. Lucene 0.97959816 >> > 4. Lucane 0.97959816 >> > >> > >> > Any idea? >> > Thanks in advance... >> > >> > >> > The code I use : >> > >> > /** >> > * @param args the command line arguments >> > */ >> > public static void main(String[] args) throws IOException, >> > ParseException >> > { >> > >> > StandardAnalyzer analyzer = new >> > StandardAnalyzer(Version.LUCENE_CURRENT); >> > >> > // TODO code application logic here >> > Directory index = new RAMDirectory(); >> > IndexWriter w = new IndexWriter(index, analyzer, true, >> > IndexWriter.MaxFieldLength.UNLIMITED); >> > >> > addDoc(w, "Lucene"); >> > addDoc(w, "Lucéne"); >> > addDoc(w, "lucéne"); >> > addDoc(w, "Lucane"); >> > addDoc(w, "Lucen"); >> > >> > w.close(); >> > >> > FuzzyQuery q = new FuzzyQuery( new Term("title", "lucéne") , >> 0.5f >> > ); >> > >> > // 3. search >> > IndexSearcher searcher = new IndexSearcher(index); >> > >> > TopDocs collector = searcher.search(q, 10); >> > ScoreDoc[] hits = collector.scoreDocs; >> > >> > // 4. display results >> > System.out.println("Found " + hits.length + " hits."); >> > for(int i = 0 ; i < hits.length; i++) >> > { >> > Document d = searcher.doc(hits[i].doc); >> > System.out.println((i + 1) + ". " + d.get("title") + " >> " + >> > hits[i].score ); >> > } >> > >> > // searcher can only be closed when there >> > // is no need to access the documents any more. >> > searcher.close(); >> > } >> > >> > >> > private static void addDoc(IndexWriter w, String value) throws >> > IOException >> > { >> > Document doc = new Document(); >> > doc.add(new Field("title", value, Field.Store.YES, >> > Field.Index.ANALYZED)); >> > w.addDocument(doc); >> > } >> > -- >> > View this message in context: >> > http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using- >> a-low-minimal-distance-tp27594371p27594371.html >> > Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> > >> >> -- >> View this message in context: http://old.nabble.com/Strange-Fuzzyquery- >> results-scoring-when-using-a-low-minimal-distance- >> tp27594371p27605395.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-a-low-minimal-distance-tp27594371p27702921.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org