Dear Matthias, First you need to know about the Lucene's ranking concept. Lucene's basic ranking is BM25 and it depends on your index status. (https://en.wikipedia.org/wiki/Okapi_BM25) There can be many reasons. One of thing that I can guess is your index has a lot of 'rozi' term so it is getting worthless. It is called IDF(Inverse Document Frequency). Anyway, if you want to be a micro controller, you need to understand the BM25 expression.
And Lucene can tell you how your score came out. Explanation can be used to get it. I attach the sample code. ====================================== IndexSearcher searcher = new IndexSearcher(reader); TopDocs docs = searcher.search(q, hitsPerPage); ScoreDoc[] hits = docs.scoreDocs; for (int i = 0; i < hits.length; ++i) { int docId = hits[i].doc; Explanation explanation = searcher.explain(q, docId); // You can see how the score is calculated System.out.println("Explanation : " + explanation.toString()); } ====================================== I hope it helps :D Best regards, Namgyu Kim P.S. For BM25, the default value in Lucene is k1 = 1.2, b = 0.75. 2019년 6월 14일 (금) 오전 12:54, <baris.ka...@oracle.com>님이 작성: > i would suggest trying (indexing and searching) without === ' === s and > see You can find it first. > > Thanks > > > On 6/13/19 11:25 AM, Matthias Müller wrote: > > I am currently matching botanic names (with possible mis-spellings) > > against an indexed referenced list with Lucene. After quick progress in > > the beginning, I am struggeling with the proper query design to achieve > > a ranking result I want. > > > > Here is an example: > > > > Search term: Acer campestre 'Rozi' > > > > Tokenized (decomposed) representation: > > acer > > campestre > > rozi > > > > Top 10 hits: > > {value=Acer campestre, score=12.288989} > > {value=Acer campestre 'Rozi', score=11.955223} // <- why is it 2nd? > > {value=Acer campestre 'Arends', score=10.640412} > > {value=Acer campestre subsp. leiocarpon, score=10.640412} > > {value=Acer campestre 'Carnival', score=10.640412} > > {value=Acer campestre 'Commodore', score=10.640412} > > {value=Acer campestre 'Nanum', score=10.640412} > > {value=Acer campestre 'Elsrijk', score=10.640412} > > {value=Acer campestre 'Fastigiatum', score=10.640412} > > {value=Acer campestre 'Geessink', score=10.640412}] > > > > > > And here is how I create my queries: > > > > final BooleanQuery.Builder builder = new BooleanQuery.Builder(); > > // add individual tokens to query > > for (String token : fuzzyTokens) { > > final Term term = new Term(NAME_TOKENS.name(), token); > > final FuzzyQuery fq = new FuzzyQuery(term); > > builder.add(fq, BooleanClause.Occur.SHOULD); > > } > > return builder.build(); > > } > > > > > > Input names are analyzed with a StandardTokenizer and Lowercase filter > > when they are added to the IndexWriter. > > > > > > My question: How can I get a ranking that scores > > "Acer campestre 'Rozi'" higher than "Acer campestre"? > > I am sure there is an obvious way to achieve this that I have yet > > failed to find. > > > > > > -Matthias > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >