I am currently matching botanic names (with possible mis-spellings) against an indexed referenced list with Lucene. After quick progress in the beginning, I am struggeling with the proper query design to achieve a ranking result I want.
Here is an example: Search term: Acer campestre 'Rozi' Tokenized (decomposed) representation: acer campestre rozi Top 10 hits: {value=Acer campestre, score=12.288989} {value=Acer campestre 'Rozi', score=11.955223} // <- why is it 2nd? {value=Acer campestre 'Arends', score=10.640412} {value=Acer campestre subsp. leiocarpon, score=10.640412} {value=Acer campestre 'Carnival', score=10.640412} {value=Acer campestre 'Commodore', score=10.640412} {value=Acer campestre 'Nanum', score=10.640412} {value=Acer campestre 'Elsrijk', score=10.640412} {value=Acer campestre 'Fastigiatum', score=10.640412} {value=Acer campestre 'Geessink', score=10.640412}] And here is how I create my queries: final BooleanQuery.Builder builder = new BooleanQuery.Builder(); // add individual tokens to query for (String token : fuzzyTokens) { final Term term = new Term(NAME_TOKENS.name(), token); final FuzzyQuery fq = new FuzzyQuery(term); builder.add(fq, BooleanClause.Occur.SHOULD); } return builder.build(); } Input names are analyzed with a StandardTokenizer and Lowercase filter when they are added to the IndexWriter. My question: How can I get a ranking that scores "Acer campestre 'Rozi'" higher than "Acer campestre"? I am sure there is an obvious way to achieve this that I have yet failed to find. -Matthias --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org