[ https://issues.apache.org/jira/browse/LUCENE-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-6789: -------------------------------- Attachment: LUCENE-6789.patch > change IndexSearcher default similarity to BM25 > ----------------------------------------------- > > Key: LUCENE-6789 > URL: https://issues.apache.org/jira/browse/LUCENE-6789 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Fix For: 6.0 > > Attachments: LUCENE-6789.patch > > > Since Lucene 4.0, the statistics needed for this are always present, so we > can make the change without any degradation. > I think the change should be a 6.0 change only: it will prevent any > surprises. DefaultSimilarity is renamed to ClassicSimilarity to prevent > confusion. No indexing change is needed as we use the same norm format, its > just a runtime switch. Users can just do IndexSearcher.setSimilarity(new > ClassicSimilarity()) to get the old behavior. I did not change solr's > default here, I think that should be a separate issue, since it has more > concerns: e.g. factories in configuration files and so on. > One issue was the generation of synonym queries (posinc=0) by QueryBuilder > (used by parsers). This is kind of a corner case (query-time synonyms), but > we should make it nicer. The current code in trunk disables coord, which > makes no sense for anything but the vector space impl. Instead, this patch > adds a SynonymQuery which treats occurrences of any term as a single > pseudoterm. With english wordnet as a query-time synonym dict, this query > gives 12% improvement in MAP for title queries on BM25, and 2% with Classic > (not significant). So its a better generic approach for synonyms that works > with all scoring models. > I wanted to use BlendedTermQuery, but it seems to have problems at a glance, > it tries to "take on the world", it has problems like not working with > distributed scoring (doesn't consult indexsearcher for stats). Anyway this > one is a different, simpler approach, which only works for a single field, > and which calls tf(sum) a single time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org