Michael McCandless created LUCENE-5288: ------------------------------------------
Summary: Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur "close" together (in proximity) in each document Key: LUCENE-5288 URL: https://issues.apache.org/jira/browse/LUCENE-5288 Project: Lucene - Core Issue Type: New Feature Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.6, 5.0 This is very much a work in progress, tons of nocommits... It adds two classes: * ProxBooleanTermQuery: like BooleanQuery (currently, all clauses must be TermQuery, and only Occur.SHOULD is supported), which is essentially a BooleanQuery (same matching/scoring) except for each matching docs the positions are merge-sorted and scored to "boost" the document's score * QueryRescorer: simple API to re-score top hits using a different query. Because ProxBooleanTermQuery is so costly, apps would normally run an "ordinary" BooleanQuery across the full index, to get the top few hundred hits, and then rescore using the more costly ProxBooleanTermQuery (or other costly queries). I'm not sure how to actually compute the appropriate prox boost (this is the hard part!!) and I've completely punted on that in the current patch (it's just a hack now), but the patch does all the "mechanics" to merge/visit all the positions in order per hit. Maybe we could do the similar scoring that SpanNearQuery or sloppy PhraseQuery would do, or maybe this paper: http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf which Rob also used in LUCENE-4909 to add proximity scoring to PostingsHighlighter. Maybe we need to make it (how the prox boost is computed/folded in) somehow pluggable ... -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org