[jira] [Commented] (LUCENE-5288) Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur "close" together (in proximity) in each document

Robert Muir (JIRA) Wed, 16 Oct 2013 09:44:10 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796958#comment-13796958
 ]


Robert Muir commented on LUCENE-5288:
-------------------------------------

Can we avoid the 'disableCoord'? I dont think we should add negatives like 
omitXXX or disableXXX anymore. Why does the one in BQ need to be protected?

Why is the heap modification code manually inlined into proxscorer? Is this for 
some performance gain? If so, what is the improvement?

Why does proxscorer score as S + P (where P is some proximity boost) if 
QueryRescorer is already scoring as S + (W*S2)? This seems to make the entire 
query unnecessary: its booleanness is not needed, in fact, not wanted as it 
will just double-count normal scoring. and we just need something more like a 
phrase query with different scoring?

> Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur 
> "close" together (in proximity) in each document
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5288
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5288
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.6, 5.0
>
>         Attachments: LUCENE-5288.patch
>
>
> This is very much a work in progress, tons of nocommits...  It adds two 
> classes:
>   * ProxBooleanTermQuery: like BooleanQuery (currently, all clauses
>     must be TermQuery, and only Occur.SHOULD is supported), which is
>     essentially a BooleanQuery (same matching/scoring) except for each
>     matching docs the positions are merge-sorted and scored to "boost"
>     the document's score
>   * QueryRescorer: simple API to re-score top hits using a different
>     query.  Because ProxBooleanTermQuery is so costly, apps would
>     normally run an "ordinary" BooleanQuery across the full index, to
>     get the top few hundred hits, and then rescore using the more
>     costly ProxBooleanTermQuery (or other costly queries).
> I'm not sure how to actually compute the appropriate prox boost (this
> is the hard part!!) and I've completely punted on that in the current
> patch (it's just a hack now), but the patch does all the "mechanics"
> to merge/visit all the positions in order per hit.
> Maybe we could do the similar scoring that SpanNearQuery or sloppy
> PhraseQuery would do, or maybe this paper:
>   http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf
> which Rob also used in LUCENE-4909 to add proximity scoring to
> PostingsHighlighter.  Maybe we need to make it (how the prox boost is
> computed/folded in) somehow pluggable ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5288) Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur "close" together (in proximity) in each document

Reply via email to