[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018417#comment-14018417 ]
Da Huang commented on LUCENE-4396: ---------------------------------- About scores diff. on BS/BS2 (the same as BNS/BS2) Now, there's scores diff. on BS/BS2, when excuting query like "+a b c d ...". I have been told that the reason is indicate by the TODO on ReqOptSumScorer.score() which says that {code} // TODO: sum into a double and cast to float if we ever send required clauses to BS1 {code} However, I don't think so, as the score bias is due to different score calculating orders. Supposed that a doc hits the query "+a b c d", the score calculated by BS is {code} BS.score(doc) = ((a.score() + b.score()) + c.score()) + d.score() {code} while the score calculated by BS2 is {code} BS2.score(doc) = a.score() + (float)(b.score() + c.score() + d.score()) {code} Notice that, in BS2, we can only get the float value of (b.score() + c.score() + d.score()) by reqScorer.score(). Furthermore, I have noticed that actually we can control the BS's score calulating order, so that {code} BS.score(doc) = a.score() + ((b.score() + c.score()) + d.score()) {code} However, for BS2, we do not know the calculating order of (b.score() + c.score() + d.score()), as the order is determined by scorer's position in a heap. I still think this matters little. I will rearrange the calculating order of BS.score() at next patch, to see whether it works. > BooleanScorer should sometimes be used for MUST clauses > ------------------------------------------------------- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch > > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 1000000 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org