[
https://issues.apache.org/jira/browse/LUCENE-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125350#comment-13125350
]
Doug Cutting commented on LUCENE-3510:
--------------------------------------
Using a single bit to track prohibited terms seems reasonable, plus a count for
required terms.
I don't recall the exact history of the original implementation. I think it
may have been in order to support more complex boolean expressions. Any
boolean expression can be rewritten to disjunctive normal form, which can then
be evaluated with a set of required/prohibited mask pairs, one per conjunctive
clause. This is something I'd implemented previously and probably had in mind
when implementing BooleanScorer. A Lucene boolean query is effectively a
single such conjunctive clause, since the optional terms can be ignored when
evaluating the boolean expression, so would reduce to a single pair of masks.
But, as you observe, this single clause DNF case can be further simplified to a
boolean and a count of required terms. Does that make sense?
> BooleanScorer should not limit number of prohibited clauses
> -----------------------------------------------------------
>
> Key: LUCENE-3510
> URL: https://issues.apache.org/jira/browse/LUCENE-3510
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3510.patch
>
>
> Today it's limited to 32, because it uses a separate bit in the mask
> for each clause.
> But I don't understand why it does this; I think all prohibited
> clauses can share a single boolean/bit? Any match on a prohibited
> clause sets this bit and the doc is not collected; we don't need each
> prohibited clause to have a dedicated bit?
> We also use the mask for required clauses, but this code is now
> commented out (we always use BS2 if there are any required clauses);
> if we re-enable this code (and I think we should, at least in certain
> cases: I suspect it'd be faster than BS2 in many cases), I think we
> can cutover to an int count instead of bit masks, and then have no
> limit on the required clauses sent to BooleanScorer also.
> Separately I cleaned a few things up about BooleanScorer: all of the
> embedded scorer methods (nextDoc, docID, advance, score) now throw
> UOE; pre-allocate the buckets instead of doing it lazily
> per-sub-collect.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]