Adrien Grand created LUCENE-6553:
------------------------------------
Summary: Simplify how we handle deleted docs in read APIs
Key: LUCENE-6553
URL: https://issues.apache.org/jira/browse/LUCENE-6553
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
Fix For: Trunk
Today, all scorers and postings formats need to be able to handle deleted
documents.
I suspect that the reason is that we want to be able to make sure to not
perform costly operations on documents that are deleted. For instance if you
run a phrase query, reading positions on a document which is deleted is
useless. I suspect this is also a source of inefficiencies since in some cases
we apply deleted documents several times: for instance conjunctions apply
deleted docs to every sub scorer.
However, with the new two-phase iteration API, we have a way to make sure that
we never run expensive operations on deleted documents: we could first iterate
over the approximation, then check that the document is not deleted, and
finally confirm the match. Since approximations are cheap, applying deleted
docs after them would not be an issue.
I would like to explore removing the "Bits acceptDocs" parameter from
TermsEnum.postings, Weight.scorer, SpanWeight.getSpans and Weight.BulkScorer,
and add it to BulkScorer.score. This way, bulk scorers would be the only API
which would need to know how to apply deleted docs, which I think would be more
manageable since we only have 3 or 4 impls. And DefaultBulkScorer would be
implemented the way described above: first advance the approximation, then
check deleted docs, then confirm the match, then collect. Of course that's only
in the case the scorer supports approximations, if it does not, it means it is
cheap so we can directly iterate the scorer and check deleted docs on top.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]