[ 
https://issues.apache.org/jira/browse/LUCENE-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601528#comment-14601528
 ] 

ASF subversion and git services commented on LUCENE-6553:
---------------------------------------------------------

Commit 1687580 from [~jpountz] in branch 'dev/trunk'
[ https://svn.apache.org/r1687580 ]

LUCENE-6553: Fix how DrillSidewaysScorer handles deleted docs.

> Simplify how we handle deleted docs in read APIs
> ------------------------------------------------
>
>                 Key: LUCENE-6553
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6553
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 5.3
>
>         Attachments: LUCENE-6553.patch
>
>
> Today, all scorers and postings formats need to be able to handle deleted 
> documents.
> I suspect that the reason is that we want to be able to make sure to not 
> perform costly operations on documents that are deleted. For instance if you 
> run a phrase query, reading positions on a document which is deleted is 
> useless. I suspect this is also a source of inefficiencies since in some 
> cases we apply deleted documents several times: for instance conjunctions 
> apply deleted docs to every sub scorer.
> However, with the new two-phase iteration API, we have a way to make sure 
> that we never run expensive operations on deleted documents: we could first 
> iterate over the approximation, then check that the document is not deleted, 
> and finally confirm the match. Since approximations are cheap, applying 
> deleted docs after them would not be an issue.
> I would like to explore removing the "Bits acceptDocs" parameter from 
> TermsEnum.postings, Weight.scorer, SpanWeight.getSpans and Weight.BulkScorer, 
> and add it to BulkScorer.score. This way, bulk scorers would be the only API 
> which would need to know how to apply deleted docs, which I think would be 
> more manageable since we only have 3 or 4 impls. And DefaultBulkScorer would 
> be implemented the way described above: first advance the approximation, then 
> check deleted docs, then confirm the match, then collect. Of course that's 
> only in the case the scorer supports approximations, if it does not, it means 
> it is cheap so we can directly iterate the scorer and check deleted docs on 
> top.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to