[ 
https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005828#comment-14005828
 ] 

Michael McCandless commented on LUCENE-5693:
--------------------------------------------

bq. I really wonder if this issue matters. 

I suspect it's uncommon in most cases, that docs are "born deleted".  But it 
does happen and it seems silly to waste IO/CPU if we can help it.

bq. I think we should just stick with the corner case and not complicate the 
code if possible?

The patch really does not complicate the code?  It adds a check against the 
liveDocs in the Docs/AndPositionsEnum passed to the codec during flush.  The 
only "complexity" was fixing a test that made invalid assumption that deleted 
docs must be present in postings.

I guess what bothers me here is this apparent precedent that deleted docs are 
in fact required to be present everywhere in a segment.  Yes, this is the case 
today, but I think it's an impl detail and should not be required, e.g. 
enforced by CheckIndex, tests asserting that it's the case.

But I'll resolve this as WONTFIX ... looks like I'm just outvoted.

> don't write deleted documents on flush
> --------------------------------------
>
>                 Key: LUCENE-5693
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5693
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-5693.patch
>
>
> When we flush a new segment, sometimes some documents are "born deleted", 
> e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed 
> documents.
> We already compute the liveDocs on flush, but then we continue (wastefully) 
> to send those known-deleted documents to all Codec parts.
> I started to implement this on LUCENE-5675 but it was too controversial.
> Also, I expect typically the number of deleted docs is 0, or small, so not 
> writing "born deleted" docs won't be much of a win for most apps.  Still it 
> seems silly to write them, consuming IO/CPU in the process, only to consume 
> more IO/CPU later for merging to re-delete them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to