[ 
https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004906#comment-14004906
 ] 

Michael McCandless commented on LUCENE-5693:
--------------------------------------------

bq. This only makes sense for postings though.

Right, postings is much easier than doc values.  But postings are also the most 
costly to merge.

bq. By writing them some places and not writing them other places, we open the 
possibility of extremely confusing corner cases and bugs.

I disagree: I think we discover places that are "relying" on deleted docs 
behavior, i.e. test bugs.  When I did this on LUCENE-5675 there were only a few 
places that relied on deleted docs.

> don't write deleted documents on flush
> --------------------------------------
>
>                 Key: LUCENE-5693
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5693
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>
> When we flush a new segment, sometimes some documents are "born deleted", 
> e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed 
> documents.
> We already compute the liveDocs on flush, but then we continue (wastefully) 
> to send those known-deleted documents to all Codec parts.
> I started to implement this on LUCENE-5675 but it was too controversial.
> Also, I expect typically the number of deleted docs is 0, or small, so not 
> writing "born deleted" docs won't be much of a win for most apps.  Still it 
> seems silly to write them, consuming IO/CPU in the process, only to consume 
> more IO/CPU later for merging to re-delete them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to