[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Marvin Humphrey (JIRA) Wed, 07 Jan 2009 18:39:08 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661829#action_12661829
 ]


Marvin Humphrey commented on LUCENE-1476:
-----------------------------------------

Jason Rutherglen:

> For realtime search where a new transaction may only have a handful of 
> deletes the tombstones may not be optimal 

The whole tombstone idea arose out of the need for (close to) realtime search!  
It's intended to improve write speed.

When you make deletes with the BitSet model, you have to rewrite files that 
scale with segment size, regardless of how few deletions you make. Deletion of 
a single document in a large segment may necessitate writing out a substantial 
bit vector file. 

In contrast, i/o throughput for writing out a tombstone file scales with the 
number of tombstones.

> because too many tombstones would accumulate (I believe).

Say that you make a string of commits that are nothing but deleting a single 
document -- thus adding a new segment each time that contains nothing but a 
single tombstone.  Those are going to be cheap to merge, so it seems unlikely 
that we'll end up with an unwieldy number of tombstone streams to interleave at 
search-time.

The more likely problem is the one McCandless articulated regarding a large 
segment accumulating a lot of tombstone streams against it.  But I agree with 
him that it only gets truly serious if your merge policy neglects such segments 
and allows them to deteriorate for too long.

> For this scenario rolling bitsets may be better. Meaning pool bit sets and 
> throw away unused readers. 

I don't think I understand.  Is this the "combination index reader/writer" 
model, where the writer prepares a data structure that then gets handed off to 
the reader?

> BitVector implement DocIdSet
> ----------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> BitVector can implement DocIdSet.  This is for making 
> SegmentReader.deletedDocs pluggable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Reply via email to