Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

robert engels Wed, 07 Jan 2009 19:28:53 -0800

Why not just write the first byte as 0 for a bitsit, and 1 for asparse bit set (compressed), and make the determination when writingbased on the segment size and/or number of set bits.


On Jan 7, 2009, at 8:38 PM, Marvin Humphrey (JIRA) wrote:

[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661829#action_12661829 ]
Marvin Humphrey commented on LUCENE-1476:
-----------------------------------------

Jason Rutherglen:
For realtime search where a new transaction may only have ahandful of
deletes the tombstones may not be optimal
The whole tombstone idea arose out of the need for (close to)realtime search! It's intended to improve write speed.
When you make deletes with the BitSet model, you have to rewritefiles that scale with segment size, regardless of how few deletionsyou make. Deletion of a single document in a large segment maynecessitate writing out a substantial bit vector file.
In contrast, i/o throughput for writing out a tombstone file scaleswith the number of tombstones.
because too many tombstones would accumulate (I believe).
Say that you make a string of commits that are nothing but deletinga single document -- thus adding a new segment each time thatcontains nothing but a single tombstone. Those are going to becheap to merge, so it seems unlikely that we'll end up with anunwieldy number of tombstone streams to interleave at search-time.
The more likely problem is the one McCandless articulated regardinga large segment accumulating a lot of tombstone streams againstit. But I agree with him that it only gets truly serious if yourmerge policy neglects such segments and allows them to deterioratefor too long.
For this scenario rolling bitsets may be better. Meaning pool bitsets and
throw away unused readers.
I don't think I understand. Is this the "combination index reader/writer" model, where the writer prepares a data structure that thengets handed off to the reader?
BitVector implement DocIdSet
----------------------------

                Key: LUCENE-1476
URL: https://issues.apache.org/jira/browse/LUCENE-1476
            Project: Lucene - Java
         Issue Type: Improvement
         Components: Index
   Affects Versions: 2.4
           Reporter: Jason Rutherglen
           Priority: Trivial
        Attachments: LUCENE-1476.patch

  Original Estimate: 12h
 Remaining Estimate: 12h
BitVector can implement DocIdSet. This is for makingSegmentReader.deletedDocs pluggable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Reply via email to