Tombstone deletions in IndexReader
----------------------------------

                 Key: LUCENE-1526
                 URL: https://issues.apache.org/jira/browse/LUCENE-1526
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
    Affects Versions: 2.4
            Reporter: Jason Rutherglen
            Priority: Minor
             Fix For: 2.9


SegmentReader currently uses a BitVector to represent deleted docs.
When performing rapid clone (see LUCENE-1314) and delete operations,
performing a copy on write of the BitVector can become costly because
the entire underlying byte array must be created and copied. A way to
make this clone delete process faster is to implement tombstones, a
term coined by Marvin Humphrey. Tombstones represent new deletions
plus the incremental deletions from previously reopened readers in
the current reader. 

The proposed implementation of tombstones is to accumulate deletions
into an int array represented as a DocIdSet. With LUCENE-1476,
SegmentTermDocs iterates over deleted docs using a DocIdSet rather
than accessing the BitVector by calling get. This allows a BitVector
and a set of tombstones to by ANDed together as the current reader's
delete docs. 

A tombstone merge policy needs to be defined to determine when to
merge tombstone DocIdSets into a new deleted docs BitVector as too
many tombstones would eventually be detrimental to performance. A
probable implementation will merge tombstones based on the number of
tombstones and the total number of documents in the tombstones. The
merge policy may be set in the clone/reopen methods or on the
IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to