[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

Michael McCandless (JIRA) Tue, 10 Nov 2009 02:46:56 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775371#action_12775371
 ]


Michael McCandless commented on LUCENE-1526:
--------------------------------------------

Thanks for running these tests John.

The micro-benchmark of BitVector vs IntAccelerator is nice, but, we
need to see it in the real-world context of running actual worst case
queries.

Zoie aims for super fast reopon time, at the expense of slower query
time since it must double-check the deletions.

Lucene NRT makes the opposite tradeoff.

The tests so far make it clear that Zoie's reopen time is much faster
than Lucene's, but they don't yet measure (as far as I can see) what
cost the double-check for deletions is adding to Zoie for the
worst-case queries.

So if you really need to reopen 100s of times per second, and can
accept that your worst case queries will run slower (we're still not
sure just how much slower), the Zoie approach is best.

If you want full speed query performance, and can instead reopen once
per second or once every 2 seconds, Lucene's approach will be better
(though we still have important fixes to make -- LUCENE-2047,
LUCENE-1313).

Can you describe the setup of the "indexing only "test?  Are you doing
any reopening at all?


> For near real-time search, use paged copy-on-write BitVector impl
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1526
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1526
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

Reply via email to