[ 
http://issues.apache.org/jira/browse/LUCENE-528?page=comments#action_12443911 ] 
            
Ning Li commented on LUCENE-528:
--------------------------------

> I think you need to ensure that no segments from the source index "S" remain 
> after the call, right?

Correct. And thanks!

So in step 4, in the case where the invariants hold for the last < M segments 
whose levels <= h,
if some of those < M segments are from S (not merged in step 3), properly copy 
them over.

Algorithm looks good?

This makes me notice a bug in current addIndexes(Directory[]). In current 
addIndexes(Directory[]),
segment infos in S are added to T's "segmentInfos" upfront. Then segments in S 
are merged to T
several at a time. Every merge is committed with T's "segmentInfos". So if a 
reader is opened on T
while addIndexes() is going on, it could see an inconsistent index.

> Optimization for IndexWriter.addIndexes()
> -----------------------------------------
>
>                 Key: LUCENE-528
>                 URL: http://issues.apache.org/jira/browse/LUCENE-528
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Tamm
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>         Attachments: AddIndexes.patch
>
>
> One big performance problem with IndexWriter.addIndexes() is that it has to 
> optimize the index both before and after adding the segments.  When you have 
> a very large index, to which you are adding batches of small updates, these 
> calls to optimize make using addIndexes() impossible.  It makes parallel 
> updates very frustrating.
> Here is an optimized function that helps out by calling mergeSegments only on 
> the newly added documents.  It will try to avoid calling mergeSegments until 
> the end, unless you're adding a lot of documents at once.
> I also have an extensive unit test that verifies that this function works 
> correctly if people are interested.  I gave it a different name because it 
> has very different performance characteristics which can make querying take 
> longer.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to