[jira] Commented: (LUCENE-2455) Some house cleaning in addIndexes*

Shai Erera (JIRA) Fri, 14 May 2010 21:14:11 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867793#action_12867793
 ]


Shai Erera commented on LUCENE-2455:
------------------------------------

In fact, I've create newAddIndexes (just for the review) which works like that:
{code}
  public void newAddIndexes(IndexReader... readers) throws 
CorruptIndexException, IOException {

    ensureOpen();

    try {
      String mergedName = newSegmentName();
      SegmentMerger merger = new SegmentMerger(this, mergedName, null);
      
      for (IndexReader reader : readers)      // add new indexes
        merger.add(reader);
      
      int docCount = merger.merge();                // merge 'em
      
      SegmentInfo info = null;
      synchronized(this) {
        info = new SegmentInfo(mergedName, docCount, directory, false, true,
            -1, null, false, merger.hasProx());
        setDiagnostics(info, "addIndexes(IndexReader...)");
        segmentInfos.add(info);
      }
      
      // Notify DocumentsWriter that the flushed count just increased
      docWriter.updateFlushedDocCount(docCount);
      
      // Now create the compound file if needed
      if (mergePolicy instanceof LogMergePolicy && getUseCompoundFile()) {

        List<String> files = null;

        synchronized(this) {
          // Must incRef our files so that if another thread
          // is running merge/optimize, it doesn't delete our
          // segment's files before we have a chance to
          // finish making the compound file.
          if (segmentInfos.contains(info)) {
            files = info.files();
            deleter.incRef(files);
          }
        }

        if (files != null) {
          try {
            merger.createCompoundFile(mergedName + ".cfs");
            synchronized(this) {
              info.setUseCompoundFile(true);
            }
          } finally {
            deleter.decRef(files);
          }
        }
      }
    } catch (OutOfMemoryError oom) {
      handleOOM(oom, "addIndexes(IndexReader...)");
    } finally {
      if (docWriter != null) {
        docWriter.resumeAllThreads();
      }
    }
  }
{code}

Question: if we've just added the new SI to segmentInfos, why do we sync on 
_this_ and check if it exists (when we create the compound file)? Is it because 
there could be a running merge which will merge it into a new segment before we 
reach that point?

What do you think? Is that what you had in mind about merging on the side and 
committing in the end?

> Some house cleaning in addIndexes*
> ----------------------------------
>
>                 Key: LUCENE-2455
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2455
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Trivial
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2455_3x.patch
>
>
> Today, the use of addIndexes and addIndexesNoOptimize is confusing - 
> especially on when to invoke each. Also, addIndexes calls optimize() in 
> the beginning, but only on the target index. It also includes the 
> following jdoc statement, which from how I understand the code, is 
> wrong: _After this completes, the index is optimized._ -- optimize() is 
> called in the beginning and not in the end. 
> On the other hand, addIndexesNoOptimize does not call optimize(), and 
> relies on the MergeScheduler and MergePolicy to handle the merges. 
> After a short discussion about that on the list (Thanks Mike for the 
> clarifications!) I understand that there are really two core differences 
> between the two: 
> * addIndexes supports IndexReader extensions
> * addIndexesNoOptimize performs better
> This issue proposes the following:
> # Clear up the documentation of each, spelling out the pros/cons of 
>   calling them clearly in the javadocs.
> # Rename addIndexesNoOptimize to addIndexes
> # Remove optimize() call from addIndexes(IndexReader...)
> # Document that clearly in both, w/ a recommendation to call optimize() 
>   before on any of the Directories/Indexes if it's a concern. 
> That way, we maintain all the flexibility in the API - 
> addIndexes(IndexReader...) allows for using IR extensions, 
> addIndexes(Directory...) is considered more efficient, by allowing the 
> merges to happen concurrently (depending on MS) and also factors in the 
> MP. So unless you have an IR extension, addDirectories is really the one 
> you should be using. And you have the freedom to call optimize() before 
> each if you care about it, or don't if you don't care. Either way, 
> incurring the cost of optimize() is entirely in the user's hands. 
> BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler 
> nor MergePolicy, but rather call SegmentMerger directly. This might be 
> another place for improvement. I'll look into it, and if it's not too 
> complicated, I may cover it by this issue as well. If you have any hints 
> that can give me a good head start on that, please don't be shy :). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2455) Some house cleaning in addIndexes*

Reply via email to