[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

Shai Erera (JIRA) Mon, 27 Jul 2009 20:48:41 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735905#action_12735905
 ]


Shai Erera commented on LUCENE-1076:
------------------------------------

Can someone please help me understand what's going on here? After I applied the 
patch to trunk, TestIndexWriter.testOptimizeMaxNumSegments2() fails. The 
failure happens only if CMS is used, and doesn't when SMS is used. I dug deeper 
into the test and what happens is that the test asks to 
optimize(maxNumSegments) and expects that either: (1) if the number of segments 
was < maxNumSegments than the resulting number of segments is exactly as it was 
before and (2) otherwise it should be exactly maxNumSegments.

First, the javadocs of optimize(maxNumSegments) say that it will result in <= 
maxNumSegments, but I understand the LogMergePolicy ensures that if you ask for 
maxNumSegments, that's the number of segments you'll get.

While trying to debug what's wrong w/ the change so far, I managed to reduce 
the test to this code:

{code}
public void test1() throws Exception {
    MockRAMDirectory dir = new MockRAMDirectory();

    final Document doc = new Document();
    doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED));

    IndexWriter writer  = new IndexWriter(dir, new WhitespaceAnalyzer(), true, 
IndexWriter.MaxFieldLength.LIMITED);
//    writer.setMergeScheduler(new SerialMergeScheduler());
    LogDocMergePolicy ldmp = new LogDocMergePolicy();
    ldmp.setMinMergeDocs(1);
    writer.setMergePolicy(ldmp);
    writer.setMergeFactor(3);
    writer.setMaxBufferedDocs(2);

    MergeScheduler ms = writer.getMergeScheduler();
//  writer.setInfoStream(System.out);
    
    // Add enough documents to create several segments (uncomitted) and kick off
    // some threads.
    for (int i = 0; i < 20; i++) {
      writer.addDocument(doc);
    }
    writer.commit();
    
    if (ms instanceof ConcurrentMergeScheduler) {
      // Wait for all merges to complete
      ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync();
    }
    
    SegmentInfos sis = new SegmentInfos();
    sis.read(dir);
    
    System.out.println("numSegments after add + commit ==> " + sis.size());
    
    final int segCount = sis.size();
    
    int maxNumSegments = 3;
    writer.optimize(maxNumSegments);
    writer.commit();
    
    if (ms instanceof ConcurrentMergeScheduler) {
      // Wait for all merges to complete
      ((ConcurrentMergeScheduler) writer.getMergeScheduler()).sync();
    }
    
    sis = new SegmentInfos();
    sis.read(dir);
    final int optSegCount = sis.size();
    
    System.out.println("numSegments after optimize (" + maxNumSegments + ") + 
commit ==> " + sis.size());
    
    if (segCount < maxNumSegments)
      Assert.assertEquals(segCount, optSegCount);
    else
      Assert.assertEquals(maxNumSegments, optSegCount);
}
{code}

This fails almost every time that I run it, so if you try it - make sure to run 
it a couple of times. I then switched to trunk, but it fails almost 
consistently on trunk also !?!?

Can someone please have a look and tell me what's wrong (is it the test, or did 
I hit a true bug in the code?)?

> Allow MergePolicy to select non-contiguous merges
> -------------------------------------------------
>
>                 Key: LUCENE-1076
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1076
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1076.patch
>
>
> I started work on this but with LUCENE-1044 I won't make much progress
> on it for a while, so I want to checkpoint my current state/patch.
> For backwards compatibility we must leave the default MergePolicy as
> selecting contiguous merges.  This is necessary because some
> applications rely on "temporal monotonicity" of doc IDs, which means
> even though merges can re-number documents, the renumbering will
> always reflect the order in which the documents were added to the
> index.
> Still, for those apps that do not rely on this, we should offer a
> MergePolicy that is free to select the best merges regardless of
> whether they are continuguous.  This requires fixing IndexWriter to
> accept such a merge, and, fixing LogMergePolicy to optionally allow
> it the freedom to do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1076) Allow MergePolicy to select non-contiguous merges

Reply via email to