In my experiments with mergeFactor I found the point of diminishing/no
returns.  If I remember correctly, I hit the limit at mergeFactor of
50.

But here is something from Lucene in Action that you can use to play
with various index tuning factors and see their effect on indexing
performance.  It's simple, and if you want to test all 3 of your
scenarios, you will have to modify it.

package lia.indexing;

import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

/**
 *
 */
public class IndexTuningDemo {

  public static void main(String[] args) throws Exception {
    int docsInIndex  = Integer.parseInt(args[0]);

    // create an index called 'index-dir' in a temp directory
    Directory dir = FSDirectory.getDirectory(
      System.getProperty("java.io.tmpdir", "tmp") +
      System.getProperty("file.separator") + "index-dir", true);
    Analyzer analyzer = new SimpleAnalyzer();
    IndexWriter writer = new IndexWriter(dir, analyzer, true);

    // set variables that affect speed of indexing
    writer.mergeFactor   = Integer.parseInt(args[1]);
    writer.maxMergeDocs  = Integer.parseInt(args[2]);
    writer.minMergeDocs  = Integer.parseInt(args[3]);
    writer.infoStream    = System.out;

    System.out.println("Merge factor:   " + writer.mergeFactor);
    System.out.println("Max merge docs: " + writer.maxMergeDocs);
    System.out.println("Min merge docs: " + writer.minMergeDocs);

    long start = System.currentTimeMillis();
    for (int i = 0; i < docsInIndex; i++) {
      Document doc = new Document();
      doc.add(Field.Text("fieldname", "Bibamus"));
      writer.addDocument(doc);
    }
    writer.close();
    long stop = System.currentTimeMillis();
    System.out.println("Time: " + (stop - start) + " ms");
  }
}


Otis


--- Chuck Williams <[EMAIL PROTECTED]> wrote:

> I'm wondering what values of mergeFactor, minMergeDocs and
> maxMergeDocs
> people have found to yield the best performance for different
> configurations.  Is there a repository of this information anywhere?
> 
>  
> 
> I've got about 30k documents and have 3 indexing scenarios:
> 
> 1.       Full indexing and optimize
> 
> 2.       Incremental indexing and optimize
> 
> 3.       Parallel incremental indexing without optimize
> 
>  
> 
> Search performance is critical.  For both cases 1 and 2, I'd like the
> fastest possible indexing time.  For case 3, I'd like minimal pauses
> and
> no noticeable degradation in search performance.
> 
>  
> 
> Based on reading the code (including the javadocs comments), I'm
> thinking of values along these lines:
> 
>  
> 
> mergeFactor:  1000 during Full indexing, and during optimize (for
> both
> cases 1 and 2); 10 during incremental indexing (cases 2 and 3)
> 
> minMergeDocs:  1000 during Full indexing, 10 during incremental
> indexing
> 
> maxMergeDocs:  Integer.MAX_VALUE during full indexing, 1000 during
> incremental indexing
> 
>  
> 
> Do these values seem reasonable?  Are there better settings before I
> start experimenting?
> 
>  
> 
> Since mergeFactor is used in both addDocument() and optimize(), I'm
> thinking of using two different values in case 2:  10 during the
> incremental indexing, and then 1000 during the optimize.  Is changing
> the value like this going to cause a problem?
> 
> 
> Thanks for any advice,
> 
>  
> 
> Chuck
> 
>  
> 
>  
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to