On Tue, Jul 06, 2004 at 10:44:40PM -0700, Kevin A. Burton wrote: > I'm trying to burn an index of 14M documents. > > I have two problems. > > 1. I have to run optimize() every 50k documents or I run out of file > handles. this takes TIME and of course is linear to the size of the > index so it just gets slower by the time I complete. It starts to crawl > at about 3M documents.
Recently I indexed roughly this many documents. I separated the whole thing first into 100 jobs (we happen to have that many machines in the cluster.-) each indexing its share into its own index. I used mergeFactor=100 and only optimized just before closing the index. Then I merged them all into one index simply by writer.mergeFactor = 150; writer.addIndexes(dirs); I was surprised myself that it went through easily within under two hours for each of the 101 indexes. The documents have, however, only three fields. Maybe this helps, Harald. -- ------------------------------------------------------------------------ Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]