Hi there, Im about to embark on a Lucene project of massive scale (between 500 million and 2 billion documents). I am currently working on parallellizing the construction of the Index(es).
Rough summary of my plan: I have many, many physical machines, each with multiple processors that I wish to dedicate to the construction of a single index. I plan on having each machine gather its documents from a central sychronized source (network, JMS, whatever). Within each machine I will have multiple threads each responsible for construcing an index slice. When all machines and all threads are finished, I should have a slew of index slices that I want to combine together to create one index. My question is this: Will it be more efficient to call addIndexes(Directory[] dirs) on all the slices all at once? Or might it be better to continually merge small indexes into a larger index, i.e. once an index slice reaches a particular size, merge it into the main index and start building a new slice... Any help would be appreciated.. Ryan Aslett --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]