I _think_ you'd be better off doing it all at once, but I wouldn't trust myself on this and would instead construct a small 3-index set and test, looking at a) maximal disk usage, b) time, and c) RAM usage. :)
Otis --- Ryan Aslett <[EMAIL PROTECTED]> wrote: > > Hi there, Im about to embark on a Lucene project of massive scale > (between 500 million and 2 billion documents). I am currently > working > on parallellizing the construction of the Index(es). > > Rough summary of my plan: > I have many, many physical machines, each with multiple processors > that > I wish to dedicate to the construction of a single index. > I plan on having each machine gather its documents from a central > sychronized source (network, JMS, whatever). > Within each machine I will have multiple threads each responsible for > construcing an index slice. > > When all machines and all threads are finished, I should have a slew > of > index slices that I want to combine together to create one index. > > My question is this: Will it be more efficient to call > addIndexes(Directory[] dirs) on all the slices all at once? > > Or might it be better to continually merge small indexes into a > larger > index, i.e. once an index slice reaches a particular size, merge it > into > the main index and start building a new slice... > > Any help would be appreciated.. > > Ryan Aslett > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]