Hi there, Im about to embark on a Lucene project of massive scale
(between 500 million and 2 billion documents).  I am currently working
on parallellizing the construction of the Index(es). 

Rough summary of my plan:
I have many, many physical machines, each with multiple processors that
I wish to dedicate to the construction of a single index. 
I plan on having each machine gather its documents from a central
sychronized source (network, JMS, whatever). 
Within each machine I will have multiple threads each responsible for
construcing an index slice.

When all machines and all threads are finished, I should have a slew of
index slices that I want to combine together to create one index.

My question is this:  Will it be more efficient to call
addIndexes(Directory[] dirs) on all the slices all at once? 

Or might it be better to continually merge small indexes into a larger
index, i.e. once an index slice reaches a particular size, merge it into
the main index and start building a new slice...

Any help would be appreciated.. 

Ryan Aslett


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to