I _think_ you'd be better off doing it all at once, but I wouldn't
trust myself on this and would instead construct a small 3-index set
and test, looking at a) maximal disk usage, b) time, and c) RAM usage.
:)

Otis

--- Ryan Aslett <[EMAIL PROTECTED]> wrote:

>  
> Hi there, Im about to embark on a Lucene project of massive scale
> (between 500 million and 2 billion documents).  I am currently
> working
> on parallellizing the construction of the Index(es). 
> 
> Rough summary of my plan:
> I have many, many physical machines, each with multiple processors
> that
> I wish to dedicate to the construction of a single index. 
> I plan on having each machine gather its documents from a central
> sychronized source (network, JMS, whatever). 
> Within each machine I will have multiple threads each responsible for
> construcing an index slice.
> 
> When all machines and all threads are finished, I should have a slew
> of
> index slices that I want to combine together to create one index.
> 
> My question is this:  Will it be more efficient to call
> addIndexes(Directory[] dirs) on all the slices all at once? 
> 
> Or might it be better to continually merge small indexes into a
> larger
> index, i.e. once an index slice reaches a particular size, merge it
> into
> the main index and start building a new slice...
> 
> Any help would be appreciated.. 
> 
> Ryan Aslett
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to