I think you should change a little bit your plans, and to think that your goal is to
create a fast search engine not a fast indexing engine.
When you plan to index a lot of documents then it is possible to creata a lot of segments (if you don't optimize the index)
and the serch will be very slow comparing with the search on an optimized index.
The problem is that the optimization of big indexes is a time consuming operation, and also


addIndexes(Directory[] dirs) I think is also a time consuming operation.

Therefore I suggest to think how can you design the indices to have a fast search, and then you should design an offline indexing process.

That is my suggestion ... maybe it doesn't fit your requirements, maybe it does 
...

 All the best,

 Sergiu


Ryan Aslett wrote:


Hi there, Im about to embark on a Lucene project of massive scale
(between 500 million and 2 billion documents). I am currently working
on parallellizing the construction of the Index(es).


Rough summary of my plan:
I have many, many physical machines, each with multiple processors that
I wish to dedicate to the construction of a single index. I plan on having each machine gather its documents from a central
sychronized source (network, JMS, whatever). Within each machine I will have multiple threads each responsible for
construcing an index slice.


When all machines and all threads are finished, I should have a slew of
index slices that I want to combine together to create one index.

My question is this: Will it be more efficient to call
addIndexes(Directory[] dirs) on all the slices all at once?


Or might it be better to continually merge small indexes into a larger
index, i.e. once an index slice reaches a particular size, merge it into
the main index and start building a new slice...

Any help would be appreciated..

Ryan Aslett


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]






--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to