Hi Everybody,
We are building a search infrastructure using lucene to scale upto 500
million document with search < 500 ms.

Here is my rough math on the size of content & index :
Total Documents = 500 million documents
Size / Document = 10k / document
Index Size / Million = 2 GB / million document
Total Index size = 500 million ~ 1 TB

We are planning to partition this 1 TB index into 25 partitions  with each
partition of around 20 million documents @ 40 GB size.

Since 1TB doesn't seem to be that much, we are debating whether we should go
for RAM memory for the whole 1 TB. Checked the prices for RAM memory ( 64 GB
/ 8 CPU boxes ) and they are very competitive.

Now the question is..  Can we use RAM Directory for all of this 1 TB or
FSDirectory is better with separate spindle for each CPU ?

We are considering 25 boxes ( 8 CPU - 64 GB boxes ) for each partition and
separate brokers to merge these results.

Did anybody did something like this in the past ? Appreciate if you guys can
share your experiences.

thanks
Murali V 
-- 
View this message in context: 
http://www.nabble.com/Scaling-Lucene-to-500-million%2B-documents---preferred-architecture-tf4040120.html#a11477786
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to