I am trying to index a huge documents on batches   . Batch size is
parameterized to the application  say X docs , that means it will hold X no.
of 

Docs in the RAM before I flush to file system using
IndexWriter.addIndexes(Directory[]) method

 

My question is :

 

Do I need to set mergefactor ? , will it hold default mergefactor docs in
memory before it is written to disk as segment .

(But my application will call indexwriter.addindexes function only after X
no of documents are in memory)

 

If the index sizes are big , at some point of time there might be a out of
memory exceptions , ( yes I could check a memory before another ramdirectory
is being created) But what would be the best solution  ? Is FSDirectory is
better option than Ramdirectory for huge text indexing ? I have roughly 50
GB of fulltext to index?

 

 

Thks in advance.

Reply via email to