I am trying to index a huge documents on batches . Batch size is parameterized to the application say X docs , that means it will hold X no. of
Docs in the RAM before I flush to file system using IndexWriter.addIndexes(Directory[]) method My question is : Do I need to set mergefactor ? , will it hold default mergefactor docs in memory before it is written to disk as segment . (But my application will call indexwriter.addindexes function only after X no of documents are in memory) If the index sizes are big , at some point of time there might be a out of memory exceptions , ( yes I could check a memory before another ramdirectory is being created) But what would be the best solution ? Is FSDirectory is better option than Ramdirectory for huge text indexing ? I have roughly 50 GB of fulltext to index? Thks in advance.