Howdy All,

I am interested in several things to improve the speed of my indexing. First would be to find out if it's possible (as well as how) to merge lucene indexes of similarly structured (same number of and type of fields) documents or coordinate several machines updating the same index. For my application (estimate of 360M lucene documents across 30k physical files), I'd like to parallelize the indexing across as many CPUs as I can and then merge the results back together - or use a MultiSearcher across all the individual indexes if merge is not an option.

Secondly, I'd like to know more about performing indexing in a RAMDirectory and flushing those indexes back out to a FSDirectory. I was performing some tests of indexing on a Solaris-based machine and my indexing speed went up by a factor of 3 when I pointed my indexing program to store it's index in a tmpfs (ram-based) filesystem rather than a physical disk - so I would imagine that I'd see a similar speedup with a RAMDirectory and it would be portable to non-solaris machines as well. Would it be as simple as getting a list() from the RAMDir, then an openFile() on each file and writing that Stream out to to disk?

Thanks,

Vince Taluskie


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to