Vincent Le Maout wrote:
I have to index a huge, huge amount of data: about 10 million documents
making up about 300 GB. Is there any technical limitation in Lucene that
could prevent me from processing such amount (I mean, of course, apart
from the external limits induce by the hardware: RAM, disks, the system,
whatever) ?

Lucene is in theory able to support up to 2B documents in a single index. Folks have sucessfully built indexes with several hundred million documents. 10 million should not be a problem.

If possible, does anyone have an idea of the amount of resource
needed: RAM, CPU time, size of indexes, access time on such a collection ?
if not, is it possible to extrapolate an estimation from previous benchmarks ?

For simple 2-3 term queries, with average sized documents (~10k of text) you should get decent performance (1 second / query) on a 10M document index. An index typically requires around 35% of the plain text size.


