In practice, you will more than likely have to distribute your index across multiple nodes once you get somewhere in the range of tens of millions of documents, but it all depends on your hardware, documents, throughput needs, etc.

On May 8, 2008, at 2:13 PM, Michael Siu wrote:

The # of documents that we are going to index could be potentially more than 2G. So I guess I have to split the index file into multiple of files with
each contain up to 2G files. Any other suggestion?

Thanks.

-----Original Message-----
From: Karl Wettin [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 08, 2008 11:00 AM
To: java-user@lucene.apache.org
Subject: Re: Limit of Lucene

Michael Siu skrev:
What is the limit of Lucene:  # of docs per index?

Integer.MAX_VALUE

Multiple indices joined in a single MultiWhatNot is still limited to
that number.


If RangeFilter.Bits(), for example, it initializes a bitset to the size of maxDoc from the indexReader. I wonder what happen if the # of docs is
huge,
say MaxInt (4G in 32bit or 2^63 in 64 bit)?

ArrayIndexOutOfBoundsException ?

It should not be that difficult to upgrade int to longs, but it is a
rather large job.

How many documents do you have? You might want to consider alternative
ways to represent your corpus in the index so it takes less documents.


          karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to