In practice, you will more than likely have to distribute your index
across multiple nodes once you get somewhere in the range of tens of
millions of documents, but it all depends on your hardware, documents,
throughput needs, etc.
On May 8, 2008, at 2:13 PM, Michael Siu wrote:
The # of documents that we are going to index could be potentially
more than
2G. So I guess I have to split the index file into multiple of files
with
each contain up to 2G files. Any other suggestion?
Thanks.
-----Original Message-----
From: Karl Wettin [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 08, 2008 11:00 AM
To: java-user@lucene.apache.org
Subject: Re: Limit of Lucene
Michael Siu skrev:
What is the limit of Lucene: # of docs per index?
Integer.MAX_VALUE
Multiple indices joined in a single MultiWhatNot is still limited to
that number.
If RangeFilter.Bits(), for example, it initializes a bitset to the
size of
maxDoc from the indexReader. I wonder what happen if the # of docs
is
huge,
say MaxInt (4G in 32bit or 2^63 in 64 bit)?
ArrayIndexOutOfBoundsException ?
It should not be that difficult to upgrade int to longs, but it is a
rather large job.
How many documents do you have? You might want to consider alternative
ways to represent your corpus in the index so it takes less documents.
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]