The limit is much less than Integer.MAX_VALUE (2,147,483,647), unless you have a VM which can run in more than 1G of heap. 1G limits you to a theoretical number of 256M (268,435,456) documents with 4 bytes per array element. In practise it will be something a less, because there are other things which need heap too.
Were going to need to maintain a set sort indexes for documents in a large index too, and I'm interested in suggestions for the best/easiest way to maintain non-RAM-based (or not entirely RAM-based) sort index which is external to Lucene. Would using MySQL for sort indexing be "a sledgehammer to crack a nut", I wonder? I've not really thought through the RAMifications (sorry!) of this approach. I wonder if anyone else here has attempted to integrate an external sort using a database? On Sat, 2006-07-29 at 22:42 +0200, karl wettin wrote: > On Sat, 2006-07-29 at 12:39 -0700, Jason Calabrese wrote: > > One fast way to make an alphabetic sort very fast is to presort your > > docs before adding them to the index. If you do this you can then > > just sort by index order. We are using this for a large index (1 > > million+ docs) and it works very good, and seems even slightly faster > > than relevance sorting. > > > > Using this approach may create some maintainance issues since you > > can't add a new doc to the index at a specified position. Instead you > > will need to re-index everything. > > Instead of above I would probably choose an int[index size] where each > position in the array represents the global order of that document. It's > much easier to re-order that than re-indexing the whole corpus every > time you want to insert something. > > It limits your corpus to 2 billion items (Integer.MAX_VALUE). And it > will consume 32 bits of RAM per document. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]