29 jan 2007 kl. 23.21 skrev Daniel Noll:

karl wettin wrote:
The maximum number of documents in an index is Integer.MAX_VALUE (2 147 483 647), but it it possible to combine multiple indices.

It's true that you can combine multiple indexes, but don't make assumptions that this lets you break the limitation. MultiReader still extends from IndexReader, and IndexReader still uses ints for the document IDs so there's no way you'll be increasing the maximum number of documents even if you combine indexes.

Must have thought of it as a MultiSearcher working rather than the MultiReader, sorry about that. Any year now I'll take a better look at scoring. I presume that is the problem with a MultiSearcher (unless some really silly similarity or so?).

I started digging in to MultiReader to understand whats going on, but mostly in a desperate attempt to save my face.

My first attempt was to mess with MultiReader.maxDoc, to make it start on Integer.MIN_VALUE rather than 0 to allow four billion rather than two billion documents. An immediate hack resulted in a lot of negative array sized and similar mess.

Then it hit me that perheps the integer limitation should be in the store (Directory) and not the IndexReader? If not now, perhaps in the future when everybody is running on 64bit JVMs. I don't think it will be a very expensive thing to implement. But did anyone need that yet?

Two billion documents is quite an amount, but it is not an insane and unthinkable number. I doubt that I'll hit something like that anytime soon, but it should not be that hard or expensive to refactor Lucene to support the 18 billion billion (18 followed by 19 zeros) documents a long can count to.

I'll stop trolling now. :)

--
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to