29 jan 2007 kl. 23.21 skrev Daniel Noll:
karl wettin wrote:
The maximum number of documents in an index is Integer.MAX_VALUE
(2 147 483 647), but it it possible to combine multiple indices.
It's true that you can combine multiple indexes, but don't make
assumptions that this lets you break the limitation. MultiReader
still extends from IndexReader, and IndexReader still uses ints for
the document IDs so there's no way you'll be increasing the maximum
number of documents even if you combine indexes.
Must have thought of it as a MultiSearcher working rather than the
MultiReader, sorry about that. Any year now I'll take a better look
at scoring. I presume that is the problem with a MultiSearcher
(unless some really silly similarity or so?).
I started digging in to MultiReader to understand whats going on, but
mostly in a desperate attempt to save my face.
My first attempt was to mess with MultiReader.maxDoc, to make it
start on Integer.MIN_VALUE rather than 0 to allow four billion rather
than two billion documents. An immediate hack resulted in a lot of
negative array sized and similar mess.
Then it hit me that perheps the integer limitation should be in the
store (Directory) and not the IndexReader? If not now, perhaps in the
future when everybody is running on 64bit JVMs. I don't think it will
be a very expensive thing to implement. But did anyone need that yet?
Two billion documents is quite an amount, but it is not an insane and
unthinkable number. I doubt that I'll hit something like that anytime
soon, but it should not be that hard or expensive to refactor Lucene
to support the 18 billion billion (18 followed by 19 zeros) documents
a long can count to.
I'll stop trolling now. :)
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]