OK, I'm a little out of my league here, but I'll plow on anyway.... bq: There are use cases out there where >2^31 does make sense in a single index
Ok, let's put some definition to this and define the use-case specifically rather than be vague. I've just run an experiment for instance where I had 200M docs in a single shard (very small docs) and tried to sort by a date on all of them. Performance on the order of 5 seconds. 3B is what, 75 seconds? Does the use-case involve sorting? Faceting? If so the performance will probably be poor. This would be huge surgery I believe, and there hasn't been a compelling use-case in the search world for it. Unless and until that case is made I suspect this idea will meet with a lot of resistance. That said, I do understand that this is somewhat akin to "Nobody will ever need more than 64K of ram", meaning that some limits are assumed and eventually become outmoded. But given Java's issues with memory and GC I suspect that it'll be really hard to justify the work this would take. FWIW, Erick On Thu, Aug 18, 2016 at 6:31 PM, Trejkaz <trej...@trypticon.org> wrote: > On Thu, Aug 18, 2016 at 11:55 PM, Adrien Grand <jpou...@gmail.com> wrote: >> No, IndexWriter enforces that the number of documents cannot go over >> IndexWriter.MAX_DOCS (which is a bit less than 2^31) and >> BaseCompositeReader computes the number of documents in a long variable and >> ensures it is less than 2^31, so you cannot have indexes that contain more >> than 2^31 documents. >> >> Larger collections should be written to multiple shards and use >> TopDocs.merge to merge results. > > But hang on: > * TopDocs#merge still returns a TopDocs. > * TopDocs still uses an array of ScoreDoc. > * ScoreDoc still uses an int doc ID. > > Looks like you're still screwed. > > I wish IndexReader would use long IDs too, because one IndexReader can > be across multiple shards too - it doesn't make much sense to me that > this is restricted, although "it's hard to fix in a > backwards-compatible way" is certainly a good reason. :D > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org