AFAICT, there is no clear documentation of the maximum number of documents that 
can be stored in a Lucene or Solr Index (single core/shard). It appears to be 
2^31 since a Lucene document number and the value returned from IW.maxDoc is a 
Java “int”. Lucene users have that “hint” to guide them, but that hint is never 
surfaced for Solr users, AFAICT. A few years ago nobody in their right mind 
would imagine indexing 2 billion documents in a single machine/core, but now 
people are at least tempted to try. So, it is now more important for people to 
know about it, up front, not hidden down in the fine print of Lucene file 
formats.

I wanted to file a Jira on this, but I wanted to check first if anybody knows 
of an existing Jira for it that maybe was worded in a way that it escaped my 
semi-diligent searches.

I was also thinking of filing it as two Jiras, one for Lucene and one for Solr 
since the doc would be in different places. Or, should there be one combined 
“Lucene/Solr Capacity Limits/Planning” wiki? Unless somebody objects, I’ll file 
as two separate (but linked) issues.

And, I was also thinking of filing two Jiras for Lucene and Solr to each have a 
robust check for exceeding the underlying Lucene limit and reporting this 
exception in a well-defined manner rather than “numFound” or “maxDoc” going 
negative. But this is separate from the documentation issue, I think. Unless 
somebody objects, I’ll file these as two separate issues.

Any objection to me filing these four issues?

-- Jack Krupansky

Reply via email to