Lucene has a limit of 2^31-1-128 documents per index, see IndexWriter.MAX_DOCS. Users don't often run into this limit but I've seen it happen multiple times.
I think that it's unlikely that Lucene will ever remove this limit on a per-segment basis, however there have been some discussions about having the ability to go over this limit across multiple segments: https://issues.apache.org/jira/browse/LUCENE-8321. On Sun, Feb 9, 2020 at 2:29 PM Erick Erickson <erickerick...@gmail.com> wrote: > Also, given how people use search, they hit performance issues long before > running out of document IDs. Usually. Although that said I do know of one > user who’s running in the 1.0-1.5B range per replica so 2B is just around > the corner. Of course they have to be _very_ careful how they use Solr. > > And that said, there’s just not a lot of pressure to go to longs, and as > Tim says it’s be a very significant effort. And there would be memory > implications for everyone to balance. > > Best, > Erick > > > On Feb 8, 2020, at 9:59 PM, Tim Casey <tca...@gmail.com> wrote: > > > > > > Hi Doug, > > > > I don't know the specific limits. But the document limits are going to > be around an int, probably signed. This comes out to mean about 2 billion > documents per lucene index. This is fairly embedded into the lucene code. > The way the collective we have solved this is through forms of sharding. > > > > tim > > > > On Fri, Feb 7, 2020 at 11:27 AM Doug Tarr <doug.t...@mongodb.com.invalid> > wrote: > > Hi! > > > > I'm working on a team that is building a lucene based search platform. > I've been lurking on this list for a while as we are spooling up on > learning the various components of Lucene. Thank you all for your amazing > work! > > > > I'm interested in learning more about what work has been done around > document count limitations in the Lucene 8 codec (as described here) > related to using int32 vs VInt or Int64: > > > > "Lucene uses a Java int to refer to document numbers, and the index file > format uses an Int32 on-disk to store document numbers. This is a > limitation of both the index file format and the current implementation. > Eventually these should be replaced with either UInt64 values, or better > yet, VInt values which have no limit." > > > > I've looked through JIRA and couldn't find any discussions about it, > trade-offs, difficulties, etc. If there's any information about this, I'd > appreciate any links or info that you might have. > > > > Thanks! > > - Doug > > -- > > > > { name : "Doug Tarr", > > title : "Director of Engineering, Search", > > location : "San Francisco, CA", > > company : "MongoDB", > > email: : "doug.t...@mongodb.com", > > linkedin : "douglastarr", > > twitter : "@doug_tarr" } > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Adrien