Also, given how people use search, they hit performance issues long before running out of document IDs. Usually. Although that said I do know of one user who’s running in the 1.0-1.5B range per replica so 2B is just around the corner. Of course they have to be _very_ careful how they use Solr.
And that said, there’s just not a lot of pressure to go to longs, and as Tim says it’s be a very significant effort. And there would be memory implications for everyone to balance. Best, Erick > On Feb 8, 2020, at 9:59 PM, Tim Casey <[email protected]> wrote: > > > Hi Doug, > > I don't know the specific limits. But the document limits are going to be > around an int, probably signed. This comes out to mean about 2 billion > documents per lucene index. This is fairly embedded into the lucene code. > The way the collective we have solved this is through forms of sharding. > > tim > > On Fri, Feb 7, 2020 at 11:27 AM Doug Tarr <[email protected]> > wrote: > Hi! > > I'm working on a team that is building a lucene based search platform. I've > been lurking on this list for a while as we are spooling up on learning the > various components of Lucene. Thank you all for your amazing work! > > I'm interested in learning more about what work has been done around document > count limitations in the Lucene 8 codec (as described here) related to using > int32 vs VInt or Int64: > > "Lucene uses a Java int to refer to document numbers, and the index file > format uses an Int32 on-disk to store document numbers. This is a limitation > of both the index file format and the current implementation. Eventually > these should be replaced with either UInt64 values, or better yet, VInt > values which have no limit." > > I've looked through JIRA and couldn't find any discussions about it, > trade-offs, difficulties, etc. If there's any information about this, I'd > appreciate any links or info that you might have. > > Thanks! > - Doug > -- > > { name : "Doug Tarr", > title : "Director of Engineering, Search", > location : "San Francisco, CA", > company : "MongoDB", > email: : "[email protected]", > linkedin : "douglastarr", > twitter : "@doug_tarr" } --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
