Re: Info on document number limitations

Erick Erickson Sun, 09 Feb 2020 05:29:55 -0800

Also, given how people use search, they hit performance issues long before 
running out of document IDs. Usually. Although that said I do know of one user 
who’s running in the 1.0-1.5B range per replica so 2B is just around the 
corner. Of course they have to be _very_ careful how they use Solr.


And that said, there’s just not a lot of pressure to go to longs, and as Tim 
says it’s be a very significant effort. And there would be memory implications 
for everyone to balance.

Best,
Erick

> On Feb 8, 2020, at 9:59 PM, Tim Casey <[email protected]> wrote:
> 
> 
> Hi Doug,
> 
> I don't know the specific limits.  But the document limits are going to be 
> around an int, probably signed.  This comes out to mean about 2 billion 
> documents per lucene index.  This is fairly embedded into the lucene code.  
> The way the collective we have solved this is through forms of sharding.
> 
> tim
> 
> On Fri, Feb 7, 2020 at 11:27 AM Doug Tarr <[email protected]> 
> wrote:
> Hi!
> 
> I'm working on a team that is building a lucene based search platform.   I've 
> been lurking on this list for a while as we are spooling up on learning the 
> various components of Lucene.  Thank you all for your amazing work!
> 
> I'm interested in learning more about what work has been done around document 
> count limitations in the Lucene 8 codec (as described here) related to using 
> int32 vs VInt or Int64:
> 
> "Lucene uses a Java int to refer to document numbers, and the index file 
> format uses an Int32 on-disk to store document numbers. This is a limitation 
> of both the index file format and the current implementation. Eventually 
> these should be replaced with either UInt64 values, or better yet, VInt 
> values which have no limit."
> 
> I've looked through JIRA and couldn't find any discussions about it, 
> trade-offs, difficulties, etc.  If there's any information about this, I'd 
> appreciate any links or info that you might have.
> 
> Thanks!
> - Doug
> -- 
> 
> { name     : "Doug Tarr",
>   title    : "Director of Engineering, Search",
>   location : "San Francisco, CA", 
>   company  : "MongoDB",
>   email:   : "[email protected]",
>   linkedin : "douglastarr",
>   twitter  : "@doug_tarr" }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Info on document number limitations

Reply via email to