Info on document number limitations

Doug Tarr Fri, 07 Feb 2020 11:28:01 -0800

Hi!

I'm working on a team that is building a lucene based search platform.
 I've been lurking on this list for a while as we are spooling up on
learning the various components of Lucene.  Thank you all for your amazing
work!


I'm interested in learning more about what work has been done around
document count limitations in the Lucene 8 codec (as described here
<http://lucene.apache.org/core/8_0_0/core/org/apache/lucene/codecs/lucene80/package-summary.html>)
related to using int32 vs VInt or Int64:

"Lucene uses a Java int to refer to document numbers, and the index file
format uses an Int32 on-disk to store document numbers. This is a
limitation of both the index file format and the current implementation.
Eventually these should be replaced with either UInt64 values, or better
yet, VInt
<http://lucene.apache.org/core/8_0_0/core/org/apache/lucene/store/DataOutput.html#writeVInt-int->
values
which have no limit."

I've looked through JIRA and couldn't find any discussions about it,
trade-offs, difficulties, etc.  If there's any information about this, I'd
appreciate any links or info that you might have.

Thanks!
- Doug
-- 


*{ *name     : *"Doug Tarr",*

  title    : "Director of Engineering, Search",

  location : "San Francisco, CA",

  company  : "MongoDB <http://www.mongodb.com>",

  email:   : "[email protected]",

  linkedin : "douglastarr <https://www.linkedin.com/in/douglastarr/>",

  twitter  : "@ <https://twitter.com/doug_tarr>*doug_tarr
<https://twitter.com/doug_tarr>" **}*

Info on document number limitations

Reply via email to