Hi, The Lucene internal DocId is not a unique identifier, it is not even stable! It is just a temporary property to identify a document in an index segment / shard and is only valid for the lifetime of an IndexReader.
Lucene (and Solr / Elasticsearch) can hold "indexes" with much more than 2 billion documents, because they shard internally (which a database is also doing). Direct Lucene users are just on a lower level than "apllication" / "database" users. Would you take care how MySQL internally addresses the rows in tables? Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Cristian Lorenzetto [mailto:cristian.lorenze...@gmail.com] > Sent: Thursday, August 18, 2016 5:58 PM > To: Lucene Users <java-user@lucene.apache.org> > Subject: Re: docid is just a signed int32 > > normally databases supports at least long primary key. > try to ask to twitter application , for example increasing every year more > than 4 petabytes :) Maybe they use big storage devices bigger than a pc > storage:) > However If you offer a possibility to use shards ... it is a possibility > anyway :) > For this reason, my suggestion was different ... was not related to size of > repository , but size of research result :):):) > > " A suggestion for possible changes in future is to not use java array but > > Iterator. Iterator is a ADT more scalable , not sucking memory for > > returning documents." > > it is just a suggestion anyway for my loved lucene :):) > > > 2016-08-18 17:43 GMT+02:00 Greg Bowyer <gbow...@fastmail.co.uk>: > > > What are you trying to index that has more than 3 billion documents per > > shard / index and can not be split as Adrien suggests? > > > > > > > > On Thu, Aug 18, 2016, at 07:35 AM, Cristian Lorenzetto wrote: > > > Maybe lucene has maxsize 2^31 because result set are java array where > > > length is a int type. > > > A suggestion for possible changes in future is to not use java array but > > > Iterator. Iterator is a ADT more scalable , not sucking memory for > > > returning documents. > > > > > > > > > 2016-08-18 16:03 GMT+02:00 Glen Newton <glen.new...@gmail.com>: > > > > > > > Or maybe it is time Lucene re-examined this limit. > > > > > > > > There are use cases out there where >2^31 does make sense in a single > > index > > > > (huge number of tiny docs). > > > > > > > > Also, I think the underlying hardware and the JDK have advanced to > make > > > > this more defendable. > > > > > > > > Constructively, > > > > Glen > > > > > > > > > > > > On Thu, Aug 18, 2016 at 9:55 AM, Adrien Grand <jpou...@gmail.com> > > wrote: > > > > > > > > > No, IndexWriter enforces that the number of documents cannot go > over > > > > > IndexWriter.MAX_DOCS (which is a bit less than 2^31) and > > > > > BaseCompositeReader computes the number of documents in a long > > variable > > > > and > > > > > ensures it is less than 2^31, so you cannot have indexes that contain > > > > more > > > > > than 2^31 documents. > > > > > > > > > > Larger collections should be written to multiple shards and use > > > > > TopDocs.merge to merge results. > > > > > > > > > > Le jeu. 18 août 2016 à 15:38, Cristian Lorenzetto < > > > > > cristian.lorenze...@gmail.com> a écrit : > > > > > > > > > > > docid is a signed int32 so it is not so big, but really docid seams > > > > not a > > > > > > primary key unmodifiable but a temporary id for the view related > > to a > > > > > > specific search. > > > > > > > > > > > > So repository can contains more than 2^31 documents. > > > > > > > > > > > > My deduction is correct ? is there a maximum size for lucene index? > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org