I ran into this issue before and after some digging, I don't think there is an easy way to accommodate long IDs in Lucene. So I decided to go with sharding documents into multiple indexes. It turned out to be a good decision in my case because I would have to shard the index anyway for performance reasons. (There are queries that requires collecting and scoring a large portion of the index).

On Mar 21, 2014, at 09:41 AM, Artem Gayardo-Matrosov <ar...@gayardo.com> wrote:

Hi Oli,

Thanks for your reply,

I thought about this, but it feels like making a crude, inefficient
implementation of what's already in lucene -- CompositeReader, isn't it? It
would involve writing my CompositeCompositeReader which would forward the
requests to the underlying CompositeReader...

Is there a better way?

Thanks,
Artem.




On Fri, Mar 21, 2014 at 6:33 PM, Oliver Christ <ochr...@ebsco.com        > wrote:

       > Can you split your corpus across multiple Lucene instances?
       >
       > Cheers, Oli
       >
       > -----Original Message-----
       > From: Artem Gayardo-Matrosov [mailto:ar...@gayardo.com]
       > Sent: Friday, March 21, 2014 12:29 PM
       > To: java-user@lucene.apache.org
       > Subject: maxDoc/numDocs int fields
       >
       > Hi all,
       >
       > I am using lucene to index a large corpus of text, with every word being a
       > separate document (this is something I cannot change), and I am hitting a
       > limitation of the CompositeReader only supporting Integer.MAX_VALUE
       > documents.
       >
       > Is there any way to work around this limitation? For the moment I have
       > implemented my own DirectoryReader and BaseCompositeReader to at least make
       > them support documents from Integer.MIN_VALUE to -1 (for twice more
       > documents supported), the problem is that all the APIs are restricted to
       > use the int type and after the docID value wraps back to 0, I have no way
       > to restore the original docID.
       >
       > --
       > Thanks in advance,
       > Artem.
       >



--

Artem.

Reply via email to