Can you split your corpus across multiple Lucene instances?

Cheers, Oli

-----Original Message-----
From: Artem Gayardo-Matrosov [mailto:ar...@gayardo.com] 
Sent: Friday, March 21, 2014 12:29 PM
To: java-user@lucene.apache.org
Subject: maxDoc/numDocs int fields

Hi all,

I am using lucene to index a large corpus of text, with every word being a 
separate document (this is something I cannot change), and I am hitting a 
limitation of the CompositeReader only supporting Integer.MAX_VALUE documents.

Is there any way to work around this limitation? For the moment I have 
implemented my own DirectoryReader and BaseCompositeReader to at least make 
them support documents from Integer.MIN_VALUE to -1 (for twice more documents 
supported), the problem is that all the APIs are restricted to use the int type 
and after the docID value wraps back to 0, I have no way to restore the 
original docID.

--
Thanks in advance,
Artem.

Reply via email to