Every word occurrence or every unique word? I mean Integer.MAX_VALUE like 2
billion. Even the OED only has 600,000 words defined. The former doesn't
sound like a good use case match for Lucene as it exists today. Lucene
indexes "documents", not "words".
I'm sure some day Lucene will switch from int to long, but not in the very
near future (maybe Lucene 6.0??), especially since it probably isn't a good
match for existing hardware. Maybe when Lucene moves a lot more stuff off
heap, then it might make more sense.
Sure, you could do you own Lucene branch that literally does that switch
now, but otherwise, that's the limit for now.
-- Jack Krupansky
-----Original Message-----
From: Artem Gayardo-Matrosov
Sent: Friday, March 21, 2014 12:41 PM
To: java-user@lucene.apache.org
Subject: Re: maxDoc/numDocs int fields
Hi Oli,
Thanks for your reply,
I thought about this, but it feels like making a crude, inefficient
implementation of what's already in lucene -- CompositeReader, isn't it? It
would involve writing my CompositeCompositeReader which would forward the
requests to the underlying CompositeReader...
Is there a better way?
Thanks,
Artem.
On Fri, Mar 21, 2014 at 6:33 PM, Oliver Christ <ochr...@ebsco.com> wrote:
Can you split your corpus across multiple Lucene instances?
Cheers, Oli
-----Original Message-----
From: Artem Gayardo-Matrosov [mailto:ar...@gayardo.com]
Sent: Friday, March 21, 2014 12:29 PM
To: java-user@lucene.apache.org
Subject: maxDoc/numDocs int fields
Hi all,
I am using lucene to index a large corpus of text, with every word being a
separate document (this is something I cannot change), and I am hitting a
limitation of the CompositeReader only supporting Integer.MAX_VALUE
documents.
Is there any way to work around this limitation? For the moment I have
implemented my own DirectoryReader and BaseCompositeReader to at least
make
them support documents from Integer.MIN_VALUE to -1 (for twice more
documents supported), the problem is that all the APIs are restricted to
use the int type and after the docID value wraps back to 0, I have no way
to restore the original docID.
--
Thanks in advance,
Artem.
--
Artem.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org