[ 
https://issues.apache.org/jira/browse/LUCENE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862528#action_12862528
 ] 

Shai Erera commented on LUCENE-2420:
------------------------------------

That documentation discusses the limitation around the number of unique terms 
Lucene can handle, which sums up to ~274 billion: "_which means the maximum 
number of *unique terms* in any single index segment is ~2.1 billion times the 
term index interval (default 128) = ~274 billion_".

One line below this you can find this documentation: "_Similarly, Lucene uses a 
Java int to refer to document numbers, and the index file format uses an Int32 
on-disk to store document numbers_" which suggests the Integer.MAX_VAL 
limitation ... but I agree it could have been spelled out more clearly.

If numDocsInStore is negative, SegmentFieldsWriter.flush() doesn't do anything 
... you mean that that code should throw the exception?

> "fdx size mismatch" overflow causes RuntimeException
> ----------------------------------------------------
>
>                 Key: LUCENE-2420
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2420
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1
>         Environment: CentOS 5.4
>            Reporter: Steven Bethard
>
> I just saw the following error:
> java.lang.RuntimeException: after flush: fdx size mismatch: -512764976 docs 
> vs 30257618564 length in bytes of _0.fdx file exists?=true
>         at 
> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:97)
>         at 
> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:51)
>         at 
> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:371)
>         at 
> org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1724)
>         at 
> org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3565)
>         at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3491)
>         at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3482)
>         at 
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1658)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1621)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1585)
> Note the negative SegmentWriteState.numDocsInStore. I assume this is because 
> Lucene has a limit of 2 ^ 31 - 1 = 2147483647 (sizeof(int)) documents per 
> index, though I couldn't find this documented clearly anywhere. It would have 
> been nice to get this error earlier, back when I exceeded the limit, rather 
> than now, after a bunch of indexing that was apparently doomed to fail.
> Hence, two suggestions:
> * State clearly somewhere that the maximum number of documents in a Lucene 
> index is sizeof(int).
> * Throw an exception when an IndexWriter first exceeds this number rather 
> than only on close.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to