[ 
https://issues.apache.org/jira/browse/LUCENE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862312#action_12862312
 ] 

Shai Erera commented on LUCENE-2420:
------------------------------------

I remember that the Integer.MAX_VAL is documented somewhere. I can try to look 
it up later. But a lot of places in the API use int as the doc Id (IndexReader, 
ScoreDoc, even IndexWriter.max/numDocs()), and so I think there's a strong hint 
about that limitation.

As for throwing the exception sooner, I don't think it will be correct. 
IndexWriter implements transaction semantics. Until you call commit() or 
close(), whatever operations you've made are not *officially* in the index yet. 
If your JVM dies before that, they will get lost. Therefore throwing the 
exception earlier would be wrong. Also think that you intend to index 1000 docs 
and delete 100,000. Would you want to get the exception while adding the docs, 
knowing that you are about to delete much more soon?



> "fdx size mismatch" overflow causes RuntimeException
> ----------------------------------------------------
>
>                 Key: LUCENE-2420
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2420
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1
>         Environment: CentOS 5.4
>            Reporter: Steven Bethard
>
> I just saw the following error:
> java.lang.RuntimeException: after flush: fdx size mismatch: -512764976 docs 
> vs 30257618564 length in bytes of _0.fdx file exists?=true
>         at 
> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:97)
>         at 
> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:51)
>         at 
> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:371)
>         at 
> org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1724)
>         at 
> org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3565)
>         at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3491)
>         at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3482)
>         at 
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1658)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1621)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1585)
> Note the negative SegmentWriteState.numDocsInStore. I assume this is because 
> Lucene has a limit of 2 ^ 31 - 1 = 2147483647 (sizeof(int)) documents per 
> index, though I couldn't find this documented clearly anywhere. It would have 
> been nice to get this error earlier, back when I exceeded the limit, rather 
> than now, after a bunch of indexing that was apparently doomed to fail.
> Hence, two suggestions:
> * State clearly somewhere that the maximum number of documents in a Lucene 
> index is sizeof(int).
> * Throw an exception when an IndexWriter first exceeds this number rather 
> than only on close.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to