[ 
https://issues.apache.org/jira/browse/LUCENE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862463#action_12862463
 ] 

Steven Bethard commented on LUCENE-2420:
----------------------------------------

I finally found the documentation saying that the maximum number of documents 
is ~274 billion:
  http://lucene.apache.org/java/3_0_1/fileformats.html

Google queries that failed to find this:
  lucene index maximum documents
  lucene document limit
  lucene max docs

Maybe a bullet could be added to the FAQ (which does turn up for most of these 
queries)?
  http://wiki.apache.org/lucene-java/LuceneFAQ

As far as the exception goes, regardless of the transaction semantics, I really 
don't think the code works correctly after numeric overflow. Once 
SegmentWriteState.numDocsInStore is negative, I would expect code like 
StoredFieldsWriter.flush to fail:

  synchronized public void flush(SegmentWriteState state) throws IOException {
    if (state.numDocsInStore > 0) {
      ...

Perhaps I'm wrong, but it seems like this is going to do the wrong thing when 
SegmentWriteState.numDocsInStore is negative. If I'm not wrong, then it seems 
sensible to me to raise an exception on numeric overflow.

> "fdx size mismatch" overflow causes RuntimeException
> ----------------------------------------------------
>
>                 Key: LUCENE-2420
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2420
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1
>         Environment: CentOS 5.4
>            Reporter: Steven Bethard
>
> I just saw the following error:
> java.lang.RuntimeException: after flush: fdx size mismatch: -512764976 docs 
> vs 30257618564 length in bytes of _0.fdx file exists?=true
>         at 
> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:97)
>         at 
> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:51)
>         at 
> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:371)
>         at 
> org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1724)
>         at 
> org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3565)
>         at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3491)
>         at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3482)
>         at 
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1658)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1621)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1585)
> Note the negative SegmentWriteState.numDocsInStore. I assume this is because 
> Lucene has a limit of 2 ^ 31 - 1 = 2147483647 (sizeof(int)) documents per 
> index, though I couldn't find this documented clearly anywhere. It would have 
> been nice to get this error earlier, back when I exceeded the limit, rather 
> than now, after a bunch of indexing that was apparently doomed to fail.
> Hence, two suggestions:
> * State clearly somewhere that the maximum number of documents in a Lucene 
> index is sizeof(int).
> * Throw an exception when an IndexWriter first exceeds this number rather 
> than only on close.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to