I'm having a problem with Lucene 4.5.1. Whenever I attempt to index a file > 
2GB in size, it dies with the following exception:

java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, startOffset=-2147483648,endOffset=-2147483647

Essentially, I'm doing this:

Directory directory = new MMapDirectory(indexPath);
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_45, analyzer);
IndexWriter iw = new IndexWriter(directory, iwc);

InputStream is = <my input stream>;
InputStreamReader reader = new InputStreamReader(is);

Document doc = new Document();
doc.add(new StoredField("fileid", fileid));
doc.add(new StoredField("pathname", pathname));
doc.add(new TextField("content", reader));

iw.addDocument(doc);

It's the IndexWriter addDocument method that throws the exception. In looking at the Lucene source code, it appears that the offsets being used internally are int, which makes it somewhat obvious why this is happening.

This issue never happened when I used Lucene 3.6.0. 3.6.0 was perfectly capable of handling a file over 2GB in this manner. What has changed and how do I get around this ? Is Lucene no longer capable of handling files this large, or is there some other way I should be doing this ?

Here's the full stack trace sans my code:

java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, startOffset=-2147483648,endOffset=-2147483647
        at 
org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45)
        at 
org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:183)
        at 
org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:49)
        at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
        at 
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:82)
        at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
        at 
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
        at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:254)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1551)
        at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1221)
        at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1202)

Thanks,
John

--
John Cecere
Principal Engineer - Oracle Corporation
732-987-4317 / john.cec...@oracle.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to