[ 
http://issues.apache.org/jira/browse/LUCENE-415?page=comments#action_12357882 ] 

Andy Hind commented on LUCENE-415:
----------------------------------

And I can reproduce it .....on 1.4.3

When FSDirectory.createFile creates a FSOutputStream the random access file may 
already exist and contain data. The content is not cleaned out.

So if segment merging is taking place to a new segment, and the merge has 
written data to this file ....and the machine crashes/app is terminated .... 
you can end up with a partial or full segment file that the segment infos knows 
nothing about. If you restart, then any merge will try to reuse the same file 
name...and the content it contains.....

To reproduce the issue I created the next segment file by copying one that 
already exists .... and bang....on the next merge

I suggest that in FSOutputStream sets the file length to 0 on initialisation 
(as well as opening the channel to the file which can aslo produce some nasty 
deferred IO erorrs in windows XP a least)

I am not sure of any side effect of this but will test it.

We are seeing this 2-3 times a day if under heavy load or single thread and 
killing the app at random, which may be in the procedss of a segment write... 


> Merge error during add to index (IndexOutOfBoundsException)
> -----------------------------------------------------------
>
>          Key: LUCENE-415
>          URL: http://issues.apache.org/jira/browse/LUCENE-415
>      Project: Lucene - Java
>         Type: Bug
>   Components: Index
>     Versions: 1.4
>  Environment: Operating System: Linux
> Platform: Other
>     Reporter: Daniel Quaroni
>     Assignee: Lucene Developers

>
> I've been batch-building indexes, and I've build a couple hundred indexes 
> with 
> a total of around 150 million records.  This only happened once, so it's 
> probably impossible to reproduce, but anyway... I was building an index with 
> around 9.6 million records, and towards the end I got this:
> java.lang.IndexOutOfBoundsException: Index: 54, Size: 24
>         at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>         at java.util.ArrayList.get(ArrayList.java:322)
>         at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
>         at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:151)
>         at 
> org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java
> :149)
>         at org.apache.lucene.index.SegmentTermEnum.next
> (SegmentTermEnum.java:115)
>         at org.apache.lucene.index.SegmentMergeInfo.next
> (SegmentMergeInfo.java:52)
>         at org.apache.lucene.index.SegmentMerger.mergeTermInfos
> (SegmentMerger.java:294)
>         at org.apache.lucene.index.SegmentMerger.mergeTerms
> (SegmentMerger.java:254)
>         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:93)
>         at org.apache.lucene.index.IndexWriter.mergeSegments
> (IndexWriter.java:487)
>         at org.apache.lucene.index.IndexWriter.maybeMergeSegments
> (IndexWriter.java:458)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:310)
>         at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:294)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to