Hi, We have seen Lucene segments corrupt, under the following situation: During merging of segments, the following sequence of operations takes place (1) Locks index (2) get new segment name by calling newSegmentName() which basically will call segmentInfos.counter++ (3) Data is written to the new Segments (4) Segment File is rewritten. (5) Old segments are deleted/marked for deletion. The corruption is a possiblity when an exception ocurrs on step (3) preventing the Commit to the segments file. Eg: No disk space, loose network share etc, Bad Merging segments etc. Because the segment files are not replaced there is no corruption immediately, however. on the next merge operation, the index will corrupt. [There is an scenario where the corruption may not occur, if the new segment is bigger than the failed one.]. I am not sure the effect of this on Compound File Store. The cause of this issue can be traced to segmentInfos.counter. Because the counter is not changed in the segments file, the next merge operation will use the same failed segment name, and if you are using any standard Directory implementation, it will probably write the segment to the same file location. Note the merge operations opens the segments in read-write mode and therefore we start with a non-empty file. Some options are: (1)Commit the counter after the newSegmentName call. This way we never reuse the the segmentName. (2) Add a callback API to directory interface for a new Segment Creation allowing the directory interface to clean up, on a new segment write. (3) Provide a Rollback mechanism in the event of merge failure. (Using the deleteable functionality). (4) For Compound File Store (The file must be empty). (Possibly, it can use the callback in (2) to cleanup. We should apply as many of the them to make the merge code robust to potential failures: I think with the increase adoption of Lucene, we need to think about data corruption and recovery issues. More later,
Arvind. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]