[ 
https://issues.apache.org/jira/browse/LUCENE-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308510#comment-17308510
 ] 

Robert Muir commented on LUCENE-9867:
-------------------------------------

[~sqshq] the general problem is files getting deleted that should not be.

A number of things could cause this: stale directory metadata from filesystem, 
two writers at the same time, etc.

Is XFS accessed via qemu disk image? Or via some other feature such as 
virtio-fs?
Is there a chance of the two indexing patterns overlapping with each other at 
the same time? Any special LockFactory configuration?


> CorruptIndexException after failed segment merge caused by No space left on 
> device
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-9867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9867
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/store
>    Affects Versions: 8.5
>            Reporter: Alexander L
>            Priority: Major
>
> Failed segment merge caused by "No space left on device" can't be recovered 
> and Lucene fails with CorruptIndexException after restart. The expectation is 
> that Lucene will be able to restart automatically without manual intervention.
> We have 2 indexing patterns:
>  * Create and commit an empty index, then start long initial indexing process 
> (might take hours), perform a second commit in the end
>  * Using existing index, add no more than 4k documents and commit after that
> Right now we don't have evidence to suggest which pattern caused this issue, 
> but we definitely witnessed a similar situation for the second pattern, 
> although it was a bit different - caused by {{OutOfMemoryError: Java Heap 
> Space}}, with missing {{_q.cfe}} file which produced only 
> {{NoSuchFileException}}, not {{CorruptIndexException}}. Please let me know if 
> we need a separate ticket for that.
> Lucene version: 8.5.0
>  Java version: OpenJDK 11
> OS: CentOS Linux 7
>  Kernel: Linux 3.10.0-1160.11.1.el7.x86_64
>  Virtualization: kvm
>  Filesystem: xfs
> Failed merge stacktrace:
> {code:java}
> 2021-02-02T08:51:51.679+0000
> org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No 
> space left on device
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
> Caused by: java.io.IOException: No space left on device
>       at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>       at 
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
>       at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
>       at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
>       at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
>       at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
>       at java.base/java.nio.channels.Channels.writeFully(Channels.java:97)
>       at java.base/java.nio.channels.Channels$1.write(Channels.java:172)
>       at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416)
>       at 
> java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74)
>       at 
> java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
>       at 
> java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
>       at 
> org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
>       at 
> org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73)
>       at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159)
>       at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172)
>       at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441)
>       at 
> org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165)
>       at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229)
>       at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159)
>       at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636)
>       at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229)
>       at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
>       at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
>       at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
>       at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
> {code}
>  Followed by failed startup:
> {code:java}
> 2021-02-02T08:52:07.926+0000
> org.apache.lucene.index.CorruptIndexException: Unexpected file read error 
> while reading index. 
> (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu")))
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)
>       at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si
>       at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
>       at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>       at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
>       at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
>       at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
>       at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
>       at 
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
>       at 
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
>       at 
> org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91)
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353)
>       at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>       ... 33 common frames omitted
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to