Simon Willnauer created LUCENE-9477: ---------------------------------------
Summary: IndexWriter might leave broken segments file behind on exception during rollback Key: LUCENE-9477 URL: https://issues.apache.org/jira/browse/LUCENE-9477 Project: Lucene - Core Issue Type: Bug Reporter: Simon Willnauer Mike ran some beasty tests while I was working on LUCENE-8962. This test caused some headaches since it only rarely also fails on master: {noformat} org.apache.lucene.index.TestIndexWriterOnVMError > testUnknownError FAILED org.apache.lucene.index.CorruptIndexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((clone of) ByteBuffersIndexInput (file=pending_segments_2, buffers\ =258 bytes, block size: 1, blocks: 1, position: 0)))) at __randomizedtesting.SeedInfo.seed([587A104EFE0C57E1:B32CCFCEFC8BC1D1]:0) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:300) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:521) at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:301) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:836) at org.apache.lucene.index.TestIndexWriterOnVMError.doTest(TestIndexWriterOnVMError.java:89) at org.apache.lucene.index.TestIndexWriterOnVMError.testUnknownError(TestIndexWriterOnVMError.java:251) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.io.FileNotFoundException: _0.si in dir=ByteBuffersDirectory@1bae3fe1 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@38275f41 at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:748) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157) at org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1044) at org.apache.lucene.codecs.lucene86.Lucene86SegmentInfoFormat.read(Lucene86SegmentInfoFormat.java:91) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:364) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:298) ... 41 more .... 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterOnVMError -Dtests.method=testUnknownError -Dtests.seed=587A104EFE0C57E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.linedocsfile=/l/sim\ on/lucene/test-framework/src/resources/org/apache/lucene/util/2000mb.txt.gz -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8 2> NOTE: leaving temporary files on disk at: /l/simon/lucene/core/build/tmp/tests-tmp/lucene.index.TestIndexWriterOnVMError_587A104EFE0C57E1-003 2> NOTE: test params are: codec=Asserting(Lucene86): {text_payloads=BlockTreeOrds(blocksize=128), text_vectors=PostingsFormat(name=Asserting), text1=PostingsFormat(name=Asserting), id=BlockTreeOrds(blocksize=128)}, docValu\ es:{dv3=DocValuesFormat(name=Lucene80), dv2=DocValuesFormat(name=Asserting), dv5=DocValuesFormat(name=Lucene80), dv=DocValuesFormat(name=Asserting), dv4=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=696, maxMBSortInH\ eap=6.040673619645681, sim=Asserting(RandomSimilarity(queryNorm=false): {text_payloads=IB SPL-DZ(0.3), text_vectors=DFR I(ne)L3(800.0), text1=org.apache.lucene.search.similarities.BooleanSimilarity@6f4329a1}), locale=zh-CN, \ timezone=SystemV/MST7MDT 2> NOTE: Linux 5.5.6-arch1-1 amd64/Oracle Corporation 11.0.6 (64-bit)/cpus=128,threads=1,free=241525696,total=268435456 2> NOTE: All tests run in this JVM: [TestIndexWriterOnVMError] {noformat} The test reproduces on master also without the huge line docs file using this: {noformat} ant test -Dtestcase=TestIndexWriterOnVMError -Dtests.method=testUnknownError -Dtests.seed=587A104EFE0C57E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8 {noformat} the reason is that we fail to delete the already renamed pending segments file when the metadata sync on the directory fails. The subsequent rollback also crashes while it's trying to delete unrefed files and that will cause subsequent CheckIndex calls to fail with FNF exceptions since the commit was written but not fully removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org