Simon Willnauer created LUCENE-9477:
---------------------------------------

             Summary: IndexWriter might leave broken segments file behind on 
exception during rollback
                 Key: LUCENE-9477
                 URL: https://issues.apache.org/jira/browse/LUCENE-9477
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Simon Willnauer


Mike ran some beasty tests while I was working on LUCENE-8962. This test caused 
some headaches since it only rarely also fails on master:

{noformat}
org.apache.lucene.index.TestIndexWriterOnVMError > testUnknownError FAILED
    org.apache.lucene.index.CorruptIndexException: Unexpected file read error 
while reading index. 
(resource=BufferedChecksumIndexInput(MockIndexInputWrapper((clone of) 
ByteBuffersIndexInput (file=pending_segments_2, buffers\
=258 bytes, block size: 1, blocks: 1, position: 0))))
        at 
__randomizedtesting.SeedInfo.seed([587A104EFE0C57E1:B32CCFCEFC8BC1D1]:0)
        at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:300)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:521)
        at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:301)
        at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:836)
        at 
org.apache.lucene.index.TestIndexWriterOnVMError.doTest(TestIndexWriterOnVMError.java:89)
        at 
org.apache.lucene.index.TestIndexWriterOnVMError.testUnknownError(TestIndexWriterOnVMError.java:251)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
        at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
        at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
        at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
        at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
        at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
        at java.base/java.lang.Thread.run(Thread.java:834)

        Caused by:
        java.io.FileNotFoundException: _0.si in 
dir=ByteBuffersDirectory@1bae3fe1 
lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@38275f41
            at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:748)
            at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
            at 
org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1044)
            at 
org.apache.lucene.codecs.lucene86.Lucene86SegmentInfoFormat.read(Lucene86SegmentInfoFormat.java:91)
            at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:364)
            at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:298)
            ... 41 more
        ....

  2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexWriterOnVMError 
-Dtests.method=testUnknownError -Dtests.seed=587A104EFE0C57E1 
-Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true 
-Dtests.linedocsfile=/l/sim\
on/lucene/test-framework/src/resources/org/apache/lucene/util/2000mb.txt.gz 
-Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
  2> NOTE: leaving temporary files on disk at: 
/l/simon/lucene/core/build/tmp/tests-tmp/lucene.index.TestIndexWriterOnVMError_587A104EFE0C57E1-003
  2> NOTE: test params are: codec=Asserting(Lucene86): 
{text_payloads=BlockTreeOrds(blocksize=128), 
text_vectors=PostingsFormat(name=Asserting), 
text1=PostingsFormat(name=Asserting), id=BlockTreeOrds(blocksize=128)}, docValu\
es:{dv3=DocValuesFormat(name=Lucene80), dv2=DocValuesFormat(name=Asserting), 
dv5=DocValuesFormat(name=Lucene80), dv=DocValuesFormat(name=Asserting), 
dv4=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=696, maxMBSortInH\
eap=6.040673619645681, sim=Asserting(RandomSimilarity(queryNorm=false): 
{text_payloads=IB SPL-DZ(0.3), text_vectors=DFR I(ne)L3(800.0), 
text1=org.apache.lucene.search.similarities.BooleanSimilarity@6f4329a1}), 
locale=zh-CN, \
timezone=SystemV/MST7MDT
  2> NOTE: Linux 5.5.6-arch1-1 amd64/Oracle Corporation 11.0.6 
(64-bit)/cpus=128,threads=1,free=241525696,total=268435456
  2> NOTE: All tests run in this JVM: [TestIndexWriterOnVMError]
{noformat}

The test reproduces on master also without the huge line docs file using this:

{noformat}
ant test  -Dtestcase=TestIndexWriterOnVMError -Dtests.method=testUnknownError 
-Dtests.seed=587A104EFE0C57E1 -Dtests.nightly=true -Dtests.slow=true 
-Dtests.badapples=true -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
{noformat}

the reason is that we fail to delete the already renamed pending segments file 
when the metadata sync on the directory fails. The subsequent rollback also 
crashes while it's trying to delete unrefed files and that will cause 
subsequent CheckIndex calls to fail with FNF exceptions since the commit was 
written but not fully removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to