[ https://issues.apache.org/jira/browse/SOLR-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249708#comment-14249708 ]
Shalin Shekhar Mangar commented on SOLR-6640: --------------------------------------------- I am looking at this failure too and I see another bug. I was wondering why did the replica have these writes in the first place considering that it hadn't recovery on startup wasn't complete yet. # RecoveryStrategy publishes the state of the replica as 'recovering' before it sets the update log to buffering mode which is why the leader sends updates to this replica that affect the index. # The test itself doesn't wait for a steady state e.g. by calling waitForRecovery or waitForThingsToLevelOut before starting the indexing threads. This is probably a good thing because that's what has helped us find this problem. # Shouldn't the peersync also be done while update log is set to buffering mode? {quote} So it's these files which are not getting removed when we do IW.rollback that were causing the problem - _0.cfe _0.cfs _0.si _0_1.liv _1.fdt _1.fdx I am yet to figure out whether these files should have been removed by IW.rollback() or not? {quote} These files hang around because an IndexReader is open using the IndexWriter due to soft commit(s). > ChaosMonkeySafeLeaderTest failure with CorruptIndexException > ------------------------------------------------------------ > > Key: SOLR-6640 > URL: https://issues.apache.org/jira/browse/SOLR-6640 > Project: Solr > Issue Type: Bug > Components: replication (java) > Affects Versions: 5.0 > Reporter: Shalin Shekhar Mangar > Fix For: 5.0 > > Attachments: Lucene-Solr-5.x-Linux-64bit-jdk1.8.0_20-Build-11333.txt, > SOLR-6640.patch, SOLR-6640.patch > > > Test failure found on jenkins: > http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11333/ > {code} > 1 tests failed. > REGRESSION: org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch > Error Message: > shard2 is not consistent. Got 62 from > http://127.0.0.1:57436/collection1lastClient and got 24 from > http://127.0.0.1:53065/collection1 > Stack Trace: > java.lang.AssertionError: shard2 is not consistent. Got 62 from > http://127.0.0.1:57436/collection1lastClient and got 24 from > http://127.0.0.1:53065/collection1 > at > __randomizedtesting.SeedInfo.seed([F4B371D421E391CD:7555FFCC56BCF1F1]:0) > at org.junit.Assert.fail(Assert.java:93) > at > org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1255) > at > org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1234) > at > org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:162) > at > org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) > {code} > Cause of inconsistency is: > {code} > Caused by: org.apache.lucene.index.CorruptIndexException: file mismatch, > expected segment id=yhq3vokoe1den2av9jbd3yp8, got=yhq3vokoe1den2av9jbd3yp7 > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/test/J0/temp/solr.cloud.ChaosMonkeySafeLeaderTest-F4B371D421E391CD-001/tempDir-001/jetty3/index/_1_2.liv"))) > [junit4] 2> at > org.apache.lucene.codecs.CodecUtil.checkSegmentHeader(CodecUtil.java:259) > [junit4] 2> at > org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:88) > [junit4] 2> at > org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat.readLiveDocs(AssertingLiveDocsFormat.java:64) > [junit4] 2> at > org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:102) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org