[ https://issues.apache.org/jira/browse/SOLR-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yonik Seeley updated SOLR-3180: ------------------------------- Attachment: fail.inconsistent.txt Uploading fail.inconsistent.txt - I had to truncate the start of the log file to get it under the limit for JIRA. Analysis: {code} 2> ASYNC NEW_CORE C6 name=collection1 org.apache.solr.core.SolrCore@eaecb09 url=http://127.0.0.1:58270/collection1 node=127.0.0.1:58270_ C6_STATE=coll:control_collection core:collection1 props:{shard=shard1, roles=null, state=active, core=collection1, collection=control_collection, node_name=127.0.0.1:58270_, base_url=http://127.0.0.1:58270, leader=true} 2> ASYNC NEW_CORE C5 name=collection1 org.apache.solr.core.SolrCore@54eeabe8 url=http://127.0.0.1:37198/collection1 node=127.0.0.1:37198_ C5_STATE=coll:collection1 core:collection1 props:{shard=shard3, roles=null, state=active, core=collection1, collection=collection1, node_name=127.0.0.1:37198_, base_url=http://127.0.0.1:37198, leader=true} 2> 25510 T80 C5 P37198 REQ /get {distrib=false&qt=/get&wt=javabin&version=2&getVersions=100} status=0 QTime=0 2> 187637 T669 C21 P39620 oasu.PeerSync.sync PeerSync: core=collection1 url=http://127.0.0.1:39620 DONE. sync succeeded 2> 188653 T669 C21 P39620 oasc.RecoveryStrategy.doRecovery PeerSync Recovery was successful - registering as Active. core=collection1 # # C21 (the replica) is recovering around the same time that the update for id:52720 comes in (we only # see when the update finishes below, not when it starts). # # update control finished 2> 187923 T24 C6 P58270 /update {wt=javabin&version=2} {add=[52720 (1422073056556220416)]} 0 1 # update leader for shard3 finished 2> 187927 T77 C5 P37198 /update {wt=javabin&version=2} {add=[52720 (1422073056559366144)]} 0 1 # these are the only adds for id:52720 in the logs... # TODO: verify that there was no replica for C5 to forward to? -------------------- 2> 225993 T77 C5 P37198 REQ /select {tests=checkShardConsistency&q=*:*&distrib=false&wt=javabin&rows=0&version=2} hits=835 status=0 QTime=1 # Note that C5 is still the leader - this means that C21 recovered from it at some point? 2> 225997 T658 C21 P39620 REQ /select {tests=checkShardConsistency&q=*:*&distrib=false&wt=javabin&rows=0&version=2} hits=833 status=0 QTime=1 2> live:true 2> num:833 2> 2> ######shard3 is not consistent. Got 835 from http://127.0.0.1:37198/collection1lastClient and got 833 from http://127.0.0.1:39620/collection1 2> ###### sizes=835,833 2> ###### Only in http://127.0.0.1:37198/collection1: [{id=52720, _version_=1422073056559366144}, {id=52710, _version_=1422073056325533696}, {id=52717, _version_=1422073056485965825}, {id=2225, _version_=1422073056602357760}, {id=52709, _version_=1422073056298270720}, {id=2226, _version_=1422073056612843520}, {id=2219, _version_=1422073056477577216}, {id=52723, _version_=1422073056605503488}] 2> ###### Only in http://127.0.0.1:39620/collection1: [{id=52680, _version_=1422073042480136192}, {id=52669, _version_=1422073042276712448}, {id=52676, _version_=1422073042420367360}, {id=2204, _version_=1422073042912149504}, {id=2198, _version_=1422073042778980352}, {id=2207, _version_=1422073053454532608}] {code} So what looks to be the case is that we came up and did a peersync with the leader and succeeded, but the leader hadn't noticed us yet and so didn't forward us a new update in the meantime. If we don't already, we need to make sure to do some of the same type of stuff that we do with replication recovery. The replica needs to ensure that the leader sees it (and hence will forward future updates) before it peersyncs with the leader. > ChaosMonkey test failures > ------------------------- > > Key: SOLR-3180 > URL: https://issues.apache.org/jira/browse/SOLR-3180 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Reporter: Yonik Seeley > Attachments: fail.inconsistent.txt, test_report_1.txt > > > Handle intermittent failures in the ChaosMonkey tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org