[
https://issues.apache.org/jira/browse/SOLR-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yonik Seeley updated SOLR-3180:
-------------------------------
Attachment: fail.inconsistent.txt
Uploading fail.inconsistent.txt - I had to truncate the start of the log file
to get it under the limit for JIRA.
Analysis:
{code}
2> ASYNC NEW_CORE C6 name=collection1 org.apache.solr.core.SolrCore@eaecb09
url=http://127.0.0.1:58270/collection1 node=127.0.0.1:58270_
C6_STATE=coll:control_collection core:collection1 props:{shard=shard1,
roles=null, state=active, core=collection1, collection=control_collection,
node_name=127.0.0.1:58270_, base_url=http://127.0.0.1:58270, leader=true}
2> ASYNC NEW_CORE C5 name=collection1 org.apache.solr.core.SolrCore@54eeabe8
url=http://127.0.0.1:37198/collection1 node=127.0.0.1:37198_
C5_STATE=coll:collection1 core:collection1 props:{shard=shard3, roles=null,
state=active, core=collection1, collection=collection1,
node_name=127.0.0.1:37198_, base_url=http://127.0.0.1:37198, leader=true}
2> 25510 T80 C5 P37198 REQ /get
{distrib=false&qt=/get&wt=javabin&version=2&getVersions=100} status=0 QTime=0
2> 187637 T669 C21 P39620 oasu.PeerSync.sync PeerSync: core=collection1
url=http://127.0.0.1:39620 DONE. sync succeeded
2> 188653 T669 C21 P39620 oasc.RecoveryStrategy.doRecovery PeerSync Recovery
was successful - registering as Active. core=collection1
#
# C21 (the replica) is recovering around the same time that the update for
id:52720 comes in (we only
# see when the update finishes below, not when it starts).
#
# update control finished
2> 187923 T24 C6 P58270 /update {wt=javabin&version=2} {add=[52720
(1422073056556220416)]} 0 1
# update leader for shard3 finished
2> 187927 T77 C5 P37198 /update {wt=javabin&version=2} {add=[52720
(1422073056559366144)]} 0 1
# these are the only adds for id:52720 in the logs...
# TODO: verify that there was no replica for C5 to forward to?
--------------------
2> 225993 T77 C5 P37198 REQ /select
{tests=checkShardConsistency&q=*:*&distrib=false&wt=javabin&rows=0&version=2}
hits=835 status=0 QTime=1
# Note that C5 is still the leader - this means that C21 recovered from it at
some point?
2> 225997 T658 C21 P39620 REQ /select
{tests=checkShardConsistency&q=*:*&distrib=false&wt=javabin&rows=0&version=2}
hits=833 status=0 QTime=1
2> live:true
2> num:833
2>
2> ######shard3 is not consistent. Got 835 from
http://127.0.0.1:37198/collection1lastClient and got 833 from
http://127.0.0.1:39620/collection1
2> ###### sizes=835,833
2> ###### Only in http://127.0.0.1:37198/collection1: [{id=52720,
_version_=1422073056559366144}, {id=52710, _version_=1422073056325533696},
{id=52717, _version_=1422073056485965825}, {id=2225,
_version_=1422073056602357760}, {id=52709, _version_=1422073056298270720},
{id=2226, _version_=1422073056612843520}, {id=2219,
_version_=1422073056477577216}, {id=52723, _version_=1422073056605503488}]
2> ###### Only in http://127.0.0.1:39620/collection1: [{id=52680,
_version_=1422073042480136192}, {id=52669, _version_=1422073042276712448},
{id=52676, _version_=1422073042420367360}, {id=2204,
_version_=1422073042912149504}, {id=2198, _version_=1422073042778980352},
{id=2207, _version_=1422073053454532608}]
{code}
So what looks to be the case is that we came up and did a peersync with the
leader and succeeded, but the leader hadn't noticed us yet and so didn't
forward us a new update in the meantime.
If we don't already, we need to make sure to do some of the same type of stuff
that we do with replication recovery. The replica needs to ensure that the
leader sees it (and hence will forward future updates) before it peersyncs with
the leader.
> ChaosMonkey test failures
> -------------------------
>
> Key: SOLR-3180
> URL: https://issues.apache.org/jira/browse/SOLR-3180
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Yonik Seeley
> Attachments: fail.inconsistent.txt, test_report_1.txt
>
>
> Handle intermittent failures in the ChaosMonkey tests.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]