[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856052#comment-13856052 ]
Timothy Potter commented on SOLR-4260: -------------------------------------- Thanks Mark, I suspected my test case was a little cherry picked ... something interesting happened when I also severed the connection between the replica and ZK (ie. same test as above but I also dropped the ZK connection on the replica). 2013-12-23 15:39:57,170 [main-EventThread] INFO common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@4f857c62 name:ZooKeeperConnection Watcher:ec2-54-197-0-103.compute-1.amazonaws.com:2181 got event WatchedEvent state:Disconnected type:None path:null path:null type:None 2013-12-23 15:39:57,170 [main-EventThread] INFO common.cloud.ConnectionManager - zkClient has disconnected >>> fixed the connection between replica and ZK here <<< 2013-12-23 15:40:45,579 [main-EventThread] INFO common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@4f857c62 name:ZooKeeperConnection Watcher:ec2-54-197-0-103.compute-1.amazonaws.com:2181 got event WatchedEvent state:Expired type:None path:null path:null type:None 2013-12-23 15:40:45,579 [main-EventThread] INFO common.cloud.ConnectionManager - Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper... 2013-12-23 15:40:45,580 [main-EventThread] INFO common.cloud.DefaultConnectionStrategy - Connection expired - starting a new one... 2013-12-23 15:40:45,586 [main-EventThread] INFO common.cloud.ConnectionManager - Waiting for client to connect to ZooKeeper 2013-12-23 15:40:45,595 [main-EventThread] INFO common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@4f857c62 name:ZooKeeperConnection Watcher:ec2-54-197-0-103.compute-1.amazonaws.com:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None 2013-12-23 15:40:45,595 [main-EventThread] INFO common.cloud.ConnectionManager - Client is connected to ZooKeeper 2013-12-23 15:40:45,595 [main-EventThread] INFO common.cloud.ConnectionManager - Connection with ZooKeeper reestablished. 2013-12-23 15:40:45,596 [main-EventThread] WARN solr.cloud.RecoveryStrategy - Stopping recovery for zkNodeName=core_node3core=cloud_shard1_replica3 2013-12-23 15:40:45,597 [main-EventThread] INFO solr.cloud.ZkController - publishing core=cloud_shard1_replica3 state=down 2013-12-23 15:40:45,597 [main-EventThread] INFO solr.cloud.ZkController - numShards not found on descriptor - reading it from system property 2013-12-23 15:40:45,905 [qtp2124890785-14] INFO handler.admin.CoreAdminHandler - It has been requested that we recover 2013-12-23 15:40:45,906 [qtp2124890785-14] INFO solr.servlet.SolrDispatchFilter - [admin] webapp=null path=/admin/cores params={action=REQUESTRECOVERY&core=cloud_shard1_replica3&wt=javabin&version=2} status=0 QTime=2 2013-12-23 15:40:45,909 [Thread-17] INFO solr.cloud.ZkController - publishing core=cloud_shard1_replica3 state=recovering 2013-12-23 15:40:45,909 [Thread-17] INFO solr.cloud.ZkController - numShards not found on descriptor - reading it from system property 2013-12-23 15:40:45,920 [Thread-17] INFO solr.update.DefaultSolrCoreState - Running recovery - first canceling any ongoing recovery 2013-12-23 15:40:45,921 [RecoveryThread] INFO solr.cloud.RecoveryStrategy - Starting recovery process. core=cloud_shard1_replica3 recoveringAfterStartup=false 2013-12-23 15:40:45,924 [RecoveryThread] INFO solr.cloud.ZkController - publishing core=cloud_shard1_replica3 state=recovering 2013-12-23 15:40:45,924 [RecoveryThread] INFO solr.cloud.ZkController - numShards not found on descriptor - reading it from system property 2013-12-23 15:40:48,613 [qtp2124890785-15] INFO solr.core.SolrCore - [cloud_shard1_replica3] webapp=/solr path=/select params={q=foo_s:bar&distrib=false&wt=json&rows=0} hits=0 status=0 QTime=1 2013-12-23 15:42:42,770 [qtp2124890785-13] INFO solr.core.SolrCore - [cloud_shard1_replica3] webapp=/solr path=/select params={q=foo_s:bar&distrib=false&wt=json&rows=0} hits=0 status=0 QTime=1 2013-12-23 15:42:45,650 [main-EventThread] ERROR solr.cloud.ZkController - There was a problem making a request to the leader:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was asked to wait on state down for cloud86:8986_solr but I still do not see the requested state. I see state: recovering live:false at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1434) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:347) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:85) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:225) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:118) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:56) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:93) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-12-23 15:42:45,963 [RecoveryThread] ERROR solr.cloud.RecoveryStrategy - Error while trying to recover. core=cloud_shard1_replica3:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was asked to wait on state recovering for cloud86:8986_solr but I still do not see the requested state. I see state: recovering live:false at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:224) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) 2013-12-23 15:42:45,964 [RecoveryThread] ERROR solr.cloud.RecoveryStrategy - Recovery failed - trying again... (0) core=cloud_shard1_replica3 2013-12-23 15:42:45,964 [RecoveryThread] INFO solr.cloud.RecoveryStrategy - Wait 2.0 seconds before trying to recover again (1) 2013-12-23 15:42:47,964 [RecoveryThread] INFO solr.cloud.ZkController - publishing core=cloud_shard1_replica3 state=recovering > Inconsistent numDocs between leader and replica > ----------------------------------------------- > > Key: SOLR-4260 > URL: https://issues.apache.org/jira/browse/SOLR-4260 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Environment: 5.0.0.2013.01.04.15.31.51 > Reporter: Markus Jelsma > Assignee: Mark Miller > Priority: Critical > Fix For: 5.0, 4.7 > > Attachments: 192.168.20.102-replica1.png, > 192.168.20.104-replica2.png, clusterstate.png > > > After wiping all cores and reindexing some 3.3 million docs from Nutch using > CloudSolrServer we see inconsistencies between the leader and replica for > some shards. > Each core hold about 3.3k documents. For some reason 5 out of 10 shards have > a small deviation in then number of documents. The leader and slave deviate > for roughly 10-20 documents, not more. > Results hopping ranks in the result set for identical queries got my > attention, there were small IDF differences for exactly the same record > causing a record to shift positions in the result set. During those tests no > records were indexed. Consecutive catch all queries also return different > number of numDocs. > We're running a 10 node test cluster with 10 shards and a replication factor > of two and frequently reindex using a fresh build from trunk. I've not seen > this issue for quite some time until a few days ago. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org