[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13824295#comment-13824295 ]
Jessica Cheng commented on SOLR-4260: ------------------------------------- {quote} This shouldn't be the case, because those updates will only have been ack'd if each replica received them. {quote} That's what I thought too, but doesn't seem to be the case in the code. If you take a look at DistributedUpdateProcessor.doFinish(), {quote} // if its a forward, any fail is a problem - // otherwise we assume things are fine if we got it locally // until we start allowing min replication param if (errors.size() > 0) { // if one node is a RetryNode, this was a forward request if (errors.get(0).req.node instanceof RetryNode) { rsp.setException(errors.get(0).e); } // else // for now we don't error - we assume if it was added locally, we // succeeded } {quote} It then starts a thread to urge the replica to recover, but if that fails, it just completely gives up. > Inconsistent numDocs between leader and replica > ----------------------------------------------- > > Key: SOLR-4260 > URL: https://issues.apache.org/jira/browse/SOLR-4260 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 5.0 > Environment: 5.0.0.2013.01.04.15.31.51 > Reporter: Markus Jelsma > Priority: Critical > Fix For: 5.0 > > Attachments: 192.168.20.102-replica1.png, > 192.168.20.104-replica2.png, clusterstate.png > > > After wiping all cores and reindexing some 3.3 million docs from Nutch using > CloudSolrServer we see inconsistencies between the leader and replica for > some shards. > Each core hold about 3.3k documents. For some reason 5 out of 10 shards have > a small deviation in then number of documents. The leader and slave deviate > for roughly 10-20 documents, not more. > Results hopping ranks in the result set for identical queries got my > attention, there were small IDF differences for exactly the same record > causing a record to shift positions in the result set. During those tests no > records were indexed. Consecutive catch all queries also return different > number of numDocs. > We're running a 10 node test cluster with 10 shards and a replication factor > of two and frequently reindex using a fresh build from trunk. I've not seen > this issue for quite some time until a few days ago. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org