[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841870#comment-13841870
 ] 

Yago Riveiro commented on SOLR-4260:
------------------------------------

Replicas are still losing docs in Solr 4.6 :(.

I'm wondering if we can't have a pair (version, numDocs) to track the 
increments of docs between versions. Also we can save the last 10 tlogs in each 
replica as backups after be commited and make a diff to see what is missing in 
case the replicas are out of sync, replay the transaction and avoid a not 
synchronized replica and a full-recovery that probably will be heaviest that 
make the diff.

It's only and idea and of course find the bug must be the priority.

This issue compromisse Solr to be "the main" storage. If re-index data is not 
possible, we can't guarantee that no data is missing,  and worse, we lost the 
data forever :(.


> Inconsistent numDocs between leader and replica
> -----------------------------------------------
>
>                 Key: SOLR-4260
>                 URL: https://issues.apache.org/jira/browse/SOLR-4260
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>         Environment: 5.0.0.2013.01.04.15.31.51
>            Reporter: Markus Jelsma
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: 5.0, 4.7
>
>         Attachments: 192.168.20.102-replica1.png, 
> 192.168.20.104-replica2.png, clusterstate.png
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using 
> CloudSolrServer we see inconsistencies between the leader and replica for 
> some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
> a small deviation in then number of documents. The leader and slave deviate 
> for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my 
> attention, there were small IDF differences for exactly the same record 
> causing a record to shift positions in the result set. During those tests no 
> records were indexed. Consecutive catch all queries also return different 
> number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor 
> of two and frequently reindex using a fresh build from trunk. I've not seen 
> this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to