[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142215#comment-15142215
 ] 

Yonik Seeley commented on SOLR-8586:
------------------------------------

bq. Yep, I've been looping a custom version of the HDFS-nothing-safe test that 
among other things, only does adds, no deletes.

Update: when I reverted my custom changes to the chaos test (so that it also 
did deletes), I got a high amount of shard-out-of-sync errors... seemingly even 
more than before, so I've been trying to track those down.  What I saw were 
issues that did not look related to PeerSync... I saw missing documents from a 
shard that replicated from the leader while buffering documents, and I saw the 
missing documents come in and get buffered, pointing to transaction log 
buffering or replay issues.

Then I realized that I had tested "adds only" before committing, and tested the 
normal test after committing and doing a "git pull".  In-between those times 
was SOLR-8575, which was a fix to the HDFS tlog!  I've been looping the test 
for a number of hours with those changes reverted, and I haven't seen a 
shards-out-of-sync fail so far.  I've also done a quick review of SOLR-8575, 
but didn't see anything obviously incorrect.

I've also been running the non-hdfs version of the test for over a day, and 
also had no inconsistent shard failures.

> Implement hash over all documents to check for shard synchronization
> --------------------------------------------------------------------
>
>                 Key: SOLR-8586
>                 URL: https://issues.apache.org/jira/browse/SOLR-8586
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>             Fix For: 5.5, master
>
>         Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to