[
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142215#comment-15142215
]
Yonik Seeley commented on SOLR-8586:
------------------------------------
bq. Yep, I've been looping a custom version of the HDFS-nothing-safe test that
among other things, only does adds, no deletes.
Update: when I reverted my custom changes to the chaos test (so that it also
did deletes), I got a high amount of shard-out-of-sync errors... seemingly even
more than before, so I've been trying to track those down. What I saw were
issues that did not look related to PeerSync... I saw missing documents from a
shard that replicated from the leader while buffering documents, and I saw the
missing documents come in and get buffered, pointing to transaction log
buffering or replay issues.
Then I realized that I had tested "adds only" before committing, and tested the
normal test after committing and doing a "git pull". In-between those times
was SOLR-8575, which was a fix to the HDFS tlog! I've been looping the test
for a number of hours with those changes reverted, and I haven't seen a
shards-out-of-sync fail so far. I've also done a quick review of SOLR-8575,
but didn't see anything obviously incorrect.
I've also been running the non-hdfs version of the test for over a day, and
also had no inconsistent shard failures.
> Implement hash over all documents to check for shard synchronization
> --------------------------------------------------------------------
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Yonik Seeley
> Fix For: 5.5, master
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch,
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should
> suffice. The hash itself is pretty easy, but we need to figure out
> when/where to do this check (for example, I think PeerSync is currently used
> in multiple contexts and this check would perhaps not be appropriate for all
> PeerSync calls?)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]