[ https://issues.apache.org/jira/browse/HDDS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195579#comment-17195579 ]
Rui Wang commented on HDDS-4237: -------------------------------- Other potential ideas: https://github.com/apache/hadoop-ozone/tree/master/hadoop-ozone/fault-injection-test/network-tests/src/test https://chaos-mesh.org/ https://jepsen.io/ > Testing Infrastructure for network partitioning > ----------------------------------------------- > > Key: HDDS-4237 > URL: https://issues.apache.org/jira/browse/HDDS-4237 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Reporter: Rui Wang > Priority: Major > > Network partitioning can cause brian-split case where there are two leaders > exist. We need some sort of testing Infrastructure/framework to simulate such > case and verify whether our SCM HA implementation can achieve strong > consistency under partitioned network. > There might be two ways suggested by Mukul Kumar Singh: > a) Blockade tests, blockade is a docker based framework where the > network for one DN can be isolated from the other > b) MiniOzoneChaosCluster - This is a unit test based test, where a > random datanode was killed and this helped in finding out issues with > the consistency. > We might need similar solution for SCM: block SCM leader network and also > increase timeout to make old leader do not turn into candidate. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org