[
https://issues.apache.org/jira/browse/HDDS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rui Wang updated HDDS-4237:
---
Description:
Network partitioning can cause brian-split case where there are two leaders
exist. We need some sort of testing Infrastructure/framework to simulate such
case and verify whether our SCM HA implementation can achieve strong
consistency under partitioned network.
There might be two ways suggested by Mukul Kumar Singh:
a) Blockade tests, blockade is a docker based framework where the
network for one DN can be isolated from the other
b) MiniOzoneChaosCluster - This is a unit test based test, where a
random datanode was killed and this helped in finding out issues with
the consistency.
We might need similar solution for SCM: block SCM leader network and also
increase timeout to make old leader do not turn into candidate.
was:
Network partitioning can cause Brian-split case where there are two leaders
exist. We need some sort of testing Infrastructure/framework to simulate such
case and verify whether our SCM HA implementation can achieve strong
consistency.
There might be two ways suggested by Mukul Kumar Singh:
a) Blockade tests, blockade is a docker based framework where the
network for one DN can be isolated from the other
b) MiniOzoneChaosCluster - This is a unit test based test, where a
random datanode was killed and this helped in finding out issues with
the consistency.
We might need similar solution for SCM: block SCM leader network and also
increase timeout to make old leader do not turn into candidate.
> Testing Infrastructure for network partitioning
> ---
>
> Key: HDDS-4237
> URL: https://issues.apache.org/jira/browse/HDDS-4237
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
>Reporter: Rui Wang
>Priority: Major
>
> Network partitioning can cause brian-split case where there are two leaders
> exist. We need some sort of testing Infrastructure/framework to simulate such
> case and verify whether our SCM HA implementation can achieve strong
> consistency under partitioned network.
> There might be two ways suggested by Mukul Kumar Singh:
> a) Blockade tests, blockade is a docker based framework where the
> network for one DN can be isolated from the other
> b) MiniOzoneChaosCluster - This is a unit test based test, where a
> random datanode was killed and this helped in finding out issues with
> the consistency.
> We might need similar solution for SCM: block SCM leader network and also
> increase timeout to make old leader do not turn into candidate.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org