Hi All, Global state of the MB cluster becomes inconsistent, when the network becomes partitioned (split brains) in previous MB version(s). So as a solution we propose following, 1) a MB cluster cannot go below a defined number ( a.k.a: minimum cluster size) 2) During a network partition if node count (/size) of the particular partition is less than 'minimum cluster size' then that partition(s) 2.1) will stop accepting incoming traffic/connections 2.2) disconnect all active connections ( publishers/subscribers)
So idea is to let only a single partition ( which has the cluster size >= minimum cluster size) keep working while other(s) stop working. Therefore, choosing the number 'minimum cluster size' is important when deploying MB. otherwise user will have multiple network partitions ( where size >= minimum cluster size) working in parallel creating the problem we are trying to solve here. So here's the way to pick the number: | Cluster size | Minimum Node Count | |-------------------|--------------------| | 2 | 2 | | 3 | 2 | | 4 | 3 | | 5 | 3 | | N | (N / 2) + 1 | So this will have a direct effect on minimum HAed deployment for MB which used to 2. why? suppose, users now deploy 2 node MB cluster with this feature enabled. then during a network partition both nodes will stop working. this may be fine since it will make MB cluster reliable but in users point of view its a complete outage (since none of the nodes except traffic). Therefore now minimum HAed node count for MB become 3. When cluster size is 3, it will be able to withstand 1 node being in a network partition (and other 2 nodes will work). thoughts? Jira: https://wso2.org/jira/browse/MB-1664 -- Ramith Jayasinghe Technical Lead WSO2 Inc., http://wso2.com lean.enterprise.middleware E: ram...@wso2.com P: +94 772534930
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture