Hi All,
I would like to propose using XS HA to switch XS master host when XS master 
host is down

Reason,
We found below issue recently,
https://issues.apache.org/jira/browse/CLOUDSTACK-6177

When XS master is down, CS uses pool-emergency-transition-to-master and 
pool-recover-slaves API to choose a new master, this API is not safe, and 
should be only used in emergent situation, this API may cause XS use a little 
bit old(5 seconds old) version of XS DB, some of object may be missing in the 
old XS DB, which may cause weird behavior, you may not be able to start VM.


Short term solution

CS doesn't do XS master switch any more to avoid this issue.

Impact,

1.      When master host is down, CS loses connect to the whole XS pool(CS 
cluster), CS cannot get VMs info in this cluster, and the whole cluster is not 
operable.

2.      Require admin to recover the XS master host manually, if recovering XS 
master host is not possible, admin can use uses 
pool-emergency-transition-to-master and pool-recover-slaves to recover the 
pool, per the issue I mentioned before , this should be the last resort.

Long term solution

Integrate XS HA, use XS HA to do XS master switch.

1.      It might take  some time to integrate XS HA.

2.      Old free version XS doesn't have XS HA feature, user might need to 
upgrade to XS 6.2( which is free) to get the feature.


I think we can fix this issue in two steps.

1.      Since this issue is very critical, CS should not  do XS master switch 
immediately to avoid this issue.

2.      Integrate XS HA.


Comments, suggestions are highly appreciated!

Best Regards.
Anthony

Reply via email to