Status: New Owner: ---- New issue 60 by barneydesmond: gnt-cluster masterfailover cannot work when master is down http://code.google.com/p/ganeti/issues/detail?id=60
I'm reporting this on behalf of my colleague who's now on leave, originally raised here: http://groups.google.com/group/ganeti/browse_thread/thread/5ff4b1d3e05da465 What steps will reproduce the problem? 1. Configure a two-node cluster, nodes=[A,B]. A is the master, B is a master-candidate. The intent is to use Ganeti to simplify the management of a high-availability setup. 2. An instance is created, using DRBD. For the sake of argument, its primary node is A, and B is its secondary node. 3. The instance can be failed-over between the nodes without problem (NB. this is done by issuing commands on the cluster-master, not by forcefully taking any nodes down). 4. Take down the cluster-master (node A), either by removing power, or disconnecting the network, etc. 5. If the instance happens to be on running on node A, it will go down too, as expected. 6. Attempt to failover the cluster-master role by running "gnt-cluster masterfailover" on node B, it will be disallowed as the master cannot be contacted. What is the expected output? What do you see instead? Expected output is that the masterfailover proceeds and B becomes the cluster-master. Instead, something like the following is seen: # node2:~# gnt-cluster masterfailover # Failure: prerequisites not met for this operation: # Cluster is inconsistent, most nodes did not respond. What version of the product are you using? On what operating system? OS: Debian stable Ganeti version: the hosts have since been retasked, but residual evidence suggests v2.0.1 Please provide any additional information below. From the thread mentioned at the top, this appears to be a disconnect between the documented bahaviour (http://ganeti-doc.googlecode.com/svn/ganeti-2.0/admin.html#failing-over-the-master-node) and the coded behaviour. Guido suggested there was a --force option, but this isn't evident in the HEAD code in the repo (http://git.ganeti.org/?p=ganeti.git;a=blob_plain;f=scripts/gnt-cluster;hb=HEAD) There was some further discussion in that thread of what the "correct" behaviour is in such a situation (a 2-node cluster being special when running on a single node), but my lack of experience with Ganeti on full-sized clusters means I have no idea how this "should" be handled, presumably some sort of quorum across remaining nodes to agree that the master is well and truly dead, thus easing the situation when dealing with split-brain. -- You received this message because you are listed in the owner or CC fields of this issue, or because you starred this issue. You may adjust your issue notification preferences at: http://code.google.com/hosting/settings
