Status: New
Owner: ----

New issue 60 by barneydesmond: gnt-cluster masterfailover cannot work when  
master is down
http://code.google.com/p/ganeti/issues/detail?id=60

I'm reporting this on behalf of my colleague who's now on leave, originally
raised here:
http://groups.google.com/group/ganeti/browse_thread/thread/5ff4b1d3e05da465

What steps will reproduce the problem?
1. Configure a two-node cluster, nodes=[A,B]. A is the master, B is a
master-candidate. The intent is to use Ganeti to simplify the management of
a high-availability setup.
2. An instance is created, using DRBD. For the sake of argument, its
primary node is A, and B is its secondary node.
3. The instance can be failed-over between the nodes without problem (NB.
this is done by issuing commands on the cluster-master, not by forcefully
taking any nodes down).
4. Take down the cluster-master (node A), either by removing power, or
disconnecting the network, etc.
5. If the instance happens to be on running on node A, it will go down too,
as expected.
6. Attempt to failover the cluster-master role by running "gnt-cluster
masterfailover" on node B, it will be disallowed as the master cannot be
contacted.


What is the expected output? What do you see instead?

Expected output is that the masterfailover proceeds and B becomes the
cluster-master. Instead, something like the following is seen:

# node2:~# gnt-cluster masterfailover
# Failure: prerequisites not met for this operation:
# Cluster is inconsistent, most nodes did not respond.


What version of the product are you using? On what operating system?
OS: Debian stable
Ganeti version: the hosts have since been retasked, but residual evidence
suggests v2.0.1


Please provide any additional information below.

 From the thread mentioned at the top, this appears to be a disconnect
between the documented bahaviour
(http://ganeti-doc.googlecode.com/svn/ganeti-2.0/admin.html#failing-over-the-master-node)
and the coded behaviour. Guido suggested there was a --force option, but
this isn't evident in the HEAD code in the repo
(http://git.ganeti.org/?p=ganeti.git;a=blob_plain;f=scripts/gnt-cluster;hb=HEAD)

There was some further discussion in that thread of what the "correct"
behaviour is in such a situation (a 2-node cluster being special when
running on a single node), but my lack of experience with Ganeti on
full-sized clusters means I have no idea how this "should" be handled,
presumably some sort of quorum across remaining nodes to agree that the
master is well and truly dead, thus easing the situation when dealing with
split-brain.

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

Reply via email to