On Tue, Jul 07, 2009 at 12:24:00PM +0200, Guido Trotter wrote:
> 
> This allows failing over in certain corner cases, such as a 2 node
> cluster with one node down. The man page is also updated to document the
> shortcomings of this option (we cannot pass --no-voting ourselves to the
> master, because that requires user interaction) and how to make the
> cluster consistent again.
> 
> Signed-off-by: Guido Trotter <[email protected]>
> ---
>  lib/bootstrap.py     |   32 +++++++++++++++++++-------------
>  man/gnt-cluster.sgml |   14 ++++++++++++++
>  scripts/gnt-cluster  |   19 +++++++++++++++++--
>  3 files changed, 50 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/bootstrap.py b/lib/bootstrap.py
> index 0308484..496a017 100644
> --- a/lib/bootstrap.py
> +++ b/lib/bootstrap.py
> @@ -373,13 +373,17 @@ def SetupNodeDaemon(cluster_name, node, ssh_key_check):
>                               (node, result.fail_reason, result.output))
>  
>  
> -def MasterFailover():
> +def MasterFailover(skip_voting=False):
>    """Failover the master node.
>  
>    This checks that we are not already the master, and will cause the
>    current master to cease being master, and the non-master to become
>    new master.
>  
> +  @type skip_voting: boolean
> +  @param skip_voting: force the operation without remote nodes agreement
> +                      (dangerous)
> +
>    """
>    sstore = ssconf.SimpleStore()
>  
> @@ -401,18 +405,20 @@ def MasterFailover():
>                                 " master candidates is:\n"
>                                 "%s" % ('\n'.join(mc_no_master)))
>  
> -  vote_list = GatherMasterVotes(node_list)
> -
> -  if vote_list:
> -    voted_master = vote_list[0][0]
> -    if voted_master is None:
> -      raise errors.OpPrereqError("Cluster is inconsistent, most nodes did 
> not"
> -                                 " respond.")
> -    elif voted_master != old_master:
> -      raise errors.OpPrereqError("I have wrong configuration, I believe the"
> -                                 " master is %s but the other nodes voted 
> for"
> -                                 " %s. Please resync the configuration of"
> -                                 " this node." % (old_master, voted_master))
> +  if not skip_voting:
> +    vote_list = GatherMasterVotes(node_list)
> +
> +    if vote_list:
> +      voted_master = vote_list[0][0]
> +      if voted_master is None:
> +        raise errors.OpPrereqError("Cluster is inconsistent, most nodes did"
> +                                   " not respond.")
> +      elif voted_master != old_master:
> +        raise errors.OpPrereqError("I have a wrong configuration, I believe"
> +                                   " the master is %s but the other nodes"
> +                                   " voted %s. Please resync the 
> configuration"
> +                                   " of this node." %
> +                                   (old_master, voted_master))
>    # end checks
>  
>    rcode = 0
> diff --git a/man/gnt-cluster.sgml b/man/gnt-cluster.sgml
> index e3fecbf..9467c00 100644
> --- a/man/gnt-cluster.sgml
> +++ b/man/gnt-cluster.sgml
> @@ -442,11 +442,25 @@
>  
>        <cmdsynopsis>
>          <command>masterfailover</command>
> +        <arg>--no-voting</arg>
>        </cmdsynopsis>
>  
>        <para>
>          Failover the master role to the current node.
>        </para>
> +
> +      <para>
> +        The <option>--no-voting</option> option skips the remote node 
> agreement
> +        checks. This is dangerous, but necessary in some cases (for example
> +        failing over the master role in a 2 node cluster with the second node

s/second node/original master/ ?

> +        down). After a failover performed this way the master daemon will 
> most
> +        probably not start, and you will need to start it manually passing 
> the
> +        --no-voting option to ganeti-masterd as well.

Hmm, I think that we should modify the start master call over RPC to
allow forcing the start, so that failover --no-voting is enough.

> Be careful because the
> +        second node will still believe to be the master, so when it comes up
> +        you'll need to start just ganeti-noded there, and perform a 
> gnt-cluster
> +        redist-conf on the new master to make the cluster consistent again.
> +      </para>

IIRC, the masterd won't start on the original master since the vote from
the current master will conflict. We should check this.

iustin

Reply via email to