On Fri, Feb 18, 2011 at 5:29 PM, Michael Hanselmann <[email protected]> wrote:
> Am 18. Februar 2011 15:14 schrieb René Nussbaumer <[email protected]>:
>> This is a convinience command to do an automated EPO in the possible limits
>> of
>
> s/convinience/convenience/
Done.
>
>> Ganeti.
>
>> +def _InstanceStart(opts, inst_list, start):
>> […]
>> + bad_cnt = len([row for row in results if not row[0]])
>
> What is row[0]? (hint: explicit unpacking)
(success, _) now
>
>> +class _RunWhenNodesReachableHelper:
>> + """Helper class to make shared internal state sharing easier.
>> +
>> + @ivar success: Indicates if all action_cb calls were successful
>> + """
>
> Wrong indentation for docstring, missing empty line.
Indeed, must have missed that one. Nice catch, thanks :).
>
>> + def Wait(self, secs):
>> + start = time.time()
>> + for node in self.down:
>> + if self._ping_fn(self.node2ip[node], self.port,
>> + timeout=self._EPO_PING_TIMEOUT,
>> live_port_needed=True):
>
> You didn't test or pylint this code, did you? Hint: there's no self._EPO_*.
I did, and it's actually fixed already locally :(
>> +def _RunWhenNodesReachable(node_list, action_cb, interval):
>> + client = GetClient()
>> + cluster_info = client.QueryClusterInfo()
>> + if cluster_info["primary_ip_version"] == constants.IP4_VERSION:
>> + family = netutils.IPAddress.family
>> + else:
>> + family = netutils.IP6Address.family
>> +
>> + node2ip = dict((node, netutils.GetHostname(node, family=family).ip)
>> + for node in node_list)
>> + for node in node_list:
>> + host = netutils.GetHostname(node, family=family)
>> + node2ip[node] = host.ip
>
> \o/ No, you don't need that loop, the code before it does everything.
Meh, forgot to remove :(, thanks.
>> + helper = _RunWhenNodesReachableHelper(node_list, action_cb, node2ip)
>> +
>> + try:
>> + return utils.Retry(helper, interval, _EPO_REACHABLE_TIMEOUT,
>> + wait_fn=helper.Wait)
>> + except utils.RetryTimeout:
>> + ToStderr("Time exceeded while waiting for nodes to become reachable"
>> + " again:\n - %s", " - ".join(helper.down))
>> + return False
>
> Very nice!
>
>> +def _EpoOn(opts, full_node_list, node_list, inst_map):
>> + """Does the actual power on.
>> +
>> + @param opts: The command line options selected by the user
>> + @param full_node_list: All nodes to operate on (includes nodes not
>> supporting
>> + OOB)
>> + @param node_list: The list of nodes to operate on (all need to support
>> OOB)
>> + @param inst_map: A dict of inst -> nodes mapping
>> + @return: The desired exit status
>> +
>> + """
>> + if node_list and not _OobPower(opts, node_list, False):
>> + ToStderr("Not all nodes seems to get back up, investigate and start"
>
> s/seems/seem/
Done.
>
>> + " manually if needed")
>> +
>> + # Wait for the nodes to be back up
>> + action_cb = compat.partial(_MaybeInstanceStartup, opts, dict(inst_map))
>> +
>> + ToStdout("Waiting until all nodes are available again")
>> + if not _RunWhenNodesReachable(full_node_list, action_cb,
>> _EPO_PING_INTERVAL):
>> + ToStderr("Please investigate and start not running instances manually")
>
> s/not running/stopped/
I found that a little bit more confusing, but changed.
>
>> + return constants.EXIT_FAILURE
>> +
>> + return constants.EXIT_FAILURE
>> +
>> +
>> +def _EpoOff(opts, node_list, inst_map):
>> + if not _InstanceStart(opts, inst_map.keys(), False):
>> + ToStderr("Please investigate and stop instances manually before
>> continue")
>
> s/continue/continuing/
Done.
>
>> + return constants.EXIT_FAILURE
>> +
>> + if node_list and _OobPower(opts, node_list, False):
>> + return constants.EXIT_SUCCESS
>> + else:
>> + return constants.EXIT_FAILURE
>
>> +def Epo(opts, args):
>> […]
>> + for (idx, (node, master, pinsts, sinsts, powered,
>> + offline)) in enumerate(result):
>> + # Normalize the node_query_list as well
>> + node_query_list[idx] = node
>> + if not offline:
>> + for inst in (pinsts + sinsts):
>> + if inst not in inst_map:
>
> Please swap the two branches, so that you don't need “not” here.
Done.
>
>> + if master:
>> + inst_map[inst] = set()
>> + else:
>> + inst_map[inst] = set([node])
>> + else:
>> + if not master:
>
> elif
>
>> + inst_map[inst].add(node)
>> +
>> + if master and opts.on:
>> + # We ignore the master for turning on the machines, in fact we are
>> + # already operating on the master at this point :)
>> + continue
>> + elif master and not opts.show_all:
>> + ToStderr("%s is the master node, please do a master-failover to
>> another"
>> + " node not affected by the EPO or use --all if you intend to"
>> + " shutdown the whole cluster", node)
>> + return constants.EXIT_FAILURE
>> + elif powered is None:
>> + ToStdout("Node %s does not support out-of-band handling, it can not
>> be"
>> + " handled in a fully automated manner", node)
>> + elif powered == opts.on:
>> + ToStdout("Node %s is already in desired power state, skipping", node)
>> + else:
>> + if not offline or (offline and powered):
>
> elif
Done to both.
>
>> + node_list.append(node)
>
>> + "epo": (
>> + Epo, [ArgUnknown()],
>> + [FORCE_OPT, ON_OPT, GROUPS_OPT, ALL_OPT, OOB_TIMEOUT_OPT],
>> + "[opts...] [args]",
>> + "Performs an EPO on given args"),
>
> s/EPO/emergency power-off/ (maybe also in other places, don't assume
> everyone knows abbreviations)
I guess it's the same arguable discussion as with OOB. I change it
here and in the man-page.
>> --- a/man/gnt-cluster.rst
>> +++ b/man/gnt-cluster.rst
>> @@ -93,6 +93,23 @@ Remove all configuration files related to the cluster, so
>> that a
>> Since this is a dangerous command, you are required to pass the
>> argument *--yes-do-it.*
>>
>> +EPO
>> +~~~
>> +
>> +**epo** [--on] [--groups|--all] *arguments*
>> +
>> +Performans an EPO on nodes given as arguments. If ``--groups`` is given,
>
> Here again.
Done.
>> +arguments are node groups. If ``--all`` is provided, the whole cluster will
>> be
>> +shut down.
>> +
>> +The ``--on`` flag says, that the cluster should be recovered from a EPO.
>
> “… flag restarts the cluster after an emergency power-off”
I personally prefer recover over restart here
>> +Please note that the master node will not be turned down or up
>> automatically.
>> +It will just be left in a state, where you can manully perform the shutdown
>> of
>
> s/manully/manually/
Typo, fixed.
>
>> +that one node. If the master is in the list of affected nodes and this is
>> not
>
> “is not _a_”
done.
>
>> +complete cluster EPO (e.g. using ``--all``), you're required to do a master
>
> emergency power-off
done.
René