On Fri, Feb 18, 2011 at 2:42 PM, Michael Hanselmann <[email protected]> wrote:
> Am 18. Februar 2011 14:15 schrieb René Nussbaumer <[email protected]>:
>> --- a/lib/client/gnt_cluster.py
>> +++ b/lib/client/gnt_cluster.py
>> +_PING_INTERVAL = 30 # 30 seconds between pings
>> +_REACHABLE_TIMEOUT = 15 * 60 # 15 minutes
>
> Prefix with “_EPO”, please.
Done.
>
>> +def _InstanceStart(opts, inst_list, start):
>> + if start:
>> + opcls = opcodes.OpInstanceStartup
>> + text_addition = ("startup", "started", "starting")
>> + else:
>> + opcls = opcodes.OpInstanceShutdown
>> + text_addition = ("shutdown", "stopped", "stopping")
>
> Make these separate variables, please.
Done.
>
>> + jex = JobExecutor(opts=opts)
>> +
>> + for inst in inst_list:
>> + ToStdout("Submit %s of instance %s", text_addition[0], inst)
>> + op = opcls(instance_name=inst)
>> + jex.QueueJob(inst, op)
>> +
>> + results = jex.GetResults()
>> + bad_cnt = len([row for row in results if not row[0]])
>> +
>> + if bad_cnt == 0:
>> + ToStdout("All instances has been %s successfully", text_addition[1])
>
> s/has/have/
Done.
>
>> + else:
>> + ToStderr("There were errors while %s instances:\n"
>> + "%d error(s) out of %d instance(s)", text_addition[2], bad_cnt,
>> + len(results))
>> + return False
>> +
>> + return True
>> +
>> +
>> +class _RunWhenNodesReachableHelper:
>> + """Helper class to make shared internal state sharing easier.
>> +
>> + """
>> + _PING_TIMEOUT = 1
>
> Please put this together with the other constants. :)
This is just class local, but well … I places it on top now
>
>> + def Wait(self, secs):
>> + start = time.time()
>> + for node in self.down:
>> + if self._ping_fn(self.node2ip[node], constants.DEFAULT_NODED_PORT,
>
> You must look up the port using netutils.GetDaemonPort, see other
> places where it's used.
I planned to do that, but forgot, thanks :).
>> + timeout=self._PING_TIMEOUT, live_port_needed=True):
>> + ToStdout("Node %s became available", node)
>> + self.up.add(node)
>> + self.down -= self.up
>
> Please add a comment here describing why you're doing this inside the
> loop (because you're returning right away).
Done.
>
>> + return
>> +
>> + self._sleep_fn(max(0.0, start + secs - time.time()))
>
> Are you sure this calculation is correct?
>
>>>> start = 5; now = 3; secs = 30
>>>> start + secs - now
> 32
I'm pretty confidence that this calculation is right, you have an
issue in your calculation: Now is small than start which means the
time shifted backwards which doesn't make sense or if it happens
(broken hardware clock?) not in the hand of our code. I think it's
safe to assume that time is moving forward. Also the calculation is
still right, let's say start is 5 and now is > 5 let's say 12 and the
timeout is 30 seconds to go we calculate:
5 + 30 - 12 = 23 -> 23 seconds to sleep
Also this calculation is done in the RunningTimeout class this way
>> +def _RunWhenNodesReachable(node_list, action_cb, interval):
>> + client = GetClient()
>> + cluster_info = client.QueryClusterInfo()
>> + if cluster_info["primary_ip_version"] == constants.IP4_VERSION:
>> + family = netutils.IPAddress.family
>> + else:
>> + family = netutils.IP6Address.family
>> +
>> + node2ip = {}
>> + for node in node_list:
>> + host = netutils.GetHostname(node, family=family)
>> + node2ip[node] = host.ip
>
> node2ip = dict((node, netutils.GetHostname(node, family=family).ip)
> for node in node_list)
Done.
>> + helper = _RunWhenNodesReachableHelper(node_list, action_cb, node2ip)
>> +
>> + try:
>> + utils.Retry(helper, interval, _REACHABLE_TIMEOUT, wait_fn=helper.Wait)
>> + return helper.success
>
> Actually you can return success as Retry's return value, so this
> becomes just “return utils.Retry(…)”.
Done.
René