On Fri, May 28, 2010 at 2:59 AM, Tom Limoncelli <[email protected]> wrote:
>
> Signed-off-by: Tom Limoncelli <[email protected]>
> ---
> daemons/ganeti-watcher | 25 ++++++++++++++++++++++++-
> lib/utils.py | 13 +++++++++++++
> 2 files changed, 37 insertions(+), 1 deletions(-)
>
> diff --git a/daemons/ganeti-watcher b/daemons/ganeti-watcher
> index 1f82db8..82bd24b 100755
> --- a/daemons/ganeti-watcher
> +++ b/daemons/ganeti-watcher
> @@ -48,6 +48,7 @@ from ganeti import ssconf
> from ganeti import bdev
> from ganeti import hypervisor
> from ganeti.confd import client as confd_client
> +from ganeti.rapi import client as rapi_client
>
>
> MAXTRIES = 5
> @@ -666,7 +667,29 @@ def main():
> client = cli.GetClient()
>
> # we are on master now
> - utils.EnsureDaemon(constants.RAPI)
> +
> + # Restart RAPI if it isn't responding to queries.
> + # Only kill/restart RAPI once. Otherwise just give up.
> + rapi_restarted = False
> + while True:
Can we avoid doing it in a while/True loop? Attempting something only
once should be doable without an infinite cycle
(which, as shown by iustin, is prone to errors, especially with such a
branched codepath).
> + utils.EnsureDaemon(constants.RAPI)
> + logging.debug("Attempting to talk with RAPI")
> + master_rapi = rapi_client.GanetiRapiClient("localhost",
> + ssl_cert_file=constants.RAPI_CERT_FILE)
> + try:
> + master_version = master_rapi.GetVersion()
> + except:
except: here catches probably too many conditions. Don't we have
anything more specific that the rapi client will throw if it can't
talk with the daemon, and we can catch more safely? Any bug in the
rapi client or signal or memory error or other condition will be
caught here.
> + logging.error("Could not open connection to RAPI")
> + if rapi_restarted:
> + break
> + else:
> + logging.debug("RAPI is running but did not speak. Killing
> RAPI")
> + utils.StopDaemon(constants.RAPI)
> + continue
> + if master_version == 2:
> + break
> + else:
> + logging.fatal("RAPI version said %s, expecting 2" % master_version)
Is this such a terrible condition? :) We're checking that the rapi
daemon is alive, and nothing in this codepath assumes a particular
version. It is conceivable we'll change the client to support version
3 (if ever version 3 exists) in a transparent way, so there's no point
in enforcing the version here.
Thanks,
Guido