I'm still learning "git" and for some reason I was able to send that last patch even though it didn't include some of the code changes I had made in response to past comments. Sorry if it looked like I was ignoring your comments. I'm still getting used to git's strangeness.
On Sun, May 30, 2010 at 7:40 AM, Guido Trotter <[email protected]> wrote: > On Sat, May 29, 2010 at 9:27 PM, Tom Limoncelli <[email protected]> wrote: >> >> Signed-off-by: Tom Limoncelli <[email protected]> >> --- >> daemons/ganeti-watcher | 41 +++++++++++++++++++++++++++++++++++++++++ >> lib/utils.py | 13 +++++++++++++ >> 2 files changed, 54 insertions(+), 0 deletions(-) >> >> diff --git a/daemons/ganeti-watcher b/daemons/ganeti-watcher >> index 1f82db8..3665f03 100755 >> --- a/daemons/ganeti-watcher >> +++ b/daemons/ganeti-watcher >> @@ -36,6 +36,7 @@ import sys >> import time >> import logging >> from optparse import OptionParser >> +import urllib2 > > I don't see anything in this patch using this import. > Is it needed? Does the commit pass "make lint"? This was needed for one of the exceptions (see below). >> from ganeti import utils >> from ganeti import constants >> @@ -48,6 +49,7 @@ from ganeti import ssconf >> from ganeti import bdev >> from ganeti import hypervisor >> from ganeti.confd import client as confd_client >> +from ganeti.rapi import client as rapi_client >> >> >> MAXTRIES = 5 >> @@ -595,6 +597,30 @@ def OpenStateFile(path): >> return os.fdopen(statefile_fd, "w+") >> >> >> +def IsRAPIResponding(cluster): >> + """Connects to RAPI port and does a simple test. >> + >> + �...@type cluster: string >> + �...@param cluster: hostname of the cluster to connect to. >> + > > Should we call it "hostname" or "host" rather than cluster? > In the end we're not passing a "cluster" at all. :) Fixed. >> + Returns: >> + True: basic tests passed. >> + False: basic tests failed. > > s/tests/test/ (you're doing only one) Fixed. >> + """ >> + ssl_config = rapi_client.CertAuthorityVerify(constants.RAPI_CERT_FILE) >> + try: >> + master_version = rapi_client.GanetiRapiClient( >> + cluster, >> + port=constants.DEFAULT_RAPI_PORT, >> + config_ssl_verification=ssl_config, >> + username="", password="").GetVersion() >> + except: > > Michael suggested to catch more specific exceptions. > I did as well. :) I swear I made this change! (see above git PBKAC) Fixed. I was able to find 3 different exceptions that could be generated. >> + logging.warning("Could not open connection to RAPI") >> + return False >> + logging.debug("RAPI master_version is %s", master_version) >> + return master_version == constants.RAPI_VERSION >> + > > Fair enough, but should we really check? Do we care? > It's not like it can answer any other version. And in the future, if > it can, perhaps this code can work anyway. I want to make sure there is some response. If we ever add a version 3, it could check for either. >> + >> def ParseOptions(): >> """Parse the command line options. >> >> @@ -668,6 +694,21 @@ def main(): >> # we are on master now >> utils.EnsureDaemon(constants.RAPI) >> >> + # If RAPI isn't responding to queries, try one restart. >> + logging.debug("Attempting to talk with RAPI") >> + rapi_test_result = IsRAPIResponding("localhost") >> + logging.debug("RAPI Result = %s", rapi_test_result) >> + if rapi_test_result == None: > >>>> False == None > False > You want if not rapi_test_result: > Please stop rapi and test your code on both paths :) Fixed. (and tested... with EnsureDaemon commented out to find real failure modes). >> + logging.warning("RAPI is running but did not speak. Killing RAPI") > > "Couldn't get an answer from the Ganeti RAPI daemon" perhaps? > (we're not *sure* it's running. we hope so, might have crashed?) Fixed. >> + utils.StopDaemon(constants.RAPI) >> + utils.EnsureDaemon(constants.RAPI) >> + logging.debug("Attempting to talk with RAPI") >> + rapi_test_result = IsRAPIResponding("localhost") >> + logging.debug("RAPI Result = %s", rapi_test_result) >> + if rapi_test_result is None: > > if not rapi_...: Fixed. > Also, these three lines are the same, above. > Should we put the debugging line in IsRAPIResponding and just do > > if IsRAPIResponding(...): Fixed. >> + logging.fatal("RAPI is not responding. Please investigate.") >> + logging.debug("RAPI works. Result = %s", rapi_test_result) >> + >> try: >> watcher = Watcher(options, notepad) >> except errors.ConfigurationError: >> diff --git a/lib/utils.py b/lib/utils.py >> index 7b93870..9b327d3 100644 >> --- a/lib/utils.py >> +++ b/lib/utils.py >> @@ -2166,6 +2166,19 @@ def EnsureDaemon(name): >> return True >> >> >> +def StopDaemon(name): >> + """Stop a daemon. >> + >> + """ >> + result = RunCmd([constants.DAEMON_UTIL, "stop", name]) >> + if result.failed: >> + logging.error("Can't stop daemon '%s', failure %s, output: %s", >> + name, result.fail_reason, result.output) >> + return False >> + >> + return True >> + >> + >> def WritePidFile(name): >> """Write the current process pidfile. >> >> -- >> 1.7.0.1 >> >> > > Thanks, > > Guido > Thanks for the feedback!
