I'm still learning "git" and for some reason I was able to send that
last patch even though it didn't include some of the code changes I
had made in response to past comments.  Sorry if it looked like I was
ignoring your comments.  I'm still getting used to git's strangeness.

On Sun, May 30, 2010 at 7:40 AM, Guido Trotter <[email protected]> wrote:
> On Sat, May 29, 2010 at 9:27 PM, Tom Limoncelli <[email protected]> wrote:
>>
>> Signed-off-by: Tom Limoncelli <[email protected]>
>> ---
>>  daemons/ganeti-watcher |   41 +++++++++++++++++++++++++++++++++++++++++
>>  lib/utils.py           |   13 +++++++++++++
>>  2 files changed, 54 insertions(+), 0 deletions(-)
>>
>> diff --git a/daemons/ganeti-watcher b/daemons/ganeti-watcher
>> index 1f82db8..3665f03 100755
>> --- a/daemons/ganeti-watcher
>> +++ b/daemons/ganeti-watcher
>> @@ -36,6 +36,7 @@ import sys
>>  import time
>>  import logging
>>  from optparse import OptionParser
>> +import urllib2
>
> I don't see anything in this patch using this import.
> Is it needed? Does the commit pass "make lint"?

This was needed for one of the exceptions (see below).

>>  from ganeti import utils
>>  from ganeti import constants
>> @@ -48,6 +49,7 @@ from ganeti import ssconf
>>  from ganeti import bdev
>>  from ganeti import hypervisor
>>  from ganeti.confd import client as confd_client
>> +from ganeti.rapi import client as rapi_client
>>
>>
>>  MAXTRIES = 5
>> @@ -595,6 +597,30 @@ def OpenStateFile(path):
>>   return os.fdopen(statefile_fd, "w+")
>>
>>
>> +def IsRAPIResponding(cluster):
>> +  """Connects to RAPI port and does a simple test.
>> +
>> + �...@type cluster: string
>> + �...@param cluster: hostname of the cluster to connect to.
>> +
>
> Should we call it "hostname" or "host" rather than cluster?
> In the end we're not passing a "cluster" at all. :)

Fixed.

>> +  Returns:
>> +    True: basic tests passed.
>> +    False: basic tests failed.
>
> s/tests/test/ (you're doing only one)

Fixed.

>> +  """
>> +  ssl_config = rapi_client.CertAuthorityVerify(constants.RAPI_CERT_FILE)
>> +  try:
>> +    master_version = rapi_client.GanetiRapiClient(
>> +        cluster,
>> +        port=constants.DEFAULT_RAPI_PORT,
>> +        config_ssl_verification=ssl_config,
>> +        username="", password="").GetVersion()
>> +  except:
>
> Michael suggested to catch more specific exceptions.
> I did as well. :)

I swear I made this change! (see above git PBKAC)

Fixed.  I was able to find 3 different exceptions that could be generated.

>> +    logging.warning("Could not open connection to RAPI")
>> +    return False
>> +  logging.debug("RAPI master_version is %s", master_version)
>> +  return master_version == constants.RAPI_VERSION
>> +
>
> Fair enough, but should we really check? Do we care?
> It's not like it can answer any other version. And in the future, if
> it can, perhaps this code can work anyway.

I want to make sure there is some response.  If we ever add a version
3, it could check for either.

>> +
>>  def ParseOptions():
>>   """Parse the command line options.
>>
>> @@ -668,6 +694,21 @@ def main():
>>       # we are on master now
>>       utils.EnsureDaemon(constants.RAPI)
>>
>> +      # If RAPI isn't responding to queries, try one restart.
>> +      logging.debug("Attempting to talk with RAPI")
>> +      rapi_test_result = IsRAPIResponding("localhost")
>> +      logging.debug("RAPI Result = %s", rapi_test_result)
>> +      if rapi_test_result == None:
>
>>>> False == None
> False
>  You want if not rapi_test_result:
> Please stop rapi and test your code on both paths :)

Fixed.

(and tested... with EnsureDaemon commented out to find real failure modes).

>> +        logging.warning("RAPI is running but did not speak.  Killing RAPI")
>
> "Couldn't get an answer from the Ganeti RAPI daemon" perhaps?
> (we're not *sure* it's running. we hope so, might have crashed?)

Fixed.

>> +        utils.StopDaemon(constants.RAPI)
>> +        utils.EnsureDaemon(constants.RAPI)
>> +        logging.debug("Attempting to talk with RAPI")
>> +        rapi_test_result = IsRAPIResponding("localhost")
>> +        logging.debug("RAPI Result = %s", rapi_test_result)
>> +        if rapi_test_result is None:
>
> if not rapi_...:

Fixed.

> Also, these three lines are the same, above.
> Should we put the debugging line in IsRAPIResponding and just do
>
> if IsRAPIResponding(...):

Fixed.

>> +          logging.fatal("RAPI is not responding. Please investigate.")
>> +      logging.debug("RAPI works. Result = %s", rapi_test_result)
>> +
>>       try:
>>         watcher = Watcher(options, notepad)
>>       except errors.ConfigurationError:
>> diff --git a/lib/utils.py b/lib/utils.py
>> index 7b93870..9b327d3 100644
>> --- a/lib/utils.py
>> +++ b/lib/utils.py
>> @@ -2166,6 +2166,19 @@ def EnsureDaemon(name):
>>   return True
>>
>>
>> +def StopDaemon(name):
>> +  """Stop a daemon.
>> +
>> +  """
>> +  result = RunCmd([constants.DAEMON_UTIL, "stop", name])
>> +  if result.failed:
>> +    logging.error("Can't stop daemon '%s', failure %s, output: %s",
>> +                  name, result.fail_reason, result.output)
>> +    return False
>> +
>> +  return True
>> +
>> +
>>  def WritePidFile(name):
>>   """Write the current process pidfile.
>>
>> --
>> 1.7.0.1
>>
>>
>
> Thanks,
>
> Guido
>

Thanks for the feedback!

Reply via email to