I've hit my tolerance level for stale librarian processes and am
looking to address this in the test environment. I want to make sure I
preserve existing use cases - please tell me if I have missed any.

1) Run an external process for the life of *a* test
* create a working config for it
* start it
* run test
* kill it
* clean up any overhead left in the work environment [that we care about]

2) For many tests
* create working config
* start it
-run tests-
 * check its still ok and do per-test isolation
 * run a test
* kill it
* clean up

3) For 'make run'
* Use a small commandline tool that
 * creates working config
 * starts it
 * waits for SIGINT
 * stops it
 * cleans up

4) for concurrent testing
* As for many tests, but the creation of a working config needs to be
safe in the presence of concurrent activity.
* The created config needs to be sharable with *other* external
processes (e.g. the buildmaster may want to talk to the librarian)

5) For low-overhead iteration
* Find an existing external process
* Must 'know' its config a-priori
-run tests-
 * check the process is running, do per-test isolation
 * run a test

6) Start a particular server in production
 * I think we should probably -not- have this as a use case: server
management, rotation, graceful setup and tear down are much more
complex than in a testing environment. Instead we may need some
supporting logic around this, in the server bring up/tear down code,
but at least for now that should be considered a separate problem.


If the above set is complete, then I am proposing to combine things in
the following way.
Firstly, because its a good building block, the make run use case.
Note that the current code is duplicative/overlapping with the test
helper code - I'm proposing to consolidate this. This shouldn't look
too different to our runlaunchpad Service today, except that we'll
have more entry points (to do cleanup etc).
 - the small helper will do the following for a given service:
   start up the instance
   optionally print a bash snippet with variables (like ssh-agent
does), including the helpers pid
     - this is useful for running up isolated copies
 - main process runs

This lets us capture useful things from starting it off without
needing a well known location a-priori.

We can even 'run' postgresql in this fashion, and have it return the
DB name to use.


Now, the cheap test iteration case can be addressed:
 - devs run eval `bin/run -i test --daemonise`
   - this outputs all the variables for all the servers started.
 - test code looks for a *per-service* information about pid files etc.
   e.g. LP_LIBRARIAN_PIDFILE and LP_LIBRARIAN_CONFIGFILE rather than
LP_PERSISTENT_TEST_SERVICES
 - to kill, eval `bin/test-servers --stop`
   (Which will kill the daemonised wrapper, and unset the environment
variables).
 - If LP_PERSISTENT_TEST_SERVICES is set and a service isn't running,
I propose to error, because I think it usefully indicates a bug in
that external process, and this is easier than detecting both 'not
started yet' and 'started but crashed' - especially given the test
runners tendancy to fork sub runners.


Concurrent testing then is easy: as long as all the fixtures are
meeting this contract, if the *default* behaviour is to  bring up a
unique instance, everything will come up fine.


Note that in this model there is never a need to do more than 'kill
helper-pid' to shut something down: that helper pid will encapsulate
all the cleanup logic, kill-9ing, dropdb of temporary dbs etc, and the
helper code should be simple and robust. This will help give us a
simple, robust interface.


In the python code, I think something like the following will do:

class ExternalService(Fixture):
    """An external service used by Launchpad.

    :ivar service_config: A config fragment with the variables for this
        service.
    :ivar service_label: the label for the service. e.g. LIBRARIAN.
Used in generating
        environment variable names.
    """

    def setUp(self, use_existing=False, unique_instance=True):
        """Setup an external service.

        :param use_existing: If True, look for an use an existing instance.
        :param unique_instance: If False use the LP_CONFIG service definition.
            Otherwise, create a new service definition for this
instance, which can
            be found on self.service_config
        """

    def reset(self):
        """Ensure the service is running and ready for another client.

        Any state accumulated since setUp is discarded or ignored
(which is up to the service implementation).
        """

    def cleanUp(self):
        """Shutdown the service and remove any state created as part
of setUp."""


The wrapper helper becomes something like the following (but with
private stdout/stderr to avoid race conditions and console spew).

def wrapper_process(service):
    pid = os.fork()
    if pid:
        wait_for_ok(..)
        print "LP_%s_CONTROL_PID %d" % (service.service_label, pid)
        exit()
    service.setUp()
    try:
        print "LP_%S_CONFIGFILE %s" % (service.service_label,
stash_config(service.service_config))
        signal.pause()
    finally:
        service.cleanUp()


Note that reset() with persistent test services is obviously a little
more limited.

What do you think?

(note that atexit is equivalent to the above code, its not at all
useful except when a normal unwind occurs).

-Rob

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to