On Mon, Oct 4, 2010 at 11:40 AM, Michael Hudson <[email protected]> wrote: > On Mon, 27 Sep 2010 10:41:51 +1200, Robert Collins > <[email protected]> wrote: >> I've hit my tolerance level for stale librarian processes and am >> looking to address this in the test environment. I want to make sure I >> preserve existing use cases - please tell me if I have missed any. > > I freely admit that for whatever reason I'm not thinking very clearly > today, so I may be being dense and not understanding things. But I hope > I'm making some kind of sense at least. I also started to reply before > reading the whole mail, sorry about that...
Thanks for digging in in detail. >> 1) Run an external process for the life of *a* test > > I think that there are two subcases here, one for a server-type process > where you start the process, get it to do some things and then shut it > down (the librarian would be a prototypical example, I guess) and the > case where you want to invoke a script to do something (e.g. the script > that performs a code import). > > I also think there are two cases when it comes to intent: one where the > external process is part of the test fixture and one where it's the > system under test. > > For all that, I don't think the implementation needs to take this into > account very much -- I think the problems each case faces are about the > same. > >> * create a working config for it >> * start it > > It's a trivial point, but one that AIUI requires some engineering time: > we need to start it, *using the previously created config*. I guess we > can do this using the LPCONFIG variable -- maybe we establish a > convention that if LPCONFIG starts with a / it is interpreted as an > absolute path to a directory containing config files. We also need to > make sure that lazr.config can serialize itself out to a config file > that it can read in again. We don't need lazr.config serialization, as long as we can hand-write (per helper) the fragment as a string template. what we need is the ability to combine multiple fragments. e.g. inheriting [for a config on disk] / push() (for an in memory fragment). We can in the simplest fashion create a config called by the name of the working dir we create for the helper, if absolute paths don't Just Work - that will occasionally leave grot around on disk in the tree, but easily cleaned, and bugs can be filed to improve. > These may both be already implemented for all I know. But I don't think > so, and I'm pretty sure that they're not routinely utilized. > > Using a hidden parameter like an environment variable is somewhat gross > I guess, but I can't think of a better way. We *could* be super regular > about having each of our scripts take a --config option and similarly > use a super regular API in all our tests to invoke each subprocess with > the appropriate value for this option, but that doesn't seem likely to > come to pass somehow. We need the main appserver and *everything* to take this option, and to apply it it to any child processes it takes. So I'm pretty sure an option isn't anywhere as easy as an environment variable. >> * run test >> * kill it >> * clean up any overhead left in the work environment [that we care about] >> >> 2) For many tests >> * create working config >> * start it >> -run tests- >> * check its still ok and do per-test isolation >> * run a test >> * kill it >> * clean up > > In some sense, this doesn't seem interestingly different to the above > problem, you just start and stop the process in different places. The main difference is needing a per-test step, which in layers is 'setUpTest' and 'tearDownTest', I'm not convinced that having two separate calls is needed, I think one call (reset) is sufficient. > Again, there's the issue of propagating 'which config to use' around. > >> 3) For 'make run' >> * Use a small commandline tool that >> * creates working config >> * starts it >> * waits for SIGINT >> * stops it >> * cleans up >> >> 4) for concurrent testing >> * As for many tests, but the creation of a working config needs to be >> safe in the presence of concurrent activity. >> * The created config needs to be sharable with *other* external >> processes (e.g. the buildmaster may want to talk to the librarian) > > Putting this bullet here doesn't really make sense to me, although I > agree it's necessary. We already have many tests that invoke scripts > that need to talk to the librarian, for example. We don't have any helpers that depend on a tree of dynamically created config fragments at the moment, and we have to make sure that we can glue them together. For instance, without this point we might do: - start unique helpers by running an external process which returns a config - helpers don't get given a special config, we just apply the config they spit out in-process. So yes, its necessary, and we could easily implement in a way that fails this if if we don't require it. >> 5) For low-overhead iteration >> * Find an existing external process >> * Must 'know' its config a-priori >> -run tests- >> * check the process is running, do per-test isolation >> * run a test > > Hm. This feels a bit tricky, but I guess you don't need this at the > same time as concurrent testing... Ideally starting things like appservers and the librarian wouldn't take 6-7 seconds. removing zcml will probably fix that (and then we can remove the low-overhead iteration story completely). >> If the above set is complete, then I am proposing to combine things in >> the following way. >> Firstly, because its a good building block, the make run use case. >> Note that the current code is duplicative/overlapping with the test >> helper code - I'm proposing to consolidate this. This shouldn't look >> too different to our runlaunchpad Service today, except that we'll >> have more entry points (to do cleanup etc). >> - the small helper will do the following for a given service: >> start up the instance >> optionally print a bash snippet with variables (like ssh-agent >> does), including the helpers pid >> - this is useful for running up isolated copies >> - main process runs > > I think I agree that the code we use in the 'make run' case is nicer and > has a nicer API than the code we use in the tests, but I'm not sure that > the make run use case is very much like the concurrent testing case -- > partly because of state that's embedded in things like the apache config > and partly because the ways you interact with things in the 'make run' > case doesn't have a convenient parent process/child process relationship > to know which running instance we should be talking to. I don't quite follow you here. I'm saying that there is a DAG (nothing has mutual dependencies - this is true today, and if it stops being true we could separate 'allocate and execute' to get a DAG); given that we have a parent->child structure we can create (but it won't be process based, rather we'd bring up helpers, and inject their returned config into the next as per the dag). >> This lets us capture useful things from starting it off without >> needing a well known location a-priori. > > To state my above point again: in this case, sure you can start it off > without needing a well-known location, but how do you find it again? It returns it as shell variables - 'bash snippets'. >> We can even 'run' postgresql in this fashion, and have it return the >> DB name to use. > > Oh, I see, maybe you're thinking of this more at the level of how the > 'make run' code builds up the various interacting processes it needs to > start? Yes. >> Now, the cheap test iteration case can be addressed: >> - devs run eval `bin/run -i test --daemonise` >> - this outputs all the variables for all the servers started. >> - test code looks for a *per-service* information about pid files etc. >> e.g. LP_LIBRARIAN_PIDFILE and LP_LIBRARIAN_CONFIGFILE rather than >> LP_PERSISTENT_TEST_SERVICES >> - to kill, eval `bin/test-servers --stop` >> (Which will kill the daemonised wrapper, and unset the environment >> variables). >> - If LP_PERSISTENT_TEST_SERVICES is set and a service isn't running, >> I propose to error, because I think it usefully indicates a bug in >> that external process, and this is easier than detecting both 'not >> started yet' and 'started but crashed' - especially given the test >> runners tendancy to fork sub runners. >> >> >> Concurrent testing then is easy: as long as all the fixtures are >> meeting this contract, if the *default* behaviour is to bring up a >> unique instance, everything will come up fine. > > Well, "easy": I think we'll find we have a lot of absolute paths in our > testrunner config currently, and these will need to be shaken out before > we can reliably run tests in parallel on the same machine. I think that > if we can solve the external process problem, the other problems will be > shallow even if they might be numerous, but maybe you are focusing a bit > too much on the process case here. > > To try to be super clear, here's an example/use case: the acceptance > tests for the scanner currently create branches in a known location > (/var/tmp/bazaar.launchpad.net/mirrors I think), invoke the scanner > script and check that the database has been updated. In the new world, > I guess they would create some branches in a temporary directory and > then use some api to set the supermirror.hosted_branches_by_id (iirc) > config key to point at this temporary directory and invoke the scanner > script and check the database as before. yes. Specifically I would probably layer it like this: branches = BranchHostingFixture() scanner = ScannerFixture(branches) branches.setUp() scanner.setUp() # make branches in branches.mirror_dir scanner.trigger_scan() > This mythical API would have to work in a way that crosses process > boundaries, possibly by creating yet another temporary directory, > creating a file in it called launchpad.conf that looks like this: > > extends: $previous-value-of-LPCONFIG/launchpad.conf > > [supermirror] > hosted_branches_by_id: file:///path/to/tmp/dir created by branches.setUp(); found by scanner.setUp() > and update the LPCONFIG environment variable to point at this directory > (as well as updating the in-process in-memory config). It would > presuambly be organized as a fixture to make undoing all this easy. > > Does this make sense? yes. >> Note that in this model there is never a need to do more than 'kill >> helper-pid' to shut something down: that helper pid will encapsulate >> all the cleanup logic, kill-9ing, dropdb of temporary dbs etc, and the >> helper code should be simple and robust. This will help give us a >> simple, robust interface. > > Right, that bit sounds nice. We should of course work on fixing it, but > I don't think we can rely on our processes shutting down nicely when > asked. Having a wrapper that does that sounds good. > >> In the python code, I think something like the following will do: >> >> class ExternalService(Fixture): >> """An external service used by Launchpad. >> >> :ivar service_config: A config fragment with the variables for this >> service. >> :ivar service_label: the label for the service. e.g. LIBRARIAN. >> Used in generating >> environment variable names. >> """ >> >> def setUp(self, use_existing=False, unique_instance=True): >> """Setup an external service. >> >> :param use_existing: If True, look for an use an existing instance. >> :param unique_instance: If False use the LP_CONFIG service >> definition. >> Otherwise, create a new service definition for this >> instance, which can >> be found on self.service_config >> """ >> >> def reset(self): >> """Ensure the service is running and ready for another client. >> >> Any state accumulated since setUp is discarded or ignored >> (which is up to the service implementation). >> """ >> >> def cleanUp(self): >> """Shutdown the service and remove any state created as part >> of setUp.""" >> >> >> The wrapper helper becomes something like the following (but with >> private stdout/stderr to avoid race conditions and console spew). >> >> def wrapper_process(service): >> pid = os.fork() >> if pid: >> wait_for_ok(..) >> print "LP_%s_CONTROL_PID %d" % (service.service_label, pid) >> exit() >> service.setUp() >> try: >> print "LP_%S_CONFIGFILE %s" % (service.service_label, >> stash_config(service.service_config)) >> signal.pause() >> finally: >> service.cleanUp() >> >> >> Note that reset() with persistent test services is obviously a little >> more limited. >> >> What do you think? > > I think it by and large makes sense. The only part that I think might > have to change is that you talk about environment variables like > LP_LIBRARIAN_CONFIGFILE, but in fact in Launchpad all services are > configured from one config file. So I think we'll end up managing a > stack of config files, and pushing and popping from the stack will be a > matter of updating where LPCONFIG points. > > Apologies for the slightly rambling reply, I hope it's useful. It is indeed useful. One misconception that this has shown up is that we have 'one config file'. We -don't-. We have many config files, smushed into one big file for testing, but in production we have *mutually exclusive* values in the configs for 'librarianX' and 'appserverY', due to the difference between 'how a service runs' and 'how a service is located'. As such, I think its best to model that directly, and glue the 'how to locate a service' bits together just-in-time for a service that we're starting. its really just dependency injection write large. -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

