Hello guys,

This is a v2 of the multi tests RFC, previously known as multi-host RFC.


    v2: Rewritten from scratch
    v2: Added examples for the demonstration to avoid confusion
v2: Removed the mht format (which was there to demonstrate manual execution)
    v2: Added 2 solutions for multi-tests
    v2: Described ways to support synchronization

The problem

A user wants to run netperf on 2 machines, which requires following manual steps:

    machine1: netserver -D
    machine1: # Wait till netserver is initialized
    machine2: netperf -H $machine1 -l 60
    machine2: # Wait till it finishes and report store the results
    machine1: # stop the netserver and report possible failures

Another use-cases might be:

1. triggering several un-related tests in parallel
2. triggering several tests in parallel with synchronization
3. spreading several tests into multiple machines
4. triggering several various tests on multiple machines

The problem is not only about running tests on multiple machines, but generally about ways to trigger tests/set of tests in whatever way the user needs to.

Running the tests

In v1 we rejected the idea to run custom code from inside the tests in bacground as it requires implementing the remote-tests again and again and we decided that executing full tests or set of tests with support for remote synchronization/data exchange is the way to go. There were two-three bigger categories so let's describe each so we can pick the most suitable one (at this moment).

For demonstration purposes I'll be writing very simple multi-host test which triggers on 3 machines "/usr/bin/wget example.org" to simulate very basic stress tests.

Synchronization and parametrization will not be covered in this section as synchronization will be described in the next chapter and is the same for all solutions and parametrization is a standard avocado feature.

Internal API

One of the ways to allow people to trigger tests and set of tests (jobs) from inside test is to pick the minimal required set of internal API which handles remote job execution, make it public (and supported) and refactor it so it can be realistically called from inside test.

Example (pseudocode)

    class WgetExample(avocado.Test):
        jobs = []
        for i, machine in enumerate(["", "",
            jobs.append(avocado.Job(urls=["/usr/bin/wget example.org"],
        for job in jobs:
        errors = []
        for i, job in enumerate(jobs):
            result = job.wait()     # returns json results
            if result["pass"] != result["total"]:
                errors.append("Tests on worker %s (%s) failed"
                              % (i, machines[i]))
        if errors:
            self.fail("Some workers failed:\n%s" % "\n".join(errors))

alternatively even require the user to define the whole workflow:

1. discover test (loader)
2. add params/variants
3. setup remote execution (RemoteTestRunner)
4. setup results (RemoteResults)

which would require even more internal API to be turned public.

+ easy to develop, we simply identify set of classes and make them public
- hard to maintain as the API would have to stay stable, therefor realistically it requires big cleanup before doing this step

Multi-tests API

To avoid the need to make the API which drives testing public, we can also introduce an API to trigger jobs/set of jobs. It would be sort of proxy between internal API, which can and changes more-often an the public multi-host API which would be supported and kept stable.

I see two basic backends supporting this API, but they both share the same public API.

Example (pseudocode)

    class WgetExample(avocado.MultiTest):
        for machine in ["", "", ""]):
        for worker in self.workers:
            worker.add_test("/usr/bin/wget example.org")
        #results = self.wait()
        #if results["failures"]:
        #    self.fail(results["failures"])
        self.run()  # does the above

The basic set of API should contain:

* MultiTest.workers - list of defined workers
* MultiTest.add_worker(machine="localhost") - to add new sub-job
* MultiTest.run(timeout=None) - to start all workers, wait for results and fail the current test if any of the workers reported failure * MultiTest.start() - start testing in background (allow this test to monitor or interact with the workers)
* MultiTest.wait(timeout=None) - wait till all workers finish
* Worker.add_test(url) - add test to be executed
* Worker.add_tests(urls) - add list of tests to be executed
* Worker.abort() - abort the execution

I didn't wanted to talk about params but they are essential for multi-tests. I think we should allow passing default params for all tests:

* Worker.params(params) - where params should be in any supported format by Test class (currently AvocadoParams or dict)

or per test during "add_test":

* Worker.add_test(url, params=None) - again, params should be any supported format (currently only possible via internal API, but even without multi-tests I'm fighting for such support on the command line)

Another option could be to allow supplying all "test" arguments using **kwargs inside the "add_test":

* Worker.add_test(url, **kwargs=None) -> discover_url and override test arguments if provided (currently only possible via internal API, probably never possible on the command line, but the arguments are methodName, name, params, base_logdir, tag, job, runner_queue and I don't see a value in overriding any them but the params)

API backed by internal API

This would implement the multi-test API using the internal API (from avocado.core).

+ runs native python
+ easy interaction and development
+ easily extensible by either using internal API (and risk changes) or by inheriting and extending the features. - lots of internal API will be involved, thus with almost every change of internal API we'd have to adjust this code to keep the MultiTest working - fabric/paramiko is not thread/parallel process safe and fails badly so first we'd have to rewrite our remote execution code (use autotest's worker, or aexpect+ssh)

API backed by cmdline

This would implement the multi-test API by translating it into "avocado run" commands during "self.start()".

+ easy to debug as users are used to the "avocado run" syntax and issues
+ allows manual mode where users trigger the "avocado run" manually
+ cmdline args are part of public API so they should stay stable
+ no issues with fabric/paramiko as each process is separate
+ even easier extensible as one just needs to implement the feature for "avocado run" and then can use it as extra_params in the worker, or send PR to support it in the stable environment. - only features available on the cmdline can be supported (currently not limiting)
- rely on stdout parsing (but avocado supports machine readable output)


Some tests does not need any synchronization, users just need to run them. But some multi-tests needs to be synchronized or they need to exchange data. For synchronization usually "barriers" are used, where barrier requires a "name" and "number of clients". One requests entry into barrier guarded section, it's interrupted until "number of clients" are waiting for it (or timeout is reached).

To do so the test needs and IP address+port where the synchronization server is listening. We can start this from the multi-test and only support it this way:

    self.sync_server.start(addr=None, port=None)  # start listening
    self.sync_server.stop()    # stop listening
    self.sync_server.details   # contact information to be used by workers

Alternatively we might even support this on the command line to allow manual execution:

    --sync-server [addr[:port]] - listen on addr:port (pick one by default)
--sync addr:port - when barrier/data exchange is used, use addr:port to contact sync server.

The cmdline argument would allow manual executions, for example for testing purposes or execution inside custom build systems (jenkins, beaker, ...) without the multi-test support.

The result is the same, avocado listens on some port and the spawned workers connect to this port, identify themselves and ask for barriers/data exchange, with the support for re-connection. To do so we have various possibilities:

Standard multiprocess API

The standard python's multiprocessing library contains over the TCP synchronization. The only problem is that "barriers" were introduced in python3 so we'd have to backport it and it does not fit all our needs so we'd have to tweak it a bit.

Autotest's syncdata

Python 2.4 friendly, supports barriers and data synchronization. On the contrary it's quite hackish and full of shortcuts.

Custom code

We can inspire by the above and create simple human-readable (easy to debug or interact with manually) protocol to support barriers and data exchange via pickling. IMO that would be easier to maintain than backporting and adjusting of the multiprocessing or fixing the autotest syncdata. A proof-of-concept can be found here:


It modifies the "passtest" to be only executed when it's executed by 2 workers at the same time. It does not support the multi-tests yet, so one has to run "avocado run passtest" twice using the same "--sync-server" (once --sync-server and once --sync).


Given the reasons I like the idea of "API backed by cmdline" as all cmdline options are stable, the output is machine readable and known to users so easily to debug manually.

For synchronization that requires the "--sync" and "--sync-server" arguments to be present, also not necessarily used when the users uses the multi-test (the multi-test can start the the server if not already started and add "--sync" for each worker if not provided).

The netperf example from introduction would look like this:

The client tests are ordinary "avocado.Test" tests that can even be executed manually without any synchronization (by providing no_client=1)

    class NetServer(avocado.Test):
        def setUp(self):
            self.barrier("setup", params.get("no_clients"))
        def test(self):
        def tearDown(self):
            self.barrier("finished", params.get("no_clients"))
            process.run("killall netserver")

    class NetPerf(avocado.Test):
        def setUp(self):
            self.barrier("setup", params.get("no_clients"))
        def test(self):
            process.run("netperf -H %s -l 60"
                        % params.get("server_ip"))
            barrier("finished", params.get("no_clients"))

One would be able to run this manually (or from build systems) using:

    avocado run NetServer --sync-server $IP:12345 &
    avocado run NetPerf --sync $IP:12345 &

(one would have to hardcode or provide the "no_clients" and "server_ip" params on the cmdline)

and the NetPerf would wait till NetServer is initialized, then it'd run the test while NetServer would wait till it finishes. For some users this is sufficient, but let's add the multi-test test to get a single results (pseudo code):

    class MultiNetperf(avocado.MultiTest):
        machines = params.get("machines")
        assert len(machines) > 1
        for machine in params.get("machines"):
            self.add_worker(machine, sync=True)     # enable sync server
        self.workers[0].set_params({"no_clients": len(self.workers)})
        for worker in self.workers[1:]:
            worker.set_parmas({"no_clients": len(self.workers),
                               "server_ip": machines[0]})


    avocado run MultiNetperf

would run a single test, which based on the params given to the test would run on several machines using the first machine as server and the rest as clients and all of them would start at the same time.

It'd produce a single results with one test id and following structure (example):

    $ tree $RESULTDIR
      └── test-results
          └── simple.mht
              ├── job.log
              ├── 1
              │   └── job.log
              └── 2
                  └── job.log

where 1 and 2 are the results of worker 1 and worker 2. For all of the solution proposed those would give the user the standard results as they know them from normal avocado executions, each with a unique id, which should help analyzing and debugging the results.

Avocado-devel mailing list

Reply via email to