Re: [Avocado-devel] RFC: Multi tests (previously multi-host test) [v2]

Lukáš Doktor Mon, 04 Apr 2016 03:00:00 -0700

Dne 4.4.2016 v 07:14 Cleber Rosa napsal(a):

On 03/31/2016 12:55 PM, Lukáš Doktor wrote:

Hello guys,


This is a v2 of the multi tests RFC, previously known as multi-host RFC.

Changes:

     v2: Rewritten from scratch
     v2: Added examples for the demonstration to avoid confusion
     v2: Removed the mht format (which was there to demonstrate manual
execution)
     v2: Added 2 solutions for multi-tests
     v2: Described ways to support synchronization

The problem
===========


I believe a formal definition of the problem may help us to keep the
possible solutions in a closer sight. I would describe the problem as:

"Allow tests to have some of its blocks of code run in separate
stream(s)[1]".

Streams sounds good.


"Blocks of code", for now, is a rather abstract concept, to be discussed
later.

A user wants to run netperf on 2 machines, which requires following
manual steps:

     machine1: netserver -D
     machine1: # Wait till netserver is initialized
     machine2: netperf -H $machine1 -l 60
     machine2: # Wait till it finishes and report store the results
     machine1: # stop the netserver and report possible failures


Using the definition given above, all code run on prefixed with
"machine1:" would be one execution stream and and all code prefixed with
"machine2:" would be a second stream.

The test itself would be a single entity, composed of its own code in
addition the code to be run on machine1 and machine2, as covered before.

yep

Another use-cases might be:


Using the same definition, these use cases would become:

1. triggering several un-related tests in parallel


"Running blocks of code in parallel".

2. triggering several tests in parallel with synchronization


"Running blocks of code in parallel with synchronization".

3. spreading several tests into multiple machines


"Running blocks of code in multiple external machines". A point here:
this could mean either sequentially or in parallel.

4. triggering several various tests on multiple machines


"Running varied blocks of code in multiple external machines".


The problem is not only about running tests on multiple machines, but
generally about ways to trigger tests/set of tests in whatever way the
user needs to.


Based on the definition given, "running tests on multiple machines" is
not the *direct* scope of this RFC. Running *tests* um multiple machines
(either sequentially or in parallel) could be the scope of a "multi host
*job*" RFC, that is a Job that encompass tests that would be run into
multiple different machines. In such a (multi host) job, there would be
a 1:N relationship between a job and machines, and a 1:1 relationship
between a test and a machine.

Probably yes. For clarification the relation of job:machines:streamswould be 1:1:N on multi-test tests and 1:N:M (N<=M) for multi-host-tests.


Running the tests
=================

In v1 we rejected the idea to run custom code from inside the tests in
bacground as it requires implementing the remote-tests again and again
and we decided that executing full tests or set of tests with support
for remote synchronization/data exchange is the way to go. There were
two-three bigger categories so let's describe each so we can pick the
most suitable one (at this moment).


My current understanding is that the approach of implementing the remote
execution of code based on the execution of multiple "avocado"
instances, with different command line options to reflect the multiple
executions was abandoned.

I understood we rejected the idea of running functions/methods as theexecuted blocks, because it does not allow easy share of those segment.

For demonstration purposes I'll be writing very simple multi-host test
which triggers on 3 machines "/usr/bin/wget example.org" to simulate
very basic stress tests.


Again revising all statements in light of the definition given before,
this clearly means one single test, with the same "block of code"
(/usr/bin/wget example.org) to be executed on 3 machines.

Synchronization and parametrization will not be covered in this section
as synchronization will be described in the next chapter and is the same
for all solutions and parametrization is a standard avocado feature.


Internal API
------------

One of the ways to allow people to trigger tests and set of tests (jobs)
from inside test is to pick the minimal required set of internal API
which handles remote job execution, make it public (and supported) and
refactor it so it can be realistically called from inside test.

Example (pseudocode)

     class WgetExample(avocado.Test):
         jobs = []
         for i, machine in enumerate(["127.0.0.1", "192.168.122.2",
                                      "192.168.122.3"]):
             jobs.append(avocado.Job(urls=["/usr/bin/wget example.org"],
                                     remote_machine=machine,
                                     logdir=os.path.join(self.logdir,
                                                         i)))


The example given may lead a reader into thinking the problem that is
attempted to be solved here is one of remote execution of commands. So'
let's just remind ourselves that the problem at stake, IMHO, is:

"Allow tests to have some of its blocks of code run in separate stream(s)".

Yep, we could use ["127.0.0.1", "127.0.0.1", "127.0.0.1"]. This exampleis only trying to be generic enough to describe the minimal API.

         for job in jobs:
             job.run_background()
         errors = []
         for i, job in enumerate(jobs):
             result = job.wait()     # returns json results
             if result["pass"] != result["total"]:
                 errors.append("Tests on worker %s (%s) failed"
                               % (i, machines[i]))
         if errors:
             self.fail("Some workers failed:\n%s" % "\n".join(errors))


This example defines a "code block" unit as an Avocado Job. So,
essentially, using the previous definition I gave, the suggestion would
be to translated to:

"Allow Avocado tests to have Avocado Jobs run in separate stream(s)"

The most striking aspect of this example is of course the use of an
Avocado Job inside an Avocado Test. An Avocado Job, by definition and
implementation, is a "logical container" for tests. Having a *test*
firing *jobs* as part of the official solution to crosses the layers we
defined and designed ourselves.

Given that an Avocado Job includes most of the functionality of Avocado
(as a whole), too many questions can be raised with regards to what
aspects of these (intra test) Jobs are to be supported.

To summarize it, I'm skeptical that an Avocado Job should be the "code
block" unit for problem at hand.

Yes, this is correct. As mentioned below, we can avoid using job byusing Loader+RemoteTestRunner+RemoteResults+Multiplexer to achieve thisonly for single tests.

As I wrote and you supported me in this, solving it this way makes ithard to distinguish what of the set of features avocado support issupported in the "nested" job.

alternatively even require the user to define the whole workflow:

1. discover test (loader)
2. add params/variants
3. setup remote execution (RemoteTestRunner)
4. setup results (RemoteResults)

which would require even more internal API to be turned public.

+ easy to develop, we simply identify set of classes and make them public
- hard to maintain as the API would have to stay stable, therefor
realistically it requires big cleanup before doing this step

Multi-tests API
--------------

To avoid the need to make the API which drives testing public, we can
also introduce an API to trigger jobs/set of jobs. It would be sort of
proxy between internal API, which can and changes more-often an the
public multi-host API which would be supported and kept stable.

I see two basic backends supporting this API, but they both share the
same public API.

Example (pseudocode)

     class WgetExample(avocado.MultiTest):
         for machine in ["127.0.0.1", "192.168.122.2", "192.168.122.3"]):
             self.add_worker(machine)
         for worker in self.workers:
             worker.add_test("/usr/bin/wget example.org")
         #self.start()
         #results = self.wait()
         #if results["failures"]:
         #    self.fail(results["failures"])
         self.run()  # does the above


I have hopes, maybe naive ones, that regular Avocado tests can have some
of its code blocks run on different streams with the aid of some APIs.
What I mean is that I Avocado would not *require* a specialized class
for that.

Currently we're talking about ~(5-10) methods. I don't like polluting ofthe `avocado.Test` class, that's why I choose `avocado.MultiTest`instead. But we can talk about the options:

* `avocado.MultiTest.*` - inherited from `avocado.Test`, adding somehelpers to create and feed the streams. (my favorited)* `avocado.Test.avocado.*` - If you remember I proposed movingnon-critical methods from `Test` to `Test.avocado`. This could be a 1stclass citizen there.* `avocado.Test.avocado.multi.*` - the same as above but the multi-APIwould be inside `multi` object.* `avocado.Test.multi` - the multi-API would be part of the main Test,but inside the `multi` object* `avocado.Test.*` - I'd not support this as we're extending the maininterface of yet another bunch of methods many people are not interestedin at all.

Note: Imagine whatever keyword instead of `multi` like (streams,workers, multihost, nested, ...)

The basic set of API should contain:

* MultiTest.workers - list of defined workers
* MultiTest.add_worker(machine="localhost") - to add new sub-job
* MultiTest.run(timeout=None) - to start all workers, wait for results
and fail the current test if any of the workers reported failure
* MultiTest.start() - start testing in background (allow this test to
monitor or interact with the workers)
* MultiTest.wait(timeout=None) - wait till all workers finish
* Worker.add_test(url) - add test to be executed
* Worker.add_tests(urls) - add list of tests to be executed
* Worker.abort() - abort the execution

I didn't wanted to talk about params but they are essential for
multi-tests. I think we should allow passing default params for all
tests:

* Worker.params(params) - where params should be in any supported format
by Test class (currently AvocadoParams or dict)

or per test during "add_test":

* Worker.add_test(url, params=None) - again, params should be any
supported format (currently only possible via internal API, but even
without multi-tests I'm fighting for such support on the command line)

Another option could be to allow supplying all "test" arguments using
**kwargs inside the "add_test":

* Worker.add_test(url, **kwargs=None) -> discover_url and override test
arguments if provided (currently only possible via internal API,
probably never possible on the command line, but the arguments are
methodName, name, params, base_logdir, tag, job, runner_queue and I
don't see a value in overriding any them but the params)


This example now suggests an Avocado Test as the "code block" unit. That
is, the problem definition would translate roughly to:

"Allow Avocado tests to have Avocado tests run in separate stream(s)"

We now have a smaller "code block" unit, but still one that can make the
design and definitions a little confusing at first. Questions that
immediately arise:

Q) Is a test the smaller "code block" unit we could use?
A) Definitely not. We could use single code statements, a function or a
("builtin" object inherited) class.

Q) Is it common to have "existing tests" run as part of "multi host" tests?
A) Pretty common (think of benchmark tests).

Q) Is there value in letting developers keep the same development flow
and use the same test APIs for the "code blocks"?
A) IMHO, yes.

Q) Should a test (which is not a container) hold test results?
A) IMHO, no.

I disagree with that. In case of failure it's sometimes better to seethe combined results, but sometimes it's better to dig deeper and seeeach test in separate (as people designed some tests to be executedalone and then combined them. They are used to their results).


My understanding is that it is still possible to keep the design
predictable by setting an "Avocado Test" as code block. One more aspect
that supports this view is that there's consensus and ongoing work to
make an Avocado Test slimmer, and remove from it the machinery needed to
make/help tests run.

Yep I agree with that and as I mentioned during our mini-meeting Ithinks something like standalone execution could be the way to go (notreally standalone, but very striped out execution of the test withmachine-readable results).


This way, from the Avocado design and results perspective we'd still have:

[Job [test #1] [test #2] [...] [test #n]]

And from the developer perspective, we'd have:

     from avocado_misc_tests.perf.stress import Stress
     class StressVMOnHost(avocado.Test):
         def test(self):
             ...
             worker1.add(Stress)
             worker2.add(Stress)
             ...
             if not require_pass(worker1, worker2):
                 self.fail(reason)

Worth mention what the worker (stream) supports and reports. In my headit allows multiple tests execution (in sequence) so the same way itshould report list of the results. Also I think PASS/FAIL is not reallya sufficient result, it should report list of json results of all addedtests.

The function `require_pass` would then go through the list and check ifall statuses are `PASS/WARN`, but some users might add custom logic.


One alternative approach would be to formally introduce the concept of
"code blocks", and allow Avocado Tests to be used as such:

     from avocado_misc_tests.perf.stress import Stress
     class StressVMOnHost(avocado.Test):
         def test(self):
             code_block = code_block_from_test(Stress)
             ...
             worker1.add(code_block)
             worker2.add(code_block)
             ...
             if not require_success(worker1, worker2):
                 self.fail(reason)

I'm not sure what the `Stress` is. In my vision we should support thetest `urls` (so the API discovers it using loaders). The problem is ifusers provide url which spawns into multiple tests. Should we run themsequentially? Should we fail? What do we report?


API backed by internal API
~~~~~~~~~~~~~~~~~~~~~~~~~~

This would implement the multi-test API using the internal API (from
avocado.core).

+ runs native python
+ easy interaction and development
+ easily extensible by either using internal API (and risk changes) or
by inheriting and extending the features.
- lots of internal API will be involved, thus with almost every change
of internal API we'd have to adjust this code to keep the MultiTest
working
- fabric/paramiko is not thread/parallel process safe and fails badly so
first we'd have to rewrite our remote execution code (use autotest's
worker, or aexpect+ssh)


Even with listed challenges (and certainly more to come), this is the
the way to go.

I got to this point too, although IMO it requires more work than thecmdline-backed API. The deal-breaker for me is the support to add testsduring execution and more-flow control.


API backed by cmdline
~~~~~~~~~~~~~~~~~~~~~

This would implement the multi-test API by translating it into "avocado
run" commands during "self.start()".

+ easy to debug as users are used to the "avocado run" syntax and issues
+ allows manual mode where users trigger the "avocado run" manually
+ cmdline args are part of public API so they should stay stable
+ no issues with fabric/paramiko as each process is separate
+ even easier extensible as one just needs to implement the feature for
"avocado run" and then can use it as extra_params in the worker, or send
PR to support it in the stable environment.
- only features available on the cmdline can be supported (currently not
limiting)
- rely on stdout parsing (but avocado supports machine readable output)


I wholeheartedly disagree with this implementation suggestion. Some
reasons where given on the previous RFC version response.

Yes I know, but I'm still a bit fond of this version (but as mentionedearlier I'm more inclined to the internal-backed API). The reasons arethat __ALL__ cmdline options are supported and should be stable. Thatmeans users could actually pass any "extra_params" matching for theircustom plugins and have them supported across versions. Doing the samefor internal API would require further modifications as the internal ofthe multi API would be non-public API, therefor it could be changing allthe time.


Synchronization
===============

Some tests does not need any synchronization, users just need to run
them. But some multi-tests needs to be synchronized or they need to
exchange data. For synchronization usually "barriers" are used, where
barrier requires a "name" and "number of clients". One requests entry
into barrier guarded section, it's interrupted until "number of clients"
are waiting for it (or timeout is reached).

To do so the test needs and IP address+port where the synchronization
server is listening. We can start this from the multi-test and only
support it this way:

     self.sync_server.start(addr=None, port=None)  # start listening
     self.sync_server.stop()    # stop listening
     self.sync_server.details   # contact information to be used by
workers

Alternatively we might even support this on the command line to allow
manual execution:

     --sync-server [addr[:port]] - listen on addr:port (pick one by
default)
     --sync addr:port - when barrier/data exchange is used, use
addr:port to contact sync server.

The  cmdline argument would allow manual executions, for example for
testing purposes or execution inside custom build systems (jenkins,
beaker, ...) without the multi-test support.

The result is the same, avocado listens on some port and the spawned
workers connect to this port, identify themselves and ask for
barriers/data exchange, with the support for re-connection. To do so we
have various possibilities:

Standard multiprocess API
-------------------------

The standard python's multiprocessing library contains over the TCP
synchronization. The only problem is that "barriers" were introduced in
python3 so we'd have to backport it and it does not fit all our needs so
we'd have to tweak it a bit.


Autotest's syncdata
-------------------

Python 2.4 friendly, supports barriers and data synchronization. On the
contrary it's quite hackish and full of shortcuts.


Custom code
-----------

We can inspire by the above and create simple human-readable (easy to
debug or interact with manually) protocol to support barriers and data
exchange via pickling. IMO that would be easier to maintain than
backporting and adjusting of the multiprocessing or fixing the autotest
syncdata. A proof-of-concept can be found here:

     https://github.com/avocado-framework/avocado/pull/1019

It modifies the "passtest" to be only executed when it's executed by 2
workers at the same time. It does not support the multi-tests yet, so
one has to run "avocado run passtest" twice using the same
"--sync-server" (once --sync-server and once --sync).


Conclusion
==========

Given the reasons I like the idea of "API backed by cmdline" as all
cmdline options are stable, the output is machine readable and known to
users so easily to debug manually.

For synchronization that requires the "--sync" and "--sync-server"
arguments to be present, also not necessarily used when the users uses
the multi-test (the multi-test can start the the server if not already
started and add "--sync" for each worker if not provided).

The netperf example from introduction would look like this:

The client tests are ordinary "avocado.Test" tests that can even be
executed manually without any synchronization (by providing no_client=1)

     class NetServer(avocado.Test):
         def setUp(self):
             process.run("netserver")
             self.barrier("setup", params.get("no_clients"))
         def test(self):
             pass
         def tearDown(self):
             self.barrier("finished", params.get("no_clients"))
             process.run("killall netserver")

     class NetPerf(avocado.Test):
         def setUp(self):
             self.barrier("setup", params.get("no_clients"))
         def test(self):
             process.run("netperf -H %s -l 60"
                         % params.get("server_ip"))
             barrier("finished", params.get("no_clients"))

One would be able to run this manually (or from build systems) using:

     avocado run NetServer --sync-server $IP:12345 &
     avocado run NetPerf --sync $IP:12345 &

(one would have to hardcode or provide the "no_clients" and "server_ip"
params on the cmdline)

and the NetPerf would wait till NetServer is initialized, then it'd run
the test while NetServer would wait till it finishes. For some users
this is sufficient, but let's add the multi-test test to get a single
results (pseudo code):

     class MultiNetperf(avocado.MultiTest):
         machines = params.get("machines")
         assert len(machines) > 1
         for machine in params.get("machines"):
             self.add_worker(machine, sync=True)     # enable sync server
         self.workers[0].add_test("NetServer")
         self.workers[0].set_params({"no_clients": len(self.workers)})
         for worker in self.workers[1:]:
             worker.add_test("NetPerf")
             worker.set_parmas({"no_clients": len(self.workers),
                                "server_ip": machines[0]})
         self.run()

Running:

     avocado run MultiNetperf

would run a single test, which based on the params given to the test
would run on several machines using the first machine as server and the
rest as clients and all of them would start at the same time.

It'd produce a single results with one test id and following structure
(example):


     $ tree $RESULTDIR
       └── test-results
           └── simple.mht


As you pointed out during our chat, the suffices ".mht" was not intended
here.

I'm sorry, it was copy&paste mistake. It's just a test name, so imagine"MultiNetperf" instead.

               ├── job.log
                   ...
               ├── 1
               │   └── job.log
                       ...
               └── 2
                   └── job.log
                       ...


Getting back to the definitions that were laid out, I revised my
understanding and now I believe/suggest that we should have a single
"job.log" per job.

As mentioned earlier I disagree with this. I think we need includeper-stream results too. Not necessarily with all the job info. Soupdated example for the MultiNetperf would be:



    job-2016-04-01T13.19-795dad3
    ├── job.log
    └── test-results
        └── netperf.NetPerf.test
            ├── debug.log
            ├── stream1
            │   ├── 000_SystemInfo
            │   └── 001_NetServer.log
            └── stream2
                ├── 000_SystemInfo
                └── 001_NetPerf.log

Where:

* job.log contains job log

* debug.log contains logs from the MultiNetperf as well as outputs ofstream1 and stream2 as they happen (if possible)* 000_SystemInfo contains system info of the worker (could be eitherdirectory as we know from test, or simplified sys-info)* \d+_$name - contains the output of the individual executed "codeblocks" per code-block.


We could even allow people to name the streams, but that's just a detail.

where 1 and 2 are the results of worker 1 and worker 2. For all of the
solution proposed those would give the user the standard results as they
know them from normal avocado executions, each with a unique id, which
should help analyzing and debugging the results.


[1] - Using "streams" instead of "threads" to reduce confusion with the
classical multi-processing pattern of threaded programming and the OS
features that support the same pattern. That being said, "threads" could
be one type of execution "stream" supported by Avocado, albeit it's not
a primary development target for various reasons, including the good
support for threads already present in the underlying Python standard
library.


_______________________________________________
Avocado-devel mailing list
Avocado-devel@redhat.com
https://www.redhat.com/mailman/listinfo/avocado-devel

Re: [Avocado-devel] RFC: Multi tests (previously multi-host test) [v2]

Reply via email to