Re: [Avocado-devel] RFC: Nested tests (previously multi-stream test) [v5]

Lukáš Doktor Fri, 27 May 2016 01:34:15 -0700

Dne 25.5.2016 v 23:36 Ademar Reis napsal(a):

On Wed, May 25, 2016 at 04:18:38PM -0300, Cleber Rosa wrote:



On 05/24/2016 11:53 AM, Lukáš Doktor wrote:

Hello guys,

this version returns to roots and tries to define clearly the single
solution I find teasing for multi-host and other complex tests.

Changes:

    v2: Rewritten from scratch
    v2: Added examples for the demonstration to avoid confusion
    v2: Removed the mht format (which was there to demonstrate manual
        execution)
    v2: Added 2 solutions for multi-tests
    v2: Described ways to support synchronization
    v3: Renamed to multi-stream as it befits the purpose
    v3: Improved introduction
    v3: Workers are renamed to streams
    v3: Added example which uses library, instead of new test
    v3: Multi-test renamed to nested tests
    v3: Added section regarding Job API RFC
    v3: Better description of the Synchronization section
    v3: Improved conclusion
    v3: Removed the "Internal API" section (it was a transition between
        no support and "nested test API", not a "real" solution)
    v3: Using per-test granularity in nested tests (requires plugins
        refactor from Job API, but allows greater flexibility)
    v4: Removed "Standard python libraries" section (rejected)
    v4: Removed "API backed by cmdline" (rejected)
    v4: Simplified "Synchronization" section (only describes the
        purpose)
    v4: Refined all sections
    v4: Improved the complex example and added comments
    v4: Formulated the problem of multiple tasks in one stream
    v4: Rejected the idea of bounding it inside MultiTest class
        inherited from avocado.Test, using a library-only approach
    v5: Avoid mapping ideas to multi-stream definition and clearly
        define the idea I bear in my head for test building blocks
        called nested tests.


Motivation
==========

Allow building complex tests out of existing tests producing a single
result depending on the complex test's requirements. Important thing is,
that the complex test might run those tests on the same, but also on a
different machine allowing simple development of multi-host tests. Note
that the existing tests should stay (mostly) unchanged and executable as
simple scenarios, or invoked by those complex tests.

Examples of what could be implemented using this feature:

1. Adding background (stress) tasks to existing test producing
real-world scenarios.
   * cpu stress test + cpu hotplug test
   * memory stress test + migration
   * network+cpu+memory test on host, memory test on guest while
     running migration
   * running several migration tests (of the same and different type)

2. Multi-host tests implemented by splitting them into components and
leveraging them from the main test.
   * multi-host migration
   * stressing a service from different machines


Nested tests
============

Test
----

A test is a receipt explaining prerequisites, steps to check how the
unit under testing behaves and cleanup after successful or unsuccessful
execution.


You probably meant "recipe" instead of "receipt".  OK, so this is an
abstract definition...

Test itself contains lots of neat features to simplify logging, results
analysis and error handling evolved to simplify testing.


... while this describes concrete conveniences and utilities that users of
the Avocado Test class can expect.

Test runner
-----------

Is responsible for driving the test(s) execution, which includes the
standard test workflow (setUp/test/tearDown), handle plugin hooks
(results/pre/post) as well as safe interruption.

OK.

Nested test
-----------

Is a test invoked by other test. It can either be executed in foreground


I got from this proposal that a nested test always has a parent.  Basic
question is: does this parent have to be a regular (that is, non-nested)
test?

Then, depending on the answer, the following question would also apply: do
you believe a nesting level limit should be enforced?


Let's not introduce yet another concept here. I don't think there
would be a need for a "non-nested parent" rule.

Ditto for nesting level limits, I see no reason to break the
current abstraction. If we introduce enforcement, then the test
will have to be run in an environment that knows if it's nested
or not (and at which level). Proper error handling will require
this information and/or even worse: test writers may have access
to this variable and start using it.

(while the main test is waiting) or in background along with the main
(and other background tests) test. It should follow the default test
workflow (setUp/test/tearDown), it should keep all the neat test feature
like logging and error handling and the results should also go into the
main test's output, with the nested test's id  as prefix. All the
produced files of the nested test should be located in a new directory
inside the main test results dir in order to be able to browse either
overall results (main test + nested tests) or just the nested tests ones.


Based on the example given later, you're attributing to the NestedRunner the
responsibility to put the nested test results "in the right" location.  It
sounds appropriate.  The tricky questions are really how they show up in the
overall job/test result structure, because that reflects how much the
NestedRunner looks like a "Job".


Since it's a test, which kind of visibility does the job have
about it? Is the result interpretation entirely up to the parent
test? Would nested-tests results be considered arbitrary data?
(think of test-results storage, or a database).

Basically nested test results are similar to `whiteboard`. It's an extra 
information one can use and find in the results directory. I'd consider listing 
the directory in the `json` output in case there are any.

Resolver
--------

Resolver is an avocado component resolving a test reference into a list
of test templates compound of the test name, params and other
`avocado.Test.__init__` arguments.

Very simple example
-------------------

This example demonstrates how to use existing test (SimpleTest
"/usr/bin/wget example.org") in order to create a complex scenario
(download the main page from example.org from multiple computers almost
concurrently), without any modifications of the `SimpleTest`.


This won't sound as a surprise to you, but although this is a
valid use-case, I don't think nested tests is the proper solution
for it. :-)

It's just the simplest demonstration example I could come up with.


    import avocado

    class WgetExample(avocado.Test):
        def test(self):
            # Initialize nested test runner
            self.runner = avocado.NestedRunner(self)
            # This is what one calls on "avocado run"
            test_reference = "/usr/bin/wget example.org"
            # This is the resolved list of templates
            tests = avocado.resolver.resolve(test_reference)
            # We could support list of results, but for simplicity
            # allow only single test.
            assert len(tests) == 1, ("Resolver produced multiple test "
                                     "names: %s\n%s" % (test_reference,
                                                        tests)
            test = tests[0]
            for machine in self.params.get("machines"):
                # Query a background job on the machine (local or
                # remote) and return test id in order to query for
                # the particular results or task interruption, ...
                self.runner.run_bg(machine, test)


Here we're missing something: you've described what a nested-test
is, but now you're introducing another concept: the ability to
run (some of these) sub-tests on different machines or
environments.

Which is precisely where it gets ugly, as it brings to the layer
of tests concepts which belong to a job. You should write at
least one section in your RFC to describe what you have in
mind in this case.


Yes, a job defines what test should be executed in what order and where, but it 
only defines this and then hand over the tasks to the runner. The runner itself 
is responsible for running local tasks locally, remote tasks remotely and 
parallel tasks in parallel. It's only the current avocado limitation and 
implementation detail, that it does not support parallel execution in runner 
and it runs tests remotely by running the full job remotely and reporting the 
results.

If we want parallel/remote per-test granularity in Job API, then we have to 
change this anyway. Then the job would define those relations, but runner would 
be the responsible one to leverage that.

When we come back to nested tests, the situation is absolutely the same. We 
have a test and the test decided to split some tasks into several nested tests. 
So the test itself knows what tests does it needs to run, when and where. Then 
it involves the nested runner to do the hard work and report results. Then the 
test gets to chose what to do with the results.

So yes, they are similar, connected, there is some overlap, but it's the same 
overlap as everywhere else. Basically Job API and nested tests are only 
different from point of view. Job API is focused on describing what tests 
should be executed, test dependencies and producing unified results the nested 
tests API is focused on composing tests of building blocks (tests). Whether 
those blocks are executed on the same machine or in parallel is not important 
to the job at all.

            # Wait for all background tasks to finish, raise exception
            # if any of them fails.
            self.runner.wait(ignore_errors=False)


You also didn't say anything about synchronization, although I'm
sure you do have something in mind. Do you expect nested-tests to
communicate with, or depend on, each other?

You mentioned in previous versions, that it's not really important and is a 
detail. Yes, it'd be useful for most of the tests, but not essential. Yes, at 
least a barrier mechanism would be needed for most of the tests. (I mentioned 
the usage in response to Vincent Matossian)


Just for accounting purposes at this point, and not for applying judgment,
let's take note that this approach requires the following sets of APIs to
become "Test APIs":

* avocado.NestedRunner
* avocado.resolver

Now, doing a bit of judgment. If I were an Avocado newcomer, looking at the
Test API docs, I'd be intrigued at how these belong to the same very select
group that includes only:

* avocado.Test
* avocado.fail_on
* avocado.main
* avocado.VERSION

I'm not proposing a different approach or a different architecture.  If the
proposed architecture included something like a NestedTest class, then
probably the feeling is that it would indeed naturally belong to the same
group.  I hope I managed to express my feeling, which may just be
overreaction. If others share the same feeling, then it may be a sign of a
red flag.

Now, considering my feeling is not an overreaction, this is how such an
example could be written so that it does not put NestedRunner and resolver
in the Test API namespace:

    from avocado import Test
    from avocado.utils import nested

    class WgetExample(Test):
    def test(self):
        reference = "/usr/bin/wget example.org"
        tests = []
        for machine in self.params.get("machines"):
           tests.append(nested.run_test_reference(self, reference, machine))
        nested.wait(tests, ignore_errors=False)

This would solve the crossing (or pollution) of the Test API namespace, but
it has a catch: the test reference resolution is either included in
`run_test_reference` (which is a similar problem) or delegated to the remote
machine.  Having the reference delegated sounds nice, until you need to
identify backing files for the tests and copy them over to the remote
machine.  So, take this as food for thought, and not as a fail proof
solution.

When nothing fails, this usage has no benefit over the simple logging
into a machine and firing up the command. The difference is, when
something does not work as expected. With nested test, one get a runner
exception if the machine is unreachable. And on test error he gets not
only overall log, but also the per-nested-test results simplifying the
error analysis. For 1, 2 or 3 machines, this makes no difference, but
imagine you want to run this from hundreds of machines. Try finding the
exception there.


I agree that it's nice to have the nested tests' logs.  What you're
proposing is *core* (as in Test API) convenience, over something like:

    from avocado import Test
    from avocado.utils import nested

    class WgetExample(Test):
    def test(self):
        reference = "/usr/bin/wget example.org"
        tests = []
        for machine in self.params.get("machines"):
           tests.append(nested.run_test_reference(self, reference, machine))
        nested.wait(tests, ignore_errors=False)
        nested.save_results(tests,
                            os.path.join(self.resultsdir, "nested"))


I agree it should not be part of the core API.

Yes, you can implement the above without nested tests, but it requires a
lot of boilerplate code to establish the connection (or raise an
exception explaining why it was not possible and I'm not talking about
"unable to establish connection", but granularity like "Invalid
password", "Host is down", ...). Then you'd have to setup the output
logging for that particular task, add the prefix, run the task (handling
all possible exceptions) and interpret the results. All of this to get
the same benefits very simple avocado test provides you.


Having boiler plate code repeatedly written by users is indeed not a good
thing.  And a well thought out API for users is the way to prevent boiler
plate code from spreading around in tests.

The exception handling, that is, raising exceptions to flag failures in the
nested tests execution is also a given IMHO.

Advanced example
----------------

Imagine a very complex scenario, for example a cloud with several
services. One could write a big-fat test tailored just for this scenario
and keep adding sub-scenarios producing unreadable source code.

With nested tests one could split this task into tests:

 * Setup a fake network
 * Setup cloud service
 * Setup in-cloud service A/B/C/D/...
 * Test in-cloud service A/B/C/D/...
 * Stress network
 * Migrate nodes


I don't understand your motivation here. Do you mean that setting
up a fake network as a (sub-)test would be a positive thing?

Yes, by fake network I mean the added latency, additional hops, ... to emulate 
usage over the internet. This is something you can reuse for many tests. You 
can implement it as a library, but then you have to setup the logging, handle 
carefully all exceptions, ... Writing tests and more importantly analyzing test 
results is way easier.


New variants could be easily added, for example DDoS attack to some
nodes, node hotplug/unplug, ... by invoking those existing tests and
combining them into a complex test.

Additionally note that some of the tests, eg. the setup cloud service
and setup in-cloud service are quite generic tests, what could be reused
many times in different tests. Yes, one could write a library to do
that, but in that library he'd have to handle all exceptions and provide
nice logging, while not clutter the main output with unnecessary
information.


Or one could create a job that runs the individual tests as
needed.

For this particular use-case, a custom job has many advantages.
To mention just one: the multiplexer.

That depends. You can pass params to tests too, you can multiplex the full test 
too and you can expect some failures. For example you can have variant where 
you run `migrate-node` test, but due to some setting you expect it to fail. 
When using Job API that would produce failure in results and overall job 
failure.

So the only benefit in implementing this as a job is you get the results of all 
steps, which is useful for test-in-cloud-services but not for all the tests 
before and after. They are just background/setup tasks created in form of tests 
in order to simplify failure analysis.

Basically as everywhere else in the RFC. If you want a single result from your 
scenario, it's nested-test material. If you want per-test results, you want job 
API.


Job results
-----------

Combine (multiple) test results into understandable format. There are
several formats, the most generic one is file format:

.
├── id  -- id of this job
├── job.log  -- overall job log
└── test-results  -- per-test-directories with test results
    ├── 1-passtest.py:PassTest.test  -- first test's results
    └── 2-failtest.py:FailTest.test  -- second test's results

Additionally it contains other files and directories produced by avocado
plugins like json, xunit, html results, sysinfo gathering and info
regarding the replay feature.


OK, this is pretty much a review.

Test results
------------

In the end, every test produces results, which is what we're interested
in. The results must clearly define the test status, should provide a
record of what was executed and in case of failure, they should provide
all the information in order to find the cause and understand the failure.

Standard tests does that by providing test log (debug, info, warning,
error, critical), stdout, stderr, allowing to write to whiteboard and
attach files in the results directory. Additionally due to structure of
the test one knows what stage(s) of the test failed and pinpoint exact
location of the failure (traceback in the log).

.
├── data  -- place for other files produced by a test
├── debug.log  -- debug, info, warn, error log
├── remote.log  -- additional log regarding remote session
├── stderr  -- standard error
├── stdout  -- standard output
├── sysinfo  -- provided by sysinfo plugin
│   ├── post
│   ├── pre
│   └── profile
└── whiteboard  -- file for arbitrary test data

I'd like to extend this structure of either a directory "subtests", or
convention for directories intended for nested test results `r"\d+-.*"`.


Having them on separate sub directory is less intrusive IMHO.  I'd even
argue that `data/nested` is the way to go.

+1.

The `r"\d+-.*"` reflects the current test-id notation, which nested
tests should also respect, replacing the serialized-id by
in-test-serialized-id. That way we easily identify which of the nested
tests was executed first (which does not necessarily mean it finished as
first).


So the nested-tests will have "In-Test-Test-IDs", which are different
than "Test-IDs".

Yes


In the end nested tests should be assigned a directory inside the main
test's results (or main test's results/subtests) and it should produce
the data/debug.log/stdout/stderr/whiteboard in there as well as
propagate the debug.log with a prefix to the main test's debug.log (as
well as job.log).

└── 1-parallel_wget.py:WgetExample.test  -- main test
    ├── data
    ├── debug.log  -- contains main log + nested logs with prefixes
    ├── remote.log
    ├── stderr
    ├── stdout
    ├── sysinfo
    │   ├── post
    │   ├── pre
    │   └── profile
    ├── whiteboard
    ├── 1-_usr_bin_wget\ example.org  -- first nested test
    │   ├── data
    │   ├── debug.log  -- contains only this nested test log
    │   ├── remote.log
    │   ├── stderr
    │   ├── stdout
    │   └── whiteboard
    ├── 2-_usr_bin_wget\ example.org  -- second nested test
...
    └── 3-_usr_bin_wget\ example.org  -- third nested test
...


And with the above, a test-ID is not unique anymore in logs and
in the results directory. For example, when looking for
"1-foobar.py", I may find:

  - foobar.py, the first test run inside the job
  AND
  - multiple foobar.py, run as a nested test inside an arbitrary
    parent test.

That's why I said you would need "In-Test-Test-IDs" (or
"Nested-Test-IDs").

Yes, when using `find ...` you could get multiple directories, but not inside 
`test-results`, but in `test-results/*/nested/*` (or nested nested ones). For me that is 
convenient, because you are used to "parse" the test-ids and you know where to 
expect them. So when you want to look for them, you don't want to think whether they 
should have this form or the other.


Note that nested tests can finish with any result and it's up to the
main test to evaluate that. This means that theoretically you could find
nested tests which states `FAIL` or `ERROR` in the end. That might be
confusing, so I think the `NestedRunner` should append last line to the
test's log saying `Expected FAILURE` to avoid confusion while looking at
results.


This special injection, and special handling for that matter, actually makes
me more confused.


Agree. This is something to add to the parent log (which is
waiting for the nested-test result).

No problem with that.

Note2: It might be impossible to pass messages in real-time across
multiple machines, so I think at the end the main job.log should be
copied to `raw_job.log` and the `job.log` should be reordered according
to date-time of the messages. (alternatively we could only add a contrib
script to do that).


You probably mean debug.log (parent test), not job.log.

I'm assuming the nested tests would run in "jobless" mode (is
that the case? If yes, you need to specify what it means).

Both actually. So the contrib script would be the best solution as one can 
point out to the file he's interested in.


Definitely no to another special handling.  Definitely yes to a post-job
contrib script that can reorder the log lines.

+1


Conclusion
==========

I believe nested tests would help people covering very complex scenarios
by splitting them into pieces similarly to Lego. It allows easier
per-component development, consistent results which are easy to analyze
as one can see both, the overall picture and the specific pieces and it
allows fixing bugs in all tests by fixing the single piece (nested test).


It's pretty clear that running other tests from tests is *useful*, that's
why it's such a hot topic and we've been devoting so much energy to
discussing possible solutions.  NestedTests is one to do it, but I'm not
confident we have enough confidence to make it *the* way to do it. The
feeling that I have at this point, is that maybe we should prototype it as
utilities to:

 * give Avocado a kickstart on this niche/feature set
 * avoid as much as possible user-written boiler plate code
 * avoid introducing *core* test APIs that would be set in stone

The gotchas that we have identified so far, are IMHO, enough to restrain us
from forcing this kind of feature into the core test API, which we're in
fact, trying to clean up.

With user exposition and feedback, this, a modified version or a completely
different solution, can evolve into *the* core (and supported) way to do it.


I tend to disagree. I think it should be the other way around:
maybe, once we have a Job API, we can consider the possibilities
of supporting nested-tests, reusing some of the other concepts.

Yes, the work on Job API is shared with nested test, so we can postpone this 
till then and see, whether building blocks inside tests would be useful. I 
think they would be.


Nested tests (as in: "simply running tests inside tests") is
relatively OK to digest. Not that I like it, but it's relatively
simple.

But what Lukas is proposing involves at least three more features
or APIs, all of which relate to a Job and should be implemented
there before being considered in the context of a test:

 - API and mechanism for running tests on different machines or
   environments (at least at first, a Job API)

IMO this is the runner task, not job task (job only demands to run it's tests 
on a specific machine/environment)

 - API and mechanism for running tests in parallel (ditto)

That's the same. Job specifies requirements, runner fulfills them. If test has 
certain demands, why shouldn't nested runner fulfill them. Job should have no 
relation to, nor interest in this.

 - API and mechanism to allow tests to synchronize and wait for
   barriers (which might be useful once we can run tests in
   parallel).

Yep, basically the synchronization is shared and useful even for simple tests, 
only for multi-host environment we might want to add some utils to simplify 
this.


To me the idea of "nested tests than can be run in multiple
machines, under different configurations and with synchronization
between them" is fundamentally flawed. It's a huge layer
violation that brings all kinds of architectural problems.

To me it's useful and clean solution using nested runner, inherited from job 
runner. Easy to learn which gives expected results while learning just one API 
(Job API and nested test API should be designed similarly). But I agree we 
should focus on Job API first and then reconsider the usage inside tests as 
inside tests the focus is not no triggering the tests, but on utilizing the 
task under testing the way we need to.


Thanks.
   - Ademar


After reading this, I have another suggestion. We could actually completely 
shred this RFC (well hopefully some ideas from this RFC would survive inside 
the Job API) if we consider allowing expected failures (skips, warns, nonskips, 
errors, ...) in Job API. Then the only missing piece would be to allow to run 
several workflows (job definitions) in a single job, but that's I think too 
much to ask (avocado run boot-test job-cloud job-parallel-tests; where 
boot-test is a test, job-cloud is job-API scenario and job-parallel-tests is 
job API scenario to run some tests in parallel; the result should be list of 
all triggered test results, so boot-test cloud-setup cloud-test parallel1 
parallel2 parallel3 parallel4).

Thank you for the feedback,
Lukáš

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Avocado-devel mailing list
Avocado-devel@redhat.com
https://www.redhat.com/mailman/listinfo/avocado-devel

Re: [Avocado-devel] RFC: Nested tests (previously multi-stream test) [v5]

Reply via email to