Re: [Avocado-devel] RFC: Avocado Job API

Cleber Rosa Tue, 12 Apr 2016 21:45:39 -0700


On 04/12/2016 06:22 AM, Lukáš Doktor wrote:

Dne 12.4.2016 v 02:31 Ademar Reis napsal(a):

On Mon, Apr 11, 2016 at 09:09:58AM -0300, Cleber Rosa wrote:

Note: the same content on this message is available at:

https://github.com/clebergnu/avocado/blob/rfc_job_api/docs/rfcs/job-api.rst


Some users may find it easier to read with a prettier formatting.

Problem statement
=================

An Avocado job is created by running the command line ``avocado``
application with the ``run`` command, such as::

   $ avocado run passtest.py

But most of Avocado's power is activated by additional command line
arguments, such as::

   $ avocado run passtest.py --vm-domain=vm1
   $ avocado run passtest.py --remote-hostname=machine1

Even though Avocado supports many features, such as running tests
locally, on a Virtual Machine and on a remote host, only one those can
be used on a given job.

The observed limitations are:

* Job creation is limited by the expressiveness of command line
   arguments, this causes mutual exclusion of some features
* Mapping features to a subset of tests or conditions is not possible
* Once created, and while running, a job can not have its status
   queried and can not be manipulated

Even though Avocado is a young project, its current feature set
already exceeds its flexibility.  Unfortunately, advanced users are
not always free to mix and match those features at will.

Reviewing and Evaluating Avocado
================================

In light of the given problem, let's take a look at what Avocado is,
both by definition and based on its real world, day to day, usage.

Avocado By Definition
---------------------

Avocado is, by definition, "a set of tools and libraries to help with
automated testing".  Here, some points can be made about the two
components that Avocado are made of:

1. Libraries are commonly flexible enough and expose the right
    features in a consistent way.  Libraries that provide good APIs
    allow users to solve their own problems, not always anticipated by
    the library authors.

2. The majority of the Avocado library code fall in two categories:
    utility and test APIs.  Avocado's core libraries are so far, not
    intended to be consumed by third party code and its use is not
    supported in any way.

3. Tools (as in command line applications), are commonly a lot less
    flexible than libraries.  Even the ones driven by command line
    arguments, configuration files and environment variables fall
    short in flexibility when compared to libraries.  That is true even
    when respecting the basic UNIX principles and features that help to
    reuse and combine different tools in a single shell session.

How Avocado is used
-------------------

The vast majority of the observed Avocado use cases, present and
future, includes running tests.  Given the Avocado architecture and
its core concepts, this means running a job.

Avocado, with regards to its real world usage, is pretty much a job
(and test) runner, and there's no escaping that.  It's probable that,
for every one hundredth ``avocado run`` commands, a different
``avocado <subcommand>`` is executed.

Proposed solution & RFC goal
----------------------------

By now, the title of this document may seem a little less
misleading. Still, let's attempt to make it even more clear.

Since Avocado is mostly a job runner that needs to be more flexible,
the most natural approach is to turn more of it into a library.  This
would lead to the creation of a new set of user consumable APIs,
albeit for a different set of users.  Those APIs should allow the
creation of custom job executions, in ways that the Avocado authors
have not yet anticipated.

Having settled on this solution to the stated problem, the primary
goal of this RFC is to propose how such a "Job API" can be
implemented.


So in theory, given a comprehensive enough API it should be
possible to rewrite the entire "Avocado Test Runner" using the
Job API.

Actually, in the future we could have multiple Test Runners (for
example in contrib/) with different feature sets or approaches at
creating and managing jobs.

(in practice we will approach the problem incrementally, so this
should be a very long term goal)

Exactly, for example run on several machines, or run in parallel.


Agreed.


Analysis of a Job Environment
=============================

To properly implement a Job API, it's necessary to review what
influences the creation and execution of a job.  Currently, a Job
execution based on the current command line, is driven by, at least,
the following factors:

* Configuration state
* Command line parameters
* Active plugins

The following subsections examines how these would behave in an API
based approach to Job execution.

Configuration state
-------------------

Even though Avocado has a well defined `settings`_ module, it only
provides support for `getting the value`_ of configuration keys. It
lacks the ability to set configuration values at run time.

If the configuration state allowed modifications at run time (in a
well defined and supported way), users could then create many types of
custom jobs with that "tool" alone.

Command line parameters
-----------------------

The need for a strong and predictable correlation between application
builtin defaults, configuration keys and command line parameters is
also a MUST for the implementation of the Job API.

Users writing a custom job will very often need to set a given
behavior that may influence different parts of the Job execution.

Not only that, many use cases may be implemented simply by changing
those defaults in the midst of the job execution.

If users know how to map command line parameters into their
programmable counterparts, advanced custom jobs will be created much
more naturally.


So if I understand it correctly, the configuration state
(configuration keys/values) and command line parameters are two
high-level approaches to the same thing, which we could probably
define as the job environment, job configuration, or job state.

Currently the mapping is:

     config -> args -> job.args


Right.

but we discussed that this is wrong, as some functions use `job.args`,
some rely on `config`. So we need to rework this and either create yet
another abstract entity which contains all, or update the `config`
values and use that one everywhere, or accept only related params for
each level (job accepts, test accepts, ...)

 From this RFC I understand (Cleber) wants to use `config` inside
plugins/job/test, so the existing argument parser would have to be
modified to map arguments not to processed-arguments, but rather to
update the config values. Do I understand it correctly, Cleber?

My plan is to indeed consolidate the various sources of "knob twisters"into a unified "state database". What I mean is, a command lineparameters changes a knob, and so does a configuration key/value, allthose changes should go into a single place. For now, it looks like the"job environment" is a good candidate, but that can still prove me wrong.

So, with regards to consolidating those values, we can either have apush or pull approach. The examples we worked on earlier had a "pull"approach from the "state database" PoV, as the job environment wasreading from the parsed parameters. These parsed parameters (job.argsin your example) would not play any role in the app. But, they would beused as food for the job state.


So, for example, there could be multiple ways to configure a job
environment:

  1. Via configuration file (at least some parameters, as of today)
  2. Via command line options available in the Job API (at least
  some parameters, as of today)
  3. Via fine-grained APIs (future)

  In the case of (2), we could have an opaque configuration API
  available as part of the Job API which would allow avocado
  command line options to be processed at runtime.

  For example, by having methods such as:

     job.process_args(argv) --> process "argv" and configure the
     job environment according to the options provided by user.

     job.args_help() and job.args_usage() --> show command line
     help and usage messages

Could be (after we finish with everything else). I had a different idea
in my head, please see the other email. (sorry I did not wanted to
provide my feedback before reading other responses)

As I also put in another response, I'm not too fond of this, that is,command line parameter handling living in the job itself.


  In this case all the Job API "knows" is that
  job.process_args(sys.argv) was called. Internally it could
  implement things such as --multiplexer, --profilers, --wrappers,
  --gdb, etc.

  Using job.process_args() would change the *job configuration* at
  runtime.

That's what the env.config.set() does, right? It prepares the env
(parsed args) and it's instantiated during `run_test`.

The env is not the parsed args in my original proposal. I don't think Iunderstand what you mean by "it's instantiated during `run_test`".


Plugins
-------

Avocado currently relies exclusively on setuptools `entry points`_ to
define the active plugins.  It may be beneficial to add a secondary
activation and deactivation mechanism, one that is locally
configurable.  This is a rather common pattern, and well supported by
the underlying stevedore library.

Given that all plugable components of Avocado are updated to adhere to
the "new plugin" standard, some use cases could be implemented simply
by enabling/disabling plugins (think of "driver" style plugins).  This
can be exclusively or in addition to setting the plugin's own
configuration.

Also, depending on the type of plugin, it may be useful to activate,
deactivate and configure those plugins per job.  Thus, as part of the
Job state, APIs would allow for querying/setting plugins.


The plugin interfaces in Avocado are not very mature yet and I
anticipate many discussions about what plugins are and which kind
of interfaces should be supported during the creation of the Job
API.

My impression is that there are two different levels of APIs in
the context of plugins:

  1. The abstract APIs available *to* plugins
  2. The specific APIs provided *by* plugins

+1


Avocado should properly expose (1) via the Job API when
necessary, while (2) should be available in a generic plugin
configuration API.

Let me explain with an example using the multiplexer API (this is
hypothetical, because in practice that's not how the current
Avocado implementation works):

I think multiplexer is not the best example here as it's not a plugin.
It needs to be extracted and the API should be (finally) defined.

The other plugins define the interface either by abstract class. Still
we have 3 ways of invoking the plugins:

1. Stevedore: registers all plugins and skips the execution based on
job.environment variable (subcommands, arguments, pre/post job plugin)
2. Proxy: We have a proxy and we manually add plugins to be used there
(TestResultProxy, TestLoaders, ...)
3. We set one plugin to handle the execution - TestRunner

The (1) can be tweaked to actually invoke only plugins enabled by
job.environment variable by default (currently it's always executed).

Right, this is my original proposal... something like "Plugin ManagementAPI should allow plugins to be enabled/disabled...".

The (2) is currently hardcoded inside job (loaders use config) and
allows multiple instances of the same plugin and I like that.

There's no limitation on approach #1, "aka new style plugins" that wouldprohibit that. IMHO, these have not yet been ported to #1 just becauseof lack of time.

And the (3) is handled by arguments.

As I mentioned earlier, we should identify types of plugin and the
test-related should probably be either passed to the test by the user,
or be instantiated by the test based on the env variables.

Anyway back to the multiplexer:


   - The multiplexer is a plugin which turns a yaml file into a
     set of variants (a variant is basically a dictionary). It has
     a complex filtering mechanisms to control how the yaml file
     is processed and generates variants with combinations from a
     tree structure. That's what we're all familiar with in
     Avocado (`run --multiplexer` and `avocado multiplexer`)

It's iterative


   - The multiplexer plugin should use a "Variant API", which is
     way more abstract and generic: it simply provides a set of
     variants (dictionaries) identified by a "Variant ID" (please
     check the "Test ID" RFC for details on how it relates to
     everything else in Avocado).

Not dictionaries (that'd be way to strict). It allows `AvocadoParams`
object. `AvocadoParams` is "a better dictionary" and is not related to
multiplexer and is the API (currently inside `avocado.core`).

Really low level discussion here... Not that I don't like it, but Ithink it can get the discussion away from the original goal of the RFC.


     There could be multiple plugins that use this "Variant API"
     to deliver functionality similar to what the multiplexer
     does. Which one is being used and how it's configured will
     depend on the "job configuration" (or state, whatever you
     prefer to call it).

There are still missing pieces, but some are already there.


So Avocado should provide the "Variant API" in the "Job API", but
the more complex and plugin-specific operations (like filtering
in this case), should be made available via a generic plugin API.

At runtime, the "Variant API" will behave in ways dictated by how
the job environment is *configured*.

Some pseudo-code examples are provided further down, when you
mention this specific multiplex use-case.


Use cases
=========

To aid in the design of an API that solves unforeseen needs, let's
think about a couple of use cases.  Most of these use cases are based
on feedback already received and/or features already requested.

Ordered and conditional test execution
--------------------------------------

A user wants to create a custom job that only runs a benchmark test on
a VM if the VM installation test succeeds.

Possible use case fulfillment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Pseudo code::

   #!/usr/bin/env python
   from avocado import Job
   from avocado.resolver import resolve

   job = Job()

   vm_install =
resolve('io-github-autotest-qemu.unattended_install.cdrom.http_ks.default_install.aio_native')

   vm_disk_benchmark =
resolve('io-github-autotest-qemu.autotest.bonnie')

   if job.run_test(vm_install).result == 'PASS':
       job.run_test(vm_disk_benchmark)

API Requirements
~~~~~~~~~~~~~~~~

1. Job creation API
2. Test resolution API
3. Single test execution API

Run profilers on a single test
------------------------------

A user wants to create a custom job that only runs profilers for the
very first test.  Running the same profilers for all other tests may
be useless to the user, or maybe consume too much I/O resources that
would influence the remaining tests.

Possible use case fulfillment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Avocado, has a configuration key that controls profilers::

   [sysinfo.collect]
   ...
   profiler = False
   ...

By exposing the configuration state, the ``profiler`` key of the
``sysinfo.collect`` section could be enabled for one test, and
disabled for all others. Pseudo code::

   #!/usr/bin/env python
   from avocado import Job
   from avocado.resolver import resolve

   job = Job()
   env = job.environment # property

   env.config.set('sysinfo.collect', 'profiler', True)
   job.run_test(resolve('build'))

   env.config.set('sysinfo.collect', 'profiler', False)
   job.run_test(resolve('benchmark'))
   job.run_test(resolve('stress'))
   ...
   job.run_test(resolve('netperf'))

API Requirements
~~~~~~~~~~~~~~~~

1. Job creation API
2. Test resolution API
3. Configuration API
4. Single test execution API

Multi-host test execution
-------------------------

Use case description
~~~~~~~~~~~~~~~~~~~~

User needs to run the same test on different platforms.  User has
hosts with the different platforms already setup and remotely
accessible.

Possible use case fulfillment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Avocado currently runs all tests in a job with a single runner.  The
`default runner`_ implementation is a local test runner.  Other tests
runners include the `remote runner`_ and the `vm runner`_.

Pseudo code such as the following could implement the (serial, for
simplicity) test execution in multiple different hosts::

   from avocado import Job
   from avocado.plugin_manager import require
   from avocado.resolver import resolve

   job = Job()
   print('JOB ID: %s' % job.unique_id)
   print('JOB LOG: %s' % job.log)

   runner_plugin = 'avocado.plugins.runner:RemoteTestRunner'
   require(runner_plugin)

   env = job.environment # property
   env.config.set('plugin.runner', 'default', runner_plugin)
   env.config.set('plugin.runner.RemoteTestRunner', 'username', 'root')
   env.config.set('plugin.runner.RemoteTestRunner', 'password',
'123456')

   test = resolve('hardware_validation.py:RHEL.test')

   host_list = ['rhel6.x86_64.internal',
                ...
                'rhel7.ppc64.internal']

   for host in host_list:
       env.config.set('plugin.runner.RemoteTestRunner', 'host', host)
       job.run_test(test)

   print('JOB STATUS: %s' % job.status)

It's actually quite simple to move from a custom Job execution to a
custom Job runner, example::

   #!/usr/bin/env python
   import sys
   from avocado import Job
   from avocado.plugin_manager import require
   from avocado.resolver import resolve

   test = resolve(sys.argv[1])
   host_list = sys.argv[2:]

   runner_plugin = 'avocado.plugins.runner:RemoteTestRunner'
   require(runner_plugin)

   job = Job()
   print('JOB ID: %s' % job.unique_id)
   print('JOB LOG: %s' % job.log)
   env = job.environment # property
   env.config.set('plugin.runner', 'default', runner_plugin)
   env.config.set('plugin.runner.RemoteTestRunner', 'username', 'root')
   env.config.set('plugin.runner.RemoteTestRunner', 'password',
'123456')

   for host in host_list:
       env.config.set('plugin.runner.RemoteTestRunner', 'host', host)
       job.run_test(test)

   print('JOB STATUS: %s' % job.status)

Which could be run as::

   $ multi hardware_validation.py:RHEL.test
rhel{6,7}.{x86_64,ppc64}.internal
   JOB ID: 54cacfb42f3fa9566b6307ad540fbe594f4a5fa2
   JOB LOG:
/home/<user>/avocado/job-results/job-2016-04-07T16.46-54cacfb/job.log
   JOB STATUS: AVOCADO_ALL_OK

API Requirements
~~~~~~~~~~~~~~~~

1. Job creation API
2. Test resolution API
3. Configuration API
4. Plugin Management API
5. Single test execution API

Current shortcomings
~~~~~~~~~~~~~~~~~~~~

1. The current Avocado runner implementations do not follow the "new
    style" plugin standard.

2. There's no concept of job environment

3. Lack uniform definition of plugin implementation for "driver" style
    plugins.

4. Lack of automatic ownership of configuration namespace by plugin
name.


Other use cases
===============

The following is a list of other valid use cases which can be
discussed at a later time:

* Use the multiplexer only for some tests.


The example I promised while discussing how to handle the core
and plugin APIs with the example of the multiplexer:

Using only the Variant API (a core concept):

   from avocado import Job
   from avocado import resolver

   job = Job()
   job.process_args(sys.argv)
   print('JOB ID: %s' % job.unique_id)
   print('JOB LOG: %s' % job.log)

   test = resolver("passtest.py")

   for v in env.variants():
       ...
       print "%s;%s" % (test.name, v.id)
       test.variants.append(v)

   job.run_test(test)

   job.run_test(resolver("foo.py"))
   job.run_test(resolver("bar.py"))

   print('JOB STATUS: %s' % job.status)
   exit()


Now using the Multiplexer API (from the plugin):

   from avocado import Job
   from avocado import resolver
   from avocado.plugin_manager import require

   job = Job()
   job.process_args(sys.argv)

   print('JOB ID: %s' % job.unique_id)
   print('JOB LOG: %s' % job.log)
   env = job.environment # property

   require('avocado.plugins.runner:Multiplexer')

   env.config.set("plugin.runner:Multiplexer", "reset")
   env.config.set("plugin.runner:Multiplexer", "yaml-file", "pass.yaml")
   env.config.set("plugin.runner:Multiplexer", "filter-out", "hw/")
   env.config.set("plugin.runner:Multiplexer", "filter-only", "linux/")

   test = resolver("passtest.py")

   for v in env.variants():
       ...
       print "%s;%s" % (test.name, v.id)
       test.variants.append(v)

   job.run(test)

   env.config.set("plugin.runner.multiplexer", "reset")
   env.config.set("plugin.runner.multiplexer", "yaml-file", "foo.yaml")

   test = resolver("foo.py")

   for v in env.variants():
       ...
       print "%s;%s" % (test.name, v.id)
       test.variants.append(v)

   job.run(test)

   job.run_test(resolver("bar.py"))

Btw this raised another concern in my head. Until now I thought
`job.run_test` runs one job, but it's true that the resolver can return
several tests (eg. resolve("virtio_console")). So shouldn't the API be:

     tests = resolver(...)

and then:

     for test in tests:
         job.run_test(test)

alternatively it'd have to be `job.run_tests(tests)`, which would return
list of results including multiplexed test results. I could live with
`run_tests` but I don't like to trigger all variants of the test. That
would make:

     passtest.1 failtest.1 passtest2 failtest2

harder to define. IMO params belongs to test, not all combinations. (see
the modified example below)

This is a discussion of the "Run test API", "Resolver API", etc. IMHO,better suited to a later stage (and maybe even a more focused RFC,experimentation PRs, etc).


   print('JOB STATUS: %s' % job.status)
   exit()

Please take a look at my plugins description in the other email. I think
`multiplexer` (and other variants generating) plugins are a bit
different and should be handled separately and even allow multiple
instances of it and maybe it should be like that with other plugins too:

We can certainly handle different patterns for plugins. This is far frombeing a big issue, and should not make us think of some plugins as"special" beings, IMHO.

Stevedore itself, for instance, brings the concept of "extensions","drivers", etc. The runner, for instance, would be a "driver" kind ofplugin, that is, you'd have to set one and only one for a given test.

     job = Job()
     job.add(avocado.plugins.sysinfo.SysInfo())
     # Uses the defaults
     job.add(avocaod.plugins.json.JsonResults(filename="myfile"))
     # Uses defaults, but overrides the "filename"
     resolver.cleanup()  # remove previously defined resolvers
     resolver.add(my.plugin())
     test = resolver("passtest")
     test_gdb = resolver("passtest")
     test_gdb.add(avocado.plugins.gdb.Gdb())
     mux1 = avocado.plugins.multiplexer.Multiplexer()
     # Get's the value from "--multiplex", or config value []
     mux2 = avocado.plugins.multiplexer.Multiplexer("MyFile.yaml")
     for variant in mux1:
         test.params = variant
         test_gdb.params = variant
         job.run_test(test)
         job.run_test(test_gdb)
     for variant in mux2:
         test.params = variant
         job.run_test(test)

Eventually we could simplify things and allow some assignments directly
on call, for example `job.run_test(test, params=None, ...)`.

IMHO, this is again another "Test runner API" kind of discussion...We'll eventually *have* to get there, but let's converge on the moregeneral view first.


Note: By default avocado would load all plugins enabled in config.


Right, I also think I mentioned I think like this.

If I had not said it before, thanks for the feedback!
 - Cleber.


* Use the gdb or wrapper feature only for some tests.

* Run Avocado tests and external-runner tests in the same job.


The same applies to these two use-cases: Runtime job
configuration will make gdb or wrappers be used when
job.run(test) is used.

I think the above makes a lot of sense, although the specifics of
how the actual API will look like still needs to be properly
defined. Please let me know if we're headed in the same
direction.

Thanks.
    - Ademar


* Run tests in parallel.

* Take actions based on test results (for example, run or skip other
   tests)

* Post-process the logs or test results before the job is done

Development Milestones
======================

Since it's clear that Avocado demands many changes to be able to
completely fulfill all mentioned use cases, it seems like a good idea
to define milestones.  Those milestones are not intended to set the
pace of development, but to allow for the maximum number of real world
use cases fulfillment as soon as possible.

Milestone 1
-----------

Includes the delivery of the following APIs:

* Job creation API
* Test resolution API
* Single test execution API

Milestone 2
-----------

Adds to the previous milestone:

* Configuration API

Milestone 3
-----------

Adds to the previous milestone:

* Plugin management API

Milestone 4
-----------

Introduces proper interfaces where previously Configuration and Plugin
management APIs were being used.  For instance, where the following
pseudo code was being used to set the current test runner::

   env = job.environment
   env.config.set('plugin.runner', 'default',
                  'avocado.plugins.runner:RemoteTestRunner')
   env.config.set('plugin.runner.RemoteTestRunner', 'username', 'root')
   env.config.set('plugin.runner.RemoteTestRunner', 'password',
'123456')

APIs would be introduced that would allow for the following pseudo
code::

   job.load_runner_by_name('RemoteTestRunner')
   if job.runner.accepts_credentials():
       job.runner.set_credentials(username='root', password='123456')

.. _settings:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/settings.py

.. _getting the value:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/settings.py#L221

.. _default runner:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/runner.py#L193

.. _remote runner:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/remote/runner.py#L37

.. _vm runner:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/remote/runner.py#L263

.. _entry points:
https://pythonhosted.org/setuptools/pkg_resources.html#entry-points

--
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]

_______________________________________________
Avocado-devel mailing list
Avocado-devel@redhat.com
https://www.redhat.com/mailman/listinfo/avocado-devel


--
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]

_______________________________________________
Avocado-devel mailing list
Avocado-devel@redhat.com
https://www.redhat.com/mailman/listinfo/avocado-devel

Re: [Avocado-devel] RFC: Avocado Job API

Reply via email to