Re: [Avocado-devel] RFC: Avocado multiplexer plugin

Lukáš Doktor Tue, 19 Jul 2016 02:40:39 -0700

Dne 18.7.2016 v 22:41 Ademar Reis napsal(a):

On Mon, Jul 18, 2016 at 07:33:43PM +0200, Lukáš Doktor wrote:

Dne 18.7.2016 v 12:46 Cleber Rosa napsal(a):


On 07/07/2016 10:44 AM, Lukáš Doktor wrote:

0Dne 5.7.2016 v 16:10 Ademar Reis napsal(a):

On Fri, Jul 01, 2016 at 03:57:31PM +0200, Lukáš Doktor wrote:

Dne 30.6.2016 v 22:57 Ademar Reis napsal(a):

On Thu, Jun 30, 2016 at 06:59:39PM +0200, Lukáš Doktor wrote:

Hello guys,

the purpose of this RFC is to properly split and define the way test
parameters are processed. There are several ways to split them
apart, each
with some benefits and drawbacks.


Current params process
======================

`tree.TreeNode` -> Object allowing to store things (environment,
filters,
...) in tree-structure
`multiplexer.Mux` -> Interface between job and multiplexer. Reports
number
of tests and yields modified test_suite templates


There has been ideas and suggestions about using the multiplexer for
purposes other than multiplexing tests.  While those other use cases are
only ideas, `Mux` and its `itertests` method are a good enough name, but
there may be the opportunity here to provide a common and more standard
interface for different "multiplexations".  Example:

1) For tests, a class TestMux, with an standard Python iterable
interface[1].
2) For test execution location (say different hosts), a HostMux class,
with the same standard Python interface.

Even if we find no real use for other Mux* classes, having a default
iterable implementation is a good idea.  So moving the `tests` from the
`itertests` method to the `Mux` class name, from `Mux` to `TestsMux`,
feels right to me.

UPDATE: then I looked at PR #1293, and noticed that it intends to take
the custom variant responsibility out of the `multiplexer.Mux()` class.
It feels right because it moves the variant processing into its own
domain, kind of what I had in mind by naming it `TestsMux()`.  Still,
having a standard iterable interface instead of `itertests()` feels
right to me.

Well without that PR it's impossible to use the python standard __iter__
method as it requires argument. With that cleanup sure, `__iter__` method is
better.

As for the multiple classes I don't see a reason for it. Multiplexer (as the
variants generator) is independent on anything, it simply produces all
possible variants. So let me just turn the `iter_tests` into `__iter__` and
we're done.

`multiplexer.MuxTree` -> Object representing part of the tree from
the root
to leaves or another multiplex domain. Recursively it creates
multiplexed
variants of the full tree.
`multiplexer.AvocadoParams` -> Params object used to retrieve
params from
given path, allows defining several domains for relative paths
matching
defined as `mux_path`s.
`multiplexer.AvocadoParam` -> Slice of the `AvocadoParams` which
handles
given `mux_path`.
`test.Test.default_params` -> Dictionary which can define test's
default
values, it's intended for removal for some time.


Creating variants
-----------------

1. app
2. parser -> creates the root tree `args.default_avocado_params =
TreeNode()`


This shows the broken link between other forms of configuration (other
than command line parameters) and the parameter mechanism.  I hope to
address some of it in the Configuration Management RFC.

3. plugins.* -> inject key/value into `args.default_avocado_params`
(or it's
children). One example is `plugins.run`'s --mux-inject, the other is
`avocado_virt`'s default values.
4. job -> creates multiplexer.Mux() object
    a. If "-m" specified, parses and filters the yaml file(s),
otherwise
creates an empty TreeNode() called `mux_tree`
    b. If `args.default_avocado_params` exists, it merges it into the
`mux_tree` (no filtering of the default params)
    c. Initializes `multiplexer.MuxTree` object using the `mux_tree`


`mux_tree` is an internal variable name and not part of any interface,
right?

It's defined in step `4b`, it's basically the combination of all trees.

  Also, the checking for multiplex files (and other arguments)
happen in the context of the `multiplexer.Mux()` class and the job is
only supposed to use the outcome of `multiplexer.Mux()`, say via
`get_number_of_tests()` or `itertests()`, right?

Yes, this was a prep work for the future to be able to tweak the parameters
by each possible plugin.


I'm not familiar with the implementation details and I'm probably
missing something here... I focused on the higher-level
architecture and APIs, and to me, the multiplexer and the Avocado
Params API(s) should be at the test level. Please see my comments
further down.


Finally, is `multiplexer.Mux().variants` something that should be
considered public in your opinion?

It's only public because the replay plugin accesses it and logs the list of
variants. When we introduce the 
https://trello.com/c/fEnQFYRE/601-record-the-mux-tree-in-yaml-format-instead-of-pickle
we can make it private.

5. job -> asks the Mux() object for number of tests
    a. Mux iterates all MuxTree variants and reports `no_variants *
no_tests`
6. runner -> iterates through test_suite
    a. runner -> iterates through Mux:
        i.  multiplexer.Mux -> iterates through MuxTree
            * multiplexer.MuxTree -> yields list of leaves of the
`mux_tree`
        ii, yields the modified test template
    b. runs the test template:
        i. Test.__init__: |
            if isinstance(params, dict):
                # update test's default params
            elif params is None:
                # New empty multiplexer.AvocadoParams are created
            elif isinstance(params, tuple):
                # multiplexer.AvocadoParams are created from params
7. exit


This overview is pretty nice.  I really feel a lot less ignorant about
this subject now! :)

AvocadoParams initialization
----------------------------

    def __init__(self, leaves, test_id, tag, mux_path,
default_params):
        """
        :param leaves: List of TreeNode leaves defining current
variant
        :param test_id: test id
        :param tag: test tag
        :param mux_path: list of entry points
        :param default_params: dict of params used when no matches
found
        """

1. Iterates through `mux_path` and creates `AvocadoParam` slices
containing
params from only matching nodes, storing them in `self._rel_paths`
2. Creates `AvocadoParam` slice containing the remaining params,
storing
them in `self._abs_path`

Test params
-----------

    def get(self, key, path=None, default=None):
        """
        Retrieve value associated with key from params
        :param key: Key you're looking for
        :param path: namespace ['*']
        :param default: default value when not found
        :raise KeyError: In case of multiple different values
(params clash)
        """

1. Tries to obtain the (key,  path, default) from cache
2. Tries to get the (key, path) from `self._rel_paths`
3. When the `path` is abs_path it tries to get the param from
`self._abs_path`
4. Looks into the Test's default params dictionary
5. Returns `default`

Overview
--------

Basically now there are 3 components:

1. params parser (yaml => tree)
2. variants generator (tree => iterator)
3. test parameters (key => value from variant values)

and we need to decide how do we want to split it and what should be
part of
the plugin API and what core API.


Plugin parser->tree
===================

We can say the `tree` is the actual params API and we could only allow
producing custom parsers to construct the tree. This would be the
simplest
method as it only requires us to move the yaml parsing into the
module and
the `AvocadoParams` would stay as they are. The cons is that the
plugin
writers would only be able to produce params compatible with the tree
structure (flat, or tree-like).

If we decide to chose this method, we can keep the current avocado
arguments
and only allow replacing the parser plugin, eg. by `--multiplex-plugin
NAME`. Alternatively we might even detect the suitable plugin based
on the
multiplex file and even allow combining them (`.cfg vs. .yaml, ...)

The plugin would have to support:

* parse_file(FILE) => tree_node
* check_file(FILE) => BOOL    // in case we want automatic
detection of
file->plugin


I really think that, at this point, this is sufficient.  This delivers
what most users would like to have, that is a flat structure of
parameters.  Having good documentation on the tree-like capabilities can
allow users to write, say, database backed implementations that map
record keys to tree nodes.

Glad to hear that :-)

Plugin parser->variant
======================

This would require deeper changes, but allow greater flexibility.
We'd also
have to chose whether we want to allow combinations, or whether the
plugin
should handle the whole workflow. I don't think we should allow
combinations
as that would imply another convention for storing the parsed results.


If I understand this correctly, you mean that the plugin would have a
broader scope, and not only stop at the tree creation, but continue all
the way to the production of variants, right?

If that's the case, I'd avoid this at this moment.  IMHO, feeding the
tree from different sources sounds more like what users may need than
tweaking with the variants generation itself.

In the future, if it is really needed, we may introduce a pluggable
`multiplexer.TestsMux()` interface, so that users may stick in their own
variant creation code.

Yep, that was the intention and I also prefer to allow just the "feeding"
and not the variatns generation itself. And if needed we should be prepared
as it already lives inside a partially defined structure `multiplexer.Mux`.

The user would configure in config or on cmdline which plugin he
wants to
use and the arguments would stay the same (optionally extended by the
plugin's arguments)

The plugin would have to support:

* configure(parser)    // optional, add extended options like
--mux-version
* parse_file(FILE)    // does not return as it's up to plugin to
store the
results
* inject_value(key, value, path)    // used by avocado-virt to
inject default
values
* __len__()            // Return number of variants (we might want
to extend this to
accept TEMPLATE and allow different number of variants per
TEMPLATE. That is
currently not supported, but it might be handy
* itervariants(TEMPLATE)    // yields modified TEMPLATE with params
set in
AvocadoParams understandable format


Plugin AvocadoParams
====================

I don't think we should make the AvocadoParams replaceable, but if
we want
to we should strictly require `params.get` compatibility so all
tests can
run seamlessly with all params. Anyway if we decided to make
AvocadoParams
replaceable, then we can create a proxy between the params and the
plugin.


I also don't think it's needed *at this point*.  But our code should be
clear enough that, given requests, we can add this interface for
pluggable parameter handling later with minimal changes.

yep

Conclusion
==========

I'm looking forward to cleaner multiplexer API. I don't think
people would
like to invest much time in developing fancy multiplexer plugins so
I'd go
with the `parser->tree` variant, which allows easy extensibility
with some
level of flexibility. The flexibility is for example sufficient to
implement
cartesian_config parser.

+1.


As for the automatic detection, I donẗ like the idea as people
might want to
use the same format with different custom tags.


From the Zen of Python: "explicit is better than implicit".  It's so
important it's the 2nd line.


Hi Lukáš.

I believe we're in sync, but I miss the high level overview, or
at least review, of how params, variants and the multiplexer or
other plugins are all related to each other.

Please check the definitions/examples below to see if we're in
sync:

Params
------

    A dictionary of key/values, with an optional path (we could
    simply call it prefix), which is used to identify the key
    when there are multiple versions of it. The path is
    interpreted from right to left to find a match.

    The Params data structure can be populated by multiple
    sources.

    Example:
    (implementation and API details are not discussed here)

    key: var1=a
    path: /foo/bar/baz

    key: var1=b
    path: /foo/bar

    key: var2=c
    path: NULL (empty)

    get(key=var1, path=/foo/) ==> error ("/foo/var1" not found)
    get(key=var1, path=/foo/*) ==> error (multiple var1)
    get(key=var1, path=/foo/bar/baz/weeee/) ==> error
    get(key=var1, path=/foo/bar/weeee/) ==> error

    get(key=var2) ==> c
    get(key=var2, path=foobar) ==> error ("foobar/var2" not found)

    get(key=var1, path=/foo/bar/baz/) ==> a
    (unique match for "/foo/bar/baz/var1")

    get(key=var1, path=/foo/bar/) ==> b
    (unique match for "/foo/bar/var1/")

    get(key=var1, path=baz) ==> a
    (unique match for "baz/var1")

    get(key=var1, path=bar) ==> b
    (unique match for "bar/var1")

    This kind of "get" API is exposed in the Test API.


Variants
--------

    Multiple sets of params, all with the same set of keys and
    paths, but potentially different values. Each variant is
    identified by a "Variant ID" (see the "Test ID RFC").

    The test runner is responsible for the association of tests
    and variants. That is, the component creating the
    variants has absolutely no visibility on which tests are
    going to be associated with variants.

    This is also completely abstract to tests: they don't have
    any visibility about which variant they're using, or which
    variants exist.

Hello Ademar,

Thank you for the overview, I probably should have included it. I
omitted it
as it's described in the documentation, so I only mentioned in the
`Plugin
AvocadoParams` that I don't think we should turn that part into a
plugin.

The variant, as described earlier, is the method which modifies the
`test_template` and as you pointed out it compounds of `Variant ID` and
`Params`. The way it works now it can go even further and alter all the
test's arguments (name, methodName, params, base_logdir, tag, job,
runner_queue) but it's not documented and might change in the future.


OK, so I think we should change this. The layers should have
clear responsibilities and abstractions, with variants being
restricted to params only, as defined above.

I don't think the component responsible for creating variants
needs any visibility or knowledge about tests.

Yes, there is no need for that, it was only simplification:

    https://github.com/avocado-framework/avocado/pull/1293


BTW, why was this PR closed?  Inteded to be sent again with other work?

I was also wondering. Probably just an accidental mouse click... Let me
reopen it... (I thought it was avocado-vt, but it works well with it...)

Given the above, the multiplexer (or any other component, like a
"cartesian config" implementation from Autotest) would be bound
to these APIs.

The cartesian config is not related to params at all. Avocado-vt uses a
hybrid mode and it replaces the params with their custom object, while
keeping the `avocado` params in `test.avocado_params`. So in
`avocado_vt`
tests you don't have `self.params`, but you have `test.params` and
`test.avocado_params`, where `test.params` is a dictionary and
`test.avocado_params` the avocado params interface with path/key/value.
Cartesian config produces variants not by creating test variants, but by
adding multiple tests with different parameters to the test_suite.


What I mean is that we probably could, in theory at least,
implement a plugin that parses a "cartesian config" and provides
the data as needed to fill the variants and param APIs I
described above. I'm not saying we should do that, much less that
it would be useful as a replacement for the current cartesian
config implementation in avocado-vt. I'm simply stating that once
we have a clear plugin API for Params and Variants, we should be
able to replace the multiplexer with other mechanisms that
provide a similar functionality.

Thanks.
   - Ademar


In that case yes. You can see it in the conclusion that even the simpler
version (parser->tree) is capable of using cartesian_config as source of
params.


So, to make sure we're on the same page:  we intend to allow users to
write and choose their own tree producers (it's pluggable).  With a
given tree producer active, the multiplex mechanism is going to be, at
this point, a single, non-pluggable one.

Right?

Yes and the switching is going to be done explicitly by `--multiplex-plugin`
cmdline argument with the default. Also the second (a bit hidden) change is,
that `multiplexer` is going to be only the variants generation code. Maybe
we should go agead and change the options as we'd have following parts:

1. ?? tree-producer ?? - to parse user input to tree (currently unnamed)


To me, tree is an implementation detail, a characteristic of the
multiplexer. I would even rename the "path" to "prefix", as I
suggested in my comment above (or maybe "namespace").

The API should be about Avocado Params. Plugins (multiplexer or
whatever else) should provide sets of Avocado Params, identified
by IDs (by default, only one set).

Params could be defined as special dictionaries where the key
contains a prefix (namespace) that gets resolved from right to
left.  How the prefix is implemented (or if it even exists) is an
implementation detail.

That's what I meant by my comment above, with two sections
(Params and Variants). Please review them if the above is not
clear, as we should not be talking about trees outside of the
multiplexer component.

So, again:

  - Avocado runner: knows about tests (each identified by a Test
    Name) and sets of Avocado Params, each identified by a
    Variant ID. This is done at test creation time, after the job
    is configured.

    The runner is responsible for matching Test Names to Variant
    IDs, creating Test IDs.

  - The multiplexer: parses a yaml file and creates multiple sets
    of Avocado Params, each identified by a Variant ID.

  - The Variant API is the abstraction layer between both. The
    API is provided by the test runner and fulfilled by the
    Multiplexer (or other component).

This means the multiplexer (anything implementing the API) can't
be used for things which are outside of the test layer, such as
job runtime configuration.

If we want to do that, we would have at least two options:

 1. We extend the multiplexer, from a provider of Avocado (Test)
 Params, to a provider of other things, such as "Avocado Job
 Params" and/or "Avocado Configuration Params". This would
 require new APIs (Job APIs, Configuration APIs) and changes in
 the multiplexer to be aware of these APIs. The multiplexer would
 then be invoked and used in multiple places in Avocado (avocado
 configuration, job creation time, test creation time).

 In other words, here we would be using the .yaml file from the
 multiplexer as a general configuration file for jobs and the
 avocado runtime (probably using special namespaces). We've been
 doing some of this already, IIRC.

 Hopefully those of you who know me understand by now that I
 don't like this approach as stated, as it's inelegant and
 hackish. But it can be done properly, which brings us to option
 2:

2. We introduce the new APIs (same as above: avocado
configuration, job configuration) but also a different mechanism
or component for avocado and job level configuration.  Maybe even
using yaml as well, maybe also with variants, but a different
component, with new semantics and syntax.

It could be, for example, a yaml file that contains multiple
sections. Multiplexation would optionally occur inside each
section to create multiple variants, but never across:

    section 1: avocado configuration params, used with the
    Configuration API. A hypothetical test runner could use this
    to run Avocado multiple times, each under a different
    configuration via multiplexation.

    section 2: job configuration params. Used with the Job
    Configuration API, again a hypothetical test runner could use
    this to run multiple jobs in Avocado, each configured in a
    different way via multiplexation of these params.

    section 3: test params. Used with the Test
    Configuration API (currently "Avocado Params"). This is our
    current multiplexer.

Again, this is one possible implementation. The APIs should be
abstract enough to introduce proper concepts that could be
implemented in different ways. Words such as Mux, Multiplexer and
Multiplexation should never be present inside Avocado core or
these APIs.

But I guess I'm getting ahead of myself here... I think all we
need right now is a proper and clean Test Params API with an
abstract multiplexer.

I think we are more or less on the same page. I understood from Cleber, that he wants to use the Mux API to produce different set of params and create one Mux object for test params, other object for configuration and so on. The multiplexer is universal so it's possible to use it as a "database" with variants. Cleber please correct me if I got it wrong...


To summarize it here (once again):

1. The tree is a "database" which allows injecting values under certain keys with given path (namespace would be also acceptable by me, I don't like the tag, though). So far we agreed that we should turn the yaml=>tree pseudoplugin to real plugin and we are looking for a name (I'd call it yaml2params). 2. Mux is the variant generation mechanism, which is responsible for walking the database and producing variants (which are lists of path along with their values) 3. AvocadoParams is a per-variant database used inside tests to query for values

So (1) is actually a core API (independent on yaml parser) which feeds the Mux, the (2) is used by the runner (or job API?) to produce multiple variants of the same test and the (3) is the API available inside test to query for the current test parameters.

The (1), (2) and (3) are essential parts of avocado, independent on yaml and currently should stay in `core`, even though that in the future we might want to allow replacing the (1) and maybe even (2) with plugins. Currently this is not important and we are not aware of anyone demanding it. We take (3) as definite for now, so it should not change apart from optimizations.


The workflow is:

1. params-feeder (eg. `yaml2params`, or `--mux-inject`) -> inject records into `tree` (in the future it could be any arbitrary database, currently hardcoded as tree)

2. `tree` merges all records and applies filters
3. `Mux` object is created on top of the tree

4. test runner uses `Mux` to produce variants and runs tests+variant combinations

5. each test creates AvocadoParams from the variant

Currently it works a bit differently (inherited from the old days) because the `yaml2param`, currently `core.tree.create_from_yaml`, actually creates the tree database directly. This is not a problem now, it would be no problem even after turning it into a proper plugin, but it'd become a problem if we chose to replace `tree` with any arbitrary database, therefor I think it's time to extract the `create_from_yaml` and instead of creating the tree we should only ask the `tree` (or database) to inject/update/remove variables/filters/flags. Then we have a clear interface.

So the multiplexer is core and IMO should stay there. Only the params-feeders should be extracted, which means the PyYAML would be optional. Also multiplexer is just a database with combinations so it could be used for anything, also we want to keep using one database for test variants and not mixed settings/config_files/test_params.


I hope everything is clear, because if not I'd have to draw a chart :-D

Regards,
Lukáš

Thanks.
   - Ademar

2. multiplexer - to produce variants
3. avocado params - to obtain params from variant

We should probably find a suitable name to ambiguously identify each part as
of today we only have multiplexer and avocado params, which can lead to
confusions. I can't come up with any good name so unless you have a good
name we maybe end up with the same name. The situation should be better,
though, because those parts will be really separated.

Thank you for the feedback, let's get my hands dirty. :-)

Regards,
Lukáš

Regards,
Lukáš


[1] - https://docs.python.org/2.6/glossary.html#term-iterable

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Avocado-devel mailing list
Avocado-devel@redhat.com
https://www.redhat.com/mailman/listinfo/avocado-devel

Re: [Avocado-devel] RFC: Avocado multiplexer plugin

Reply via email to