Re: [Avocado-devel] RFC: N(ext) Runner - A proposal to the finish line

Cleber Rosa Wed, 10 Jun 2020 19:01:44 -0700

On Fri, May 22, 2020 at 05:45:23PM -0300, Willian Rampazzo wrote:
> Hello Cleber,
> 
> Thanks for this RFC, it is appreciated. I see you have different
> points for discussion in this RFC, it would be better to discuss them
> in different places/ways, I will try to give my contribution to those
> that I can, but I will hold my comments about the task scheduler. The
> format of a blueprint, using the motivation and dividing into sections
> would be better for my understanding for this kind of architecture
> related discussion.
>


Hi Willian,

Ack.  The individual smaller issues have been turned into "GitHub
issues", so we can move the discussion about the scheduler to its
blueprint.

> On Wed, May 20, 2020 at 8:33 PM Cleber Rosa <cr...@redhat.com> wrote:
> >
> > Intro
> > =====
> >
> > This is a more technical follow up to the points given in a previous
> > thread.  Because that thread and the current N(ext) Runner documentation
> > for a good context for this proposal, I encourage everyone to read them
> > first:
> >
> >   https://www.redhat.com/archives/avocado-devel/2020-May/msg00009.html
> >
> >   https://avocado-framework.readthedocs.io/en/79.0/future/core/nrunner.html
> >
> > The N(ext) Runner allows for greater flexibility than the the current
> > runner, so to be effective in delivering the N(ext) Runner for general
> > usage, we must define the bare minimum that still needs to be
> > implemented.
> >
> > Basic Job and Task execution
> > ============================
> >
> > An Task, within the context of the N(ext) Runner, is described as "one
> > specific instance/occurrence of the execution of a runnable with its
> > respective runner".
> >
> > A Task is a very important building block for Avocado Job, and running
> > an Avocado Job means, to a large extent, running a number of Tasks.
> > The Tasks that need to be executed in a Job, are created during
> > the ``create_test_suite()`` phase:
> >
> >   
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.create_test_suite
> >
> > And are kept in the Job's ``test_suite`` attribute:
> >
> >   
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.test_suite
> >
> > Running the tests, then, happens during the ``run_tests()`` phase:
> >
> >   
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.run_tests
> >
> > During the ``run_tests()`` phase, a plugin that run test suites on a
> > job is chosen, based on the ``run.test_runner`` configuration.
> > The current "work in progress" implementation for the N(ext) Runner,
> > can be activated either by setting that configuration key to ``nrunner``,
> > which can be easily done on the command line too::
> >
> >   avocado run --test-runner=nrunner /bin/true
> >
> > A general rule for measuring the quality and completeness of the
> > ``nrunner`` implementation is to run the same jobs with the current
> > runner, and compare its behavior and output with that of the
> > ``nrunner``.  For here on, we'll call this simply the "nrunner
> > plugin".
> >
> > Known issues and limitations of the current implementation
> > ==========================================================
> >
> > Different Test IDs
> > ------------------
> >
> > When running tests with the current runner, the Test IDs are different::
> >
> >    $ avocado run --test-runner=runner --json=- -- /bin/true /bin/false 
> > /bin/uname | grep \"id\"
> >             "id": "1-/bin/true",
> >             "id": "2-/bin/false",
> >             "id": "3-/bin/uname",
> >
> >    $ avocado run --test-runner=nrunner --json=- -- /bin/true /bin/false 
> > /bin/uname | grep \"id\"
> >             "id": "1-1-/bin/true",
> >             "id": "2-2-/bin/false",
> >             "id": "3-3-/bin/uname",
> >
> > The goal is to make the IDs the same.
> >
> 
> In my opinion, this seems to be a simple issue that is easily tracked
> on GitHub. If we are going to keep the output of the nrunner just like
> the current runner, there is not much to discuss, only implement.
>

Ack, and it's done now.

> > Inability to run Tasks other than exec, exec-test, python-unittest (and 
> > noop)
> > -----------------------------------------------------------------------------
> >
> > The current implementation of the nrunner plugin is based on the fact that
> > Tasks are already present at ``test_suite`` job attribute, and that running
> > Tasks can be (but shouldn't always be) a matter of iterating of the result
> > of its ``run()`` method.  This is part of the actual code::
> >
> >     for status in task.run():
> >       result_dispatcher.map_method('test_progress', False)
> >       statuses.append(status)
> >
> > The problem here is that only the Python classes implemented in the core
> > "avocado.core.nrunner" module, and registered at:
> >
> >   
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.nrunner.RUNNERS_REGISTRY_PYTHON_CLASS
> >
> > The goal is to have all other Python classes that inherit from
> > "avocado.core.nrunner.BaseRunner" available in such a registry.
> >
> 
> Agreed, we need to find a way to centralize supported runners, not
> only implemented into the Avocado core. A registration method like we
> are using for the new avocado parameters is an option. Another option
> is to do the same way we register plugins today, utilizing the
> setup.py. The problem I see with both solutions is breaking the
> "standalone" effect of nrunner.py. Right now, I don't have a better
> solution for it.
>

Exactly.  In fact, I got to work on something which I believe matches
your comment:

   https://github.com/avocado-framework/avocado/pull/3908

> > Inability to run Tasks with Spawners
> > ------------------------------------
> >
> > While the "avocado nrun" command makes use of the Spawners, the
> > current implementation of the nrunner plugin described earlier,
> > calls a Task's ``run()`` method directly, and clearly doesn't
> > use spawners.
> >
> > The goal here is to leverage spawners so that other isolation
> > models (or execution environments, depending how you look at
> > processes, containers, etc) are supported.
> >
> 
> Agreed! If tasks are the default way to run a test on nrunner,
> Spawners should be the default "way of transportation" to achieve it.
> This discussion and its issues can be tracked as an epic on GitHub.
>

Ack, issue is here:

   https://github.com/avocado-framework/avocado/issues/3866

> > Unoptmized execution of Tasks (extra serialization/deserialization)
> > -------------------------------------------------------------------
> >
> > At this time, the nrunner plugin runs a Task directly through its
> > ``run()`` method.  Besides the earlier point of not supporting
> > other isolation models/execution environments (that means not using
> > spawners), there's an extra layer of work happening when running
> > a task which is most often not necessary: turning a Task instance
> > into a command line, and within its execution, turning it into a
> > Task instance again.
> >
> > The goal is to support an optmized execution of the tasks, without
> > having to turn them into command lines, and back into Task instances.
> > The idea is already present in the spawning method definitions:
> >
> >   
> > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.spawners.html#avocado.core.spawners.common.SpawnMethod.PYTHON_CLASS
> >
> > And a PoC on top of the ``nrun`` command was implemented here:
> >
> >   
> > https://github.com/avocado-framework/avocado/pull/3766/commits/ae57ee78df7f2935e40394cdfc72a34b458cdcef
> >
> 
> If I understood correctly, starting here, you would discuss the
> architecture for a task scheduler. I understood the phases, but this
> discussion should be self-contained, decoupled from the previous
> content, better if it is in the blueprint format with motivation, and
> divided into sections.
>

Ack.

> > Proposal
> > ========
> >
> > Besides the known limitations listed previously, there are others that
> > will appear along the way, and certainly some new challeges as we
> > solve them.
> >
> > The goal of this proposal is to attempt to identify those challenges,
> > and lay out a plan that can be tackled by the Avocado team/community
> > and not by a single person.
> >
> > Task execution coordination goals
> > ---------------------------------
> >
> > As stated earlier, to run a job, tasks must be executed. Differently
> > than the current runner, the N(ext) Runner architecture allows those
> > to be executed in a much more decoupled way. This characteristic will
> > be maintained, but it needs to be adapted into the current Job
> > execution.
> >
> > From a high level view, the nrunner plugin needs to:
> >
> > 1. Break apart from the "one at a time" Task execution model that it
> >    currently employs;
> >
> > 2. Check if a Tasks can be executed, that is, if its requirements can
> >    be fulfilled (the most basic requirement for a task is a matching
> >    runner;
> >
> > 3. Prepare for the execution of a task, such as the fulfillment of
> >    extra tasks requirements. The requirements resolver is one, if not
> >    the only way, component that should be given a chance to act here;
> >
> > 4. Executes a task in prepared environment;
> >
> > 5. Monitor the execution of a task (from an external PoV);
> >
> > 6. Collect the status messages that tasks will send;
> >
> >    a. Forward the status messages to the appropriate job components,
> >       such as the result plugins.
> >
> >    b. Depending on the content of messages, such as the ones
> >       containing "status: started" or "status: finished", interfere in
> >       the Task execution status, and consequently, in the Job
> >       execution status.
> >
> > 7. Verify, warn the user, and attempt to clean up stray tasks.  This
> >    may be for instance, necessary if a Task on a container seems to
> >    be stuck, and the container can not be destroyed.  The same applies
> >    to process in some time of uninterruptile sleeps.
> >
> > Parallelization
> > ---------------
> >
> > Because the N(ext) Runner features allow for parallel execution of tasks,
> > all other aspects of task execution coordination (fulfilling requirements,
> > collecting results, etc) should not block each other.
> >
> > There are a number of strategies for concurrent programming in Python
> > these days, and the "avocado nrun" command currently makes use of
> > asyncio to have coroutines that spawn tasks and collect results
> > concurrently (in a cooperative preemptive model).  The actual language
> > or library features used is, IMO, less important than the end result.
> >
> > Suggested terminology
> > ---------------------
> >
> > Task execution has been requested
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > A Task whose execution was requested by the user.  All of the tasks on
> > a Job's ``test_suite`` attribute are requested tasks.
> >
> > If a software component deals with this type of task, it's advisable
> > that it refers to ``TASK_REQUESTED`` or ``requested_tasks`` or a
> > similar name that links to this definition.
> >
> > Task is being triaged
> > ~~~~~~~~~~~~~~~~~~~~~
> >
> > The details of the task are being analyzed, including and most
> > importantly the ability of the system to *attempt* to fulfill its
> > requirements. A task leaves triage and it's either considered
> > "discarded" or proceeds to be prepared and then executed.
> >
> > If a software component deals with this type of task, for instance if
> > a "task scheduler" is looking for runners matching the Task's kind, it
> > should keep it under a ``tasks_under_triage`` or mark the tasks as
> > ``UNDER_TRIAGE`` or ``TRIAGING`` a similar name that links to this
> > definition.
> >
> > Task is being prepared
> > ~~~~~~~~~~~~~~~~~~~~~~
> >
> > Task has left triage, and it has not been discarded, that is, it's
> > a candidate to be setup, and if it goes well, executed.
> >
> > The requirements for a task are being prepared in its reespective
> > isolation model/execution environment, that is, the spawner it'll
> > be executed with is known, and the setup actions will be visible
> > by the task.
> >
> > If a software component deals with this type of task, for instance the
> > implementation of resolution of specific requirements, it should
> > should keep it under a ``tasks_preparing`` or mark the tasks as
> > ``PREPARING`` or a similar name that links to this definition.
> >
> > Task is ready to be started
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Task has been prepared succesfully, and can now be executed.
> >
> > If a software component deals with this type of task, it should
> > should keep it under a ``tasks_ready`` or mark the tasks as
> > ``READY`` or a similar name that links to this definition.
> >
> > Task is being started
> > ~~~~~~~~~~~~~~~~~~~~~
> >
> > A hopefully short lived state, in which a task that is ready to be started
> > (see previous point) will be given to the reespective spawner to be started.
> >
> > If a software component deals with this type of task, it should should
> > keep it under a ``tasks_starting`` or mark the tasks as ``STARTING``
> > or a similar name that links to this definition.
> >
> > The spawner should know if the starting of the task succeeded or failed,
> > and the task should be categorized accordingly.
> >
> > Task has been started
> > ~~~~~~~~~~~~~~~~~~~~~
> >
> > A task was successfully started by a spawner.
> >
> > Note that it does *not* mean that the test that the task runner (say,
> > an "avocado-runner-$kind task-run" command) will run has already been
> > started.  This will be signalled by a "status: started" kind of
> > message.
> >
> > If a software component deals with this type of task, it should should
> > keep it under a ``tasks_started`` or mark the tasks as ``STARTED`` or
> > a similar name that links to this definition.
> >
> > Task has failed to start
> > ~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Quite self explanatory. If the spawner failed to start a task, it
> > should be kept under a ``tasks_failed_to_start`` structure or be
> > marked as ``FAILED_TO_START`` or a similar name that links to this
> > definition.
> >
> > Task is finished
> > ~~~~~~~~~~~~~~~~
> >
> > This means that the task has started, and is now finished.  There's no
> > associated meaning here about the pass/fail output of the test payload
> > executed by the task.
> >
> > It should be kept under a ``tasks_finished`` structure or be marked as
> > ``FINISHED`` or a similar name that links to this definition.
> >
> > Task has been interrupted
> > ~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > This means that the task has started, but has not finished and it's
> > past due.
> >
> > It should be kept under a ``tasks_interrupted`` structure or be marked
> > as ``INTERRUPTED`` or a similar name that links to this definition.
> >
> > Task workflow
> > -------------
> >
> > A task will usually be created from a Runnable.  A Runnable will, in
> > turn, almost always be created as part of the "avocado.core.resolver"
> > module.  Let's consider the following output of a resolution::
> >
> >   +--------------------------------------+
> >   | ReferenceResolution #1               |
> >   +--------------------------------------+
> >   | Reference: test.py                   |
> >   | Result: SUCCESS                      |
> >   | +----------------------------------+ |
> >   | | Resolution #1 (Runnable):        | |
> >   | |  - kind: python-unittest         | |
> >   | |  - uri: test.py:Test.test_1      | |
> >   | |  - requirements:                 | |
> >   | |    + file: mylib.py              | |
> >   | |    + package: gcc                | |
> >   | |    + package: libc-devel         | |
> >   | +----------------------------------+ |
> >   | +----------------------------------+ |
> >   | | Resolution #2 (Runnable):        | |
> >   | |  - kind: python-unittest         | |
> >   | |  - uri: test.py:Test.test_2      | |
> >   | |  - requirements:                 | |
> >   | |    + file: mylib.py              | |
> >   | +----------------------------------+ |
> >   +--------------------------------------+
> >
> > Two Runnables here will be transformed into Tasks.  The process
> > usually includes adding an identification (I) and a status URI (II)::
> >
> >   +----------------------------------+    
> > +----------------------------------+
> >   | Resolution #1 (Runnable):        |    | Resolution #2 (Runnable):       
> >  |
> >   |  - kind: python-unittest         |    |  - kind: python-unittest        
> >  |
> >   |  - uri: test.py:Test.test_1      |    |  - uri: test.py:Test.test_2     
> >  |
> >   |  - requirements:                 |    |  - requirements:                
> >  |
> >   |    + file: mylib.py              |    |    + file: mylib.py             
> >  |
> >   |    + package: gcc                |    
> > +----------------------------------+
> >   |    + package: libc-devel         |                    ||
> >   +----------------------------------+                    ||
> >                   ||                                      ||
> >                   ||                                      ||
> >                   \/                                      \/
> >   +----------------------------------+    
> > +----------------------------------+
> >   | Task #1:                         |    | Task #2:                        
> >  |
> >   |  - id: 1-test.py:Test.test_1  (I)|    |  - id: 2-test.py:Test.test_2  
> > (I)|
> >   |  - kind: python-unittest         |    |  - kind: python-unittest        
> >  |
> >   |  - uri: test.py:Test.test_1      |    |  - uri: test.py:Test.test_2     
> >  |
> >   |  - requirements:                 |    |  - requirements:                
> >  |
> >   |    + file: mylib.py              |    |    + file: mylib.py             
> >  |
> >   |    + package: gcc                |    |  - status uris:                 
> >  |
> >   |    + package: libc-devel         |    |    + 127.0.0.1:8080          
> > (II)|
> >   |  - status uris:                  |    
> > +----------------------------------+
> >   |    + 127.0.0.1:8080          (II)|
> >   +----------------------------------+
> >
> > In the end, a job will contain a ``test_suite`` with "Task #1" and
> > "Task #2".  It means that the execution of both tasks were requested
> > by the Job owner::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | REQUESTED                                                               
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | +----------------------------------+ 
> > +----------------------------------+ |
> >   | | Task #1:                         | | Task #2:                         
> > | |
> >   | |  - id: 1-test.py:Test.test_1     | |  - id: 2-test.py:Test.test_2     
> > | |
> >   | |  - kind: python-unittest         | |  - kind: python-unittest         
> > | |
> >   | |  - uri: test.py:Test.test_1      | |  - uri: test.py:Test.test_2      
> > | |
> >   | |  - requirements:                 | |  - requirements:                 
> > | |
> >   | |    + file: mylib.py              | |    + file: mylib.py              
> > | |
> >   | |    + package: gcc                | |  - status uris:                  
> > | |
> >   | |    + package: libc-devel         | |    + 127.0.0.1:8080              
> > | |
> >   | |  - status uris:                  | 
> > +----------------------------------+ |
> >   | |    + 127.0.0.1:8080              |                                    
> >   |
> >   | +----------------------------------+                                    
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> > These tasks now will be triaged.  A suitable implementation will
> > move those tasks to a ``tasks_under_triage`` queue,  mark them as
> > ``UNDER_TRIAGE`` or some other strategy to differentiate the
> > tasks at this stage::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | UNDER_TRIAGE                                                            
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | +----------------------------------+ 
> > +----------------------------------+ |
> >   | | Task #1:                         | | Task #2:                         
> > | |
> >   | |  - id: 1-test.py:Test.test_1     | |  - id: 2-test.py:Test.test_2     
> > | |
> >   | |  - kind: python-unittest         | |  - kind: python-unittest         
> > | |
> >   | |  - uri: test.py:Test.test_1      | |  - uri: test.py:Test.test_2      
> > | |
> >   | |  - requirements:                 | |  - requirements:                 
> > | |
> >   | |    + file: mylib.py              | |    + file: mylib.py              
> > | |
> >   | |    + package: gcc                | |  - status uris:                  
> > | |
> >   | |    + package: libc-devel         | |    + 127.0.0.1:8080              
> > | |
> >   | |  - status uris:                  | 
> > +----------------------------------+ |
> >   | |    + 127.0.0.1:8080              |                                    
> >   |
> >   | +----------------------------------+                                    
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> > Iteration I
> > ~~~~~~~~~~~
> >
> > Task #1 is selected on the first iteration, and it's found that:
> >
> > 1. A suitable runner for tasks of kind ``python-unittest`` exists
> >
> > 2. The ``mylib.py`` requirement is already present on the current
> >    environment
> >
> > 3. The ``gcc`` and ``libc-devel`` packages are not installed in the
> >    current environment
> >
> > 4. The system is capable of *attempting* to fulfill "package" types of
> >    requirements.
> >
> > Task #1 will then be prepared.  No further action is performed on the
> > first iteration, because no other relevant state exists (Task #2, the
> > only other requested task, has not progressed beyone its initial
> > stage)::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | UNDER_TRIAGE                                                            
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   |                                      
> > +----------------------------------+ |
> >   |                                      | Task #2:                         
> > | |
> >   |                                      |  - id: 2-test.py:Test.test_2     
> > | |
> >   |                                      |  - kind: python-unittest         
> > | |
> >   |                                      |  - uri: test.py:Test.test_2      
> > | |
> >   |                                      |  - requirements:                 
> > | |
> >   |                                      |    + file: mylib.py              
> > | |
> >   |                                      |  - status uris:                  
> > | |
> >   |                                      |    + 127.0.0.1:8080              
> > | |
> >   |                                      
> > +----------------------------------+ |
> >   |                                                                         
> >   |
> >   |                                                                         
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | PREPARING                                                               
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | +----------------------------------+                                    
> >   |
> >   | | Task #1:                         |                                    
> >   |
> >   | |  - id: 1-test.py:Test.test_1     |                                    
> >   |
> >   | |  - kind: python-unittest         |                                    
> >   |
> >   | |  - uri: test.py:Test.test_1      |                                    
> >   |
> >   | |  - requirements:                 |                                    
> >   |
> >   | |    + file: mylib.py              |                                    
> >   |
> >   | |    + package: gcc                |                                    
> >   |
> >   | |    + package: libc-devel         |                                    
> >   |
> >   | |  - status uris:                  |                                    
> >   |
> >   | |    + 127.0.0.1:8080              |                                    
> >   |
> >   | +----------------------------------+                                    
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> > Iteration II
> > ~~~~~~~~~~~~
> >
> > On the second iteration, Task #2 is selected, and it's found that:
> >
> > 1. A suitable runner for tasks of kind ``python-unittest`` exists
> >
> > 2. The ``mylib.py`` requirement is already present on the current
> >    environment
> >
> > Task #2 is now ready to be started.  Possibily concurrently, the
> > setup of Task #1, selected as the single entry being prepared,
> > is having its requirements prepared::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | UNDER_TRIAGE                                                            
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   |                                                                         
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | READY                                                                   
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   |                                      
> > +----------------------------------+ |
> >   |                                      | Task #2:                         
> > | |
> >   |                                      |  - id: 2-test.py:Test.test_2     
> > | |
> >   |                                      |  - kind: python-unittest         
> > | |
> >   |                                      |  - uri: test.py:Test.test_2      
> > | |
> >   |                                      |  - requirements:                 
> > | |
> >   |                                      |    + file: mylib.py              
> > | |
> >   |                                      |  - status uris:                  
> > | |
> >   |                                      |    + 127.0.0.1:8080              
> > | |
> >   |                                      
> > +----------------------------------+ |
> >   |                                                                         
> >   |
> >   |                                                                         
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | PREPARING                                                               
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | +----------------------------------+                                    
> >   |
> >   | | Task #1:                         |                                    
> >   |
> >   | |  - id: 1-test.py:Test.test_1     |                                    
> >   |
> >   | |  - kind: python-unittest         |                                    
> >   |
> >   | |  - uri: test.py:Test.test_1      |                                    
> >   |
> >   | |  - requirements:                 |                                    
> >   |
> >   | |    + file: mylib.py              |                                    
> >   |
> >   | |    + package: gcc                |                                    
> >   |
> >   | |    + package: libc-devel         |                                    
> >   |
> >   | |  - status uris:                  |                                    
> >   |
> >   | |    + 127.0.0.1:8080              |                                    
> >   |
> >   | +----------------------------------+                                    
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> > Iteration III
> > ~~~~~~~~~~~~~
> >
> > On the third iteration, there are no tasks left under triage, so
> > the action is now limited to tasks being prepared and ready to
> > be started.
> >
> > Supposing that the "status uri" 127.0.0.1:8080, was set by the job, as
> > its internal status server, it must be started before any task, to
> > avoid any status message being lost.
> >
> > At this stage, Task #2 is started, and Task #1 is now ready::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | UNDER_TRIAGE                                                            
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   |                                                                         
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | STARTED                                                                 
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   |                                      
> > +----------------------------------+ |
> >   |                                      | Task #2:                         
> > | |
> >   |                                      |  - id: 2-test.py:Test.test_2     
> > | |
> >   |                                      |  - kind: python-unittest         
> > | |
> >   |                                      |  - uri: test.py:Test.test_2      
> > | |
> >   |                                      |  - requirements:                 
> > | |
> >   |                                      |    + file: mylib.py              
> > | |
> >   |                                      |  - status uris:                  
> > | |
> >   |                                      |    + 127.0.0.1:8080              
> > | |
> >   |                                      
> > +----------------------------------+ |
> >   |                                                                         
> >   |
> >   |                                                                         
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | READY                                                                   
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | +----------------------------------+                                    
> >   |
> >   | | Task #1:                         |                                    
> >   |
> >   | |  - id: 1-test.py:Test.test_1     |                                    
> >   |
> >   | |  - kind: python-unittest         |                                    
> >   |
> >   | |  - uri: test.py:Test.test_1      |                                    
> >   |
> >   | |  - requirements:                 |                                    
> >   |
> >   | |    + file: mylib.py              |                                    
> >   |
> >   | |    + package: gcc                |                                    
> >   |
> >   | |    + package: libc-devel         |                                    
> >   |
> >   | |  - status uris:                  |                                    
> >   |
> >   | |    + 127.0.0.1:8080              |                                    
> >   |
> >   | +----------------------------------+                                    
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | STATUS SERVER "127.0.0.1:8080"                                          
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | Status Messages: []                                                     
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> >
> > Iteration IV
> > ~~~~~~~~~~~~
> >
> > On the fourth iteration, Task #1 is started::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | STARTED                                                                 
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | +----------------------------------+ 
> > +----------------------------------+ |
> >   | | Task #1:                         | | Task #2:                         
> > | |
> >   | |  - id: 1-test.py:Test.test_1     | |  - id: 2-test.py:Test.test_2     
> > | |
> >   | |  - kind: python-unittest         | |  - kind: python-unittest         
> > | |
> >   | |  - uri: test.py:Test.test_1      | |  - uri: test.py:Test.test_2      
> > | |
> >   | |  - requirements:                 | |  - requirements:                 
> > | |
> >   | |    + file: mylib.py              | |    + file: mylib.py              
> > | |
> >   | |    + package: gcc                | |  - status uris:                  
> > | |
> >   | |    + package: libc-devel         | |    + 127.0.0.1:8080              
> > | |
> >   | |  - status uris:                  | 
> > +----------------------------------+ |
> >   | |    + 127.0.0.1:8080              |                                    
> >   |
> >   | +----------------------------------+                                    
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | STATUS SERVER "127.0.0.1:8080"                                          
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | Status Messages:                                                        
> >   |
> >   | - {id: 2-test.py:Test.test_2, status: started}                          
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> > Note: the ideal level of parallelization is still to be defined, that
> > is, it may be that triaging and preparing and starting tasks, all run
> > concurrently.  An initial implementation that, on each iteration,
> > looks at all Task states, and attempts to advance them further,
> > blocking other Tasks as much as little as possible should be
> > acceptable.
> >
> > Iteration V
> > ~~~~~~~~~~~
> >
> > On the fifth iteration, the spawner reports that Task #2 is not alive 
> > anymore,
> > and the status server has received a message about it (and also a message 
> > about
> > Task #1 having started)::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | STATUS SERVER "127.0.0.1:8080"                                          
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | Status Messages:                                                        
> >   |
> >   | - {id: 2-test.py:Test.test_2, status: started}                          
> >   |
> >   | - {id: 1-test.py:Test.test_1, status: started}                          
> >   |
> >   | - {id: 2-test.py:Test.test_2, status: finished, result: pass}           
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> > Because of that, Task #2 is now considered ``FINISHED``::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | FINISHED                                                                
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   |                                      
> > +----------------------------------+ |
> >   |                                      | Task #2:                         
> > | |
> >   |                                      |  - id: 2-test.py:Test.test_2     
> > | |
> >   |                                      |  - kind: python-unittest         
> > | |
> >   |                                      |  - uri: test.py:Test.test_2      
> > | |
> >   |                                      |  - requirements:                 
> > | |
> >   |                                      |    + file: mylib.py              
> > | |
> >   |                                      |  - status uris:                  
> > | |
> >   |                                      |    + 127.0.0.1:8080              
> > | |
> >   |                                      
> > +----------------------------------+ |
> >   
> > +---------------------------------------------------------------------------+
> >
> > And Task #1 is still a ``STARTED`` task::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | STARTED                                                                 
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | +----------------------------------+                                    
> >   |
> >   | | Task #1:                         |                                    
> >   |
> >   | |  - id: 1-test.py:Test.test_1     |                                    
> >   |
> >   | |  - kind: python-unittest         |                                    
> >   |
> >   | |  - uri: test.py:Test.test_1      |                                    
> >   |
> >   | |  - requirements:                 |                                    
> >   |
> >   | |    + file: mylib.py              |                                    
> >   |
> >   | |    + package: gcc                |                                    
> >   |
> >   | |    + package: libc-devel         |                                    
> >   |
> >   | |  - status uris:                  |                                    
> >   |
> >   | |    + 127.0.0.1:8080              |                                    
> >   |
> >   | +----------------------------------+                                    
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> > Final Iteration
> > ~~~~~~~~~~~~~~~
> >
> > After a number of iterations with no status changes, and because of a
> > timeout implementation at the job level, it's decided that Task #1 is
> > not to be waited on.
> >
> > The spawner continues to inform that Task #1 is alive (from its PoV),
> > but no further status message has been received.  Provided the spawner
> > has support for that, it may attempt to clean up the task (such as
> > destroying a container or killing a process).  In the end, it's left
> > with::
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | STATUS SERVER "127.0.0.1:8080"                                          
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | Status Messages:                                                        
> >   |
> >   | - {id: 2-test.py:Test.test_2, status: started}                          
> >   |
> >   | - {id: 1-test.py:Test.test_1, status: started}                          
> >   |
> >   | - {id: 2-test.py:Test.test_2, status: finished, result: pass}           
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | FINISHED                                                                
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   |                                      
> > +----------------------------------+ |
> >   |                                      | Task #2:                         
> > | |
> >   |                                      |  - id: 2-test.py:Test.test_2     
> > | |
> >   |                                      |  - kind: python-unittest         
> > | |
> >   |                                      |  - uri: test.py:Test.test_2      
> > | |
> >   |                                      |  - requirements:                 
> > | |
> >   |                                      |    + file: mylib.py              
> > | |
> >   |                                      |  - status uris:                  
> > | |
> >   |                                      |    + 127.0.0.1:8080              
> > | |
> >   |                                      
> > +----------------------------------+ |
> >   
> > +---------------------------------------------------------------------------+
> >
> >   
> > +---------------------------------------------------------------------------+
> >   | INTERRUPTED                                                             
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >   | +----------------------------------+                                    
> >   |
> >   | | Task #1:                         |                                    
> >   |
> >   | |  - id: 1-test.py:Test.test_1     |                                    
> >   |
> >   | |  - kind: python-unittest         |                                    
> >   |
> >   | |  - uri: test.py:Test.test_1      |                                    
> >   |
> >   | |  - requirements:                 |                                    
> >   |
> >   | |    + file: mylib.py              |                                    
> >   |
> >   | |    + package: gcc                |                                    
> >   |
> >   | |    + package: libc-devel         |                                    
> >   |
> >   | |  - status uris:                  |                                    
> >   |
> >   | |    + 127.0.0.1:8080              |                                    
> >   |
> >   | +----------------------------------+                                    
> >   |
> >   
> > +---------------------------------------------------------------------------+
> >
> 
> I have attached a diagram with the phases of your proposal and the
> example you gave, for those that like diagrams.
> 
> > Tallying results
> > ~~~~~~~~~~~~~~~~
> >
> > The nrunner plugin should be able to provide meaningful results to the Job,
> > and consequently to the user, based on the resulting information on the
> > final iteration.
> >
> > Notice that some information will come, such as the ``PASS`` for the
> > first test, will come from the "result" given in a status message from
> > the task itself.  Some other status, such as the ``INTERRUPTED``
> > status for the second test will not come from a status message
> > received, but from a realization of the actual management of the task
> > execution.  It's expected to other information will also have to be
> > inferred, and "filled in" by the nrunner plugin implementation
> >
> > In the end, it's expected that results similar to this would be
> > presented::
> >
> >   JOB ID     : f59bd40b8ac905864c4558dc02b6177d4f422ca3
> >   JOB LOG    : 
> > /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/job.log
> >    (1/2) tests.py:Test.test_2: PASS (2.56 s)
> >    (2/2) tests.py:Test.test_1: INTERRUPT (900 s)
> >   RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | 
> > CANCEL 0
> >   JOB TIME   : 0.19 s
> >   JOB HTML   : 
> > /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/results.html
> >
> > Notice how Task #2 shows up before Task #1, because it was both started 
> > first,
> > and finished earlier.  There may be issues associated with the current UI to
> > dealt with regarding out of order task status updates.
> >
> > Summary
> > =======
> >
> > This proposal contains a number of items that can become GitHub issues
> > at this stage.  It also contains a general explanation of what I believe
> > are the crucial missing features to make the N(ext) Runner implementation
> > available to the general public.
> >
> > Feedback is highly appreciated, and it's expected that this document will
> > evolve into a better version, and possibly become a formal Blue Print.
> >
> > Thanks,
> > - Cleber.
> 
> I think the idea for the task scheduler is promising. I have some
> suggestions, but, as I said before if the text is structured in a
> self-contained blueprint way, it will be better for the discussion and
> documentation.
>

Cool, and thanks for the providing the blueprint "kickstart" PR.  I'll
work on top of that.

> Thanks,
> 
> Willian


Thanks,
- Cleber.

signature.asc
Description: PGP signature

Re: [Avocado-devel] RFC: N(ext) Runner - A proposal to the finish line

Reply via email to