On Fri, May 22, 2020 at 05:45:23PM -0300, Willian Rampazzo wrote: > Hello Cleber, > > Thanks for this RFC, it is appreciated. I see you have different > points for discussion in this RFC, it would be better to discuss them > in different places/ways, I will try to give my contribution to those > that I can, but I will hold my comments about the task scheduler. The > format of a blueprint, using the motivation and dividing into sections > would be better for my understanding for this kind of architecture > related discussion. >
Hi Willian, Ack. The individual smaller issues have been turned into "GitHub issues", so we can move the discussion about the scheduler to its blueprint. > On Wed, May 20, 2020 at 8:33 PM Cleber Rosa <cr...@redhat.com> wrote: > > > > Intro > > ===== > > > > This is a more technical follow up to the points given in a previous > > thread. Because that thread and the current N(ext) Runner documentation > > for a good context for this proposal, I encourage everyone to read them > > first: > > > > https://www.redhat.com/archives/avocado-devel/2020-May/msg00009.html > > > > https://avocado-framework.readthedocs.io/en/79.0/future/core/nrunner.html > > > > The N(ext) Runner allows for greater flexibility than the the current > > runner, so to be effective in delivering the N(ext) Runner for general > > usage, we must define the bare minimum that still needs to be > > implemented. > > > > Basic Job and Task execution > > ============================ > > > > An Task, within the context of the N(ext) Runner, is described as "one > > specific instance/occurrence of the execution of a runnable with its > > respective runner". > > > > A Task is a very important building block for Avocado Job, and running > > an Avocado Job means, to a large extent, running a number of Tasks. > > The Tasks that need to be executed in a Job, are created during > > the ``create_test_suite()`` phase: > > > > > > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.create_test_suite > > > > And are kept in the Job's ``test_suite`` attribute: > > > > > > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.test_suite > > > > Running the tests, then, happens during the ``run_tests()`` phase: > > > > > > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.run_tests > > > > During the ``run_tests()`` phase, a plugin that run test suites on a > > job is chosen, based on the ``run.test_runner`` configuration. > > The current "work in progress" implementation for the N(ext) Runner, > > can be activated either by setting that configuration key to ``nrunner``, > > which can be easily done on the command line too:: > > > > avocado run --test-runner=nrunner /bin/true > > > > A general rule for measuring the quality and completeness of the > > ``nrunner`` implementation is to run the same jobs with the current > > runner, and compare its behavior and output with that of the > > ``nrunner``. For here on, we'll call this simply the "nrunner > > plugin". > > > > Known issues and limitations of the current implementation > > ========================================================== > > > > Different Test IDs > > ------------------ > > > > When running tests with the current runner, the Test IDs are different:: > > > > $ avocado run --test-runner=runner --json=- -- /bin/true /bin/false > > /bin/uname | grep \"id\" > > "id": "1-/bin/true", > > "id": "2-/bin/false", > > "id": "3-/bin/uname", > > > > $ avocado run --test-runner=nrunner --json=- -- /bin/true /bin/false > > /bin/uname | grep \"id\" > > "id": "1-1-/bin/true", > > "id": "2-2-/bin/false", > > "id": "3-3-/bin/uname", > > > > The goal is to make the IDs the same. > > > > In my opinion, this seems to be a simple issue that is easily tracked > on GitHub. If we are going to keep the output of the nrunner just like > the current runner, there is not much to discuss, only implement. > Ack, and it's done now. > > Inability to run Tasks other than exec, exec-test, python-unittest (and > > noop) > > ----------------------------------------------------------------------------- > > > > The current implementation of the nrunner plugin is based on the fact that > > Tasks are already present at ``test_suite`` job attribute, and that running > > Tasks can be (but shouldn't always be) a matter of iterating of the result > > of its ``run()`` method. This is part of the actual code:: > > > > for status in task.run(): > > result_dispatcher.map_method('test_progress', False) > > statuses.append(status) > > > > The problem here is that only the Python classes implemented in the core > > "avocado.core.nrunner" module, and registered at: > > > > > > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.nrunner.RUNNERS_REGISTRY_PYTHON_CLASS > > > > The goal is to have all other Python classes that inherit from > > "avocado.core.nrunner.BaseRunner" available in such a registry. > > > > Agreed, we need to find a way to centralize supported runners, not > only implemented into the Avocado core. A registration method like we > are using for the new avocado parameters is an option. Another option > is to do the same way we register plugins today, utilizing the > setup.py. The problem I see with both solutions is breaking the > "standalone" effect of nrunner.py. Right now, I don't have a better > solution for it. > Exactly. In fact, I got to work on something which I believe matches your comment: https://github.com/avocado-framework/avocado/pull/3908 > > Inability to run Tasks with Spawners > > ------------------------------------ > > > > While the "avocado nrun" command makes use of the Spawners, the > > current implementation of the nrunner plugin described earlier, > > calls a Task's ``run()`` method directly, and clearly doesn't > > use spawners. > > > > The goal here is to leverage spawners so that other isolation > > models (or execution environments, depending how you look at > > processes, containers, etc) are supported. > > > > Agreed! If tasks are the default way to run a test on nrunner, > Spawners should be the default "way of transportation" to achieve it. > This discussion and its issues can be tracked as an epic on GitHub. > Ack, issue is here: https://github.com/avocado-framework/avocado/issues/3866 > > Unoptmized execution of Tasks (extra serialization/deserialization) > > ------------------------------------------------------------------- > > > > At this time, the nrunner plugin runs a Task directly through its > > ``run()`` method. Besides the earlier point of not supporting > > other isolation models/execution environments (that means not using > > spawners), there's an extra layer of work happening when running > > a task which is most often not necessary: turning a Task instance > > into a command line, and within its execution, turning it into a > > Task instance again. > > > > The goal is to support an optmized execution of the tasks, without > > having to turn them into command lines, and back into Task instances. > > The idea is already present in the spawning method definitions: > > > > > > https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.spawners.html#avocado.core.spawners.common.SpawnMethod.PYTHON_CLASS > > > > And a PoC on top of the ``nrun`` command was implemented here: > > > > > > https://github.com/avocado-framework/avocado/pull/3766/commits/ae57ee78df7f2935e40394cdfc72a34b458cdcef > > > > If I understood correctly, starting here, you would discuss the > architecture for a task scheduler. I understood the phases, but this > discussion should be self-contained, decoupled from the previous > content, better if it is in the blueprint format with motivation, and > divided into sections. > Ack. > > Proposal > > ======== > > > > Besides the known limitations listed previously, there are others that > > will appear along the way, and certainly some new challeges as we > > solve them. > > > > The goal of this proposal is to attempt to identify those challenges, > > and lay out a plan that can be tackled by the Avocado team/community > > and not by a single person. > > > > Task execution coordination goals > > --------------------------------- > > > > As stated earlier, to run a job, tasks must be executed. Differently > > than the current runner, the N(ext) Runner architecture allows those > > to be executed in a much more decoupled way. This characteristic will > > be maintained, but it needs to be adapted into the current Job > > execution. > > > > From a high level view, the nrunner plugin needs to: > > > > 1. Break apart from the "one at a time" Task execution model that it > > currently employs; > > > > 2. Check if a Tasks can be executed, that is, if its requirements can > > be fulfilled (the most basic requirement for a task is a matching > > runner; > > > > 3. Prepare for the execution of a task, such as the fulfillment of > > extra tasks requirements. The requirements resolver is one, if not > > the only way, component that should be given a chance to act here; > > > > 4. Executes a task in prepared environment; > > > > 5. Monitor the execution of a task (from an external PoV); > > > > 6. Collect the status messages that tasks will send; > > > > a. Forward the status messages to the appropriate job components, > > such as the result plugins. > > > > b. Depending on the content of messages, such as the ones > > containing "status: started" or "status: finished", interfere in > > the Task execution status, and consequently, in the Job > > execution status. > > > > 7. Verify, warn the user, and attempt to clean up stray tasks. This > > may be for instance, necessary if a Task on a container seems to > > be stuck, and the container can not be destroyed. The same applies > > to process in some time of uninterruptile sleeps. > > > > Parallelization > > --------------- > > > > Because the N(ext) Runner features allow for parallel execution of tasks, > > all other aspects of task execution coordination (fulfilling requirements, > > collecting results, etc) should not block each other. > > > > There are a number of strategies for concurrent programming in Python > > these days, and the "avocado nrun" command currently makes use of > > asyncio to have coroutines that spawn tasks and collect results > > concurrently (in a cooperative preemptive model). The actual language > > or library features used is, IMO, less important than the end result. > > > > Suggested terminology > > --------------------- > > > > Task execution has been requested > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > A Task whose execution was requested by the user. All of the tasks on > > a Job's ``test_suite`` attribute are requested tasks. > > > > If a software component deals with this type of task, it's advisable > > that it refers to ``TASK_REQUESTED`` or ``requested_tasks`` or a > > similar name that links to this definition. > > > > Task is being triaged > > ~~~~~~~~~~~~~~~~~~~~~ > > > > The details of the task are being analyzed, including and most > > importantly the ability of the system to *attempt* to fulfill its > > requirements. A task leaves triage and it's either considered > > "discarded" or proceeds to be prepared and then executed. > > > > If a software component deals with this type of task, for instance if > > a "task scheduler" is looking for runners matching the Task's kind, it > > should keep it under a ``tasks_under_triage`` or mark the tasks as > > ``UNDER_TRIAGE`` or ``TRIAGING`` a similar name that links to this > > definition. > > > > Task is being prepared > > ~~~~~~~~~~~~~~~~~~~~~~ > > > > Task has left triage, and it has not been discarded, that is, it's > > a candidate to be setup, and if it goes well, executed. > > > > The requirements for a task are being prepared in its reespective > > isolation model/execution environment, that is, the spawner it'll > > be executed with is known, and the setup actions will be visible > > by the task. > > > > If a software component deals with this type of task, for instance the > > implementation of resolution of specific requirements, it should > > should keep it under a ``tasks_preparing`` or mark the tasks as > > ``PREPARING`` or a similar name that links to this definition. > > > > Task is ready to be started > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > Task has been prepared succesfully, and can now be executed. > > > > If a software component deals with this type of task, it should > > should keep it under a ``tasks_ready`` or mark the tasks as > > ``READY`` or a similar name that links to this definition. > > > > Task is being started > > ~~~~~~~~~~~~~~~~~~~~~ > > > > A hopefully short lived state, in which a task that is ready to be started > > (see previous point) will be given to the reespective spawner to be started. > > > > If a software component deals with this type of task, it should should > > keep it under a ``tasks_starting`` or mark the tasks as ``STARTING`` > > or a similar name that links to this definition. > > > > The spawner should know if the starting of the task succeeded or failed, > > and the task should be categorized accordingly. > > > > Task has been started > > ~~~~~~~~~~~~~~~~~~~~~ > > > > A task was successfully started by a spawner. > > > > Note that it does *not* mean that the test that the task runner (say, > > an "avocado-runner-$kind task-run" command) will run has already been > > started. This will be signalled by a "status: started" kind of > > message. > > > > If a software component deals with this type of task, it should should > > keep it under a ``tasks_started`` or mark the tasks as ``STARTED`` or > > a similar name that links to this definition. > > > > Task has failed to start > > ~~~~~~~~~~~~~~~~~~~~~~~~ > > > > Quite self explanatory. If the spawner failed to start a task, it > > should be kept under a ``tasks_failed_to_start`` structure or be > > marked as ``FAILED_TO_START`` or a similar name that links to this > > definition. > > > > Task is finished > > ~~~~~~~~~~~~~~~~ > > > > This means that the task has started, and is now finished. There's no > > associated meaning here about the pass/fail output of the test payload > > executed by the task. > > > > It should be kept under a ``tasks_finished`` structure or be marked as > > ``FINISHED`` or a similar name that links to this definition. > > > > Task has been interrupted > > ~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > This means that the task has started, but has not finished and it's > > past due. > > > > It should be kept under a ``tasks_interrupted`` structure or be marked > > as ``INTERRUPTED`` or a similar name that links to this definition. > > > > Task workflow > > ------------- > > > > A task will usually be created from a Runnable. A Runnable will, in > > turn, almost always be created as part of the "avocado.core.resolver" > > module. Let's consider the following output of a resolution:: > > > > +--------------------------------------+ > > | ReferenceResolution #1 | > > +--------------------------------------+ > > | Reference: test.py | > > | Result: SUCCESS | > > | +----------------------------------+ | > > | | Resolution #1 (Runnable): | | > > | | - kind: python-unittest | | > > | | - uri: test.py:Test.test_1 | | > > | | - requirements: | | > > | | + file: mylib.py | | > > | | + package: gcc | | > > | | + package: libc-devel | | > > | +----------------------------------+ | > > | +----------------------------------+ | > > | | Resolution #2 (Runnable): | | > > | | - kind: python-unittest | | > > | | - uri: test.py:Test.test_2 | | > > | | - requirements: | | > > | | + file: mylib.py | | > > | +----------------------------------+ | > > +--------------------------------------+ > > > > Two Runnables here will be transformed into Tasks. The process > > usually includes adding an identification (I) and a status URI (II):: > > > > +----------------------------------+ > > +----------------------------------+ > > | Resolution #1 (Runnable): | | Resolution #2 (Runnable): > > | > > | - kind: python-unittest | | - kind: python-unittest > > | > > | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 > > | > > | - requirements: | | - requirements: > > | > > | + file: mylib.py | | + file: mylib.py > > | > > | + package: gcc | > > +----------------------------------+ > > | + package: libc-devel | || > > +----------------------------------+ || > > || || > > || || > > \/ \/ > > +----------------------------------+ > > +----------------------------------+ > > | Task #1: | | Task #2: > > | > > | - id: 1-test.py:Test.test_1 (I)| | - id: 2-test.py:Test.test_2 > > (I)| > > | - kind: python-unittest | | - kind: python-unittest > > | > > | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 > > | > > | - requirements: | | - requirements: > > | > > | + file: mylib.py | | + file: mylib.py > > | > > | + package: gcc | | - status uris: > > | > > | + package: libc-devel | | + 127.0.0.1:8080 > > (II)| > > | - status uris: | > > +----------------------------------+ > > | + 127.0.0.1:8080 (II)| > > +----------------------------------+ > > > > In the end, a job will contain a ``test_suite`` with "Task #1" and > > "Task #2". It means that the execution of both tasks were requested > > by the Job owner:: > > > > > > +---------------------------------------------------------------------------+ > > | REQUESTED > > | > > > > +---------------------------------------------------------------------------+ > > | +----------------------------------+ > > +----------------------------------+ | > > | | Task #1: | | Task #2: > > | | > > | | - id: 1-test.py:Test.test_1 | | - id: 2-test.py:Test.test_2 > > | | > > | | - kind: python-unittest | | - kind: python-unittest > > | | > > | | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 > > | | > > | | - requirements: | | - requirements: > > | | > > | | + file: mylib.py | | + file: mylib.py > > | | > > | | + package: gcc | | - status uris: > > | | > > | | + package: libc-devel | | + 127.0.0.1:8080 > > | | > > | | - status uris: | > > +----------------------------------+ | > > | | + 127.0.0.1:8080 | > > | > > | +----------------------------------+ > > | > > > > +---------------------------------------------------------------------------+ > > > > These tasks now will be triaged. A suitable implementation will > > move those tasks to a ``tasks_under_triage`` queue, mark them as > > ``UNDER_TRIAGE`` or some other strategy to differentiate the > > tasks at this stage:: > > > > > > +---------------------------------------------------------------------------+ > > | UNDER_TRIAGE > > | > > > > +---------------------------------------------------------------------------+ > > | +----------------------------------+ > > +----------------------------------+ | > > | | Task #1: | | Task #2: > > | | > > | | - id: 1-test.py:Test.test_1 | | - id: 2-test.py:Test.test_2 > > | | > > | | - kind: python-unittest | | - kind: python-unittest > > | | > > | | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 > > | | > > | | - requirements: | | - requirements: > > | | > > | | + file: mylib.py | | + file: mylib.py > > | | > > | | + package: gcc | | - status uris: > > | | > > | | + package: libc-devel | | + 127.0.0.1:8080 > > | | > > | | - status uris: | > > +----------------------------------+ | > > | | + 127.0.0.1:8080 | > > | > > | +----------------------------------+ > > | > > > > +---------------------------------------------------------------------------+ > > > > Iteration I > > ~~~~~~~~~~~ > > > > Task #1 is selected on the first iteration, and it's found that: > > > > 1. A suitable runner for tasks of kind ``python-unittest`` exists > > > > 2. The ``mylib.py`` requirement is already present on the current > > environment > > > > 3. The ``gcc`` and ``libc-devel`` packages are not installed in the > > current environment > > > > 4. The system is capable of *attempting* to fulfill "package" types of > > requirements. > > > > Task #1 will then be prepared. No further action is performed on the > > first iteration, because no other relevant state exists (Task #2, the > > only other requested task, has not progressed beyone its initial > > stage):: > > > > > > +---------------------------------------------------------------------------+ > > | UNDER_TRIAGE > > | > > > > +---------------------------------------------------------------------------+ > > | > > +----------------------------------+ | > > | | Task #2: > > | | > > | | - id: 2-test.py:Test.test_2 > > | | > > | | - kind: python-unittest > > | | > > | | - uri: test.py:Test.test_2 > > | | > > | | - requirements: > > | | > > | | + file: mylib.py > > | | > > | | - status uris: > > | | > > | | + 127.0.0.1:8080 > > | | > > | > > +----------------------------------+ | > > | > > | > > | > > | > > > > +---------------------------------------------------------------------------+ > > > > > > +---------------------------------------------------------------------------+ > > | PREPARING > > | > > > > +---------------------------------------------------------------------------+ > > | +----------------------------------+ > > | > > | | Task #1: | > > | > > | | - id: 1-test.py:Test.test_1 | > > | > > | | - kind: python-unittest | > > | > > | | - uri: test.py:Test.test_1 | > > | > > | | - requirements: | > > | > > | | + file: mylib.py | > > | > > | | + package: gcc | > > | > > | | + package: libc-devel | > > | > > | | - status uris: | > > | > > | | + 127.0.0.1:8080 | > > | > > | +----------------------------------+ > > | > > > > +---------------------------------------------------------------------------+ > > > > Iteration II > > ~~~~~~~~~~~~ > > > > On the second iteration, Task #2 is selected, and it's found that: > > > > 1. A suitable runner for tasks of kind ``python-unittest`` exists > > > > 2. The ``mylib.py`` requirement is already present on the current > > environment > > > > Task #2 is now ready to be started. Possibily concurrently, the > > setup of Task #1, selected as the single entry being prepared, > > is having its requirements prepared:: > > > > > > +---------------------------------------------------------------------------+ > > | UNDER_TRIAGE > > | > > > > +---------------------------------------------------------------------------+ > > | > > | > > > > +---------------------------------------------------------------------------+ > > > > > > +---------------------------------------------------------------------------+ > > | READY > > | > > > > +---------------------------------------------------------------------------+ > > | > > +----------------------------------+ | > > | | Task #2: > > | | > > | | - id: 2-test.py:Test.test_2 > > | | > > | | - kind: python-unittest > > | | > > | | - uri: test.py:Test.test_2 > > | | > > | | - requirements: > > | | > > | | + file: mylib.py > > | | > > | | - status uris: > > | | > > | | + 127.0.0.1:8080 > > | | > > | > > +----------------------------------+ | > > | > > | > > | > > | > > > > +---------------------------------------------------------------------------+ > > > > > > +---------------------------------------------------------------------------+ > > | PREPARING > > | > > > > +---------------------------------------------------------------------------+ > > | +----------------------------------+ > > | > > | | Task #1: | > > | > > | | - id: 1-test.py:Test.test_1 | > > | > > | | - kind: python-unittest | > > | > > | | - uri: test.py:Test.test_1 | > > | > > | | - requirements: | > > | > > | | + file: mylib.py | > > | > > | | + package: gcc | > > | > > | | + package: libc-devel | > > | > > | | - status uris: | > > | > > | | + 127.0.0.1:8080 | > > | > > | +----------------------------------+ > > | > > > > +---------------------------------------------------------------------------+ > > > > Iteration III > > ~~~~~~~~~~~~~ > > > > On the third iteration, there are no tasks left under triage, so > > the action is now limited to tasks being prepared and ready to > > be started. > > > > Supposing that the "status uri" 127.0.0.1:8080, was set by the job, as > > its internal status server, it must be started before any task, to > > avoid any status message being lost. > > > > At this stage, Task #2 is started, and Task #1 is now ready:: > > > > > > +---------------------------------------------------------------------------+ > > | UNDER_TRIAGE > > | > > > > +---------------------------------------------------------------------------+ > > | > > | > > > > +---------------------------------------------------------------------------+ > > > > > > +---------------------------------------------------------------------------+ > > | STARTED > > | > > > > +---------------------------------------------------------------------------+ > > | > > +----------------------------------+ | > > | | Task #2: > > | | > > | | - id: 2-test.py:Test.test_2 > > | | > > | | - kind: python-unittest > > | | > > | | - uri: test.py:Test.test_2 > > | | > > | | - requirements: > > | | > > | | + file: mylib.py > > | | > > | | - status uris: > > | | > > | | + 127.0.0.1:8080 > > | | > > | > > +----------------------------------+ | > > | > > | > > | > > | > > > > +---------------------------------------------------------------------------+ > > > > > > +---------------------------------------------------------------------------+ > > | READY > > | > > > > +---------------------------------------------------------------------------+ > > | +----------------------------------+ > > | > > | | Task #1: | > > | > > | | - id: 1-test.py:Test.test_1 | > > | > > | | - kind: python-unittest | > > | > > | | - uri: test.py:Test.test_1 | > > | > > | | - requirements: | > > | > > | | + file: mylib.py | > > | > > | | + package: gcc | > > | > > | | + package: libc-devel | > > | > > | | - status uris: | > > | > > | | + 127.0.0.1:8080 | > > | > > | +----------------------------------+ > > | > > > > +---------------------------------------------------------------------------+ > > > > > > +---------------------------------------------------------------------------+ > > | STATUS SERVER "127.0.0.1:8080" > > | > > > > +---------------------------------------------------------------------------+ > > | Status Messages: [] > > | > > > > +---------------------------------------------------------------------------+ > > > > > > Iteration IV > > ~~~~~~~~~~~~ > > > > On the fourth iteration, Task #1 is started:: > > > > > > +---------------------------------------------------------------------------+ > > | STARTED > > | > > > > +---------------------------------------------------------------------------+ > > | +----------------------------------+ > > +----------------------------------+ | > > | | Task #1: | | Task #2: > > | | > > | | - id: 1-test.py:Test.test_1 | | - id: 2-test.py:Test.test_2 > > | | > > | | - kind: python-unittest | | - kind: python-unittest > > | | > > | | - uri: test.py:Test.test_1 | | - uri: test.py:Test.test_2 > > | | > > | | - requirements: | | - requirements: > > | | > > | | + file: mylib.py | | + file: mylib.py > > | | > > | | + package: gcc | | - status uris: > > | | > > | | + package: libc-devel | | + 127.0.0.1:8080 > > | | > > | | - status uris: | > > +----------------------------------+ | > > | | + 127.0.0.1:8080 | > > | > > | +----------------------------------+ > > | > > > > +---------------------------------------------------------------------------+ > > > > > > +---------------------------------------------------------------------------+ > > | STATUS SERVER "127.0.0.1:8080" > > | > > > > +---------------------------------------------------------------------------+ > > | Status Messages: > > | > > | - {id: 2-test.py:Test.test_2, status: started} > > | > > > > +---------------------------------------------------------------------------+ > > > > Note: the ideal level of parallelization is still to be defined, that > > is, it may be that triaging and preparing and starting tasks, all run > > concurrently. An initial implementation that, on each iteration, > > looks at all Task states, and attempts to advance them further, > > blocking other Tasks as much as little as possible should be > > acceptable. > > > > Iteration V > > ~~~~~~~~~~~ > > > > On the fifth iteration, the spawner reports that Task #2 is not alive > > anymore, > > and the status server has received a message about it (and also a message > > about > > Task #1 having started):: > > > > > > +---------------------------------------------------------------------------+ > > | STATUS SERVER "127.0.0.1:8080" > > | > > > > +---------------------------------------------------------------------------+ > > | Status Messages: > > | > > | - {id: 2-test.py:Test.test_2, status: started} > > | > > | - {id: 1-test.py:Test.test_1, status: started} > > | > > | - {id: 2-test.py:Test.test_2, status: finished, result: pass} > > | > > > > +---------------------------------------------------------------------------+ > > > > Because of that, Task #2 is now considered ``FINISHED``:: > > > > > > +---------------------------------------------------------------------------+ > > | FINISHED > > | > > > > +---------------------------------------------------------------------------+ > > | > > +----------------------------------+ | > > | | Task #2: > > | | > > | | - id: 2-test.py:Test.test_2 > > | | > > | | - kind: python-unittest > > | | > > | | - uri: test.py:Test.test_2 > > | | > > | | - requirements: > > | | > > | | + file: mylib.py > > | | > > | | - status uris: > > | | > > | | + 127.0.0.1:8080 > > | | > > | > > +----------------------------------+ | > > > > +---------------------------------------------------------------------------+ > > > > And Task #1 is still a ``STARTED`` task:: > > > > > > +---------------------------------------------------------------------------+ > > | STARTED > > | > > > > +---------------------------------------------------------------------------+ > > | +----------------------------------+ > > | > > | | Task #1: | > > | > > | | - id: 1-test.py:Test.test_1 | > > | > > | | - kind: python-unittest | > > | > > | | - uri: test.py:Test.test_1 | > > | > > | | - requirements: | > > | > > | | + file: mylib.py | > > | > > | | + package: gcc | > > | > > | | + package: libc-devel | > > | > > | | - status uris: | > > | > > | | + 127.0.0.1:8080 | > > | > > | +----------------------------------+ > > | > > > > +---------------------------------------------------------------------------+ > > > > Final Iteration > > ~~~~~~~~~~~~~~~ > > > > After a number of iterations with no status changes, and because of a > > timeout implementation at the job level, it's decided that Task #1 is > > not to be waited on. > > > > The spawner continues to inform that Task #1 is alive (from its PoV), > > but no further status message has been received. Provided the spawner > > has support for that, it may attempt to clean up the task (such as > > destroying a container or killing a process). In the end, it's left > > with:: > > > > > > +---------------------------------------------------------------------------+ > > | STATUS SERVER "127.0.0.1:8080" > > | > > > > +---------------------------------------------------------------------------+ > > | Status Messages: > > | > > | - {id: 2-test.py:Test.test_2, status: started} > > | > > | - {id: 1-test.py:Test.test_1, status: started} > > | > > | - {id: 2-test.py:Test.test_2, status: finished, result: pass} > > | > > > > +---------------------------------------------------------------------------+ > > > > > > +---------------------------------------------------------------------------+ > > | FINISHED > > | > > > > +---------------------------------------------------------------------------+ > > | > > +----------------------------------+ | > > | | Task #2: > > | | > > | | - id: 2-test.py:Test.test_2 > > | | > > | | - kind: python-unittest > > | | > > | | - uri: test.py:Test.test_2 > > | | > > | | - requirements: > > | | > > | | + file: mylib.py > > | | > > | | - status uris: > > | | > > | | + 127.0.0.1:8080 > > | | > > | > > +----------------------------------+ | > > > > +---------------------------------------------------------------------------+ > > > > > > +---------------------------------------------------------------------------+ > > | INTERRUPTED > > | > > > > +---------------------------------------------------------------------------+ > > | +----------------------------------+ > > | > > | | Task #1: | > > | > > | | - id: 1-test.py:Test.test_1 | > > | > > | | - kind: python-unittest | > > | > > | | - uri: test.py:Test.test_1 | > > | > > | | - requirements: | > > | > > | | + file: mylib.py | > > | > > | | + package: gcc | > > | > > | | + package: libc-devel | > > | > > | | - status uris: | > > | > > | | + 127.0.0.1:8080 | > > | > > | +----------------------------------+ > > | > > > > +---------------------------------------------------------------------------+ > > > > I have attached a diagram with the phases of your proposal and the > example you gave, for those that like diagrams. > > > Tallying results > > ~~~~~~~~~~~~~~~~ > > > > The nrunner plugin should be able to provide meaningful results to the Job, > > and consequently to the user, based on the resulting information on the > > final iteration. > > > > Notice that some information will come, such as the ``PASS`` for the > > first test, will come from the "result" given in a status message from > > the task itself. Some other status, such as the ``INTERRUPTED`` > > status for the second test will not come from a status message > > received, but from a realization of the actual management of the task > > execution. It's expected to other information will also have to be > > inferred, and "filled in" by the nrunner plugin implementation > > > > In the end, it's expected that results similar to this would be > > presented:: > > > > JOB ID : f59bd40b8ac905864c4558dc02b6177d4f422ca3 > > JOB LOG : > > /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/job.log > > (1/2) tests.py:Test.test_2: PASS (2.56 s) > > (2/2) tests.py:Test.test_1: INTERRUPT (900 s) > > RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | > > CANCEL 0 > > JOB TIME : 0.19 s > > JOB HTML : > > /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/results.html > > > > Notice how Task #2 shows up before Task #1, because it was both started > > first, > > and finished earlier. There may be issues associated with the current UI to > > dealt with regarding out of order task status updates. > > > > Summary > > ======= > > > > This proposal contains a number of items that can become GitHub issues > > at this stage. It also contains a general explanation of what I believe > > are the crucial missing features to make the N(ext) Runner implementation > > available to the general public. > > > > Feedback is highly appreciated, and it's expected that this document will > > evolve into a better version, and possibly become a formal Blue Print. > > > > Thanks, > > - Cleber. > > I think the idea for the task scheduler is promising. I have some > suggestions, but, as I said before if the text is structured in a > self-contained blueprint way, it will be better for the discussion and > documentation. > Cool, and thanks for the providing the blueprint "kickstart" PR. I'll work on top of that. > Thanks, > > Willian Thanks, - Cleber.
signature.asc
Description: PGP signature