CamilleTeruel commented on PR #4896:
URL:
https://github.com/apache/incubator-devlake/pull/4896#issuecomment-1508529515
> not sure if we need this kind of mechanism, if so, is it appropriate to
sit it on the task level (organize subtasks in a plugin only), or should we do
it on the framework level (organize tasks and their subtasks across all
plugins). I would like to hear your thoughts on this.
It is currently the responsibility of the plugin to order its subtasks while
taking into account their dependencies.
I think it is valuable to let the plugin declare those dependencies and let
the framework order subtasks accordingly. I find it less error-prone because it
declares something that was tacit.
Declaring those dependencies might also be a first step to enable other
improvements.
Currently we have a hierarchy of different task types: plans, stages, tasks
and subtasks.
This has the merit of imposing a structure on task definition but it also
comes with costs:
- Dependencies are implicit:
- a subtask *might* depend on a subtask listed before in the same task
or on a subtask from an earlier stage. Since we don't know many of the implicit
dependencies are artificial and the execution could be parallelized more
efficiently.
- The plugin developer has to know and remember the execution rules:
stages and subtasks are executed sequentially and tasks in parallel. Those
execution details could be hidden from the plugins.
- This make the code to generate pipelines plan complex (~100 lines of
procedural code for each plugin)
- This introduces different protocols for data source plugins and metric
plugins. Everything could be declared with dependencies instead.
- Non-uniformity: there is a possible logic duplication between different
types of tasks for concerns like: execution order, resource management,
monitoring, caching, error handling, retries, ...
Adding dependency declaration at the subtask level does not solves all those
point but is a step in the good direction IMO.
Now regarding implementation, adding a `Dependencies` field in SubtaskMeta
looks like a better fit to me than calling a register function in `init`
functions. I understand that the purpose of the subtask center is to handle
subtask registration, and that calling a function to register each subtask is
needed for that. But I think that registration should be handled separately
with auto-discovery of subtasks by the framework.
For how to express dependencies, I think we should not force to declare that
a subtask depends on another specific subtask but instead declare that the task
require that some type of entity has been collected. I agree that it poses no
problem for dependencies between subtasks defined by the same plugin: an
extractor will always depend on the same convertor for example. But this
doesn't work for dependencies between subtasks from different different
plugins. For example, the DORA plugin requires some data to be present, like
CICDPipelineCommit or PullRequest, but doesn't and mustn't care which plugin
collected that data. So the declaration of dependencies should say "I need pull
requests" not "I depend on gitlab plugin". I think that declaring dependencies
that way would allow us to simplify plugin code and the plugin model of the
framework.
To sum up, here is what I think we sould do:
- Add `Requirements` and `Provisions` slice fields to SubtaskMeta
- The type of each item is not clear as it should be a supertype of raw
data, tool models and domain models. So some refactorings are needed to subsume
all those type if we don't want to end up with `[]interface{}` or `[]string`
type
- domain types may be deduced from provisions, so we can later remove
the `DomainTypes` field
- Let the framework order the subtasks of a plugin according to their
requirements and provisions
- Add auto-discovery of a plugin subtasks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]