CamilleTeruel commented on PR #4896:
URL: 
https://github.com/apache/incubator-devlake/pull/4896#issuecomment-1508529515

   > not sure if we need this kind of mechanism, if so, is it appropriate to 
sit it on the task level (organize subtasks in a plugin only), or should we do 
it on the framework level (organize tasks and their subtasks across all 
plugins). I would like to hear your thoughts on this.
   
   It is currently the responsibility of the plugin to order its subtasks while 
taking into account their dependencies.
   I think it is valuable to let the plugin declare those dependencies and let 
the framework order subtasks accordingly. I find it less error-prone because it 
declares something that was tacit.
   
   Declaring those dependencies might also be a first step to enable other 
improvements.
   Currently we have a hierarchy of different task types: plans, stages, tasks 
and subtasks. 
   This has the merit of imposing a structure on task definition but it also 
comes with costs:
   - Dependencies are implicit: 
       - a subtask *might* depend on a subtask listed before in the same task 
or on a subtask from an earlier stage. Since we don't know many of the implicit 
dependencies are artificial and the execution could be parallelized more 
efficiently. 
       - The plugin developer has to know and remember the execution rules: 
stages and subtasks are executed sequentially and tasks in parallel. Those 
execution details could be hidden from the plugins.
   - This make the code to generate pipelines plan complex (~100 lines of 
procedural code for each plugin)
   - This introduces different protocols for data source plugins and metric 
plugins. Everything could be declared with dependencies instead.
   - Non-uniformity: there is a possible logic duplication between different 
types of tasks for concerns like: execution order, resource management, 
monitoring, caching, error handling, retries, ...
   
   Adding dependency declaration at the subtask level does not solves all those 
point but is a step in the good direction IMO.
   
   Now regarding implementation, adding a `Dependencies` field in SubtaskMeta 
looks like a better fit to me than calling a register function in `init` 
functions. I understand that the purpose of the subtask center is to handle 
subtask registration, and that calling a function to register each subtask is 
needed for that. But I think that registration should be handled separately 
with auto-discovery of subtasks by the framework.
   
   For how to express dependencies, I think we should not force to declare that 
a subtask depends on another specific subtask but instead declare that the task 
require that some type of entity has been collected. I agree that it poses no 
problem for dependencies between subtasks defined by the same plugin: an 
extractor will always depend on the same convertor for example. But this 
doesn't work for dependencies between subtasks from different different 
plugins. For example, the DORA plugin requires some data to be present, like 
CICDPipelineCommit or PullRequest, but doesn't and mustn't care which plugin 
collected that data. So the declaration of dependencies should say "I need pull 
requests" not "I depend on gitlab plugin". I think that declaring dependencies 
that way would allow us to simplify plugin code and the plugin model of the 
framework.
   
   To sum up, here is what I think we sould do:
   - Add `Requirements` and `Provisions` slice fields to SubtaskMeta
       - The type of each item is not clear as it should be a supertype of raw 
data, tool models and domain models. So some refactorings are needed to subsume 
all those type if we don't want to end up with `[]interface{}` or `[]string` 
type
       - domain types may be deduced from provisions, so we can later remove 
the `DomainTypes` field
   - Let the framework order the subtasks of a plugin according to their 
requirements and provisions
   - Add auto-discovery of a plugin subtasks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to