Hi Marcus and Sudhakar, Thank you for the detailed answers but still, I have few issues. Let me explain a little bit more about the architecture of the Airavata Workflow implementation.
[image: workflow.png] Currently, Orchestrator is only capable of submitting single application Experiments. But to support different types of workflows we need to have more control over the processes (It is a little bit more complicated than submitting a set of Processes). For that, we decided to use *Helix* at the Orchestrator level. Since the current experiment implementation cannot be used in such a situation, we decided to use a separate set of models and APIs which enables submitting and launching workflows. [1] Workflow execution is also managed by *Helix* and it is at the *Orchestrator* level. These workflows contain *Helix tasks* which are responsible for handling workflows. *i. Flow Starter Task* This task is responsible for starting a specific branch of the Airavata Workflow. In a single Airavata Workflow, there can be multiple starting points. Flow starter is the only component which can accept input in the standard InputDataObjectType. *ii. Flow Terminator Task* This task is responsible for terminating a specific branch of the Airavata workflow. In a single workflow, there can be multiple terminating points. Flow terminator is the only component which can output in the standard OutputDataObjectType. *iii. Flow Barrier Task* This task works as a waiting component at a middle of a workflow. For example, if there are two applications running and the results of both applications are required to continue the workflow, barrier waits for both applications to be completed before continuing. *iv. Flow Divider Task* This task opens up new branches in the middle of a workflow. *v. Condition Handler Task* This task is the path selection component of the workflow. It works similar to an if statement. *vi. Foreach Loop Task* This task divides the input into specified portions and executes the task loop parallelly for those input portions. *vii. Do While Loop Task* This task is capable of re-running a specified tasks loop until the result meets a specified condition. Other than these flow handler tasks it contains a type of task called *ApplicationTask*, which is responsible for executing an application within a workflow (workflow contains multiple *application tasks* connected with *flow handler tasks*). Within these ApplicationTasks, it is required to perform the similar operation that is currently executed within *Orchestrator* in a single *Experiment*. That is, creating a Process (which has a set of tasks to be executed) and submitting it for execution. I was planned previously to use the same approach that Orchestrator follows currently when launching an experiment, also within the *ApplicationTask*, but later realized that it cannot be done since Process execution performs many experiment specific activities. That is the reason why I raised this issue and proposed to make Process execution independent. Output data staging (*Saving output files*), is planned to do within *ApplicationTask* after the Process completes its execution (after receiving the Process completion message). This is required to be done at the Orchestrator level since outputs are used as inputs to other *application tasks* within a workflow. (Outputs are persisted using the DataBlock table - DataBlock is responsible for maintaining the data flow within the workflow) I think I am clear enough about the exact issue now and waiting to hear from you again. Thank you again for the continuous support. Regards [1] https://github.com/apache/airavata/pull/203 On Fri, Sep 21, 2018 at 9:03 PM Christie, Marcus Aaron <[email protected]> wrote: > > > On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne <[email protected]> > wrote: > > In the beginning, I tried to use the current ExperimentModel to implement > workflows since it has workflow related characteristics as you have > mentioned. It seemed to be designed at first keeping the workflow as a > primary focus including even ExperimentType.WORKFLOW. But, apart from that > and the database level one-to-many relationship with processes, there is no > significant support provided for workflows. > > I believe processes should be capable of executing independently at their > level of abstraction. But, in the current architecture processes execute > some experiment related parts going beyond their scope. For example, saving > experiment output along with process output after completing the process, > which is not required for workflows. Here, submitting a message to indicate > the process status should be enough. > > > I think Sudhakar addressed a lot of your questions, but here are some > additional thoughts: > > Processes just execute a set of tasks, which are specified by the > Orchestrator. For workflows I would expect the Orchestrator to create a > list of processes that each have a set of tasks that make sense for the > running of the workflow. For example, regarding saving experiment output, > the Orchestrator could either create a process to save the experiment > output or have the terminal process in the workflow have a final task to > save the experiment output. > > If processes can execute independently, it doesn't need to keep > experiment_id within itself in the table. Isn't it the responsibility of > whatever the outer layer (Experiment/Workflow) to keep this mapping? WDYT? > :) > > Possibly. I wonder how this relates to the recent data parsing efforts. > It does make sense that we might want processes to execute independently > because we do have the use case of running task dags separate from any > experiment-like context. > > As you have mentioned we can keep an additional Experiment within Workflow > Application to keeping the current Process execution unchanged. (Here the > experiment is still executing a single application.) Is that what you meant? > > > Not quite. I was suggesting that the Experiment is the workflow instance, > having a list of processes where each process executes an application > (corresponding roughly to nodes in the workflow dag). > > Thanks, > > Marcus > -- *Yasas Gunarathne* Undergraduate at Department of Computer Science and Engineering Faculty of Engineering - University of Moratuwa Sri Lanka LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub <https://github.com/yasgun> | Mobile : +94 77 4893616
