Re: User Defined Workflow Execution Framework

Yasas Gunarathne Fri, 21 Sep 2018 09:54:43 -0700

Hi Marcus and Sudhakar,

Thank you for the detailed answers but still, I have few issues. Let me
explain a little bit more about the architecture of the Airavata Workflow
implementation.

[image: workflow.png]

Currently, Orchestrator is only capable of submitting single application
Experiments. But to support different types of workflows we need to have
more control over the processes (It is a little bit more complicated than
submitting a set of Processes). For that, we decided to use *Helix* at the
Orchestrator level.

Since the current experiment implementation cannot be used in such a
situation, we decided to use a separate set of models and APIs which
enables submitting and launching workflows. [1]

Workflow execution is also managed by *Helix* and it is at the
*Orchestrator* level. These workflows contain *Helix tasks* which are
responsible for handling workflows.

*i. Flow Starter Task*

This task is responsible for starting a specific branch of the Airavata
Workflow. In a single Airavata Workflow, there can be multiple starting
points. Flow starter is the only component which can accept input in the
standard InputDataObjectType.

*ii. Flow Terminator Task*

This task is responsible for terminating a specific branch of the Airavata
workflow. In a single workflow, there can be multiple terminating points.
Flow terminator is the only component which can output in the standard
OutputDataObjectType.

*iii. Flow Barrier Task*

This task works as a waiting component at a middle of a workflow. For
example, if there are two applications running and the results of both
applications are required to continue the workflow, barrier waits for both
applications to be completed before continuing.

*iv. Flow Divider Task*

This task opens up new branches in the middle of a workflow.

*v. Condition Handler Task*

This task is the path selection component of the workflow. It works similar
to an if statement.

*vi. Foreach Loop Task*

This task divides the input into specified portions and executes the task
loop parallelly for those input portions.

*vii. Do While Loop Task*

This task is capable of re-running a specified tasks loop until the result
meets a specified condition.

Other than these flow handler tasks it contains a type of task called
*ApplicationTask*, which is responsible for executing an application within
a workflow (workflow contains multiple *application tasks* connected with *flow
handler tasks*).

Within these ApplicationTasks, it is required to perform the similar
operation that is currently executed within *Orchestrator* in a single
*Experiment*. That is, creating a Process (which has a set of tasks to be
executed) and submitting it for execution.

I was planned previously to use the same approach that Orchestrator follows
currently when launching an experiment, also within the *ApplicationTask*,
but later realized that it cannot be done since Process execution performs
many experiment specific activities. That is the reason why I raised this
issue and proposed to make Process execution independent.

Output data staging (*Saving output files*), is planned to do within
*ApplicationTask* after the Process completes its execution (after
receiving the Process completion message). This is required to be done at
the Orchestrator level since outputs are used as inputs to other *application
tasks* within a workflow. (Outputs are persisted using the DataBlock table
- DataBlock is responsible for maintaining the data flow within the
workflow)

I think I am clear enough about the exact issue now and waiting to hear
from you again. Thank you again for the continuous support.

Regards

[1] https://github.com/apache/airavata/pull/203

On Fri, Sep 21, 2018 at 9:03 PM Christie, Marcus Aaron <[email protected]>
wrote:

>
>
> On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne <[email protected]>
> wrote:
>
> In the beginning, I tried to use the current ExperimentModel to implement
> workflows since it has workflow related characteristics as you have
> mentioned. It seemed to be designed at first keeping the workflow as a
> primary focus including even ExperimentType.WORKFLOW. But, apart from that
> and the database level one-to-many relationship with processes, there is no
> significant support provided for workflows.
>
> I believe processes should be capable of executing independently at their
> level of abstraction. But, in the current architecture processes execute
> some experiment related parts going beyond their scope. For example, saving
> experiment output along with process output after completing the process,
> which is not required for workflows. Here, submitting a message to indicate
> the process status should be enough.
>
>
> I think Sudhakar addressed a lot of your questions, but here are some
> additional thoughts:
>
> Processes just execute a set of tasks, which are specified by the
> Orchestrator. For workflows I would expect the Orchestrator to create a
> list of processes that each have a set of tasks that make sense for the
> running of the workflow.  For example, regarding saving experiment output,
> the Orchestrator could either create a process to save the experiment
> output or have the terminal process in the workflow have a final task to
> save the experiment output.
>
> If processes can execute independently, it doesn't need to keep
> experiment_id within itself in the table. Isn't it the responsibility of
> whatever the outer layer (Experiment/Workflow) to keep this mapping? WDYT?
> :)
>
> Possibly. I wonder how this relates to the recent data parsing efforts.
> It does make sense that we might want processes to execute independently
> because we do have the use case of running task dags separate from any
> experiment-like context.
>
> As you have mentioned we can keep an additional Experiment within Workflow
> Application to keeping the current Process execution unchanged. (Here the
> experiment is still executing a single application.) Is that what you meant?
>
>
> Not quite. I was suggesting that the Experiment is the workflow instance,
> having a list of processes where each process executes an application
> (corresponding roughly to nodes in the workflow dag).
>
> Thanks,
>
> Marcus
>

-- 
*Yasas Gunarathne*
Undergraduate at Department of Computer Science and Engineering
Faculty of Engineering - University of Moratuwa Sri Lanka
LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub
<https://github.com/yasgun> | Mobile : +94 77 4893616

Re: User Defined Workflow Execution Framework

Reply via email to