On Sep 20, 2018, at 12:15 AM, Yasas Gunarathne <[email protected]<mailto:[email protected]>> wrote:
In the beginning, I tried to use the current ExperimentModel to implement workflows since it has workflow related characteristics as you have mentioned. It seemed to be designed at first keeping the workflow as a primary focus including even ExperimentType.WORKFLOW. But, apart from that and the database level one-to-many relationship with processes, there is no significant support provided for workflows. I believe processes should be capable of executing independently at their level of abstraction. But, in the current architecture processes execute some experiment related parts going beyond their scope. For example, saving experiment output along with process output after completing the process, which is not required for workflows. Here, submitting a message to indicate the process status should be enough. I think Sudhakar addressed a lot of your questions, but here are some additional thoughts: Processes just execute a set of tasks, which are specified by the Orchestrator. For workflows I would expect the Orchestrator to create a list of processes that each have a set of tasks that make sense for the running of the workflow. For example, regarding saving experiment output, the Orchestrator could either create a process to save the experiment output or have the terminal process in the workflow have a final task to save the experiment output. If processes can execute independently, it doesn't need to keep experiment_id within itself in the table. Isn't it the responsibility of whatever the outer layer (Experiment/Workflow) to keep this mapping? WDYT? :) Possibly. I wonder how this relates to the recent data parsing efforts. It does make sense that we might want processes to execute independently because we do have the use case of running task dags separate from any experiment-like context. As you have mentioned we can keep an additional Experiment within Workflow Application to keeping the current Process execution unchanged. (Here the experiment is still executing a single application.) Is that what you meant? Not quite. I was suggesting that the Experiment is the workflow instance, having a list of processes where each process executes an application (corresponding roughly to nodes in the workflow dag). Thanks, Marcus
