[openstack-dev] [mistral] [taskflow] Mistral TaskFlow integration summary

Kirill Izotov Thu, 10 Apr 2014 21:29:22 -0700

Hi everyone,

This is a summary to the prototype integration we did not too long ago: 
http://github.com/enykeev/mistral/pull/1. Hope it would shed some light on the 
aspects of the integration we are struggling with.


There is a possibility to build Mistral on top of TaskFlow as a library, but in 
order to meet the requirements dictated by Mistral users and use cases, both 
Mistral and TaskFlow should change.

There are two main sides of the story. One is engine. The other is flow control 
capabilities. 

1) THE ENGINE 
The current TaskFlow implementation of engine doesn't fit Mistral needs because 
it is synchronous, it blocks the thread, it requires us to store the reference 
to the particular engine to be able to get its status and suspend the execution 
and it lacks long-running task compatibility. To fix this problem in a solid 
and maintainable way, we need to split the engine into its synchronous and 
asynchronous counterparts.

Lazy engine should be async and atomic, it should not have its own state, 
instead it should rely on some kind of global state (db or in-memory, depending 
on a type of application). It should have at least two methods: run and 
task_complete. Run method should calculate the first batch of tasks and 
schedule them for executing (either put them in queue or spawn the threads). 
Task_complete should mark a certain task to be completed and then schedule the 
next batch of tasks that became available due to resolution of this one.

The desired use of lazy engine in Mistral is illustrated here: 
https://wiki.openstack.org/wiki/Mistral/Blueprints/ActionsDesign#Big_Picture. 
It should support long running tasks and survive engine process restart without 
loosing the state of the running actions. So it must be passive (lazy) and 
persistent. 

On Mistral side we are using Lazy engine by patching async.run directly to the 
API (or engine queue) and async.task_complete to the worker queue result 
channel (and the API for long running tasks). We are still sharing the same 
graph_analyzer, but instead of relying on loop and Futures, we are handling the 
execution ourselves in a scalable and robust way.

Then, on top of it you can build a sync engine by introducing Futures. You are 
using async.run() to schedule tasks by transforming them to Futures and then 
starting a loop, checking Futures for completion and sending their results to 
async.task_complete() which would produce even more Futures to check over. Just 
the same way TaskFlow do it right now.

The reason I'm proposing to extract Futures from async engine is because they 
won't work if we have multiple engine processes that should handle the task 
results concurrently (and without that there will be no scalability).

2) THE FLOW CONTROL CAPABILITIES

Since we treat TaskFlow as a library we expect them to provide us with a number 
of primitives to build our workflow with them. Most important of them to us for 
the moment are Direct Transitions, and Conditional Transitions. 

The current implementation of flow transitions in TaskFlow are built on top of 
data flow dependencies where each task provides some data to the flow and 
requires some data to be present prior being executed. In other words, you are 
starting to build your flow tree from the last task through the first one by 
adding their requirements to the tree. All the tasks of successfully finished 
flow should be successfully finished too. If one of the tasks finishes with 
error, the whole flow will be reverted back to its initial state 
unconditionally.

At the same time, Mistral use cases require direct control on the order of the 
task execution, with top-to-bottom scheme where the next task will be 
determined based on the results of the execution of the current one. This way 
to successfully finish a flow you don't have to execute all tasks in it. 
Besides, the error in execution of a particular task may cause execution of 
another one. The workflow examples (in pseudo DSL) are here: 
https://github.com/dzimine/mistral-workflows/tree/add-usecases

There is also a handful of small issues, but these two differences cover most 
basic parts of TaskFlow thus block us from integration and require substantial 
changes in TaskFlow engine design. Inability to make such changes will directly 
result in Mistral not being able to meet its requirements and thus rendering 
the whole project useless for its users.

-- 
Kirill Izotov

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [mistral] [taskflow] Mistral TaskFlow integration summary

Reply via email to