Re: [openstack-dev] [Neutron] Introducing task oriented workflows

Andrew Laski Tue, 03 Jun 2014 07:37:30 -0700


On 05/22/2014 08:16 PM, Nachi Ueno wrote:

Hi Salvatore


Thank you for your posting this.

IMO, this topic shouldn't be limited for Neutron only.
Users wants consistent API between OpenStack project, right?

In Nova, a server has task_state, so Neutron should do same way.

We're moving away from the simple task_state field in Nova towards amore comprehensive task model. Seehttps://review.openstack.org/#/c/86938/ for the nova-spec around this.



2014-05-22 15:34 GMT-07:00 Salvatore Orlando <sorla...@nicira.com>:

As most of you probably know already, this is one of the topics discussed
during the Juno summit [1].
I would like to kick off the discussion in order to move towards a concrete
design.

Preamble: Considering the meat that's already on the plate for Juno, I'm not
advocating that whatever comes out of this discussion should be put on the
Juno roadmap. However, preparation (or yak shaving) activities that should
be identified as pre-requisite might happen during the Juno time frame
assuming that they won't interfere with other critical or high priority
activities.
This is also a very long post; the TL;DR summary is that I would like to
explore task-oriented communication with the backend and how it should be
reflected in the API - gauging how the community feels about this, and
collecting feedback regarding design, constructs, and related
tools/techniques/technologies.

At the summit a broad range of items were discussed during the session, and
most of them have been reported in the etherpad [1].

First, I think it would be good to clarify whether we're advocating a
task-based API, a workflow-oriented operation processing, or both.

--> About a task-based API

In a task-based API, most PUT/POST API operations would return tasks rather
than neutron resources, and users of the API will interact directly with
tasks.
I put an example in [2] to avoid cluttering this post with too much text.
As the API operation simply launches a task - the database state won't be
updated until the task is completed.

Needless to say, this would be a radical change to Neutron's API; it should
be carefully evaluated and not considered for the v2 API.
Even if it is easily recognisable that this approach has a few benefits, I
don't think this will improve usability of the API at all. Indeed this will
limit the ability of operating on a resource will a task is in execution on
it, and will also require neutron API users to change the paradigm the use
to interact with the API; for not mentioning the fact that it would look
weird if neutron is the only API endpoint in Openstack operating in this
way.
For the Neutron API, I think that its operations should still be
manipulating the database state, and possibly return immediately after that
(*) - a task, or to better say a workflow will then be started, executed
asynchronously, and update the resource status on completion.

--> On workflow-oriented operations

The benefits of it when it comes to easily controlling operations and
ensuring consistency in case of failures are obvious. For what is worth, I
have been experimenting introducing this kind of capability in the NSX
plugin in the past few months. I've been using celery as a task queue, and
writing the task management code from scratch - only to realize that the
same features I was implementing are already supported by taskflow.

I think that all parts of Neutron API can greatly benefit from introducing a
flow-based approach.
Some examples:
- pre/post commit operations in the ML2 plugin can be orchestrated a lot
better as a workflow, articulating operations on the various drivers in a
graph
- operation spanning multiple plugins (eg: add router interface) could be
simplified using clearly defined tasks for the L2 and L3 parts
- it would be finally possible to properly manage resources' "operational
status", as well as knowing whether the actual configuration of the backend
matches the database configuration
- synchronous plugins might be converted into asynchronous thus improving
their API throughput

Now, the caveats:
- during the sessions it was correctly pointed out that special care is
required with multiple producers (ie: api servers) as workflows should be
always executed in the correct order
- it is probably be advisable to serialize workflows operating on the same
resource; this might lead to unexpected situations (potentially to
deadlocks) with workflows operating on multiple resources
- if the API is asynchronous, and multiple workflows might be queued or in
execution at a given time, rolling back the DB operation on failures is
probably not advisable (it would not be advisable anyway in any asynchronous
framework). If the API instead stays synchronous the revert action for a
failed task might also restore the db state for a resource; but I think that
keeping the API synchronous missed a bit the point of this whole work - feel
free to show your disagreement here!
- some neutron workflows are actually initiated by agents; this is the case,
for instance, of the workflow for doing initial L2 and security group
configuration for a port.
- it's going to be a lot of work, and we need to devise a strategy to either
roll this changes in the existing plugins or just decide that future v3
plugins will use it.

 From the implementation side, I've done a bit of research and task queue
like celery only implement half of what is needed; conversely I have not
been able to find a workflow manager, at least in the python world, as
complete and suitable as taskflow.
So my preference will be obviously to use it, and contribute to it should we
realize Neutron needs some changes to suit its needs. Growing something
neutron-specific in tree is something I'd rule out.

(*) This is a bit different from what many plugins do, as they execute
requests synchronously and return only once the backend request is
completed.

--> Tasks and the API

The etherpad [1] contains a lot of interesting notes on this topic.
One important item it to understand how tasks affect the resource's status
to indicate their completion or failure. So far Neutron resource status
pretty much expresses its "fabric" status. For instance a port is "UP" if
it's been wired by the OVS agent; it often does not tell us whether the
actual resource configuration is exactly the desired one in the database.
For instance, if the ovs agent fails to apply security groups to a port, the
port stays "ACTIVE" and the user might never know there was an error and the
actual state diverged from the desired one.

It is therefore important to allow users to know whether the backend state
is in sync with the db; tools like taskflow will be really helpful to this
aim.
However, how should this be represented? The main options are to either have
a new attribute describing the resource sync state, or to extend the
semantics of the current status attribute to include also resource sync
state. I've put some rumblings on the subjects in the etherpad [3].
Still, it has been correctly pointed out that it might not be enough to know
that a resource is out of sync, but it is good to know which operation
exactly failed; this is where exposing somehow tasks through the API might
come handy.

For instance one could do something like:

GET /tasks?resource_id=<res_id>&task_state=FAILED

to get failure details for a given resource.

--> How to proceed

This is where I really don't know... and I will therefore be brief.
We'll probably need some more brainstorming to flush out all the details.
Once that is done, it might the case of evaluating what needs to be done and
whether it is better to target this work onto existing plugins, or moving it
out to v3 plugins (and hence do the actual work once the "core refactoring"
activities are complete).

Regards,
Salvatore


[1] https://etherpad.openstack.org/p/integrating-task-into-neutron
[2] http://paste.openstack.org/show/81184/
[3] https://etherpad.openstack.org/p/sillythings




_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Introducing task oriented workflows

Reply via email to