Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation

Joshua Harlow Tue, 19 Nov 2013 15:03:07 -0800

Can u explain a little how using celery achieves workflow reliability and 
avoids races (or mitigates spaghetti code)?


To me celery acts as a way to distribute tasks, but does not deal with actually 
forming a easily understandable way of knowing that a piece of code that u 
design is actually going to go through the various state transitions (or states 
& workflows) that u expect (this is a higher level mechanism that u can build 
on-top of a distribution system). So this means that NVP (or neutron or other?) 
must be maintaining an orchestration/engine layer on-top of celery to add on 
this additional set of code that 'drives' celery to accomplish a given workflow 
in a reliable manner.

This starts to sound pretty similar to what taskflow is doing, not being a 
direct competitor to a distributed task queue such as celery but providing this 
higher level mechanism which adds on these benefits; since they are needed 
anyway.

To me these benefits currently are (may get bigger in the future):

1. A way to define a workflow (in a way that is not tied to celery, since 
celeries '@task' decorator ties u to celeries internal implementation).
     - This includes ongoing work to determine how to easily define a 
state-machine in a way that is relevant to cinder (and other projects).
2. A way to keep track of the state that the workflow goes through (this brings 
along resumption, progress information… when u track at the right level).
3. A way to execute that workflow reliably (potentially using celery, rpc, 
local threads, other future hotness)
     - This becomes important when u ask yourself how did u plan on testing 
celery in the gate/jenkins/CI?
4. A way to guarantee that the workflow upon failure is *automatically* resumed 
by some other entity.

More details @ http://www.slideshare.net/harlowja/taskflow-27820295

From: Salvatore Orlando <sorla...@nicira.com<mailto:sorla...@nicira.com>>
Date: Tuesday, November 19, 2013 2:22 PM
To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Cc: Joshua Harlow <harlo...@yahoo-inc.com<mailto:harlo...@yahoo-inc.com>>, 
Isaku Yamahata <isaku.yamah...@gmail.com<mailto:isaku.yamah...@gmail.com>>, 
Robert Kukura <rkuk...@redhat.com<mailto:rkuk...@redhat.com>>
Subject: Re: [openstack-dev] [Neutron] Race condition between DB layer and 
plugin back-end implementation

For what is worth we have considered this aspect from the perspective of the 
Neutron plugin my team maintains (NVP) during the past release cycle.

The synchronous model that most plugins with a controller on the backend 
currently implement is simple and convenient, but has some flaws:

- reliability: the current approach where the plugin orchestrates the backend 
is not really optimal when it comes to ensuring your running configuration 
(backend/control plane) is in sync with your desired configuration 
(neutron/mgmt plane); moreover in some case, due to neutron internals, API 
calls to the backend are wrapped in a transaction too, leading to very long SQL 
transactions, which are quite dangerous indeed. It is not easy to recover from 
a failure due to an eventlet thread deadlocking with a mysql transaction, where 
by 'recover' I mean ensuring neutron and backend state are in sync.

- maintainability: since handling rollback in case of failures on the backend 
and/or the db is cumbersome, this often leads to spaghetti code which is very 
hard to maintain regardless of the effort (ok, I agree here that this also 
depends on how good the devs are - most of the guys in my team are very good, 
but unfortunately they have me too...).

- performance & scalability:
    -  roundtrips to the backend take a non-negligible toll on the duration of 
an API call, whereas most Neutron API calls should probably just terminate at 
the DB just like a nova boot call does not wait for the VM to be ACTIVE to 
return.
    - we need to keep some operation serialized in order to avoid the mentioned 
race issues

For this reason we're progressively moving toward a change in the NVP plugin 
with a series of patches under this umbrella-blueprint [1].

For answering the issues mentioned by Isaku, we've been looking at a task 
management library with an efficient and reliable set of abstractions for 
ensuring operations are properly ordered thus avoiding those races (I agree on 
the observation on the pre/post commit solution).
We are currently looking at using celery [2] rather than taskflow; mostly 
because we've already have expertise on how to use it into our applications, 
and has very easy abstractions for workflow design, as well as for handling 
task failures.
Said that, I think we're still open to switch to taskflow should we become 
aware of some very good reason for using it.

Regards,
Salvatore

[1] 
https://blueprints.launchpad.net/neutron/+spec/nvp-async-backend-communication
[2] http://docs.celeryproject.org/en/master/index.html



On 19 November 2013 19:42, Joshua Harlow 
<harlo...@yahoo-inc.com<mailto:harlo...@yahoo-inc.com>> wrote:
And also of course, nearly forgot a similar situation/review in heat.

https://review.openstack.org/#/c/49440/

Except theres was/is dealing with stack locking (a heat concept).

On 11/19/13 10:33 AM, "Joshua Harlow" 
<harlo...@yahoo-inc.com<mailto:harlo...@yahoo-inc.com>> wrote:

>If you start adding these states you might really want to consider the
>following work that is going on in other projects.
>
>It surely appears that everyone is starting to hit the same problem (and
>joining efforts would produce a more beneficial result).
>
>Relevant icehouse etherpads:
>- https://etherpad.openstack.org/p/CinderTaskFlowFSM
>- https://etherpad.openstack.org/p/icehouse-oslo-service-synchronization
>
>And of course my obvious plug for taskflow (which is designed to be a
>useful library to help in all these usages).
>
>- https://wiki.openstack.org/wiki/TaskFlow
>
>The states u just mentioned start to line-up with
>https://wiki.openstack.org/wiki/TaskFlow/States_of_Task_and_Flow
>
>If this sounds like a useful way to go (joining efforts) then lets see how
>we can make it possible.
>
>IRC: #openstack-state-management is where I am usually at.
>
>On 11/19/13 3:57 AM, "Isaku Yamahata" 
><isaku.yamah...@gmail.com<mailto:isaku.yamah...@gmail.com>> wrote:
>
>>On Mon, Nov 18, 2013 at 03:55:49PM -0500,
>>Robert Kukura <rkuk...@redhat.com<mailto:rkuk...@redhat.com>> wrote:
>>
>>> On 11/18/2013 03:25 PM, Edgar Magana wrote:
>>> > Developers,
>>> >
>>> > This topic has been discussed before but I do not remember if we have
>>>a
>>> > good solution or not.
>>>
>>> The ML2 plugin addresses this by calling each MechanismDriver twice.
>>>The
>>> create_network_precommit() method is called as part of the DB
>>> transaction, and the create_network_postcommit() method is called after
>>> the transaction has been committed. Interactions with devices or
>>> controllers are done in the postcommit methods. If the postcommit
>>>method
>>> raises an exception, the plugin deletes that partially-created resource
>>> and returns the exception to the client. You might consider a similar
>>> approach in your plugin.
>>
>>Splitting works into two phase, pre/post, is good approach.
>>But there still remains race window.
>>Once the transaction is committed, the result is visible to outside.
>>So the concurrent request to same resource will be racy.
>>There is a window after pre_xxx_yyy before post_xxx_yyy() where
>>other requests can be handled.
>>
>>The state machine needs to be enhanced, I think. (plugins need
>>modification)
>>For example, adding more states like pending_{create, delete, update}.
>>Also we would like to consider serializing between operation of ports
>>and subnets. or between operation of subnets and network depending on
>>performance requirement.
>>(Or carefully audit complex status change. i.e.
>>changing port during subnet/network update/deletion.)
>>
>>I think it would be useful to establish reference locking policy
>>for ML2 plugin for SDN controllers.
>>Thoughts or comments? If this is considered useful and acceptable,
>>I'm willing to help.
>>
>>thanks,
>>Isaku Yamahata
>>
>>> -Bob
>>>
>>> > Basically, if concurrent API calls are sent to Neutron, all of them
>>>are
>>> > sent to the plug-in level where two actions have to be made:
>>> >
>>> > 1. DB transaction ? No just for data persistence but also to collect
>>>the
>>> > information needed for the next action
>>> > 2. Plug-in back-end implementation ? In our case is a call to the
>>>python
>>> > library than consequentially calls PLUMgrid REST GW (soon SAL)
>>> >
>>> > For instance:
>>> >
>>> > def create_port(self, context, port):
>>> >         with context.session.begin(subtransactions=True):
>>> >             # Plugin DB - Port Create and Return port
>>> >             port_db = super(NeutronPluginPLUMgridV2,
>>> > self).create_port(context,
>>> >
>>> port)
>>> >             device_id = port_db["device_id"]
>>> >             if port_db["device_owner"] == "network:router_gateway":
>>> >                 router_db = self._get_router(context, device_id)
>>> >             else:
>>> >                 router_db = None
>>> >             try:
>>> >                 LOG.debug(_("PLUMgrid Library: create_port()
>>>called"))
>>> > # Back-end implementation
>>> >                 self._plumlib.create_port(port_db, router_db)
>>> >             except Exception:
>>> >             Š
>>> >
>>> > The way we have implemented at the plugin-level in Havana (even in
>>> > Grizzly) is that both action are wrapped in the same "transaction"
>>>which
>>> > automatically rolls back any operation done to its original state
>>> > protecting mostly the DB of having any inconsistency state or left
>>>over
>>> > data if the back-end part fails.=.
>>> > The problem that we are experiencing is when concurrent calls to the
>>> > same API are sent, the number of operation at the plug-in back-end
>>>are
>>> > long enough to make the next concurrent API call to get stuck at the
>>>DB
>>> > transaction level, which creates a hung state for the Neutron Server
>>>to
>>> > the point that all concurrent API calls will fail.
>>> >
>>> > This can be fixed if we include some "locking" system such as
>>>calling:
>>> >
>>> > from neutron.common import utile
>>> > Š
>>> >
>>> > @utils.synchronized('any-name', external=True)
>>> > def create_port(self, context, port):
>>> > Š
>>> >
>>> > Obviously, this will create a serialization of all concurrent calls
>>> > which will ends up in having a really bad performance. Does anyone
>>>has a
>>> > better solution?
>>> >
>>> > Thanks,
>>> >
>>> > Edgar
>>> >
>>> >
>>> > _______________________________________________
>>> > OpenStack-dev mailing list
>>> > OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> >
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>--
>>Isaku Yamahata <isaku.yamah...@gmail.com<mailto:isaku.yamah...@gmail.com>>
>>
>>_______________________________________________
>>OpenStack-dev mailing list
>>OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation

Reply via email to