On Mon, Jun 29, 2015 at 03:45:56PM +0300, Duncan Thomas wrote: > On 29 June 2015 at 15:23, Dulko, Michal <michal.du...@intel.com> wrote: > > > There’s also some similar situations when we actually don’t lock on > > resources. For example – a cgsnapshot may get deleted while creating a > > consistencygroup from it. > > > > > > > > From my perspective it seems best to have atomic state changes and > > state-based exclusion in API. We would need some kind of > > currently_used_to_create_snapshot/volums/consistencygroups states to > > achieve that. Then we would be also able to return VolumeIsBusy exceptions > > so retrying a request would be on the user side. > > > > > > > I'd agree, except that gives quite a big behaviour change in the > tenant-facing API, which will break clients and scripts. Not sure how to > square that circle... I'd say V3 API except Mike might kill me...
I'd prefer not to add another item to the list of things to get HA, much less one on the scale of a new version. As far as I can see, we have 3 cases where we use or need to use locks: 1- Locking multiple writing access to a resource 2- Prevent modification of a resource being used for reading 3- Backend drivers 1- Locking multiple writing access to a resource These locks can most likely be avoided if we implement atomic state changes (with compare-and-swap) and use current state to prevent multiple writes on the same resource, since writes change the status of the resource. There's already a spec proposing this [1]. 2- Prevent modification of a resource in read use I only see 2 options here: - Limit numbers of readers to 1 and use Tooz's Locks as DLM. This would be implemented quite easily, although it would not be very efficient. - Implement shared locks in Tooz or in DB. One way to implement this in the DB would be to add a field with a counter of tasks currently using the resource for reading. Modifications to this counter would use a compare and swap to check the status when increasing the counter and doing the increase on the DB instead of doing it in the Cinder node. Status changes would also work with compare-and-swap and besides checking current status for availability it would check the counter to be 0. The drawback of the DB implementation is that an aborted operation would be locking the resource. But it could be solved if we use TaskFlow for operations and on the revert method we decrement the counter. One big advantage is that we don't need heartbeats to be periodically sent to prevent locks from being released and it's easy to pass the lock from the API to the Volume node. If we implement this in Tooz we could start implementing it in only 1 driver and recommend only using that until the rest are available. 3- Backend drivers Depending on the drivers they could not need locks, or they could do with file locks local to the node (since Cinder would be preventing multiple write access to the same resource) or they may need a DLM if they need, for example, to prevent simultaneous operations on the same pool from different nodes. For this case Tooz would be the best solution, since drivers should not access the DB and Tooz allows using file locks as well as distributed locking. Cheers, Gorka [1]: https://review.openstack.org/#/c/149894/ __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev