On 10/12/15 15:29, Matt Riedemann wrote:
>> In a simplified view of a detach volume we can say that the nova code >> does: >> 1 detach the volume from the instance >> 2 Inform cinder about the detach and call the terminate_connection on >> the cinder API. >> 3 delete the dbm recod in the nova DB > > We actually: > > 1. terminate the connection in cinder: > > https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2312 > > > 2. detach the volume > > https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2315 > > > 3. delete the volume (if marked for delete_on_termination): > > https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2348 > > > 4. delete the bdm in the nova db: > > https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L908 > > I am confused here, why are are you referring to the _shutdown_instance code? > So if terminate_connection fails, we shouldn't get to detach. And if > detach fails, we shouldn't get to delete. > >> >> If 2 fails the volumes get stuck in a detaching status and any further >> attempt to delete or detach the volume will fail: >> "Delete for volume <volume_id> failed: Volume <volume_id> is still >> attached, detach volume first. (HTTP 400)" >> >> And if you try to detach: >> "EROR (BadRequest): Invalid input received: Invalid volume: Unable to >> detach volume. Volume status must be 'in-use' and attach_status must >> be 'attached' to detach. Currently: status: 'detaching', >> attach_status: 'attached.' (HTTP 400)" >> >> at the moment the only way to clean up the situation is to hack the >> nova DB for deleting the bdm record and do some hack on the cinder >> side as well. >> We wanted a way to clean up the situation avoiding the manual hack to >> the nova DB. > > Can't cinder rollback state somehow if it's bogus or failed an > operation? For example, if detach failed, shouldn't we not be in > 'detaching' state? This is like auto-reverting task_state on server > instances when an operation fails so that we can reset or delete those > servers if needed. I think that is an option but probably it is part of the redesign of the cinder API (see the solution proposed #3), but It would be nice to get cinder guys commenting here. >> Solution proposed #3 >> Ok, so the solution is to fix the Cinder API and makes the interaction >> between Nova volume manager and that API robust. >> This time I was right (YAY) but as you can imagine this fix is not >> going to be an easy one and after talking with Cinder guys they >> clearly told me that thatt is going to be a massive change in the >> Cinder API and it is unlikely to land in the N(utella) or O(melette) >> release. > As Sean pointed out in another reply, I feel like what we're really > missing here is some rollback code in the case that delete fails so we > don't get in this stuck state and have to rely on deleting the BDMs > manually in the database just to delete the instance. > > We should rollback on delete fail 1 so that delete request 2 can pass > the 'check attach' checks again. The communication with cinder is async, Nova doesn't wait or check if the detach on cinder side has been executed correctly. Thanks -- Andrea Rosa __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev