On 12/14/2015 11:24 AM, Andrea Rosa wrote:
On 10/12/15 15:29, Matt Riedemann wrote:
In a simplified view of a detach volume we can say that the nova code
does:
1 detach the volume from the instance
2 Inform cinder about the detach and call the terminate_connection on
the cinder API.
3 delete the dbm recod in the nova DB
We actually:
1. terminate the connection in cinder:
https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2312
2. detach the volume
https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2315
3. delete the volume (if marked for delete_on_termination):
https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2348
4. delete the bdm in the nova db:
https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L908
I am confused here, why are are you referring to the _shutdown_instance
code?
Because that's the code in the compute manager that calls cinder to
terminate the connection to the storage backend and detaches the volume
from the instance, which you pointed out in your email as part of
terminating the instance.
So if terminate_connection fails, we shouldn't get to detach. And if
detach fails, we shouldn't get to delete.
If 2 fails the volumes get stuck in a detaching status and any further
attempt to delete or detach the volume will fail:
"Delete for volume <volume_id> failed: Volume <volume_id> is still
attached, detach volume first. (HTTP 400)"
And if you try to detach:
"EROR (BadRequest): Invalid input received: Invalid volume: Unable to
detach volume. Volume status must be 'in-use' and attach_status must
be 'attached' to detach. Currently: status: 'detaching',
attach_status: 'attached.' (HTTP 400)"
at the moment the only way to clean up the situation is to hack the
nova DB for deleting the bdm record and do some hack on the cinder
side as well.
We wanted a way to clean up the situation avoiding the manual hack to
the nova DB.
Can't cinder rollback state somehow if it's bogus or failed an
operation? For example, if detach failed, shouldn't we not be in
'detaching' state? This is like auto-reverting task_state on server
instances when an operation fails so that we can reset or delete those
servers if needed.
I think that is an option but probably it is part of the redesign of the
cinder API (see the solution proposed #3), but It would be nice to get
cinder guys commenting here.
Solution proposed #3
Ok, so the solution is to fix the Cinder API and makes the interaction
between Nova volume manager and that API robust.
This time I was right (YAY) but as you can imagine this fix is not
going to be an easy one and after talking with Cinder guys they
clearly told me that thatt is going to be a massive change in the
Cinder API and it is unlikely to land in the N(utella) or O(melette)
release.
As Sean pointed out in another reply, I feel like what we're really
missing here is some rollback code in the case that delete fails so we
don't get in this stuck state and have to rely on deleting the BDMs
manually in the database just to delete the instance.
We should rollback on delete fail 1 so that delete request 2 can pass
the 'check attach' checks again.
The communication with cinder is async, Nova doesn't wait or check if
the detach on cinder side has been executed correctly.
Yeah, I guess nova gets the 202 back:
http://logs.openstack.org/18/258118/2/check/gate-tempest-dsvm-full-ceph/7a5290d/logs/screen-n-cpu.txt.gz#_2015-12-16_03_30_43_990
Should nova be waiting for detach to complete before it tries deleting
the volume (in the case that delete_on_termination=True in the bdm)?
Should nova be waiting (regardless of volume delete) for the volume
detach to complete - or timeout and fail the instance delete if it doesn't?
Thanks
--
Andrea Rosa
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
Thanks,
Matt Riedemann
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev