On Fri, May 27, 2016, at 11:25 AM, Matthew Treinish wrote: > On Fri, May 27, 2016 at 05:52:51PM +0300, Vasyl Saienko wrote: > > Lucas, Andrew > > > > Thanks for fast response. > > > > On Fri, May 27, 2016 at 4:53 PM, Andrew Laski <and...@lascii.com> wrote: > > > > > > > > > > > On Fri, May 27, 2016, at 09:25 AM, Lucas Alvares Gomes wrote: > > > > Hi, > > > > > > > > Thanks for bringing this up Vasyl! > > > > > > > > > At the moment Nova with ironic virt_driver consider instance as > > > deleted, > > > > > while on Ironic side server goes to cleaning which can take a while. > > > > > As > > > > > result current implementation of Nova tempest tests doesn't work for > > > case > > > > > when Ironic is enabled. > > > > > > What is the actual failure? Is it a capacity issue because nodes do not > > > become available again quickly enough? > > > > > > > > The actual failure is that temepest community doesn't want to accept 1 > > option. > > https://review.openstack.org/315422/ > > And I'm not sure that it is the right way. > > No Andrew is right, this is a resource limitation in the gate. The > failures > you're hitting are caused by resource constraints in the gate and not > having > enough available nodes to run all the tests because deleted nodes are > still > cleaning (or doing another operation) and aren't available to nova for > booting > another guest. > > I -2d that patch because it's a workaround for the fundamental issue here > and > not actually an appropriate change for Tempest. What you've implemented > in that > patch is the equivalent of talking to libvirt or some other hypervisor > directly > to find out if something is actually deleted. It's a layer violation, > there is > never a reason that should be necessary especially in a test of the nova > api. > > > > > > > > > > > > There are two possible options how to fix it: > > > > > > > > > > Update Nova tempest test scenarios for Ironic case to wait when > > > cleaning is > > > > > finished and Ironic node goes to 'available' state. > > > > > > > > > > Mark instance as deleted in Nova only after cleaning is finished on > > > Ironic > > > > > side. > > > > > > > > > > I'm personally incline to 2 option. From user side successful instance > > > > > termination means that no instance data is available any more, and > > > nobody > > > > > can access/restore that data. Current implementation breaks this rule. > > > > > Instance is marked as successfully deleted while in fact it may be not > > > > > cleaned, it may fail to clean and user will not know anything about > > > > > it. > > > > > > > > > > > > > > I don't really like option #2, cleaning can take several hours > > > > depending on the configuration of the node. I think that it would be a > > > > really bad experience if the user of the cloud had to wait a really > > > > long time before his resources are available again once he deletes an > > > > instance. The idea of marking the instance as deleted in Nova quickly > > > > is aligned with our idea of making bare metal deployments > > > > look-and-feel like VMs for the end user. And also (one of) the > > > > reason(s) why we do have a separated state in Ironic for DELETING and > > > > CLEANING. > > > > > > > The resources will be available only if there are other available baremetal > > nodes in the cloud. > > User doesn't have ability to track for status of available resources > > without admin access. > > > > > > > I agree. From a user perspective once they've issued a delete their > > > instance should be gone. Any delay in that actually happening is purely > > > an internal implementation detail that they should not care about. > > > > > Delete is an async operation in Nova. There is never any immediacy here > it > always takes an indeterminate amount of time between it being issued by > the user > and the server actually going away. The disconnect here is that when > running > with the ironic driver the server disappears from Nova but the resources > aren't > freed back when that happens until the cleaning is done. I'm pretty sure > this is > different from all the other Nova drivers. > > I don't really have a horse in this race so whatever ends up being > decided for > the behavior here is fine. But, I think we need to be clear with what the > behavior here is and want we actually want. Personally, I don't see an > issue > with the node being in the deleting task_state for a long time because > that's > what is really happening while it's deleting. To me a delete is only > finished > when the resource is actually gone and it's consumed resources return to > the > pool.
I wouldn't argue against an instance hanging around in a deleting state for a long time. However at this time quota usage is not reduced until the instance is considered to have been deleted. I think those would need to be decoupled in order to leave instances in a deleting state. A user should not need to wait hours to get their quota back just because they wanted a baremetal machine. The burden of a long cleanup should fall on a deployer and their ability to manage capacity. But the issue here is just capacity. Whether or not we keep an instance in a deleting state, or when we release quota, doesn't change the Tempest failures from what I can tell. The suggestions below address that. > > > > > > > > > I think we should go with #1, but instead of erasing the whole disk > > > > for real maybe we should have a "fake" clean step that runs quickly > > > > for tests purposes only? > > > > > > Disabling the cleaning step (or having a fake one that does nothing) for > the > gate would get around the failures at least. It would make things work > again > because the nodes would be available right after Nova deletes them. > > -Matt Treinish > > > > > > > > At the gates we just waiting for bootstrap and callback from node when > > cleaning starts. All heavy operations are postponed. We can disable > > automated_clean, which means it is not tested. > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev