Re: [openstack-dev] [nova] Why don't we unbind ports or terminate volume connections on shelve offload?

2017-05-31 Thread Matt Riedemann

On 4/13/2017 11:45 AM, Matt Riedemann wrote:
This came up in the nova/cinder meeting today, but I can't for the life 
of me think of why we don't unbind ports or terminate the connection 
volumes when we shelve offload an instance from a compute host.


When you unshelve, if the instance was shelved offloaded, the conductor 
asks the scheduler for a new set of hosts to build the instance on 
(unshelve it). That could be a totally different host.


So am I just missing something super obvious? Or is this the most latent 
bug ever?




Looks like this is a known bug:

https://bugs.launchpad.net/nova/+bug/1547142

The fix on the nova side apparently depends on some changes on the 
cinder side. The new v3.27 APIs in cinder might help with all of this, 
but it doesn't fix old attachments.


By the way, search for shelve + volume in nova bugs and you're rewarded 
with a treasure trove of bugs:


https://bugs.launchpad.net/nova/?field.searchtext=shelved+volume=Search%3Alist=NEW%3Alist=INCOMPLETE_WITH_RESPONSE%3Alist=INCOMPLETE_WITHOUT_RESPONSE%3Alist=CONFIRMED%3Alist=TRIAGED%3Alist=INPROGRESS%3Alist=FIXCOMMITTED=_reporter=_dupes=on_patch=_no_package=

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Why don't we unbind ports or terminate volume connections on shelve offload?

2017-04-13 Thread Matt Riedemann

On 4/13/2017 6:53 PM, Andrew Laski wrote:



On Thu, Apr 13, 2017, at 12:45 PM, Matt Riedemann wrote:

This came up in the nova/cinder meeting today, but I can't for the life
of me think of why we don't unbind ports or terminate the connection
volumes when we shelve offload an instance from a compute host.

When you unshelve, if the instance was shelved offloaded, the conductor
asks the scheduler for a new set of hosts to build the instance on
(unshelve it). That could be a totally different host.

So am I just missing something super obvious? Or is this the most latent
bug ever?


It's at the very least a hack, and may be a bug depending on what
behaviour is being seen while the instance is offloaded or unshelved.

The reason that networks and volumes are left in place is because it
is/was the only way to prevent them from being used by another instance
and causing a subsequent unshelve to fail. During the unshelve operation
it is expected that they will then be shifted over to the new host the
instance lands on if it switches hosts.

This is similar to how resize is handled.  From an implementation point
of view you can think of shelve as being a really really long
resize/migration operation.

There very well may be issues with this approach.



--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



I'm not advocating that we detach the volumes or ports - we can totally 
leave those coupled with the instance in the database (the 
port.device_id still points at the instance even though the port's 
binding details don't have a host set). The thing with the volume though 
is we need to terminate the connection to the backend for that host 
before we offload, because when we unshelve and initialize a new volume 
connection, there will now be two connections.


As noted elsewhere in the thread, there is a reported bug for this and 
some history around it. Calling terminate_connection will fix the issue 
for some backends in Cinder but not all (like it won't fix LVM). There 
is some other internal 'remove_export' call in Cinder that fixes it for 
LVM, but that is not exposed out of the API *except* through the 
os-detach API, which is precisely the thing we can't call for shelve 
offload for the reason you described.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Why don't we unbind ports or terminate volume connections on shelve offload?

2017-04-13 Thread Andrew Laski


On Thu, Apr 13, 2017, at 12:45 PM, Matt Riedemann wrote:
> This came up in the nova/cinder meeting today, but I can't for the life 
> of me think of why we don't unbind ports or terminate the connection 
> volumes when we shelve offload an instance from a compute host.
> 
> When you unshelve, if the instance was shelved offloaded, the conductor 
> asks the scheduler for a new set of hosts to build the instance on 
> (unshelve it). That could be a totally different host.
> 
> So am I just missing something super obvious? Or is this the most latent 
> bug ever?

It's at the very least a hack, and may be a bug depending on what
behaviour is being seen while the instance is offloaded or unshelved.

The reason that networks and volumes are left in place is because it
is/was the only way to prevent them from being used by another instance
and causing a subsequent unshelve to fail. During the unshelve operation
it is expected that they will then be shifted over to the new host the
instance lands on if it switches hosts.

This is similar to how resize is handled.  From an implementation point
of view you can think of shelve as being a really really long
resize/migration operation.

There very well may be issues with this approach.

> 
> -- 
> 
> Thanks,
> 
> Matt
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Why don't we unbind ports or terminate volume connections on shelve offload?

2017-04-13 Thread Chris Friesen

On 04/13/2017 10:45 AM, Matt Riedemann wrote:

This came up in the nova/cinder meeting today, but I can't for the life of me
think of why we don't unbind ports or terminate the connection volumes when we
shelve offload an instance from a compute host.

When you unshelve, if the instance was shelved offloaded, the conductor asks the
scheduler for a new set of hosts to build the instance on (unshelve it). That
could be a totally different host.

So am I just missing something super obvious? Or is this the most latent bug 
ever?


Does anyone actually use shelve?

Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev