Sorry for delayed response. Was on PTO when this came out. Comments inline...

On 08/22/2018 09:23 PM, Matt Riedemann wrote:
Hi everyone,

I have started an etherpad for cells topics at the Stein PTG [1]. The main issue in there right now is dealing with cross-cell cold migration in nova.

At a high level, I am going off these requirements:

* Cells can shard across flavors (and hardware type) so operators would like to move users off the old flavors/hardware (old cell) to new flavors in a new cell.

So cell migrations are kind of the new release upgrade dance. Got it.

* There is network isolation between compute hosts in different cells, so no ssh'ing the disk around like we do today. But the image service is global to all cells.

Based on this, for the initial support for cross-cell cold migration, I am proposing that we leverage something like shelve offload/unshelve masquerading as resize. We shelve offload from the source cell and unshelve in the target cell. This should work for both volume-backed and non-volume-backed servers (we use snapshots for shelved offloaded non-volume-backed servers).

shelve was and continues to be a hack in order for users to keep an IPv4 address while not consuming compute resources for some amount of time. [1]

If cross-cell cold migration is similarly just about the user being able to keep their instance's IPv4 address while allowing an admin to move an instance's storage to another physical location, then my firm belief is that this kind of activity needs to be coordinated *externally to Nova*.

Each deployment is going to be different, and in all cases of cross-cell migration, the admins doing these move operations are going to need to understand various network, storage and failure domains that are particular to that deployment (and not something we have the ability to discover in any automated fashion).

Since we're not talking about live migration (thank all that is holy), I believe the safest and most effective way to perform such a cross-cell "migration" would be the following basic steps:

0. ensure that each compute node is associated with at least one nova host aggregate that is *only* in a single cell 1. shut down the instance (optionally snapshotting required local disk changes if the user is unfortunately using their root disk for application data) 2. "save" the instance's IP address by manually creating a port in Neutron and assigning the IP address manually to that port. this of course will be deployment-dependent since you will need to hope the saved IP address for the migrating instance is in a subnet range that is available in the target cell 3. migrate the volume manually. this will be entirely deployment and backend-dependent as smcginnis alluded to in a response to this thread 4. have the admin boot the instance in a host aggregate that is known to be in the target cell, passing --network port_id=$SAVED_PORT_WITH_IP and --volume $MIGRATED_VOLUME_UUID arguments as needed. the admin would need to do this because users don't know about host aggregates and, frankly, the user shouldn't know about host aggregates, cells, or any of this.

Best,
-jay

[1] ok, shelve also lets a user keep their instance ID. I don't care much about that.

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to