On Fri, Dec 02, 2016 at 02:04:00PM +0000, 'Viktor Bachraty' via ganeti-devel
wrote:
> In case of Xen migrations, the most common failure case is when the
> instance fails to freeze so the migration fails with domains running on
> both target and source node. This patchs allows migrate --cleanup to
> recover by running AbortMigrate() in case the instance is running on
> both the source and target node.
>
> Signed-off-by: Viktor Bachraty <[email protected]>
Mostly LGTM. See below:
Thanks,
Brian.
> ---
> lib/cmdlib/instance_migration.py | 99
> ++++++++++++++++++++++++++++++----------
> 1 file changed, 74 insertions(+), 25 deletions(-)
>
> + result.Raise("Can't contact node %s" % self.cfg.GetNodeName(node_uuid))
> +
> + # Xen renames the instance during migration, unfortunately we don't have
> + # a nicer way of identifying that it's the same instance. This is an
> awful
> + # leaking abstraction.
Could we add a little more documentation than this to make life easier on
future ganeti devs? Eg.
# xm and xl have different (undocumented) naming conventions
# xm: (in tools/python/xen/xend/XendCheckpoint.py save() & restore())
# source dom name target dom name
# during copy: migrating-$DOM $DOM
# finalize migrate: <none> $DOM
# finished: <none> $DOM
#
# xl: (in tools/libxl/xl_cmdimpl.c migrate_domain() & migrate_receive())
# source dom name target dom name
# during copy: $DOM $DOM--incoming
# finalize migrate: $DOM--migratedaway $DOM
# finished: <none> $DOM
> + variants = [
> + name, 'migrating-' + name, name + '--incoming', name +
> '--migratedaway']
> + node_uuids = [node for node, data in instance_list.items()
> + if any(var in data.payload for var in variants)]
> + self.feedback_fn("* instance running on: %s" % ','.join(
> + self.cfg.GetNodeName(uuid) for uuid in node_uuids))
> + return node_uuids