[openstack-dev] [nova] theoretical race between live migration and resource audit?

Chris Friesen Thu, 09 Jun 2016 14:45:31 -0700

Hi,

I'm wondering if we might have a race between live migration and the resourceaudit. I've included a few people on the receiver list that have workeddirectly with this code in the past.


In _update_available_resource() we have code that looks like this:

instances = objects.InstanceList.get_by_host_and_node()
self._update_usage_from_instances()
migrations = objects.MigrationList.get_in_progress_by_host_and_node()
self._update_usage_from_migrations()

In post_live_migration_at_destination() we do this (updating the host and nodeas well as the task state):

            instance.host = self.host
            instance.task_state = None
            instance.node = node_name
            instance.save(expected_task_state=task_states.MIGRATING)


And in _post_live_migration() we update the migration status to "completed":
        if migrate_data and migrate_data.get('migration'):
            migrate_data['migration'].status = 'completed'
            migrate_data['migration'].save()

Both of the latter routines are not serialized by theCOMPUTE_RESOURCE_SEMAPHORE, so they can race relative to the code in_update_available_resource().



I'm wondering if we can have a situation like this:

1) migration in progress

2) We start running _update_available_resource() on destination, and we callinstances = objects.InstanceList.get_by_host_and_node(). This will not returnthe migration, because it is not yet on the destination host.3) The migration completes and we call post_live_migration_at_destination(),which sets the host/node/task_state on the instance.4) In _update_available_resource() on destination, we call migrations =objects.MigrationList.get_in_progress_by_host_and_node(). This will return themigration for the instance in question, but when we runself._update_usage_from_migrations() the uuid will not be in "instances" and sowe will use the instance from the newly-queried migration. We will then ignorethe instance because it is not in a "migrating" state.

Am I imagining things, or is there a race here? If so, the negative effectswould be that the resources of the migrating instance would be "lost", allowinga newly-scheduled instance to claim the same resources (PCI devices, pinnedCPUs, etc.)


Chris

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] theoretical race between live migration and resource audit?

Reply via email to