It's a weird question, so I'll explain.

An issue came up in IRC today where someone was trying to live migrate an instance to a specified host, and the RetryFilter in the scheduler was kicking out the specified host, even though other similar instances were live migrating to that specified host successfully.

After some DB debugging, we figured out that the instance that failed to live migrate has a persisted request spec which listed the specified host as an originally attempted host during the initial instance create. The RetryFilter was tripping up on this during live migration saying, essentially, "you've already tried that host, sorry".

This was confusing because the live migration task in conductor actually manually handles retries if pre-migration checks fail on the selected destination host. This is why we have the "migrate_max_retries" config option.

The actual fix for this is trivial:

https://review.openstack.org/#/c/505771/

I wanted to bring it up here in case anyone had a good reason why we should not continue to exclude originally failed hosts during live migration, even if the admin is specifying one of those hosts for the live migration destination.

Presumably there was a good reason why the instance failed to build on a host originally, but that could be for any number of reasons: resource claim failed during a race, configuration issues, etc. Since we don't really know what originally happened, it seems reasonable to not exclude originally attempted build targets since the scheduler filters should still validate them during live migration (this is all assuming you're not using the 'force' flag with live migration - and if you are, all bets are off).

If people agree with doing this fix, then we also have to consider making a similar fix for other move operations like cold migrate, evacuate and unshelve. However, out of those other move operations, only cold migrate attempts any retries. If evacuate or unshelve fail on the target host, there is no retry.

--

Thanks,

Matt

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to