Am 30. Mai 2011 13:41 schrieb Iustin Pop <[email protected]>:
> On Wed, May 25, 2011 at 06:04:05PM +0200, Michael Hanselmann wrote:
>> What I'm worried about here is that for some reason (e.g. a busy
>> queue) the migration jobs aren't run for a while, meanwhile
>> modifications on the cluster take place which change iallocator's
>> decision and may lead to job errors. There are other places with the
>> same situation, but here it can be avoided. With more changes upcoming
>> on instance migration/failover locks will be released as soon as
>> possible. Do you think risking job failures for performance benefits
>> is acceptable?
>
> Hard to say. I think this is another case where we don't have clear
> semantics for multi-job operations and what their consistency model will
> be.
>
> That said, in this particular case, I believe the above model is bad due
> to the fact that, by requiring N acquires of cluster-wide locks, it is a
> generator of "busy queues" itself, hence we should try to avoid it.
>
> It's fine for now, but I want to have it fixed sometime in the future,
> so please add a TODO.
Interdiff:
--- a/lib/cmdlib.py
+++ b/lib/cmdlib.py
@@ -6689,6 +6689,11 @@ class LUNodeMigrate(LogicalUnit):
for inst in _GetNodePrimaryInstances(self.cfg, self.op.node_name)
]
+ # TODO: Run iallocator in this opcode and pass correct placement options to
+ # OpInstanceMigrate. Since other jobs can modify the cluster between
+ # running the iallocator and the actual migration, a good consistency model
+ # will have to be found.
+
assert (frozenset(self.glm.list_owned(locking.LEVEL_NODE)) ==
frozenset([self.op.node_name]))
Michael