Am 25. Mai 2011 11:19 schrieb Iustin Pop <[email protected]>:
> On Mon, May 23, 2011 at 06:57:03PM +0200, Michael Hanselmann wrote:
>> Commit aac4511a added CheckArguments to LUNodeMigrate with a call to
>> _CheckIAllocatorOrNode. I haven't understood why this was done and can't
>> see how it would've worked, at least on clusters with default settings.
>
> What is not clear about that?

(this taken care of in commit f8fa4175232)

>> --- a/lib/cmdlib.py
>> +++ b/lib/cmdlib.py
>> +  def CheckPrereq(self):
>> +    pass
>> +
>> +  def Exec(self, feedback_fn):
>> +    # Prepare jobs for migration instances
>> +    jobs = [
>> +      [opcodes.OpInstanceMigrate(instance_name=inst.name,
>> +                                 mode=self.op.mode,
>> +                                 live=self.op.live,
>> +                                 iallocator=self.op.iallocator,
>> +                                 target_node=self.op.target_node)]
>> +      for inst in _GetNodePrimaryInstances(self.cfg, self.op.node_name)
>> +      ]
>> +
>> +    assert (frozenset(self.glm.list_owned(locking.LEVEL_NODE)) ==
>> +            frozenset([self.op.node_name]))
>> +
>> +    return ResultWithJobs(jobs)
>
> Well well. This is completely suboptimal, as in case of iallocator runs
> and external storage, you will run the iallocator and get the cluster
> locks N times.
>
> The proper model is that this LU (LUNodeMigrate) takes an iallocator,
> that it runs the iallocator to compute the node strategy, and then
> submits N instance migrate jobs that use explicit nodes, without
> iallocator decisions to be done in the individual jobs, thus allowing
> more parallelism.

What I'm worried about here is that for some reason (e.g. a busy
queue) the migration jobs aren't run for a while, meanwhile
modifications on the cluster take place which change iallocator's
decision and may lead to job errors. There are other places with the
same situation, but here it can be avoided. With more changes upcoming
on instance migration/failover locks will be released as soon as
possible. Do you think risking job failures for performance benefits
is acceptable?

Michael

Reply via email to