On Wed, May 25, 2011 at 06:04:05PM +0200, Michael Hanselmann wrote: > Am 25. Mai 2011 11:19 schrieb Iustin Pop <[email protected]>: > > On Mon, May 23, 2011 at 06:57:03PM +0200, Michael Hanselmann wrote: > >> Commit aac4511a added CheckArguments to LUNodeMigrate with a call to > >> _CheckIAllocatorOrNode. I haven't understood why this was done and can't > >> see how it would've worked, at least on clusters with default settings. > > > > What is not clear about that? > > (this taken care of in commit f8fa4175232) > > >> --- a/lib/cmdlib.py > >> +++ b/lib/cmdlib.py > >> + def CheckPrereq(self): > >> + pass > >> + > >> + def Exec(self, feedback_fn): > >> + # Prepare jobs for migration instances > >> + jobs = [ > >> + [opcodes.OpInstanceMigrate(instance_name=inst.name, > >> + mode=self.op.mode, > >> + live=self.op.live, > >> + iallocator=self.op.iallocator, > >> + target_node=self.op.target_node)] > >> + for inst in _GetNodePrimaryInstances(self.cfg, self.op.node_name) > >> + ] > >> + > >> + assert (frozenset(self.glm.list_owned(locking.LEVEL_NODE)) == > >> + frozenset([self.op.node_name])) > >> + > >> + return ResultWithJobs(jobs) > > > > Well well. This is completely suboptimal, as in case of iallocator runs > > and external storage, you will run the iallocator and get the cluster > > locks N times. > > > > The proper model is that this LU (LUNodeMigrate) takes an iallocator, > > that it runs the iallocator to compute the node strategy, and then > > submits N instance migrate jobs that use explicit nodes, without > > iallocator decisions to be done in the individual jobs, thus allowing > > more parallelism. > > What I'm worried about here is that for some reason (e.g. a busy > queue) the migration jobs aren't run for a while, meanwhile > modifications on the cluster take place which change iallocator's > decision and may lead to job errors. There are other places with the > same situation, but here it can be avoided. With more changes upcoming > on instance migration/failover locks will be released as soon as > possible. Do you think risking job failures for performance benefits > is acceptable?
Hard to say. I think this is another case where we don't have clear semantics for multi-job operations and what their consistency model will be. That said, in this particular case, I believe the above model is bad due to the fact that, by requiring N acquires of cluster-wide locks, it is a generator of "busy queues" itself, hence we should try to avoid it. It's fine for now, but I want to have it fixed sometime in the future, so please add a TODO. iustin
