On Fri, Jul 03, 2015 at 10:34:41AM +0100, Morten Rasmussen wrote: > > > IOW, since task groups include blocked load in the load_avg_contrib (see > > > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the > > > imbalance includes blocked load and hence env->imbalance >= > > > sum(task_h_load(p)) for all tasks p on the rq. Which leads to > > > detach_tasks() emptying the rq completely in the reported scenario where > > > blocked load > runnable load.
So IIRC we need the blocked load for groups for computing the per-cpu slices of the total weight, avg works really well for that. > I'm not against having a policy that sits somewhere in between, we just > have to agree it is the right policy and clean up the load-balance code > such that the implemented policy is clear. Right, for balancing its a tricky question, but mixing them without intent is, as you say, a bit of a mess. So clearly blocked load doesn't make sense for (new)idle balancing. OTOH it does make some sense for the regular periodic balancing, because there we really do care mostly about the averages, esp. so when we're overloaded -- but there are issues there too. Now we can't track them both (or rather we could, but overhead). I like Yuyang's load tracking rewrite, but it changes exactly this part, and I'm not sure I understand the full ramifications of that yet. One way out would be to split the load balancer into 3 distinct regions; 1) get a task on every CPU, screw everything else. 2) get each CPU fully utilized, still ignoring 'load' 3) when everybody is fully utilized, consider load. If we make find_busiest_foo() select one of these 3, and make calculate_imbalance() invariant to the metric passed in, and have things like cpu_load() and task_load() return different, but coherent, numbers depending on which region we're in, this almost sounds 'simple'. The devil is in the details, and the balancer is a hairy nest of details which will make the above non-trivial. But for 1) we could simply 'balance' on nr_running, for 2) we can 'balance' on runnable_avg and for 3) we'll 'balance' on load_avg (which will then include blocked load). Let me go play outside for a bit so that it can sink in what kind of nonsense my heat addled brain has just sprouted :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/