On Wed, Jul 09, 2014 at 05:11:20PM +0530, Preeti U Murthy wrote: > On 07/09/2014 04:13 PM, Peter Zijlstra wrote: > > On Wed, Jul 09, 2014 at 09:24:54AM +0530, Preeti U Murthy wrote: > >> In the example that I mention above, t1 and t2 are on the rq of cpu0; > >> while t1 is running on cpu0, t2 is on the rq but does not have cpu1 in > >> its cpus allowed mask. So during load balance, cpu1 tries to pull t2, > >> cannot do so, and hence LBF_ALL_PINNED flag is set and it jumps to > >> out_balanced. Note that there are only two sched groups at this level of > >> sched domain.one with cpu0 and the other with cpu1. In this scenario we > >> do not try to do active load balancing, atleast thats what the code does > >> now if LBF_ALL_PINNED flag is set. > > > > I think Vince is right in saying that in this scenario ALL_PINNED won't > > be set. move_tasks() will iterate cfs_rq::cfs_tasks, that list will also > > include the current running task. > > Hmm.. really? Because while dequeueing a task from the rq so as to > schedule it on a cpu, we delete its entry from the list of cfs_tasks on > the rq. > > list_del_init(&se->group_node) in account_entity_dequeue() does that.
But set_next_entity() doesn't call account_entity_dequeue(), only __dequeue_entity() to take it out of the rb-tree. > > And can_migrate_task() only checks for current after the pinning bits. > > > >> Continuing with the above explanation; when LBF_ALL_PINNED flag is > >> set,and we jump to out_balanced, we clear the imbalance flag for the > >> sched_group comprising of cpu0 and cpu1,although there is actually an > >> imbalance. t2 could still be migrated to say cpu2/cpu3 (t2 has them in > >> its cpus allowed mask) in another sched group when load balancing is > >> done at the next sched domain level. > > > > And this is where Vince is wrong; note how > > update_sg_lb_stats()/sg_imbalance() uses group->sgc->imbalance, but > > load_balance() sets: sd_parent->groups->sgc->imbalance, so explicitly > > one level up. > > One level up? The group->sgc->imbalance flag is checked during > update_sg_lb_stats(). This flag is *set during the load balancing at a > lower level sched domain*.IOW, when the 'group' formed the sched domain. sd_parent is one level up. > > > > So what we can do I suppose is clear 'group->sgc->imbalance' at > > out_balanced. > > You mean 'set'? If we clear it we will have no clue about imbalances at > lower level sched domains due to pinning. Specifically in LBF_ALL_PINNED > case. This might prevent us from balancing out these tasks to other > groups at a higher level domain. update_sd_pick_busiest() specifically > relies on this flag to choose the busiest group. No, clear, in load_balance. So set one level up, clear the current level.
pgpE4QQqkUPsH.pgp
Description: PGP signature

