Hi Patrick, On Mon, May 14, 2018 at 05:32:06PM +0100, Patrick Bellasi wrote: > On 12-May 23:25, Joel Fernandes wrote: > > On Sat, May 12, 2018 at 11:04:43PM -0700, Joel Fernandes wrote: > > > On Thu, May 10, 2018 at 04:05:53PM +0100, Patrick Bellasi wrote: > > > > Schedutil updates for FAIR tasks are triggered implicitly each time a > > > > cfs_rq's utilization is updated via cfs_rq_util_change(), currently > > > > called by update_cfs_rq_load_avg(), when the utilization of a cfs_rq has > > > > changed, and {attach,detach}_entity_load_avg(). > > > > > > > > This design is based on the idea that "we should callback schedutil > > > > frequently enough" to properly update the CPU frequency at every > > > > utilization change. However, such an integration strategy has also > > > > some downsides: > > > > > > I agree making the call explicit would make schedutil integration easier > > > so > > > that's really awesome. However I also fear that if some path in the fair > > > class in the future changes the utilization but forgets to update > > > schedutil > > > explicitly (because they forgot to call the explicit public API) then the > > > schedutil update wouldn't go through. In this case the previous design of > > > doing the schedutil update in the wrapper kind of was a nice to have > > I cannot see right now other possible future paths where we can > actually change the utilization signal without considering that, > eventually, we should call an existing API to update schedutil if it > makes sense. > > What I can see more likely instead, also because it already happened a > couple of time, is that because of code changes in fair.c we end up > calling (implicitly) schedutil with a wrong utilization value. > > To note this kind of broken dependency it has already been more > difficult than possibly noticing an update of the utilization without > a corresponding explicit call of the public API.
Ok, we are in agreement this is a good thing to do :) > > > > @@ -5397,9 +5366,27 @@ enqueue_task_fair(struct rq *rq, struct > > > > task_struct *p, int flags) > > > > update_cfs_group(se); > > > > } > > > > > > > > - if (!se) > > > > + /* The task is visible from the root cfs_rq */ > > > > + if (!se) { > > > > + unsigned int flags = 0; > > > > + > > > > add_nr_running(rq, 1); > > > > > > > > + if (p->in_iowait) > > > > + flags |= SCHED_CPUFREQ_IOWAIT; > > > > + > > > > + /* > > > > + * !last_update_time means we've passed through > > > > + * migrate_task_rq_fair() indicating we migrated. > > > > + * > > > > + * IOW we're enqueueing a task on a new CPU. > > > > + */ > > > > + if (!p->se.avg.last_update_time) > > > > + flags |= SCHED_CPUFREQ_MIGRATION; > > > > + > > > > + cpufreq_update_util(rq, flags); > > > > + } > > > > + > > > > hrtick_update(rq); > > > > } > > > > > > > > @@ -5456,10 +5443,12 @@ static void dequeue_task_fair(struct rq *rq, > > > > struct task_struct *p, int flags) > > > > update_cfs_group(se); > > > > } > > > > > > > > + /* The task is no more visible from the root cfs_rq */ > > > > if (!se) > > > > sub_nr_running(rq, 1); > > > > > > > > util_est_dequeue(&rq->cfs, p, task_sleep); > > > > + cpufreq_update_util(rq, 0); > > > > > > One question about this change. In enqueue, throttle and unthrottle - you > > > are > > > conditionally calling cpufreq_update_util incase the task was > > > visible/not-visible in the hierarchy. > > > > > > But in dequeue you're unconditionally calling it. Seems a bit > > > inconsistent. > > > Is this because of util_est or something? Could you add a comment here > > > explaining why this is so? > > > > The big question I have is incase se != NULL, then its still visible at the > > root RQ level. > > My understanding it that you get !se at dequeue time when we are > dequeuing a task from a throttled RQ. Isn't it? I don't think so? !se means the RQ is not throttled. > Thus, this means you are dequeuing a throttled task, I guess for > example because of a migration. > However, the point is that a task dequeue from a throttled RQ _is > already_ not visible from the root RQ, because of the sub_nr_running() > done by throttle_cfs_rq(). Yes that's what I was wondering, so my point was if its already not visible, then why call schedutil. I felt call schedutil only if its visible like you were doing for the other paths. > > > In that case should we still call the util_est_dequeue and the > > cpufreq_update_util? > > I had a better look at the different code paths and I've possibly come > up with some interesting observations. Lemme try to resume theme here. > > First of all, we need to distinguish from estimated utilization > updates and schedutil updates, since they respond to two very > different goals. I agree with your assessments below and about not calling cpufreq when CPU is about to idle. thanks! - Joel