On 23 March 2017 at 00:56, Joel Fernandes <[email protected]> wrote: > On Mon, Mar 20, 2017 at 5:34 AM, Patrick Bellasi > <[email protected]> wrote: >> On 20-Mar 09:26, Vincent Guittot wrote: >>> On 20 March 2017 at 04:57, Viresh Kumar <[email protected]> wrote: >>> > On 19-03-17, 14:34, Rafael J. Wysocki wrote: >>> >> From: Rafael J. Wysocki <[email protected]> >>> >> >>> >> The PELT metric used by the schedutil governor underestimates the >>> >> CPU utilization in some cases. The reason for that may be time spent >>> >> in interrupt handlers and similar which is not accounted for by PELT. >>> >>> Are you sure of the root cause described above (time stolen by irq >>> handler) or is it just a hypotheses ? That would be good to be sure of >>> the root cause >>> Furthermore, IIRC the time spent in irq context is also accounted as >>> run time for the running cfs task but not RT and deadline task running >>> time >> >> As long as the IRQ processing does not generate a context switch, >> which is happening (eventually) if the top half schedule some deferred >> work to be executed by a bottom half. >> >> Thus, me too I would say that all the top half time is accounted in >> PELT, since the current task is still RUNNABLE/RUNNING. > > Sorry if I'm missing something but doesn't this depend on whether you > have CONFIG_IRQ_TIME_ACCOUNTING enabled? > > __update_load_avg uses rq->clock_task for deltas which I think > shouldn't account IRQ time with that config option. So it should be > quite possible for IRQ time spent to reduce the PELT signal right? > >> >>> So I'm not really aligned with the description of your problem: PELT >>> metric underestimates the load of the CPU. The PELT is just about >>> tracking CFS task utilization but not whole CPU utilization and >>> according to your description of the problem (time stolen by irq), >>> your problem doesn't come from an underestimation of CFS task but from >>> time spent in something else but not accounted in the value used by >>> schedutil >> >> Quite likely. Indeed, it can really be that the CFS task is preempted >> because of some RT activity generated by the IRQ handler. >> >> More in general, I've also noticed many suboptimal freq switches when >> RT tasks interleave with CFS ones, because of: >> - relatively long down _and up_ throttling times >> - the way schedutil's flags are tracked and updated >> - the callsites from where we call schedutil updates >> >> For example it can really happen that we are running at the highest >> OPP because of some RT activity. Then we switch back to a relatively >> low utilization CFS workload and then: >> 1. a tick happens which produces a frequency drop > > Any idea why this frequency drop would happen? Say a running CFS task > gets preempted by RT task, the PELT signal shouldn't drop for the > duration the CFS task is preempted because the task is runnable, so
utilization only tracks the running state but not runnable state. Runnable state is tracked in load_avg > once the CFS task gets CPU back, schedutil should still maintain the > capacity right? > > Regards, > Joel

