Re: [PATCH v8 00/16] Add utilization clamping support

2019-05-09 Thread Patrick Bellasi
On 09-May 15:02, Peter Zijlstra wrote:
> On Tue, Apr 02, 2019 at 11:41:36AM +0100, Patrick Bellasi wrote:
> > Series Organization
> > ===
> > 
> > The series is organized into these main sections:
> > 
> >  - Patches [01-07]: Per task (primary) API
> >  - Patches [08-09]: Schedutil integration for FAIR and RT tasks
> >  - Patches [10-11]: Integration with EAS's energy_compute()
> 
> Aside from the comments already provided, I think this is starting to
> look really good.

Thanks Peter for the very useful review...
 
> Thanks!
> 
> >  - Patches [12-16]: Per task group (secondary) API
> 
> I still have to stare at these, but maybe a little later...

... I'll soon post a v9 to factor in all the last comments from this
round so that you have a better base for when you wanna start looking
at the cgroup bits.

-- 
#include 

Patrick Bellasi


Re: [PATCH v8 00/16] Add utilization clamping support

2019-05-09 Thread Peter Zijlstra
On Tue, Apr 02, 2019 at 11:41:36AM +0100, Patrick Bellasi wrote:
> Series Organization
> ===
> 
> The series is organized into these main sections:
> 
>  - Patches [01-07]: Per task (primary) API
>  - Patches [08-09]: Schedutil integration for FAIR and RT tasks
>  - Patches [10-11]: Integration with EAS's energy_compute()

Aside from the comments already provided, I think this is starting to
look really good.

Thanks!

>  - Patches [12-16]: Per task group (secondary) API

I still have to stare at these, but maybe a little later...


[PATCH v8 00/16] Add utilization clamping support

2019-04-02 Thread Patrick Bellasi
Hi all, this is a respin of:

   https://lore.kernel.org/lkml/20190208100554.32196-1-patrick.bell...@arm.com/

which includes the following main changes:

 - remove "bucket local boosting" code and move it into a dedicated patch
 - refactor uclamp_rq_update() to make it cleaner
 - s/uclamp_rq_update/uclamp_rq_max_value/ and move update into caller
 - update changelog to clarify the configuration fitting in one cache line
 - s/uclamp_bucket_value/uclamp_bucket_base_value/
 - update UCLAMP_BUCKET_DELTA to use DIV_ROUND_CLOSEST()
 - moved flag reset into uclamp_rq_inc()
 - add "requested" values uclamp_se instance beside the existing "effective"
   values instance
 - rename uclamp_effective_{get,assign}() into uclamp_eff_{get,set}()
 - make uclamp_eff_get() return the new "effective" values by copy
 - run uclamp_fork() code independently from the class being supported
 - add sysctl_sched_uclamp_handler()'s internal mutex to serialize concurrent
   usages
 - make schedutil_type visible on !CONFIG_CPU_FREQ_GOV_SCHEDUTIL
 - drop optional renamings
 - keep using unsigned long for utilization
 - update first cgroup patch's changelog to make it more clear

Thanks for all the valuable comments, almost there... :?

Cheers Patrick


Series Organization
===

The series is organized into these main sections:

 - Patches [01-07]: Per task (primary) API
 - Patches [08-09]: Schedutil integration for FAIR and RT tasks
 - Patches [10-11]: Integration with EAS's energy_compute()
 - Patches [12-16]: Per task group (secondary) API

It is based on today's tip/sched/core and the full tree is available here:

   git://linux-arm.org/linux-pb.git   lkml/utilclamp_v8
   
http://www.linux-arm.org/git?p=linux-pb.git;a=shortlog;h=refs/heads/lkml/utilclamp_v8


Newcomer's Short Abstract
=

The Linux scheduler tracks a "utilization" signal for each scheduling entity
(SE), e.g. tasks, to know how much CPU time they use. This signal allows the
scheduler to know how "big" a task is and, in principle, it can support
advanced task placement strategies by selecting the best CPU to run a task.
Some of these strategies are represented by the Energy Aware Scheduler [3].

When the schedutil cpufreq governor is in use, the utilization signal allows
the Linux scheduler to also drive frequency selection. The CPU utilization
signal, which represents the aggregated utilization of tasks scheduled on that
CPU, is used to select the frequency which best fits the workload generated by
the tasks.

The current translation of utilization values into a frequency selection is
simple: we go to max for RT tasks or to the minimum frequency which can
accommodate the utilization of DL+FAIR tasks.
However, utilisation values by themselves cannot convey the desired
power/performance behaviours of each task as intended by user-space.
As such they are not ideally suited for task placement decisions.

Task placement and frequency selection policies in the kernel can be improved
by taking into consideration hints coming from authorised user-space elements,
like for example the Android middleware or more generally any "System
Management Software" (SMS) framework.

Utilization clamping is a mechanism which allows to "clamp" (i.e. filter) the
utilization generated by RT and FAIR tasks within a range defined by user-space.
The clamped utilization value can then be used, for example, to enforce a
minimum and/or maximum frequency depending on which tasks are active on a CPU.

The main use-cases for utilization clamping are:

 - boosting: better interactive response for small tasks which
   are affecting the user experience.

   Consider for example the case of a small control thread for an external
   accelerator (e.g. GPU, DSP, other devices). Here, from the task utilization
   the scheduler does not have a complete view of what the task's requirements
   are and, if it's a small utilization task, it keeps selecting a more energy
   efficient CPU, with smaller capacity and lower frequency, thus negatively
   impacting the overall time required to complete task activations.

 - capping: increase energy efficiency for background tasks not affecting the
   user experience.

   Since running on a lower capacity CPU at a lower frequency is more energy
   efficient, when the completion time is not a main goal, then capping the
   utilization considered for certain (maybe big) tasks can have positive
   effects, both on energy consumption and thermal headroom.
   This feature allows also to make RT tasks more energy friendly on mobile
   systems where running them on high capacity CPUs and at the maximum
   frequency is not required.

>From these two use-cases, it's worth noticing that frequency selection
biasing, introduced by patches 9 and 10 of this series, is just one possible
usage of utilization clamping. Another compelling extension of utilization
clamping is in helping the scheduler in macking tasks placement decisions.

Utilization is (