From: zhouchengming <zhouchengm...@bytedance.com> When I run a simple "perf bench sched pipe" test in a cgroup on my machine, the output of "watch -d -n 1 cpu.pressure" of this cgroup will report some avg10 10%-20%. It's strange because there is not any other process in the cgroup. Then I found that cpu contention/wait percentage came from outside of the cgroup and the cpu idle latency, not wait for threads in the cgroup. So I think adding PSI_CPU_FULL state will be useful for container workloads to distinguish between outside cgroup resource contention and inside cgroup resource contention. What's more, the PSI_CPU_FULL state includes the latency of the cpu idle itself, so we can see the system sum latency introduced by cpu idle state.
Signed-off-by: zhouchengming <zhouchengm...@bytedance.com> --- include/linux/psi_types.h | 3 ++- kernel/sched/psi.c | 9 +++++++-- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index b95f3211566a..0a23300d49af 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -50,9 +50,10 @@ enum psi_states { PSI_MEM_SOME, PSI_MEM_FULL, PSI_CPU_SOME, + PSI_CPU_FULL, /* Only per-CPU, to weigh the CPU in the global average: */ PSI_NONIDLE, - NR_PSI_STATES = 6, + NR_PSI_STATES = 7, }; enum psi_aggregators { diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 967732c0766c..234047e368a5 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -224,7 +224,9 @@ static bool test_state(unsigned int *tasks, enum psi_states state) case PSI_MEM_FULL: return tasks[NR_MEMSTALL] && !tasks[NR_RUNNING]; case PSI_CPU_SOME: - return tasks[NR_RUNNING] > tasks[NR_ONCPU]; + return tasks[NR_RUNNING] > tasks[NR_ONCPU] && tasks[NR_ONCPU]; + case PSI_CPU_FULL: + return tasks[NR_RUNNING] && !tasks[NR_ONCPU]; case PSI_NONIDLE: return tasks[NR_IOWAIT] || tasks[NR_MEMSTALL] || tasks[NR_RUNNING]; @@ -681,6 +683,9 @@ static void record_times(struct psi_group_cpu *groupc, int cpu, if (groupc->state_mask & (1 << PSI_CPU_SOME)) groupc->times[PSI_CPU_SOME] += delta; + if (groupc->state_mask & (1 << PSI_CPU_FULL)) + groupc->times[PSI_CPU_FULL] += delta; + if (groupc->state_mask & (1 << PSI_NONIDLE)) groupc->times[PSI_NONIDLE] += delta; } @@ -1018,7 +1023,7 @@ int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res) group->avg_next_update = update_averages(group, now); mutex_unlock(&group->avgs_lock); - for (full = 0; full < 2 - (res == PSI_CPU); full++) { + for (full = 0; full < 2; full++) { unsigned long avg[3]; u64 total; int w; -- 2.11.0