Hi, I don't see any further concern. What should we do with this? It could either go through the scheduler tree or the timer tree.
Thanks. Le Fri, May 08, 2026 at 03:16:32PM +0200, Frederic Weisbecker a écrit : > Hi, > > After the issue reported here: > > > https://lore.kernel.org/all/[email protected]/ > > It occurs that the idle cputime accounting is a big mess that > accumulates within two concurrent statistics, each having their own > shortcomings: > > * The accounting for online CPUs which is based on the delta between > tick_nohz_start_idle() and tick_nohz_stop_idle(). > > Pros: > - Works when the tick is off > > - Has nsecs granularity > > Cons: > - Account idle steal time but doesn't substract it from idle > cputime. > > - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but > the IRQ time is simply ignored when > CONFIG_IRQ_TIME_ACCOUNTING=n > > - The windows between 1) idle task scheduling and the first call > to tick_nohz_start_idle() and 2) idle task between the last > tick_nohz_stop_idle() and the rest of the idle time are > blindspots wrt. cputime accounting (though mostly insignificant > amount) > > - Relies on private fields outside of kernel stats, with specific > accessors. > > * The accounting for offline CPUs which is based on ticks and the > jiffies delta during which the tick was stopped. > > Pros: > - Handles steal time correctly > > - Handle CONFIG_IRQ_TIME_ACCOUNTING=y and > CONFIG_IRQ_TIME_ACCOUNTING=n correctly. > > - Handles the whole idle task > > - Accounts directly to kernel stats, without midlayer accumulator. > > Cons: > - Doesn't elapse when the tick is off, which doesn't make it > suitable for online CPUs. > > - Has TICK_NSEC granularity (jiffies) > > - Needs to track the dyntick-idle ticks that were accounted and > substract them from the total jiffies time spent while the tick > was stopped. This is an ugly workaround. > > Having two different accounting for a single context is not the only > problem: since those accountings are of different natures, it is > possible to observe the global idle time going backward after a CPU goes > offline, as reported by Xin Zhao. > > Clean up the situation with introducing a hybrid approach that stays > coherent, fixes the backward jumps and works for both online and offline > CPUs: > > * Tick based or native vtime accounting operate before the tick is > stopped and resumes once the tick is restarted. > > * When the idle loop starts, switch to dynticks-idle accounting as is > done currently, except that the statistics accumulate directly to the > relevant kernel stat fields. > > * Private dyntick cputime accounting fields are removed. > > * Works on both online and offline case. > > * Move most of the relevant code to the common sched/cputime subsystem > > * Handle CONFIG_IRQ_TIME_ACCOUNTING=n correctly such that the > dynticks-idle accounting still elapses while on IRQs. > > * Correctly substract idle steal cputime from idle time > > Changes since v3 (among which a lot of relevant reviews from Sashiko): > > - Add new tags > > - Rebase on latest -rc1 > > - Add "tick/sched: Fix TOCTOU in nohz idle time fetch" (Sashiko) > > - Fix buggy state refetch in kcpustat_cpu_fetch_vtime() (Sashiko) > > - Fix build issue on powerpc (Christophe Leroy) > > - Fix s390 lost steal time occuring on idle IRQs (call vtime_flush() on > vtime_account_hardirq() and vtime_account_softirq()) (Sashiko) > > - Fix build issue on s390 > > - Fix uninitialized idle_sleeptime_seq (Sashiko) > > - Fix irqtime being disabled or enabled in the middle of an idle IRQ > (Sashiko) > > - Fix tick restart and then restop in the same idle loop (Sashiko) > > - Fix "sched/cputime: Handle idle irqtime gracefully" changelog (Sashiko) > > - Fix idle steal time substracted from the wrong index between idle and > iowait kcpustat. (Sashiko) > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git > timers/core-v4 > > HEAD: e64ba052ce04e363ff76d3cb8bedc5f812188acb > Thanks, > Frederic > --- > > Frederic Weisbecker (15): > tick/sched: Fix TOCTOU in nohz idle time fetch > sched/idle: Handle offlining first in idle loop > sched/cputime: Remove superfluous and error prone kcpustat_field() > parameter > sched/cputime: Correctly support generic vtime idle time > powerpc/time: Prepare to stop elapsing in dynticks-idle > s390/time: Prepare to stop elapsing in dynticks-idle > tick/sched: Unify idle cputime accounting > tick/sched: Remove nohz disabled special case in cputime fetch > tick/sched: Move dyntick-idle cputime accounting to cputime code > tick/sched: Remove unused fields > tick/sched: Account tickless idle cputime only when tick is stopped > tick/sched: Consolidate idle time fetching APIs > sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case > sched/cputime: Handle idle irqtime gracefully > sched/cputime: Handle dyntick-idle steal time correctly > > arch/powerpc/kernel/time.c | 41 +++++ > arch/s390/include/asm/idle.h | 2 + > arch/s390/kernel/idle.c | 5 +- > arch/s390/kernel/vtime.c | 75 ++++++++- > drivers/cpufreq/cpufreq.c | 29 +--- > drivers/cpufreq/cpufreq_governor.c | 6 +- > drivers/macintosh/rack-meter.c | 2 +- > fs/proc/stat.c | 40 +---- > fs/proc/uptime.c | 8 +- > include/linux/kernel_stat.h | 76 +++++++-- > include/linux/tick.h | 4 - > include/linux/vtime.h | 22 ++- > kernel/rcu/tree.c | 9 +- > kernel/rcu/tree_stall.h | 7 +- > kernel/sched/core.c | 6 +- > kernel/sched/cputime.c | 308 > +++++++++++++++++++++++++++++++------ > kernel/sched/idle.c | 13 +- > kernel/time/tick-sched.c | 212 ++++++------------------- > kernel/time/tick-sched.h | 12 -- > kernel/time/timer_list.c | 6 +- > scripts/gdb/linux/timerlist.py | 4 - > 21 files changed, 529 insertions(+), 358 deletions(-) -- Frederic Weisbecker SUSE Labs
