On Wed, Jun 22, 2016 at 10:25:47PM -0400, r...@redhat.com wrote:
> From: Rik van Riel <r...@redhat.com>
> 
> Currently, if there was any irq or softirq time during 'ticks'
> jiffies, the entire period will be accounted as irq or softirq
> time.
> 
> This is inaccurate if only a subset of 'ticks' jiffies was
> actually spent handling irqs, and could conceivably mis-count
> all of the ticks during a period as irq time, when there was
> some irq and some softirq time.

Good catch!

Many comments following.

> 
> This can actually happen when irqtime_account_process_tick
> is called from account_idle_ticks, which can pass a larger
> number of ticks down all at once.
> 
> Fix this by changing irqtime_account_hi_update and
> irqtime_account_si_update to round elapsed irq and softirq
> time to jiffies, and return the number of jiffies spent in
> each mode, similar to how steal time is handled.
> 
> Additionally, have irqtime_account_process_tick take into
> account how much time was spent in each of steal, irq,
> and softirq time.
> 
> The latter could help improve the accuracy of timekeeping

Maybe you meant cputime? Timekeeping is rather about jiffies and GTOD.

> when returning from idle on a NO_HZ_IDLE CPU.
> 
> Properly accounting how much time was spent in hardirq and
> softirq time will also allow the NO_HZ_FULL code to re-use
> these same functions for hardirq and softirq accounting.
> 
> Signed-off-by: Rik van Riel <r...@redhat.com>
> ---
>  kernel/sched/cputime.c | 81 
> +++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 60 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 3d60e5d76fdb..15b813c014be 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -79,40 +79,54 @@ void irqtime_account_irq(struct task_struct *curr)
>  }
>  EXPORT_SYMBOL_GPL(irqtime_account_irq);
>  
> -static int irqtime_account_hi_update(void)
> +static unsigned long irqtime_account_hi_update(unsigned long max_jiffies)
>  {
>       u64 *cpustat = kcpustat_this_cpu->cpustat;
> +     unsigned long irq_jiffies = 0;
>       unsigned long flags;
> -     u64 latest_ns;
> -     int ret = 0;
> +     u64 irq;

Should be cputime_t ? And perhaps renamed to irq_cputime to distinguish
clearly against irq_jiffies?

>  
>       local_irq_save(flags);
> -     latest_ns = this_cpu_read(cpu_hardirq_time);
> -     if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_IRQ])
> -             ret = 1;
> +     irq = this_cpu_read(cpu_hardirq_time) - cpustat[CPUTIME_IRQ];

cpu_hardirq_time is made of nsecs whereas cpustat is of cputime_t (in fact
even cputime64_t). So you need to convert cpu_hardirq_time before doing the
substract.

> +     if (irq > cputime_one_jiffy) {
> +             irq_jiffies = min(max_jiffies, cputime_to_jiffies(irq));
> +             cpustat[CPUTIME_IRQ] += jiffies_to_cputime(irq_jiffies);
> +     }
>       local_irq_restore(flags);
> -     return ret;
> +     return irq_jiffies;

I think we shouldn't use jiffies at all here and only rely on cputime_t.
max_jiffies should be of cputime_t and this function should return the 
cputime_t.

The resulting code should be more simple and in fact more precise (more below 
on that).

>  }
>  
> -static int irqtime_account_si_update(void)
> +static unsigned long irqtime_account_si_update(unsigned long max_jiffies)
>  {
>       u64 *cpustat = kcpustat_this_cpu->cpustat;
> +     unsigned long si_jiffies = 0;
>       unsigned long flags;
> -     u64 latest_ns;
> -     int ret = 0;
> +     u64 softirq;
>  
>       local_irq_save(flags);
> -     latest_ns = this_cpu_read(cpu_softirq_time);
> -     if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_SOFTIRQ])
> -             ret = 1;
> +     softirq = this_cpu_read(cpu_softirq_time) - cpustat[CPUTIME_SOFTIRQ];
> +     if (softirq > cputime_one_jiffy) {
> +             si_jiffies = min(max_jiffies, cputime_to_jiffies(softirq));
> +             cpustat[CPUTIME_SOFTIRQ] += jiffies_to_cputime(si_jiffies);
> +     }
>       local_irq_restore(flags);
> -     return ret;
> +     return si_jiffies;

So same comments apply here.

[...]
>   * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
>   * tasks (sum on group iteration) belonging to @tsk's group.
>   */
> @@ -344,19 +378,24 @@ static void irqtime_account_process_tick(struct 
> task_struct *p, int user_tick,
>  {
>       cputime_t scaled = cputime_to_scaled(cputime_one_jiffy);
>       u64 cputime = (__force u64) cputime_one_jiffy;
> -     u64 *cpustat = kcpustat_this_cpu->cpustat;
> +     unsigned long other;
>  
> -     if (steal_account_process_tick(ULONG_MAX))
> +     /*
> +      * When returning from idle, many ticks can get accounted at
> +      * once, including some ticks of steal, irq, and softirq time.
> +      * Subtract those ticks from the amount of time accounted to
> +      * idle, or potentially user or system time. Due to rounding,
> +      * other time can exceed ticks occasionally.
> +      */
> +     other = account_other_ticks(ticks);
> +     if (other >= ticks)
>               return;
> +     ticks -= other;
>  
>       cputime *= ticks;
>       scaled *= ticks;

So instead of dealing with ticks here, I think you should rather use the above
cputime as both the limit and the remaining time to account after steal/irqs.

This should avoid some middle conversions and improve precision when cputime_t 
== nsecs granularity.

If we account 2 ticks to idle (lets say HZ=100) and irq time to account is 15 
ms. 2 ticks = 20 ms
so we have 5 ms left to account to idle. With the jiffies granularity in this 
patch, we would account
one tick to irqtime (1 tick = 10 ms, there will be 5 ms to account back later) 
and one tick to idle
time whereas if you deal with cputime_t, you are going to account the correct 
amount of idle time.

Beyond the scope of this patchset, we should probably kill cputime_t and 
account everything to nsecs.
I have a patchset that did that 2 years ago, I should probably re-iterate it.

Thanks.

Reply via email to