On Thu, 2017-06-29 at 19:15 +0200, Frederic Weisbecker wrote: > From: Wanpeng Li <kernel...@gmail.com> > > Currently the cputime source used by vtime is jiffies. When we cross > a context boundary and jiffies have changed since the last snapshot, > the > pending cputime is accounted to the switching out context. > > This system works ok if the ticks are not aligned across CPUs. If > they > instead are aligned (ie: all fire at the same time) and the CPUs run > in > userspace, the jiffies change is only observed on tick exit and > therefore > the user cputime is accounted as system cputime. This is because the > CPU that maintains timekeeping fires its tick at the same time as the > others. It updates jiffies in the middle of the tick and the other > CPUs > see that update on IRQ exit: > > CPU 0 (timekeeper) CPU 1 > ------------------- ------------- > jiffies = N > ... run in userspace for a jiffy > tick entry tick entry (sees jiffies = N) > set jiffies = N + 1 > tick exit tick exit (sees jiffies = N + 1) > account 1 jiffy as > stime > > Fix this with using a nanosec clock source instead of jiffies. The > cputime is then accumulated and flushed everytime the pending delta > reaches a jiffy in order to mitigate the accounting overhead.
Glad to hear this could be done without dramatically increasing the accounting overhead! > [fweisbec: changelog, rebase on struct vtime, field renames, add > delta > on cputime readers, keep idle vtime as-is (low overhead accounting), > harmonize clock sources] > > Reported-by: Luiz Capitulino <lcapitul...@redhat.com> > Suggested-by: Thomas Gleixner <t...@linutronix.de> > Not-Yet-Signed-off-by: Wanpeng Li <kernel...@gmail.com> > Cc: Rik van Riel <r...@redhat.com> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Wanpeng Li <kernel...@gmail.com> > Cc: Ingo Molnar <mi...@kernel.org> > Cc: Luiz Capitulino <lcapitul...@redhat.com> > Signed-off-by: Frederic Weisbecker <fweis...@gmail.com> Acked-by: Rik van Riel <r...@redhat.com>