On Wed, Dec 18, 2013 at 1:49 PM, Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote: > On Wed, Dec 18, 2013 at 01:29:53PM -0800, Andy Lutomirski wrote: >> On 12/18/2013 09:43 AM, Frederic Weisbecker wrote: >> > On Wed, Dec 18, 2013 at 10:04:43AM +0800, Alex Shi wrote: >> >> On 12/18/2013 06:51 AM, Frederic Weisbecker wrote: >> >>> So this is what this series brings, more details following: >> >>> >> >>> * Some code, naming and whitespace cleanups >> >>> >> >>> * Allow all CPUs outside the nohz_full range to handle the timekeeping >> >>> duty, not just CPU 0. Balancing the timekeeping duty should improve >> >>> powersavings. >> >> >> >> If the system just has one nohz_full cpu running, it will need another >> >> cpu to do timerkeeper job. Then the system roughly needs 2 cpu living. >> >> From powersaving POV, that is not good compare to normal nohz idle. >> > >> > Sure, but everything has a tradeoff :) >> > >> > We could theoretically run with the timekeeper purely idle if the other >> > CPU in full dynticks mode runs in userspace for a long while and seldom >> > do syscalls and faults. Timekeeping could be updated on kernel/user >> > boundaries in this case without much impact on performances. >> > >> > But then there is one strict condition for that: it can't read the >> > timeofday >> > through the vdso but only through a syscall. >> >> Where's your ambition? :) >> >> If the vdso timing functions could see that it's been too long since a >> real timekeeping update, they could fall back to a syscall. Otherwise, >> they could using rdtsc or whatever is in use. > > One objection to that approach in the past has been that it injects > avoidable latency into the worker CPUs. I suppose that you could argue > that the cache misses due to a timekeeping-CPU update are not free, but > then again, the syscall is likely to also incur a few cache misses as > well. > > I bet that the timekeeping-CPU approach wins, but it would be cool to > see you prove me wrong.
There's already some (very vague) discussion about having a scheduled time at which the clock frequency and/or offset will change, and this wouldn't be a huge departure from that. The goal there is to avoid waiting for timekeeping if vclock_gettime runs concurrently with an update, but the same approach could apply here (albeit with one extra branch). Anyway, syscalls aren't *that* expensive. Alternatively, couldn't workloads like this just turn off NTP? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/