On Sat, Aug 17, 2013 at 06:49:18PM -0700, Paul E. McKenney wrote: > Hello! > > Whenever there is at least one non-idle CPU, it is necessary to > periodically update timekeeping information. Before NO_HZ_FULL, this > updating was carried out by the scheduling-clock tick, which ran on > every non-idle CPU. With the advent of NO_HZ_FULL, it is possible > to have non-idle CPUs that are not receiving scheduling-clock ticks. > This possibility is handled by assigning a timekeeping CPU that continues > taking scheduling-clock ticks. > > Unfortunately, timekeeping CPU continues taking scheduling-clock > interrupts even when all other CPUs are completely idle, which is > not so good for energy efficiency and battery lifetime. Clearly, it > would be good to turn off the timekeeping CPU's scheduling-clock tick > when all CPUs are completely idle. This is conceptually simple, but > we also need good performance and scalability on large systems, which > rules out implementations based on frequently updated global counts of > non-idle CPUs as well as implementations that frequently scan all CPUs. > Nevertheless, we need a single global indicator in order to keep the > overhead of checking acceptably low. > > The chosen approach is to enforce hysteresis on the non-idle to > full-system-idle transition, with the amount of hysteresis increasing > linearly with the number of CPUs, thus keeping contention acceptably low. > This approach piggybacks on RCU's existing force-quiescent-state scanning > of idle CPUs, which has the advantage of avoiding the scan entirely on > busy systems that have high levels of multiprogramming. This scan > takes per-CPU idleness information and feeds it into a state machine > that applies the level of hysteresis required to arrive at a single > full-system-idle indicator. > > The individual patches are as follows: > > 1. Eliminate unused APIs that were intended for adaptive ticks. > > 2. Add documentation covering the testing of nohz_full. > > 3. Add a CONFIG_NO_HZ_FULL_SYSIDLE Kconfig parameter to enable > this feature. Kernels built with CONFIG_NO_HZ_FULL_SYSIDLE=n > act exactly as they do today. > > 4. Add new fields to the rcu_dynticks structure that track CPU-idle > information. These fields consider CPUs running usermode to be > non-idle, in contrast with the existing fields in that structure. > > 5. Track per-CPU idle states. > > 6. Add full-system idle states and state variables. > > 7. Expand force_qs_rnp(), dyntick_save_progress_counter(), and > rcu_implicit_dynticks_qs() APIs to enable passing full-system > idle state information. > > 8. Add full-system-idle state machine. > > 9. Force RCU's grace-period kthreads onto the timekeeping CPU.
Comments on 4, 5, and 6; for 1-3 and 7-9, Reviewed-by: Josh Triplett <[email protected]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

