Hi, following up on a very old thread: http://thread.gmane.org/gmane.linux.kernel/1212777
On Thu, Feb 16, 2012 at 08:59:00AM -0600, Dimitri Sivanich wrote: > On Wed, Feb 15, 2012 at 09:36:47PM +0100, Thomas Gleixner wrote: > > On Wed, 15 Feb 2012, Dimitri Sivanich wrote: > > > On Wed, Feb 15, 2012 at 03:52:06PM +0100, Thomas Gleixner wrote: > > > > So the first CPU which registers a clock event device takes it. That's > > > > the boot CPU, no matter what. > > > > > > > Both kernel tracing and the original patch that I proposed for this > > > showed plainly (at the time) that the tick_do_timer_cpu was not always > > > cpu 0 > > > prior to modifying it for nohz=off. Maybe that is no longer the case? > > > > This logic has not been changed in years. > > I did some tracing of all points where tick_do_timer_cpu is set in the > 3.3.0-rc3+ kernel. > > > > > tick_do_timer_cpu is initialized to TICK_DO_TIMER_BOOT and the first > > cpu which registers either a global or a per cpu clock event device > > takes it over. This is at least on x86 always the boot cpu, i.e. cpu0. > > After that point nothing touches that variable when nohz is disabled > > (runtime or compile time). > > At that point it is set to cpu 0. However, when we go into highres mode > it does change. Below are the two places it was set during boot with > nohz=off on one of our x86 based machines. > > [ 0.000000] tick_setup_device: tick_do_timer_cpu 0 > [ 1.924098] tick_broadcast_setup_oneshot: tick_do_timer_cpu 17 > > So in this example it's now cpu 17, and it stays that way from that point on. > > A traceback at that point shows tick_init_highres is indeed initiating this: > > [ 1.924863] [<ffffffff81087e01>] tick_broadcast_setup_oneshot+0x71/0x160 > [ 1.924863] [<ffffffff81087f23>] > tick_broadcast_switch_to_oneshot+0x33/0x50 > [ 1.924863] [<ffffffff81088841>] tick_switch_to_oneshot+0x81/0xd0 > [ 1.924863] [<ffffffff810888a0>] tick_init_highres+0x10/0x20 > [ 1.924863] [<ffffffff81061e71>] hrtimer_run_pending+0x71/0xd0 > > > > > So I really want to see proper proof why that would not be the > > case. If it really happens then we fix the root cause instead of > > adding random sysfs interfaces. As Dimitri wrote above, the switch from cpu 0 is done by tick_broadcast_setup_oneshot. The first CPU switching to highres takes the broadcast responsibility and also sets tick_do_timer_cpu to itself. This behaviour has been introduced by 7300711e (clockevents: broadcast fixup possible waiters). I don't see a good reason assign tick_do_timer_cpu to the CPU doing the one-shot timer broadcasts. The timer interrupt will be generated on any other CPU as well, be it through the broadcast IPI or a per-CPU clockevent device. Any online CPU can do that job, so how about just dropping the assignment? The do_timer() code should not suffer from the jitter introduced by the interrupt being generated by the broadcast, should it? Signed-off-by: Jiri Bohac <jbo...@suse.cz> --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -572,9 +572,6 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc) bc->event_handler = tick_handle_oneshot_broadcast; - /* Take the do_timer update */ - tick_do_timer_cpu = cpu; - /* * We must be careful here. There might be other CPUs * waiting for periodic broadcast. We need to set the -- Jiri Bohac <jbo...@suse.cz> SUSE Labs, SUSE CZ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/