Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-07 Thread Fengguang Wu
On Fri, Sep 07, 2012 at 09:23:16AM +0200, Peter Zijlstra wrote:
> On Fri, 2012-09-07 at 09:20 +0800, Fengguang Wu wrote:
> 
> > FYI, the bisect result is
> > 
> > commit 554cecaf733623b327eef9652b65965eb1081b81
> > Author: Diwakar Tundlam 
> > Date:   Wed Mar 7 14:44:26 2012 -0800
> > 
> > sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
> > 
> > The 'next_balance' field of 'nohz' idle balancer must be initialized
> > to jiffies. Since jiffies is initialized to negative 300 seconds the
> > 'nohz' idle balancer does not run for the first 300s (5mins) after
> > bootup. If no new processes are spawed or no idle cycles happen, the
> > load on the cpus will remain unbalanced for that duration.
> > 
> > Signed-off-by: Diwakar Tundlam 
> > Signed-off-by: Peter Zijlstra 
> > Link: 
> > http://lkml.kernel.org/r/1dd7bfedd3147247b1355befefe4665237994f3...@hqmail04.nvidia.com
> > Signed-off-by: Ingo Molnar 
> 
> Oh fun.. does the below 'fix' it?
> 
> The thing I'm thinking of a tick happening right after we set jiffies
> but before the zalloc (specifically the memset(0)) is complete. Since
> we've already registered the softirq we can end up in the load-balancer
> and see a completely weird idle mask.
> 
> Hmm?

The may be more causes, since I still get the warning:

[9.816279] reboot: machine restart
[9.835796] [ cut here ]
[9.836558] WARNING: at /c/wfg/linux/arch/x86/kernel/smp.c:123 
native_smp_send_reschedule+0x46/0x50()
[9.839792] Pid: 18, comm: kworker/0:1 Not tainted 
3.6.0-rc3-bisect-5-gb374aa1-dirty #49
[9.839792] Call Trace:
[9.839792]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
[9.839792]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
[9.839792]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
[9.839792]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
[9.839792]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50
[9.839792]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
[9.839792]  [<79056d14>] scheduler_tick+0xd4/0x100
[9.839792]  [<7903bde5>] update_process_times+0x55/0x70
[9.839792]  [<79071187>] tick_sched_timer+0x57/0xb0
[9.839792]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
[9.839792]  [<7904e0b7>] __run_hrtimer.isra.29+0x57/0x100
[9.839792]  [<79071130>] ? tick_nohz_handler+0xe0/0xe0
[9.839792]  [<7904ed45>] hrtimer_interrupt+0xe5/0x280
[9.839792]  [<7905a5a7>] ? sched_clock_cpu+0xc7/0x150
[9.839792]  [<7901f9a4>] smp_apic_timer_interrupt+0x54/0x90
[9.839792]  [<79882631>] apic_timer_interrupt+0x31/0x40
[9.839792]  [<7905007b>] ? call_srcu+0x2b/0x70
[9.839792]  [<793a00e0>] ? __bitmap_intersects+0x10/0x80
[9.839792]  [<7988194f>] ? _raw_spin_unlock_irq+0x1f/0x40
[9.839792]  [<7905307f>] finish_task_switch+0x7f/0xd0
[9.839792]  [<79053038>] ? finish_task_switch+0x38/0xd0
[9.839792]  [<7988044a>] __schedule+0x38a/0x770
[9.839792]  [<79045529>] ? worker_thread+0x1a9/0x380
[9.839792]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
[9.839792]  [<7988084e>] schedule+0x1e/0x50
[9.839792]  [<7904552e>] worker_thread+0x1ae/0x380
[9.839792]  [<79056ed9>] ? complete+0x49/0x60
[9.839792]  [<79045380>] ? manage_workers.isra.23+0x250/0x250
[9.839792]  [<79049ff8>] kthread+0x78/0x80
[9.839792]  [<7988>] ? __up.isra.0+0xd/0x2d
[9.839792]  [<79049f80>] ? insert_kthread_work+0x70/0x70
[9.839792]  [<798830c6>] kernel_thread_helper+0x6/0xd

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 09:20 +0800, Fengguang Wu wrote:

> FYI, the bisect result is
> 
> commit 554cecaf733623b327eef9652b65965eb1081b81
> Author: Diwakar Tundlam 
> Date:   Wed Mar 7 14:44:26 2012 -0800
> 
> sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
> 
> The 'next_balance' field of 'nohz' idle balancer must be initialized
> to jiffies. Since jiffies is initialized to negative 300 seconds the
> 'nohz' idle balancer does not run for the first 300s (5mins) after
> bootup. If no new processes are spawed or no idle cycles happen, the
> load on the cpus will remain unbalanced for that duration.
> 
> Signed-off-by: Diwakar Tundlam 
> Signed-off-by: Peter Zijlstra 
> Link: 
> http://lkml.kernel.org/r/1dd7bfedd3147247b1355befefe4665237994f3...@hqmail04.nvidia.com
> Signed-off-by: Ingo Molnar 

Oh fun.. does the below 'fix' it?

The thing I'm thinking of a tick happening right after we set jiffies
but before the zalloc (specifically the memset(0)) is complete. Since
we've already registered the softirq we can end up in the load-balancer
and see a completely weird idle mask.

Hmm?

---
 kernel/sched/fair.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1ca4fe4..ac57bb6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5346,13 +5346,12 @@ void print_cfs_stats(struct seq_file *m, int
cpu)
 __init void init_sched_fair_class(void)
 {
 #ifdef CONFIG_SMP
-   open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
-
 #ifdef CONFIG_NO_HZ
nohz.next_balance = jiffies;
zalloc_cpumask_var(_cpus_mask, GFP_NOWAIT);
cpu_notifier(sched_ilb_notifier, 0);
 #endif
-#endif /* SMP */
 
+   open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
+#endif /* SMP */
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-07 Thread Peter Zijlstra
On Fri, 2012-09-07 at 09:20 +0800, Fengguang Wu wrote:

 FYI, the bisect result is
 
 commit 554cecaf733623b327eef9652b65965eb1081b81
 Author: Diwakar Tundlam dtund...@nvidia.com
 Date:   Wed Mar 7 14:44:26 2012 -0800
 
 sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
 
 The 'next_balance' field of 'nohz' idle balancer must be initialized
 to jiffies. Since jiffies is initialized to negative 300 seconds the
 'nohz' idle balancer does not run for the first 300s (5mins) after
 bootup. If no new processes are spawed or no idle cycles happen, the
 load on the cpus will remain unbalanced for that duration.
 
 Signed-off-by: Diwakar Tundlam dtund...@nvidia.com
 Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
 Link: 
 http://lkml.kernel.org/r/1dd7bfedd3147247b1355befefe4665237994f3...@hqmail04.nvidia.com
 Signed-off-by: Ingo Molnar mi...@elte.hu

Oh fun.. does the below 'fix' it?

The thing I'm thinking of a tick happening right after we set jiffies
but before the zalloc (specifically the memset(0)) is complete. Since
we've already registered the softirq we can end up in the load-balancer
and see a completely weird idle mask.

Hmm?

---
 kernel/sched/fair.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1ca4fe4..ac57bb6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5346,13 +5346,12 @@ void print_cfs_stats(struct seq_file *m, int
cpu)
 __init void init_sched_fair_class(void)
 {
 #ifdef CONFIG_SMP
-   open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
-
 #ifdef CONFIG_NO_HZ
nohz.next_balance = jiffies;
zalloc_cpumask_var(nohz.idle_cpus_mask, GFP_NOWAIT);
cpu_notifier(sched_ilb_notifier, 0);
 #endif
-#endif /* SMP */
 
+   open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
+#endif /* SMP */
 }


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-07 Thread Fengguang Wu
On Fri, Sep 07, 2012 at 09:23:16AM +0200, Peter Zijlstra wrote:
 On Fri, 2012-09-07 at 09:20 +0800, Fengguang Wu wrote:
 
  FYI, the bisect result is
  
  commit 554cecaf733623b327eef9652b65965eb1081b81
  Author: Diwakar Tundlam dtund...@nvidia.com
  Date:   Wed Mar 7 14:44:26 2012 -0800
  
  sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
  
  The 'next_balance' field of 'nohz' idle balancer must be initialized
  to jiffies. Since jiffies is initialized to negative 300 seconds the
  'nohz' idle balancer does not run for the first 300s (5mins) after
  bootup. If no new processes are spawed or no idle cycles happen, the
  load on the cpus will remain unbalanced for that duration.
  
  Signed-off-by: Diwakar Tundlam dtund...@nvidia.com
  Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
  Link: 
  http://lkml.kernel.org/r/1dd7bfedd3147247b1355befefe4665237994f3...@hqmail04.nvidia.com
  Signed-off-by: Ingo Molnar mi...@elte.hu
 
 Oh fun.. does the below 'fix' it?
 
 The thing I'm thinking of a tick happening right after we set jiffies
 but before the zalloc (specifically the memset(0)) is complete. Since
 we've already registered the softirq we can end up in the load-balancer
 and see a completely weird idle mask.
 
 Hmm?

The may be more causes, since I still get the warning:

[9.816279] reboot: machine restart
[9.835796] [ cut here ]
[9.836558] WARNING: at /c/wfg/linux/arch/x86/kernel/smp.c:123 
native_smp_send_reschedule+0x46/0x50()
[9.839792] Pid: 18, comm: kworker/0:1 Not tainted 
3.6.0-rc3-bisect-5-gb374aa1-dirty #49
[9.839792] Call Trace:
[9.839792]  [7902f42a] warn_slowpath_common+0x5a/0x80
[9.839792]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
[9.839792]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
[9.839792]  [7902f4fd] warn_slowpath_null+0x1d/0x20
[9.839792]  [7901ee16] native_smp_send_reschedule+0x46/0x50
[9.839792]  [7905fdad] trigger_load_balance+0x1bd/0x250
[9.839792]  [79056d14] scheduler_tick+0xd4/0x100
[9.839792]  [7903bde5] update_process_times+0x55/0x70
[9.839792]  [79071187] tick_sched_timer+0x57/0xb0
[9.839792]  [793accee] ? do_raw_spin_unlock+0x4e/0x90
[9.839792]  [7904e0b7] __run_hrtimer.isra.29+0x57/0x100
[9.839792]  [79071130] ? tick_nohz_handler+0xe0/0xe0
[9.839792]  [7904ed45] hrtimer_interrupt+0xe5/0x280
[9.839792]  [7905a5a7] ? sched_clock_cpu+0xc7/0x150
[9.839792]  [7901f9a4] smp_apic_timer_interrupt+0x54/0x90
[9.839792]  [79882631] apic_timer_interrupt+0x31/0x40
[9.839792]  [7905007b] ? call_srcu+0x2b/0x70
[9.839792]  [793a00e0] ? __bitmap_intersects+0x10/0x80
[9.839792]  [7988194f] ? _raw_spin_unlock_irq+0x1f/0x40
[9.839792]  [7905307f] finish_task_switch+0x7f/0xd0
[9.839792]  [79053038] ? finish_task_switch+0x38/0xd0
[9.839792]  [7988044a] __schedule+0x38a/0x770
[9.839792]  [79045529] ? worker_thread+0x1a9/0x380
[9.839792]  [793accee] ? do_raw_spin_unlock+0x4e/0x90
[9.839792]  [7988084e] schedule+0x1e/0x50
[9.839792]  [7904552e] worker_thread+0x1ae/0x380
[9.839792]  [79056ed9] ? complete+0x49/0x60
[9.839792]  [79045380] ? manage_workers.isra.23+0x250/0x250
[9.839792]  [79049ff8] kthread+0x78/0x80
[9.839792]  [7988] ? __up.isra.0+0xd/0x2d
[9.839792]  [79049f80] ? insert_kthread_work+0x70/0x70
[9.839792]  [798830c6] kernel_thread_helper+0x6/0xd

Thanks,
Fengguang

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-06 Thread Michael Wang
On 09/07/2012 09:20 AM, Fengguang Wu wrote:
> On Wed, Sep 05, 2012 at 08:57:00PM +0800, Fengguang Wu wrote:
>> On Wed, Sep 05, 2012 at 12:54:40PM +0200, Peter Zijlstra wrote:
>>> On Wed, 2012-09-05 at 12:35 +0800, Michael Wang wrote:
> [   10.968565] reboot: machine restart
> [   10.983510] [ cut here ]
> [   10.984218] WARNING: at 
> /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
> native_smp_send_reschedule+0x46/0x50()
> [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 
> 3.6.0-rc3-5-gb374aa1 #10
> [   10.987185] Call Trace:
> [   10.987506]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [   10.987506]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> [   10.987506]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50

 So this cpu try to fire a nohz balance kick ipi to an offline cpu?

 May be we are choosing a wrong cpu to kick but that's not the point,
 what I can't understand is why this cpu could do this kick.

 We have nohz_kick_needed() to check whether current cpu should do kick ,
 and the first condition we need to match is that current cpu should be
 idle, but the trace show current pid is 88 not 0.

 We should add Peter to cc list, may be he will be interested on what
 happened.
>>>
> [   10.987506]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> [   10.987506]  [<79056d14>] scheduler_tick+0xd4/0x100
> [   10.987506]  [<7903bde5>] update_process_times+0x55/0x70 
>>>
>>> Hmm, added both venki and suresh as they touched it last ;-)
>>>
>>> I suppose you're running a hotplug loop along with your workload?
>>
>> I would definitely like to add some hotplug tests! However for this
>> trace, it's simply booting into an ubuntu-core initrd and run the
>> "reboot" command in some late init.d script.
>>
>> It seems that the bug was introduced somewhere in v3.3..v3.4. I'm now
>> running 100 kvms to speedup the bisect progress :)
> 
> FYI, the bisect result is
> 
> commit 554cecaf733623b327eef9652b65965eb1081b81
> Author: Diwakar Tundlam 
> Date:   Wed Mar 7 14:44:26 2012 -0800
> 
> sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
> 
> The 'next_balance' field of 'nohz' idle balancer must be initialized
> to jiffies. Since jiffies is initialized to negative 300 seconds the
> 'nohz' idle balancer does not run for the first 300s (5mins) after
> bootup. If no new processes are spawed or no idle cycles happen, the
> load on the cpus will remain unbalanced for that duration.
> 
> Signed-off-by: Diwakar Tundlam 
> Signed-off-by: Peter Zijlstra 
> Link: 
> http://lkml.kernel.org/r/1dd7bfedd3147247b1355befefe4665237994f3...@hqmail04.nvidia.com
> Signed-off-by: Ingo Molnar 

This patch enabled the nohz kick during the booting, without it, nohz
load balance won't happen until jiffies reach 0.

So the issue disappear because the nohz balance was disabled in testing
time, I think that's not what we want...

I still can't figure out why a cpu do nohz kick while it's not idle :(

Regards,
Michael Wang

> 
> Thanks,
> Fengguang
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-06 Thread Fengguang Wu
On Wed, Sep 05, 2012 at 08:57:00PM +0800, Fengguang Wu wrote:
> On Wed, Sep 05, 2012 at 12:54:40PM +0200, Peter Zijlstra wrote:
> > On Wed, 2012-09-05 at 12:35 +0800, Michael Wang wrote:
> > > > [   10.968565] reboot: machine restart
> > > > [   10.983510] [ cut here ]
> > > > [   10.984218] WARNING: at 
> > > > /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
> > > > native_smp_send_reschedule+0x46/0x50()
> > > > [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 
> > > > 3.6.0-rc3-5-gb374aa1 #10
> > > > [   10.987185] Call Trace:
> > > > [   10.987506]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> > > > [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> > > > [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> > > > [   10.987506]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> > > > [   10.987506]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50
> > > 
> > > So this cpu try to fire a nohz balance kick ipi to an offline cpu?
> > > 
> > > May be we are choosing a wrong cpu to kick but that's not the point,
> > > what I can't understand is why this cpu could do this kick.
> > > 
> > > We have nohz_kick_needed() to check whether current cpu should do kick ,
> > > and the first condition we need to match is that current cpu should be
> > > idle, but the trace show current pid is 88 not 0.
> > > 
> > > We should add Peter to cc list, may be he will be interested on what
> > > happened.
> > 
> > > > [   10.987506]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> > > > [   10.987506]  [<79056d14>] scheduler_tick+0xd4/0x100
> > > > [   10.987506]  [<7903bde5>] update_process_times+0x55/0x70 
> > 
> > Hmm, added both venki and suresh as they touched it last ;-)
> > 
> > I suppose you're running a hotplug loop along with your workload?
> 
> I would definitely like to add some hotplug tests! However for this
> trace, it's simply booting into an ubuntu-core initrd and run the
> "reboot" command in some late init.d script.
> 
> It seems that the bug was introduced somewhere in v3.3..v3.4. I'm now
> running 100 kvms to speedup the bisect progress :)

FYI, the bisect result is

commit 554cecaf733623b327eef9652b65965eb1081b81
Author: Diwakar Tundlam 
Date:   Wed Mar 7 14:44:26 2012 -0800

sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer

The 'next_balance' field of 'nohz' idle balancer must be initialized
to jiffies. Since jiffies is initialized to negative 300 seconds the
'nohz' idle balancer does not run for the first 300s (5mins) after
bootup. If no new processes are spawed or no idle cycles happen, the
load on the cpus will remain unbalanced for that duration.

Signed-off-by: Diwakar Tundlam 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1dd7bfedd3147247b1355befefe4665237994f3...@hqmail04.nvidia.com
Signed-off-by: Ingo Molnar 

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-06 Thread Fengguang Wu
On Wed, Sep 05, 2012 at 08:57:00PM +0800, Fengguang Wu wrote:
 On Wed, Sep 05, 2012 at 12:54:40PM +0200, Peter Zijlstra wrote:
  On Wed, 2012-09-05 at 12:35 +0800, Michael Wang wrote:
[   10.968565] reboot: machine restart
[   10.983510] [ cut here ]
[   10.984218] WARNING: at 
/c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
native_smp_send_reschedule+0x46/0x50()
[   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 
3.6.0-rc3-5-gb374aa1 #10
[   10.987185] Call Trace:
[   10.987506]  [7902f42a] warn_slowpath_common+0x5a/0x80
[   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
[   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
[   10.987506]  [7902f4fd] warn_slowpath_null+0x1d/0x20
[   10.987506]  [7901ee16] native_smp_send_reschedule+0x46/0x50
   
   So this cpu try to fire a nohz balance kick ipi to an offline cpu?
   
   May be we are choosing a wrong cpu to kick but that's not the point,
   what I can't understand is why this cpu could do this kick.
   
   We have nohz_kick_needed() to check whether current cpu should do kick ,
   and the first condition we need to match is that current cpu should be
   idle, but the trace show current pid is 88 not 0.
   
   We should add Peter to cc list, may be he will be interested on what
   happened.
  
[   10.987506]  [7905fdad] trigger_load_balance+0x1bd/0x250
[   10.987506]  [79056d14] scheduler_tick+0xd4/0x100
[   10.987506]  [7903bde5] update_process_times+0x55/0x70 
  
  Hmm, added both venki and suresh as they touched it last ;-)
  
  I suppose you're running a hotplug loop along with your workload?
 
 I would definitely like to add some hotplug tests! However for this
 trace, it's simply booting into an ubuntu-core initrd and run the
 reboot command in some late init.d script.
 
 It seems that the bug was introduced somewhere in v3.3..v3.4. I'm now
 running 100 kvms to speedup the bisect progress :)

FYI, the bisect result is

commit 554cecaf733623b327eef9652b65965eb1081b81
Author: Diwakar Tundlam dtund...@nvidia.com
Date:   Wed Mar 7 14:44:26 2012 -0800

sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer

The 'next_balance' field of 'nohz' idle balancer must be initialized
to jiffies. Since jiffies is initialized to negative 300 seconds the
'nohz' idle balancer does not run for the first 300s (5mins) after
bootup. If no new processes are spawed or no idle cycles happen, the
load on the cpus will remain unbalanced for that duration.

Signed-off-by: Diwakar Tundlam dtund...@nvidia.com
Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
Link: 
http://lkml.kernel.org/r/1dd7bfedd3147247b1355befefe4665237994f3...@hqmail04.nvidia.com
Signed-off-by: Ingo Molnar mi...@elte.hu

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-06 Thread Michael Wang
On 09/07/2012 09:20 AM, Fengguang Wu wrote:
 On Wed, Sep 05, 2012 at 08:57:00PM +0800, Fengguang Wu wrote:
 On Wed, Sep 05, 2012 at 12:54:40PM +0200, Peter Zijlstra wrote:
 On Wed, 2012-09-05 at 12:35 +0800, Michael Wang wrote:
 [   10.968565] reboot: machine restart
 [   10.983510] [ cut here ]
 [   10.984218] WARNING: at 
 /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
 native_smp_send_reschedule+0x46/0x50()
 [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 
 3.6.0-rc3-5-gb374aa1 #10
 [   10.987185] Call Trace:
 [   10.987506]  [7902f42a] warn_slowpath_common+0x5a/0x80
 [   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
 [   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
 [   10.987506]  [7902f4fd] warn_slowpath_null+0x1d/0x20
 [   10.987506]  [7901ee16] native_smp_send_reschedule+0x46/0x50

 So this cpu try to fire a nohz balance kick ipi to an offline cpu?

 May be we are choosing a wrong cpu to kick but that's not the point,
 what I can't understand is why this cpu could do this kick.

 We have nohz_kick_needed() to check whether current cpu should do kick ,
 and the first condition we need to match is that current cpu should be
 idle, but the trace show current pid is 88 not 0.

 We should add Peter to cc list, may be he will be interested on what
 happened.

 [   10.987506]  [7905fdad] trigger_load_balance+0x1bd/0x250
 [   10.987506]  [79056d14] scheduler_tick+0xd4/0x100
 [   10.987506]  [7903bde5] update_process_times+0x55/0x70 

 Hmm, added both venki and suresh as they touched it last ;-)

 I suppose you're running a hotplug loop along with your workload?

 I would definitely like to add some hotplug tests! However for this
 trace, it's simply booting into an ubuntu-core initrd and run the
 reboot command in some late init.d script.

 It seems that the bug was introduced somewhere in v3.3..v3.4. I'm now
 running 100 kvms to speedup the bisect progress :)
 
 FYI, the bisect result is
 
 commit 554cecaf733623b327eef9652b65965eb1081b81
 Author: Diwakar Tundlam dtund...@nvidia.com
 Date:   Wed Mar 7 14:44:26 2012 -0800
 
 sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
 
 The 'next_balance' field of 'nohz' idle balancer must be initialized
 to jiffies. Since jiffies is initialized to negative 300 seconds the
 'nohz' idle balancer does not run for the first 300s (5mins) after
 bootup. If no new processes are spawed or no idle cycles happen, the
 load on the cpus will remain unbalanced for that duration.
 
 Signed-off-by: Diwakar Tundlam dtund...@nvidia.com
 Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
 Link: 
 http://lkml.kernel.org/r/1dd7bfedd3147247b1355befefe4665237994f3...@hqmail04.nvidia.com
 Signed-off-by: Ingo Molnar mi...@elte.hu

This patch enabled the nohz kick during the booting, without it, nohz
load balance won't happen until jiffies reach 0.

So the issue disappear because the nohz balance was disabled in testing
time, I think that's not what we want...

I still can't figure out why a cpu do nohz kick while it's not idle :(

Regards,
Michael Wang

 
 Thanks,
 Fengguang
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-05 Thread Fengguang Wu
On Wed, Sep 05, 2012 at 12:54:40PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-09-05 at 12:35 +0800, Michael Wang wrote:
> > > [   10.968565] reboot: machine restart
> > > [   10.983510] [ cut here ]
> > > [   10.984218] WARNING: at 
> > > /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
> > > native_smp_send_reschedule+0x46/0x50()
> > > [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 
> > > 3.6.0-rc3-5-gb374aa1 #10
> > > [   10.987185] Call Trace:
> > > [   10.987506]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> > > [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> > > [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> > > [   10.987506]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> > > [   10.987506]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50
> > 
> > So this cpu try to fire a nohz balance kick ipi to an offline cpu?
> > 
> > May be we are choosing a wrong cpu to kick but that's not the point,
> > what I can't understand is why this cpu could do this kick.
> > 
> > We have nohz_kick_needed() to check whether current cpu should do kick ,
> > and the first condition we need to match is that current cpu should be
> > idle, but the trace show current pid is 88 not 0.
> > 
> > We should add Peter to cc list, may be he will be interested on what
> > happened.
> 
> > > [   10.987506]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> > > [   10.987506]  [<79056d14>] scheduler_tick+0xd4/0x100
> > > [   10.987506]  [<7903bde5>] update_process_times+0x55/0x70 
> 
> Hmm, added both venki and suresh as they touched it last ;-)
> 
> I suppose you're running a hotplug loop along with your workload?

I would definitely like to add some hotplug tests! However for this
trace, it's simply booting into an ubuntu-core initrd and run the
"reboot" command in some late init.d script.

It seems that the bug was introduced somewhere in v3.3..v3.4. I'm now
running 100 kvms to speedup the bisect progress :)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 12:35 +0800, Michael Wang wrote:
> > [   10.968565] reboot: machine restart
> > [   10.983510] [ cut here ]
> > [   10.984218] WARNING: at 
> > /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
> > native_smp_send_reschedule+0x46/0x50()
> > [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 
> > 3.6.0-rc3-5-gb374aa1 #10
> > [   10.987185] Call Trace:
> > [   10.987506]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> > [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> > [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> > [   10.987506]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> > [   10.987506]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50
> 
> So this cpu try to fire a nohz balance kick ipi to an offline cpu?
> 
> May be we are choosing a wrong cpu to kick but that's not the point,
> what I can't understand is why this cpu could do this kick.
> 
> We have nohz_kick_needed() to check whether current cpu should do kick ,
> and the first condition we need to match is that current cpu should be
> idle, but the trace show current pid is 88 not 0.
> 
> We should add Peter to cc list, may be he will be interested on what
> happened.

> > [   10.987506]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> > [   10.987506]  [<79056d14>] scheduler_tick+0xd4/0x100
> > [   10.987506]  [<7903bde5>] update_process_times+0x55/0x70 

Hmm, added both venki and suresh as they touched it last ;-)

I suppose you're running a hotplug loop along with your workload?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 12:35 +0800, Michael Wang wrote:
  [   10.968565] reboot: machine restart
  [   10.983510] [ cut here ]
  [   10.984218] WARNING: at 
  /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
  native_smp_send_reschedule+0x46/0x50()
  [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 
  3.6.0-rc3-5-gb374aa1 #10
  [   10.987185] Call Trace:
  [   10.987506]  [7902f42a] warn_slowpath_common+0x5a/0x80
  [   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
  [   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
  [   10.987506]  [7902f4fd] warn_slowpath_null+0x1d/0x20
  [   10.987506]  [7901ee16] native_smp_send_reschedule+0x46/0x50
 
 So this cpu try to fire a nohz balance kick ipi to an offline cpu?
 
 May be we are choosing a wrong cpu to kick but that's not the point,
 what I can't understand is why this cpu could do this kick.
 
 We have nohz_kick_needed() to check whether current cpu should do kick ,
 and the first condition we need to match is that current cpu should be
 idle, but the trace show current pid is 88 not 0.
 
 We should add Peter to cc list, may be he will be interested on what
 happened.

  [   10.987506]  [7905fdad] trigger_load_balance+0x1bd/0x250
  [   10.987506]  [79056d14] scheduler_tick+0xd4/0x100
  [   10.987506]  [7903bde5] update_process_times+0x55/0x70 

Hmm, added both venki and suresh as they touched it last ;-)

I suppose you're running a hotplug loop along with your workload?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-05 Thread Fengguang Wu
On Wed, Sep 05, 2012 at 12:54:40PM +0200, Peter Zijlstra wrote:
 On Wed, 2012-09-05 at 12:35 +0800, Michael Wang wrote:
   [   10.968565] reboot: machine restart
   [   10.983510] [ cut here ]
   [   10.984218] WARNING: at 
   /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
   native_smp_send_reschedule+0x46/0x50()
   [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 
   3.6.0-rc3-5-gb374aa1 #10
   [   10.987185] Call Trace:
   [   10.987506]  [7902f42a] warn_slowpath_common+0x5a/0x80
   [   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
   [   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
   [   10.987506]  [7902f4fd] warn_slowpath_null+0x1d/0x20
   [   10.987506]  [7901ee16] native_smp_send_reschedule+0x46/0x50
  
  So this cpu try to fire a nohz balance kick ipi to an offline cpu?
  
  May be we are choosing a wrong cpu to kick but that's not the point,
  what I can't understand is why this cpu could do this kick.
  
  We have nohz_kick_needed() to check whether current cpu should do kick ,
  and the first condition we need to match is that current cpu should be
  idle, but the trace show current pid is 88 not 0.
  
  We should add Peter to cc list, may be he will be interested on what
  happened.
 
   [   10.987506]  [7905fdad] trigger_load_balance+0x1bd/0x250
   [   10.987506]  [79056d14] scheduler_tick+0xd4/0x100
   [   10.987506]  [7903bde5] update_process_times+0x55/0x70 
 
 Hmm, added both venki and suresh as they touched it last ;-)
 
 I suppose you're running a hotplug loop along with your workload?

I would definitely like to add some hotplug tests! However for this
trace, it's simply booting into an ubuntu-core initrd and run the
reboot command in some late init.d script.

It seems that the bug was introduced somewhere in v3.3..v3.4. I'm now
running 100 kvms to speedup the bisect progress :)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-04 Thread Michael Wang
Hi, Feng Guang

On 09/05/2012 09:11 AM, Fengguang Wu wrote:
> Hi,
> 
> Here is an old problem that happens also in 3.4. It's very unreliable:
> it may only happen once per 3000 boots..
> 
> [   10.968565] reboot: machine restart
> [   10.983510] [ cut here ]
> [   10.984218] WARNING: at 
> /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
> native_smp_send_reschedule+0x46/0x50()
> [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 3.6.0-rc3-5-gb374aa1 
> #10
> [   10.987185] Call Trace:
> [   10.987506]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [   10.987506]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> [   10.987506]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50

So this cpu try to fire a nohz balance kick ipi to an offline cpu?

May be we are choosing a wrong cpu to kick but that's not the point,
what I can't understand is why this cpu could do this kick.

We have nohz_kick_needed() to check whether current cpu should do kick ,
and the first condition we need to match is that current cpu should be
idle, but the trace show current pid is 88 not 0.

We should add Peter to cc list, may be he will be interested on what
happened.

Regards,
Michael Wang

> [   10.987506]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> [   10.987506]  [<79056d14>] scheduler_tick+0xd4/0x100
> [   10.987506]  [<7903bde5>] update_process_times+0x55/0x70
> [   10.987506]  [<79071187>] tick_sched_timer+0x57/0xb0
> [   10.987506]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [   10.987506]  [<7904e0b7>] __run_hrtimer.isra.29+0x57/0x100
> [   10.987506]  [<79071130>] ? tick_nohz_handler+0xe0/0xe0
> [   10.987506]  [<7904ed45>] hrtimer_interrupt+0xe5/0x280
> [   10.987506]  [<7905a5a7>] ? sched_clock_cpu+0xc7/0x150
> [   10.987506]  [<7901f9a4>] smp_apic_timer_interrupt+0x54/0x90
> [   10.987506]  [<79882631>] apic_timer_interrupt+0x31/0x40
> [   10.987506]  [<7905007b>] ? call_srcu+0x2b/0x70
> [   10.987506]  [<793a00e0>] ? __bitmap_intersects+0x10/0x80
> [   10.987506]  [<7988194f>] ? _raw_spin_unlock_irq+0x1f/0x40
> [   10.987506]  [<7905307f>] finish_task_switch+0x7f/0xd0
> [   10.987506]  [<79053038>] ? finish_task_switch+0x38/0xd0
> [   10.987506]  [<7988044a>] __schedule+0x38a/0x770
> [   10.987506]  [<7905a5a7>] ? sched_clock_cpu+0xc7/0x150
> [   10.987506]  [<7987ea40>] ? schedule_timeout+0x100/0x1b0
> [   10.987506]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [   10.987506]  [<7988084e>] schedule+0x1e/0x50
> [   10.987506]  [<7987ea45>] schedule_timeout+0x105/0x1b0
> [   10.987506]  [<7903adb0>] ? __internal_add_timer+0xb0/0xb0
> [   10.987506]  [<796842f2>] pktgen_thread_worker+0x1342/0x1390
> [   10.987506]  [<7988044a>] ? __schedule+0x38a/0x770
> [   10.987506]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [   10.987506]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [   10.987506]  [<7904aa40>] ? abort_exclusive_wait+0x80/0x80
> [   10.987506]  [<7904aa40>] ? abort_exclusive_wait+0x80/0x80
> [   10.987506]  [<79682fb0>] ? pktgen_if_write+0x2210/0x2210
> [   10.987506]  [<79049ff8>] kthread+0x78/0x80
> [   10.987506]  [<7988>] ? __up.isra.0+0xd/0x2d
> [   10.987506]  [<79049f80>] ? insert_kthread_work+0x70/0x70
> [   10.987506]  [<798830c6>] kernel_thread_helper+0x6/0xd
> 
> Here are all the oops messages I collected in the past days:
> 
> [4.815145] Restarting system.
> [4.815644] reboot: machine restart
> [4.824591] [ cut here ]
> [4.825423] WARNING: at /c/wfg/linux/arch/x86/kernel/smp.c:123 
> native_smp_send_reschedule+0x46/0x50()
> [4.826881] Pid: 18, comm: kworker/0:1 Not tainted 
> 3.6.0-rc3-bisect-7-g6320675 #25
> [4.828116] Call Trace:
> [4.828533]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> [4.828585]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [4.828585]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [4.828585]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> [4.828585]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50
> [4.828585]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> [4.828585]  [<79056d14>] scheduler_tick+0xd4/0x100
> [4.828585]  [<7903bde5>] update_process_times+0x55/0x70
> [4.828585]  [<79071187>] tick_sched_timer+0x57/0xb0
> [4.828585]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [4.828585]  [<7904e0b7>] __run_hrtimer.isra.29+0x57/0x100
> [4.828585]  [<79071130>] ? tick_nohz_handler+0xe0/0xe0
> [4.828585]  [<7904ed45>] hrtimer_interrupt+0xe5/0x280
> [4.828585]  [<7905a5a7>] ? sched_clock_cpu+0xc7/0x150
> [4.828585]  [<7901f9a4>] smp_apic_timer_interrupt+0x54/0x90
> [4.828585]  [<79882401>] apic_timer_interrupt+0x31/0x40
> [4.828585]  [<7905007b>] ? call_srcu+0x2b/0x70
> [4.828585]  [<793a00e0>] ? __bitmap_intersects+0x10/0x80
> [4.828585]  [<7988171f>] ? 

Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

2012-09-04 Thread Michael Wang
Hi, Feng Guang

On 09/05/2012 09:11 AM, Fengguang Wu wrote:
 Hi,
 
 Here is an old problem that happens also in 3.4. It's very unreliable:
 it may only happen once per 3000 boots..
 
 [   10.968565] reboot: machine restart
 [   10.983510] [ cut here ]
 [   10.984218] WARNING: at 
 /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
 native_smp_send_reschedule+0x46/0x50()
 [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 3.6.0-rc3-5-gb374aa1 
 #10
 [   10.987185] Call Trace:
 [   10.987506]  [7902f42a] warn_slowpath_common+0x5a/0x80
 [   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
 [   10.987506]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
 [   10.987506]  [7902f4fd] warn_slowpath_null+0x1d/0x20
 [   10.987506]  [7901ee16] native_smp_send_reschedule+0x46/0x50

So this cpu try to fire a nohz balance kick ipi to an offline cpu?

May be we are choosing a wrong cpu to kick but that's not the point,
what I can't understand is why this cpu could do this kick.

We have nohz_kick_needed() to check whether current cpu should do kick ,
and the first condition we need to match is that current cpu should be
idle, but the trace show current pid is 88 not 0.

We should add Peter to cc list, may be he will be interested on what
happened.

Regards,
Michael Wang

 [   10.987506]  [7905fdad] trigger_load_balance+0x1bd/0x250
 [   10.987506]  [79056d14] scheduler_tick+0xd4/0x100
 [   10.987506]  [7903bde5] update_process_times+0x55/0x70
 [   10.987506]  [79071187] tick_sched_timer+0x57/0xb0
 [   10.987506]  [793accee] ? do_raw_spin_unlock+0x4e/0x90
 [   10.987506]  [7904e0b7] __run_hrtimer.isra.29+0x57/0x100
 [   10.987506]  [79071130] ? tick_nohz_handler+0xe0/0xe0
 [   10.987506]  [7904ed45] hrtimer_interrupt+0xe5/0x280
 [   10.987506]  [7905a5a7] ? sched_clock_cpu+0xc7/0x150
 [   10.987506]  [7901f9a4] smp_apic_timer_interrupt+0x54/0x90
 [   10.987506]  [79882631] apic_timer_interrupt+0x31/0x40
 [   10.987506]  [7905007b] ? call_srcu+0x2b/0x70
 [   10.987506]  [793a00e0] ? __bitmap_intersects+0x10/0x80
 [   10.987506]  [7988194f] ? _raw_spin_unlock_irq+0x1f/0x40
 [   10.987506]  [7905307f] finish_task_switch+0x7f/0xd0
 [   10.987506]  [79053038] ? finish_task_switch+0x38/0xd0
 [   10.987506]  [7988044a] __schedule+0x38a/0x770
 [   10.987506]  [7905a5a7] ? sched_clock_cpu+0xc7/0x150
 [   10.987506]  [7987ea40] ? schedule_timeout+0x100/0x1b0
 [   10.987506]  [793accee] ? do_raw_spin_unlock+0x4e/0x90
 [   10.987506]  [7988084e] schedule+0x1e/0x50
 [   10.987506]  [7987ea45] schedule_timeout+0x105/0x1b0
 [   10.987506]  [7903adb0] ? __internal_add_timer+0xb0/0xb0
 [   10.987506]  [796842f2] pktgen_thread_worker+0x1342/0x1390
 [   10.987506]  [7988044a] ? __schedule+0x38a/0x770
 [   10.987506]  [793accee] ? do_raw_spin_unlock+0x4e/0x90
 [   10.987506]  [793accee] ? do_raw_spin_unlock+0x4e/0x90
 [   10.987506]  [7904aa40] ? abort_exclusive_wait+0x80/0x80
 [   10.987506]  [7904aa40] ? abort_exclusive_wait+0x80/0x80
 [   10.987506]  [79682fb0] ? pktgen_if_write+0x2210/0x2210
 [   10.987506]  [79049ff8] kthread+0x78/0x80
 [   10.987506]  [7988] ? __up.isra.0+0xd/0x2d
 [   10.987506]  [79049f80] ? insert_kthread_work+0x70/0x70
 [   10.987506]  [798830c6] kernel_thread_helper+0x6/0xd
 
 Here are all the oops messages I collected in the past days:
 
 [4.815145] Restarting system.
 [4.815644] reboot: machine restart
 [4.824591] [ cut here ]
 [4.825423] WARNING: at /c/wfg/linux/arch/x86/kernel/smp.c:123 
 native_smp_send_reschedule+0x46/0x50()
 [4.826881] Pid: 18, comm: kworker/0:1 Not tainted 
 3.6.0-rc3-bisect-7-g6320675 #25
 [4.828116] Call Trace:
 [4.828533]  [7902f42a] warn_slowpath_common+0x5a/0x80
 [4.828585]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
 [4.828585]  [7901ee16] ? native_smp_send_reschedule+0x46/0x50
 [4.828585]  [7902f4fd] warn_slowpath_null+0x1d/0x20
 [4.828585]  [7901ee16] native_smp_send_reschedule+0x46/0x50
 [4.828585]  [7905fdad] trigger_load_balance+0x1bd/0x250
 [4.828585]  [79056d14] scheduler_tick+0xd4/0x100
 [4.828585]  [7903bde5] update_process_times+0x55/0x70
 [4.828585]  [79071187] tick_sched_timer+0x57/0xb0
 [4.828585]  [793accee] ? do_raw_spin_unlock+0x4e/0x90
 [4.828585]  [7904e0b7] __run_hrtimer.isra.29+0x57/0x100
 [4.828585]  [79071130] ? tick_nohz_handler+0xe0/0xe0
 [4.828585]  [7904ed45] hrtimer_interrupt+0xe5/0x280
 [4.828585]  [7905a5a7] ? sched_clock_cpu+0xc7/0x150
 [4.828585]  [7901f9a4] smp_apic_timer_interrupt+0x54/0x90
 [4.828585]  [79882401] apic_timer_interrupt+0x31/0x40
 [4.828585]  [7905007b] ? call_srcu+0x2b/0x70
 [4.828585]  [793a00e0] ? __bitmap_intersects+0x10/0x80
 [4.828585]  [7988171f] ? _raw_spin_unlock_irq+0x1f/0x40
 [4.828585]  [7905307f] finish_task_switch+0x7f/0xd0
 [4.828585]  [79053038] ? finish_task_switch+0x38/0xd0
 [4.828585]  [7988021a] __schedule+0x38a/0x770
 [