Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-19 Thread Mike Galbraith
On Thu, 2012-07-19 at 10:02 -0400, Steven Rostedt wrote: 
> On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote:
> 
> > Every kernel I've fed you script to has died sooner or later, so I wish
> > him fair sailing.  Here there be sea monsters ;-)
> 
> I'm curious. Can my script bring down a non-rt kernel?

Yeah, it took a couple non-rt (and virgin) kernels down on my 64 core
box.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-19 Thread Steven Rostedt
On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote:

> Every kernel I've fed you script to has died sooner or later, so I wish
> him fair sailing.  Here there be sea monsters ;-)

I'm curious. Can my script bring down a non-rt kernel?

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-19 Thread Mike Galbraith
On Thu, 2012-07-19 at 09:05 -0400, Steven Rostedt wrote: 
> On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote:
> > On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:
> > 
> > > Please test the patches too.
> > 
> > Your hotplug stress test script made x3550 M3 box fall over.  It took a
> > bit, but down she went.  64 core test box fell over quickly, but that's
> > very far from virgin source.. seems to be the same though.
> 
> Thanks for the report. I know a few areas in the hotplug code that can
> still deadlock (but are hard to hit). But there's no easy fix for them.
> Basically, the only thing we can do is redesign cpu hotplug (I think
> someone is already trying to do that ;-).

Every kernel I've fed you script to has died sooner or later, so I wish
him fair sailing.  Here there be sea monsters ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-19 Thread Steven Rostedt
On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote:
> On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:
> 
> > Please test the patches too.
> 
> Your hotplug stress test script made x3550 M3 box fall over.  It took a
> bit, but down she went.  64 core test box fell over quickly, but that's
> very far from virgin source.. seems to be the same though.

Thanks for the report. I know a few areas in the hotplug code that can
still deadlock (but are hard to hit). But there's no easy fix for them.
Basically, the only thing we can do is redesign cpu hotplug (I think
someone is already trying to do that ;-).

But these patches do fix the main issues of cpu hotplug (albeit, making
the code even uglier).

The panic below isn't telling much. We really need to know what the
other CPUs were up to. This call trace is just telling us that one of
the CPUs is waiting for other CPUs to stop or to finish something up.

-- Steve


> 
> [  255.016043] CPU 1 MCA<0>Kernel panic - not syncing: Watchdog detected hard 
> LOCKUP on cpu 7
> Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49
> Call Trace:
>[] panic+0x9b/0x1b0
>  [] watchdog_overflow_callback+0xd7/0xe0
>  [] __perf_event_overflow+0x9d/0x240
>  [] ? perf_event_update_userpage+0x9b/0xe0
>  [] perf_event_overflow+0x14/0x20
>  [] intel_pmu_handle_irq+0x177/0x230
>  [] perf_event_nmi_handler+0x39/0xc0
>  [] notifier_call_chain+0x4d/0x70
>  [] __atomic_notifier_call_chain+0x43/0x60
>  [] atomic_notifier_call_chain+0x11/0x20
>  [] notify_die+0x2e/0x30
>  [] default_do_nmi+0x39/0x200
>  [] do_nmi+0x78/0x80
>  [] nmi+0x20/0x30
>  [] ? stop_machine_cpu_stop+0x6a/0xe0
>  <>  [] cpu_stopper_thread+0xf4/0x1d0
>  [] ? wait_for_stop_done+0xa0/0xa0
>  [] ? __schedule+0x2c7/0x630
>  [] ? cpu_stop_queue_work+0x70/0x70
>  [] ? cpu_stop_queue_work+0x70/0x70
>  [] kthread+0xa6/0xb0
>  [] ? do_exit+0x278/0x450
>  [] ? __switch_to+0xf2/0x370
>  [] ? finish_task_switch+0x55/0xd0
>  [] kernel_thread_helper+0x4/0x10
>  [] ? __init_kthread_worker+0x50/0x50
>  [] ? gs_change+0x13/0x13
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-19 Thread Steven Rostedt
On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote:
 On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:
 
  Please test the patches too.
 
 Your hotplug stress test script made x3550 M3 box fall over.  It took a
 bit, but down she went.  64 core test box fell over quickly, but that's
 very far from virgin source.. seems to be the same though.

Thanks for the report. I know a few areas in the hotplug code that can
still deadlock (but are hard to hit). But there's no easy fix for them.
Basically, the only thing we can do is redesign cpu hotplug (I think
someone is already trying to do that ;-).

But these patches do fix the main issues of cpu hotplug (albeit, making
the code even uglier).

The panic below isn't telling much. We really need to know what the
other CPUs were up to. This call trace is just telling us that one of
the CPUs is waiting for other CPUs to stop or to finish something up.

-- Steve


 
 [  255.016043] CPU 1 MCA0Kernel panic - not syncing: Watchdog detected hard 
 LOCKUP on cpu 7
 Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49
 Call Trace:
  NMI  [814a0f7b] panic+0x9b/0x1b0
  [810b0627] watchdog_overflow_callback+0xd7/0xe0
  [810c3dad] __perf_event_overflow+0x9d/0x240
  [810c066b] ? perf_event_update_userpage+0x9b/0xe0
  [810c41a4] perf_event_overflow+0x14/0x20
  [81015707] intel_pmu_handle_irq+0x177/0x230
  [814a5549] perf_event_nmi_handler+0x39/0xc0
  [814a727d] notifier_call_chain+0x4d/0x70
  [814a72e3] __atomic_notifier_call_chain+0x43/0x60
  [814a7311] atomic_notifier_call_chain+0x11/0x20
  [814a734e] notify_die+0x2e/0x30
  [814a4699] default_do_nmi+0x39/0x200
  [814a4a48] do_nmi+0x78/0x80
  [814a44d0] nmi+0x20/0x30
  [810a461a] ? stop_machine_cpu_stop+0x6a/0xe0
  EOE  [810a47f4] cpu_stopper_thread+0xf4/0x1d0
  [810a45b0] ? wait_for_stop_done+0xa0/0xa0
  [814a1397] ? __schedule+0x2c7/0x630
  [810a4700] ? cpu_stop_queue_work+0x70/0x70
  [810a4700] ? cpu_stop_queue_work+0x70/0x70
  [810702c6] kthread+0xa6/0xb0
  [81056328] ? do_exit+0x278/0x450
  [810016b2] ? __switch_to+0xf2/0x370
  [81040f15] ? finish_task_switch+0x55/0xd0
  [814aa6e4] kernel_thread_helper+0x4/0x10
  [81070220] ? __init_kthread_worker+0x50/0x50
  [814aa6e0] ? gs_change+0x13/0x13
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-19 Thread Mike Galbraith
On Thu, 2012-07-19 at 09:05 -0400, Steven Rostedt wrote: 
 On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote:
  On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:
  
   Please test the patches too.
  
  Your hotplug stress test script made x3550 M3 box fall over.  It took a
  bit, but down she went.  64 core test box fell over quickly, but that's
  very far from virgin source.. seems to be the same though.
 
 Thanks for the report. I know a few areas in the hotplug code that can
 still deadlock (but are hard to hit). But there's no easy fix for them.
 Basically, the only thing we can do is redesign cpu hotplug (I think
 someone is already trying to do that ;-).

Every kernel I've fed you script to has died sooner or later, so I wish
him fair sailing.  Here there be sea monsters ;-)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-19 Thread Steven Rostedt
On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote:

 Every kernel I've fed you script to has died sooner or later, so I wish
 him fair sailing.  Here there be sea monsters ;-)

I'm curious. Can my script bring down a non-rt kernel?

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-19 Thread Mike Galbraith
On Thu, 2012-07-19 at 10:02 -0400, Steven Rostedt wrote: 
 On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote:
 
  Every kernel I've fed you script to has died sooner or later, so I wish
  him fair sailing.  Here there be sea monsters ;-)
 
 I'm curious. Can my script bring down a non-rt kernel?

Yeah, it took a couple non-rt (and virgin) kernels down on my 64 core
box.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-18 Thread Mike Galbraith
On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:

> Please test the patches too.

Your hotplug stress test script made x3550 M3 box fall over.  It took a
bit, but down she went.  64 core test box fell over quickly, but that's
very far from virgin source.. seems to be the same though.

[  255.016043] CPU 1 MCA<0>Kernel panic - not syncing: Watchdog detected hard 
LOCKUP on cpu 7
Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49
Call Trace:
   [] panic+0x9b/0x1b0
 [] watchdog_overflow_callback+0xd7/0xe0
 [] __perf_event_overflow+0x9d/0x240
 [] ? perf_event_update_userpage+0x9b/0xe0
 [] perf_event_overflow+0x14/0x20
 [] intel_pmu_handle_irq+0x177/0x230
 [] perf_event_nmi_handler+0x39/0xc0
 [] notifier_call_chain+0x4d/0x70
 [] __atomic_notifier_call_chain+0x43/0x60
 [] atomic_notifier_call_chain+0x11/0x20
 [] notify_die+0x2e/0x30
 [] default_do_nmi+0x39/0x200
 [] do_nmi+0x78/0x80
 [] nmi+0x20/0x30
 [] ? stop_machine_cpu_stop+0x6a/0xe0
 <>  [] cpu_stopper_thread+0xf4/0x1d0
 [] ? wait_for_stop_done+0xa0/0xa0
 [] ? __schedule+0x2c7/0x630
 [] ? cpu_stop_queue_work+0x70/0x70
 [] ? cpu_stop_queue_work+0x70/0x70
 [] kthread+0xa6/0xb0
 [] ? do_exit+0x278/0x450
 [] ? __switch_to+0xf2/0x370
 [] ? finish_task_switch+0x55/0xd0
 [] kernel_thread_helper+0x4/0x10
 [] ? __init_kthread_worker+0x50/0x50
 [] ? gs_change+0x13/0x13


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-18 Thread Steven Rostedt

Dear RT Folks,

This is the RT stable review cycle of patch 3.0.36-rt58-rc1.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 7/20/2012.

Enjoy,

-- Steve


To build 3.0.36-rt58-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.0.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.0/patch-3.0.36.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/patch-3.0.36-rt58-rc1.patch.xz

You can also build from 3.0.36-rt57 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/incr/patch-3.0.36-rt57-rt58-rc1.patch.xz


Changes from 3.0.36-rt57:

---


Carsten Emde (4):
  Latency histogramms: Cope with backwards running local trace clock
  Latency histograms: Adjust timer, if already elapsed when programmed
  Disable RT_GROUP_SCHED in PREEMPT_RT_FULL
  Latency histograms: Detect another yet overlooked sharedprio condition

Mike Galbraith (1):
  fs, jbd: pull your plug when waiting for space

Steven Rostedt (5):
  cpu/rt: Rework cpu down for PREEMPT_RT
  cpu/rt: Fix cpu_hotplug variable initialization
  workqueue: Revert workqueue: Fix PF_THREAD_BOUND abuse
  workqueue: Revert workqueue: Fix cpuhotplug trainwreck
  Linux 3.0.36-rt58-rc1

Thomas Gleixner (1):
  slab: Prevent local lock deadlock

Yong Zhang (1):
  perf: Make swevent hrtimer run in irq instead of softirq


 fs/jbd/checkpoint.c |2 +
 include/linux/cpu.h |   14 +-
 include/linux/hrtimer.h |3 +
 include/linux/sched.h   |9 +-
 include/linux/workqueue.h   |5 +-
 init/Kconfig|1 +
 kernel/cpu.c|  240 ++
 kernel/events/core.c|1 +
 kernel/hrtimer.c|   16 +-
 kernel/sched.c  |   82 +-
 kernel/trace/latency_hist.c |   74 +++---
 kernel/workqueue.c  |  578 +++
 localversion-rt |2 +-
 mm/slab.c   |   26 +-
 14 files changed, 792 insertions(+), 261 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-18 Thread Steven Rostedt

Dear RT Folks,

This is the RT stable review cycle of patch 3.0.36-rt58-rc1.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 7/20/2012.

Enjoy,

-- Steve


To build 3.0.36-rt58-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.0.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.0/patch-3.0.36.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/patch-3.0.36-rt58-rc1.patch.xz

You can also build from 3.0.36-rt57 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/incr/patch-3.0.36-rt57-rt58-rc1.patch.xz


Changes from 3.0.36-rt57:

---


Carsten Emde (4):
  Latency histogramms: Cope with backwards running local trace clock
  Latency histograms: Adjust timer, if already elapsed when programmed
  Disable RT_GROUP_SCHED in PREEMPT_RT_FULL
  Latency histograms: Detect another yet overlooked sharedprio condition

Mike Galbraith (1):
  fs, jbd: pull your plug when waiting for space

Steven Rostedt (5):
  cpu/rt: Rework cpu down for PREEMPT_RT
  cpu/rt: Fix cpu_hotplug variable initialization
  workqueue: Revert workqueue: Fix PF_THREAD_BOUND abuse
  workqueue: Revert workqueue: Fix cpuhotplug trainwreck
  Linux 3.0.36-rt58-rc1

Thomas Gleixner (1):
  slab: Prevent local lock deadlock

Yong Zhang (1):
  perf: Make swevent hrtimer run in irq instead of softirq


 fs/jbd/checkpoint.c |2 +
 include/linux/cpu.h |   14 +-
 include/linux/hrtimer.h |3 +
 include/linux/sched.h   |9 +-
 include/linux/workqueue.h   |5 +-
 init/Kconfig|1 +
 kernel/cpu.c|  240 ++
 kernel/events/core.c|1 +
 kernel/hrtimer.c|   16 +-
 kernel/sched.c  |   82 +-
 kernel/trace/latency_hist.c |   74 +++---
 kernel/workqueue.c  |  578 +++
 localversion-rt |2 +-
 mm/slab.c   |   26 +-
 14 files changed, 792 insertions(+), 261 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

2012-07-18 Thread Mike Galbraith
On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:

 Please test the patches too.

Your hotplug stress test script made x3550 M3 box fall over.  It took a
bit, but down she went.  64 core test box fell over quickly, but that's
very far from virgin source.. seems to be the same though.

[  255.016043] CPU 1 MCA0Kernel panic - not syncing: Watchdog detected hard 
LOCKUP on cpu 7
Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49
Call Trace:
 NMI  [814a0f7b] panic+0x9b/0x1b0
 [810b0627] watchdog_overflow_callback+0xd7/0xe0
 [810c3dad] __perf_event_overflow+0x9d/0x240
 [810c066b] ? perf_event_update_userpage+0x9b/0xe0
 [810c41a4] perf_event_overflow+0x14/0x20
 [81015707] intel_pmu_handle_irq+0x177/0x230
 [814a5549] perf_event_nmi_handler+0x39/0xc0
 [814a727d] notifier_call_chain+0x4d/0x70
 [814a72e3] __atomic_notifier_call_chain+0x43/0x60
 [814a7311] atomic_notifier_call_chain+0x11/0x20
 [814a734e] notify_die+0x2e/0x30
 [814a4699] default_do_nmi+0x39/0x200
 [814a4a48] do_nmi+0x78/0x80
 [814a44d0] nmi+0x20/0x30
 [810a461a] ? stop_machine_cpu_stop+0x6a/0xe0
 EOE  [810a47f4] cpu_stopper_thread+0xf4/0x1d0
 [810a45b0] ? wait_for_stop_done+0xa0/0xa0
 [814a1397] ? __schedule+0x2c7/0x630
 [810a4700] ? cpu_stop_queue_work+0x70/0x70
 [810a4700] ? cpu_stop_queue_work+0x70/0x70
 [810702c6] kthread+0xa6/0xb0
 [81056328] ? do_exit+0x278/0x450
 [810016b2] ? __switch_to+0xf2/0x370
 [81040f15] ? finish_task_switch+0x55/0xd0
 [814aa6e4] kernel_thread_helper+0x4/0x10
 [81070220] ? __init_kthread_worker+0x50/0x50
 [814aa6e0] ? gs_change+0x13/0x13


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/