Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Thu, 2012-07-19 at 10:02 -0400, Steven Rostedt wrote: > On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote: > > > Every kernel I've fed you script to has died sooner or later, so I wish > > him fair sailing. Here there be sea monsters ;-) > > I'm curious. Can my script bring down a non-rt kernel? Yeah, it took a couple non-rt (and virgin) kernels down on my 64 core box. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote: > Every kernel I've fed you script to has died sooner or later, so I wish > him fair sailing. Here there be sea monsters ;-) I'm curious. Can my script bring down a non-rt kernel? -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Thu, 2012-07-19 at 09:05 -0400, Steven Rostedt wrote: > On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote: > > On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote: > > > > > Please test the patches too. > > > > Your hotplug stress test script made x3550 M3 box fall over. It took a > > bit, but down she went. 64 core test box fell over quickly, but that's > > very far from virgin source.. seems to be the same though. > > Thanks for the report. I know a few areas in the hotplug code that can > still deadlock (but are hard to hit). But there's no easy fix for them. > Basically, the only thing we can do is redesign cpu hotplug (I think > someone is already trying to do that ;-). Every kernel I've fed you script to has died sooner or later, so I wish him fair sailing. Here there be sea monsters ;-) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote: > On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote: > > > Please test the patches too. > > Your hotplug stress test script made x3550 M3 box fall over. It took a > bit, but down she went. 64 core test box fell over quickly, but that's > very far from virgin source.. seems to be the same though. Thanks for the report. I know a few areas in the hotplug code that can still deadlock (but are hard to hit). But there's no easy fix for them. Basically, the only thing we can do is redesign cpu hotplug (I think someone is already trying to do that ;-). But these patches do fix the main issues of cpu hotplug (albeit, making the code even uglier). The panic below isn't telling much. We really need to know what the other CPUs were up to. This call trace is just telling us that one of the CPUs is waiting for other CPUs to stop or to finish something up. -- Steve > > [ 255.016043] CPU 1 MCA<0>Kernel panic - not syncing: Watchdog detected hard > LOCKUP on cpu 7 > Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49 > Call Trace: >[] panic+0x9b/0x1b0 > [] watchdog_overflow_callback+0xd7/0xe0 > [] __perf_event_overflow+0x9d/0x240 > [] ? perf_event_update_userpage+0x9b/0xe0 > [] perf_event_overflow+0x14/0x20 > [] intel_pmu_handle_irq+0x177/0x230 > [] perf_event_nmi_handler+0x39/0xc0 > [] notifier_call_chain+0x4d/0x70 > [] __atomic_notifier_call_chain+0x43/0x60 > [] atomic_notifier_call_chain+0x11/0x20 > [] notify_die+0x2e/0x30 > [] default_do_nmi+0x39/0x200 > [] do_nmi+0x78/0x80 > [] nmi+0x20/0x30 > [] ? stop_machine_cpu_stop+0x6a/0xe0 > <> [] cpu_stopper_thread+0xf4/0x1d0 > [] ? wait_for_stop_done+0xa0/0xa0 > [] ? __schedule+0x2c7/0x630 > [] ? cpu_stop_queue_work+0x70/0x70 > [] ? cpu_stop_queue_work+0x70/0x70 > [] kthread+0xa6/0xb0 > [] ? do_exit+0x278/0x450 > [] ? __switch_to+0xf2/0x370 > [] ? finish_task_switch+0x55/0xd0 > [] kernel_thread_helper+0x4/0x10 > [] ? __init_kthread_worker+0x50/0x50 > [] ? gs_change+0x13/0x13 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote: On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote: Please test the patches too. Your hotplug stress test script made x3550 M3 box fall over. It took a bit, but down she went. 64 core test box fell over quickly, but that's very far from virgin source.. seems to be the same though. Thanks for the report. I know a few areas in the hotplug code that can still deadlock (but are hard to hit). But there's no easy fix for them. Basically, the only thing we can do is redesign cpu hotplug (I think someone is already trying to do that ;-). But these patches do fix the main issues of cpu hotplug (albeit, making the code even uglier). The panic below isn't telling much. We really need to know what the other CPUs were up to. This call trace is just telling us that one of the CPUs is waiting for other CPUs to stop or to finish something up. -- Steve [ 255.016043] CPU 1 MCA0Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 7 Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49 Call Trace: NMI [814a0f7b] panic+0x9b/0x1b0 [810b0627] watchdog_overflow_callback+0xd7/0xe0 [810c3dad] __perf_event_overflow+0x9d/0x240 [810c066b] ? perf_event_update_userpage+0x9b/0xe0 [810c41a4] perf_event_overflow+0x14/0x20 [81015707] intel_pmu_handle_irq+0x177/0x230 [814a5549] perf_event_nmi_handler+0x39/0xc0 [814a727d] notifier_call_chain+0x4d/0x70 [814a72e3] __atomic_notifier_call_chain+0x43/0x60 [814a7311] atomic_notifier_call_chain+0x11/0x20 [814a734e] notify_die+0x2e/0x30 [814a4699] default_do_nmi+0x39/0x200 [814a4a48] do_nmi+0x78/0x80 [814a44d0] nmi+0x20/0x30 [810a461a] ? stop_machine_cpu_stop+0x6a/0xe0 EOE [810a47f4] cpu_stopper_thread+0xf4/0x1d0 [810a45b0] ? wait_for_stop_done+0xa0/0xa0 [814a1397] ? __schedule+0x2c7/0x630 [810a4700] ? cpu_stop_queue_work+0x70/0x70 [810a4700] ? cpu_stop_queue_work+0x70/0x70 [810702c6] kthread+0xa6/0xb0 [81056328] ? do_exit+0x278/0x450 [810016b2] ? __switch_to+0xf2/0x370 [81040f15] ? finish_task_switch+0x55/0xd0 [814aa6e4] kernel_thread_helper+0x4/0x10 [81070220] ? __init_kthread_worker+0x50/0x50 [814aa6e0] ? gs_change+0x13/0x13 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Thu, 2012-07-19 at 09:05 -0400, Steven Rostedt wrote: On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote: On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote: Please test the patches too. Your hotplug stress test script made x3550 M3 box fall over. It took a bit, but down she went. 64 core test box fell over quickly, but that's very far from virgin source.. seems to be the same though. Thanks for the report. I know a few areas in the hotplug code that can still deadlock (but are hard to hit). But there's no easy fix for them. Basically, the only thing we can do is redesign cpu hotplug (I think someone is already trying to do that ;-). Every kernel I've fed you script to has died sooner or later, so I wish him fair sailing. Here there be sea monsters ;-) -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote: Every kernel I've fed you script to has died sooner or later, so I wish him fair sailing. Here there be sea monsters ;-) I'm curious. Can my script bring down a non-rt kernel? -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Thu, 2012-07-19 at 10:02 -0400, Steven Rostedt wrote: On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote: Every kernel I've fed you script to has died sooner or later, so I wish him fair sailing. Here there be sea monsters ;-) I'm curious. Can my script bring down a non-rt kernel? Yeah, it took a couple non-rt (and virgin) kernels down on my 64 core box. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote: > Please test the patches too. Your hotplug stress test script made x3550 M3 box fall over. It took a bit, but down she went. 64 core test box fell over quickly, but that's very far from virgin source.. seems to be the same though. [ 255.016043] CPU 1 MCA<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 7 Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49 Call Trace: [] panic+0x9b/0x1b0 [] watchdog_overflow_callback+0xd7/0xe0 [] __perf_event_overflow+0x9d/0x240 [] ? perf_event_update_userpage+0x9b/0xe0 [] perf_event_overflow+0x14/0x20 [] intel_pmu_handle_irq+0x177/0x230 [] perf_event_nmi_handler+0x39/0xc0 [] notifier_call_chain+0x4d/0x70 [] __atomic_notifier_call_chain+0x43/0x60 [] atomic_notifier_call_chain+0x11/0x20 [] notify_die+0x2e/0x30 [] default_do_nmi+0x39/0x200 [] do_nmi+0x78/0x80 [] nmi+0x20/0x30 [] ? stop_machine_cpu_stop+0x6a/0xe0 <> [] cpu_stopper_thread+0xf4/0x1d0 [] ? wait_for_stop_done+0xa0/0xa0 [] ? __schedule+0x2c7/0x630 [] ? cpu_stop_queue_work+0x70/0x70 [] ? cpu_stop_queue_work+0x70/0x70 [] kthread+0xa6/0xb0 [] ? do_exit+0x278/0x450 [] ? __switch_to+0xf2/0x370 [] ? finish_task_switch+0x55/0xd0 [] kernel_thread_helper+0x4/0x10 [] ? __init_kthread_worker+0x50/0x50 [] ? gs_change+0x13/0x13 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
Dear RT Folks, This is the RT stable review cycle of patch 3.0.36-rt58-rc1. Please scream at me if I messed something up. Please test the patches too. The -rc release will be uploaded to kernel.org and will be deleted when the final release is out. This is just a review release (or release candidate). The pre-releases will not be pushed to the git repository, only the final release is. If all goes well, this patch will be converted to the next main release on 7/20/2012. Enjoy, -- Steve To build 3.0.36-rt58-rc1 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.0.tar.xz http://www.kernel.org/pub/linux/kernel/v3.0/patch-3.0.36.xz http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/patch-3.0.36-rt58-rc1.patch.xz You can also build from 3.0.36-rt57 by applying the incremental patch: http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/incr/patch-3.0.36-rt57-rt58-rc1.patch.xz Changes from 3.0.36-rt57: --- Carsten Emde (4): Latency histogramms: Cope with backwards running local trace clock Latency histograms: Adjust timer, if already elapsed when programmed Disable RT_GROUP_SCHED in PREEMPT_RT_FULL Latency histograms: Detect another yet overlooked sharedprio condition Mike Galbraith (1): fs, jbd: pull your plug when waiting for space Steven Rostedt (5): cpu/rt: Rework cpu down for PREEMPT_RT cpu/rt: Fix cpu_hotplug variable initialization workqueue: Revert workqueue: Fix PF_THREAD_BOUND abuse workqueue: Revert workqueue: Fix cpuhotplug trainwreck Linux 3.0.36-rt58-rc1 Thomas Gleixner (1): slab: Prevent local lock deadlock Yong Zhang (1): perf: Make swevent hrtimer run in irq instead of softirq fs/jbd/checkpoint.c |2 + include/linux/cpu.h | 14 +- include/linux/hrtimer.h |3 + include/linux/sched.h |9 +- include/linux/workqueue.h |5 +- init/Kconfig|1 + kernel/cpu.c| 240 ++ kernel/events/core.c|1 + kernel/hrtimer.c| 16 +- kernel/sched.c | 82 +- kernel/trace/latency_hist.c | 74 +++--- kernel/workqueue.c | 578 +++ localversion-rt |2 +- mm/slab.c | 26 +- 14 files changed, 792 insertions(+), 261 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
Dear RT Folks, This is the RT stable review cycle of patch 3.0.36-rt58-rc1. Please scream at me if I messed something up. Please test the patches too. The -rc release will be uploaded to kernel.org and will be deleted when the final release is out. This is just a review release (or release candidate). The pre-releases will not be pushed to the git repository, only the final release is. If all goes well, this patch will be converted to the next main release on 7/20/2012. Enjoy, -- Steve To build 3.0.36-rt58-rc1 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.0.tar.xz http://www.kernel.org/pub/linux/kernel/v3.0/patch-3.0.36.xz http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/patch-3.0.36-rt58-rc1.patch.xz You can also build from 3.0.36-rt57 by applying the incremental patch: http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/incr/patch-3.0.36-rt57-rt58-rc1.patch.xz Changes from 3.0.36-rt57: --- Carsten Emde (4): Latency histogramms: Cope with backwards running local trace clock Latency histograms: Adjust timer, if already elapsed when programmed Disable RT_GROUP_SCHED in PREEMPT_RT_FULL Latency histograms: Detect another yet overlooked sharedprio condition Mike Galbraith (1): fs, jbd: pull your plug when waiting for space Steven Rostedt (5): cpu/rt: Rework cpu down for PREEMPT_RT cpu/rt: Fix cpu_hotplug variable initialization workqueue: Revert workqueue: Fix PF_THREAD_BOUND abuse workqueue: Revert workqueue: Fix cpuhotplug trainwreck Linux 3.0.36-rt58-rc1 Thomas Gleixner (1): slab: Prevent local lock deadlock Yong Zhang (1): perf: Make swevent hrtimer run in irq instead of softirq fs/jbd/checkpoint.c |2 + include/linux/cpu.h | 14 +- include/linux/hrtimer.h |3 + include/linux/sched.h |9 +- include/linux/workqueue.h |5 +- init/Kconfig|1 + kernel/cpu.c| 240 ++ kernel/events/core.c|1 + kernel/hrtimer.c| 16 +- kernel/sched.c | 82 +- kernel/trace/latency_hist.c | 74 +++--- kernel/workqueue.c | 578 +++ localversion-rt |2 +- mm/slab.c | 26 +- 14 files changed, 792 insertions(+), 261 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote: Please test the patches too. Your hotplug stress test script made x3550 M3 box fall over. It took a bit, but down she went. 64 core test box fell over quickly, but that's very far from virgin source.. seems to be the same though. [ 255.016043] CPU 1 MCA0Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 7 Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49 Call Trace: NMI [814a0f7b] panic+0x9b/0x1b0 [810b0627] watchdog_overflow_callback+0xd7/0xe0 [810c3dad] __perf_event_overflow+0x9d/0x240 [810c066b] ? perf_event_update_userpage+0x9b/0xe0 [810c41a4] perf_event_overflow+0x14/0x20 [81015707] intel_pmu_handle_irq+0x177/0x230 [814a5549] perf_event_nmi_handler+0x39/0xc0 [814a727d] notifier_call_chain+0x4d/0x70 [814a72e3] __atomic_notifier_call_chain+0x43/0x60 [814a7311] atomic_notifier_call_chain+0x11/0x20 [814a734e] notify_die+0x2e/0x30 [814a4699] default_do_nmi+0x39/0x200 [814a4a48] do_nmi+0x78/0x80 [814a44d0] nmi+0x20/0x30 [810a461a] ? stop_machine_cpu_stop+0x6a/0xe0 EOE [810a47f4] cpu_stopper_thread+0xf4/0x1d0 [810a45b0] ? wait_for_stop_done+0xa0/0xa0 [814a1397] ? __schedule+0x2c7/0x630 [810a4700] ? cpu_stop_queue_work+0x70/0x70 [810a4700] ? cpu_stop_queue_work+0x70/0x70 [810702c6] kthread+0xa6/0xb0 [81056328] ? do_exit+0x278/0x450 [810016b2] ? __switch_to+0xf2/0x370 [81040f15] ? finish_task_switch+0x55/0xd0 [814aa6e4] kernel_thread_helper+0x4/0x10 [81070220] ? __init_kthread_worker+0x50/0x50 [814aa6e0] ? gs_change+0x13/0x13 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/