Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote: > Ok, I see your point now but still wonder if this is too specialised > for what we are trying to do. Could it have been done with a read-write > semaphore with the global stop_cpus taking it for write and stop_two_cpus > taking it for read? rwsem for read is still global state.. That said it should be fairly easy to use lglock for this. Or write it with spinlocks like: DEFINE_PER_CPU(spinlock_t, local_lock); DEFINE_PER_CPU(int, have_global); spinlock_t global_lock; void local_lock(void) { preempt_disable(); spin_lock(this_cpu_ptr(_lock)); if (spin_is_locked(_lock)) { spin_unlock(this_cpu_ptr(_lock)); spin_lock(_lock); this_cpu_write(have_global, true); spin_lock(this_cpu_ptr(_lock)); } } void local_unlock(void) { spin_unlock(this_cpu_ptr(_lock)); if (this_cpu_read(have_global)) { this_cpu_write(have_global, false); spin_unlock(_lock); } } void global_lock(void) { int cpu; spin_lock(_lock); for_each_possible_cpu(cpu) spin_unlock_wait(_cpu(local_lock, cpu)); } void global_unlock(void) { spin_unlock(_lock); } Or possibly make the global_lock a mutex, plenty variants possible. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On 11/01/2013 10:24 AM, Peter Zijlstra wrote: > On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote: >> Ok, I see your point now but still wonder if this is too specialised >> for what we are trying to do. Could it have been done with a read-write >> semaphore with the global stop_cpus taking it for write and stop_two_cpus >> taking it for read? > > rwsem for read is still global state.. That said it should be fairly > easy to use lglock for this. I'll rewrite the patch using lglocks, that should make things more readable. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On Fri, Nov 01, 2013 at 07:36:36AM -0400, Rik van Riel wrote: > On 11/01/2013 07:08 AM, Mel Gorman wrote: > > On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: > >> There is a race between stop_two_cpus, and the global stop_cpus. > >> > > > > What was the trigger for this? I want to see what was missing from my own > > testing. I'm going to go out on a limb and guess that CPU hotplug was also > > running in the background to specifically stress this sort of rare > > condition. > > Something like running a standard test with the monitors/watch-cpuoffline.sh > > from mmtests running in parallel. > > AFAIK the trigger was a test that continuously loads and > unloads kernel modules, while doing other stuff. > ok, thanks. > >> + wait_for_global: > >> + /* If a global stop_cpus is queuing up stoppers, wait. */ > >> + while (unlikely(stop_cpus_queueing)) > >> + cpu_relax(); > >> + > > > > This partially serialises callers to migrate_swap() while it is checked > > if the pair of CPUs are being affected at the moment. It's two-stage > > Not really. This only serializes migrate_swap if there is a global > stop_cpus underway. > Ok, I see your point now but still wonder if this is too specialised for what we are trying to do. Could it have been done with a read-write semaphore with the global stop_cpus taking it for write and stop_two_cpus taking it for read? -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On 11/01/2013 07:36 AM, Rik van Riel wrote: > On 11/01/2013 07:08 AM, Mel Gorman wrote: >> On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: >>> There is a race between stop_two_cpus, and the global stop_cpus. >>> >> >> What was the trigger for this? I want to see what was missing from my own >> testing. I'm going to go out on a limb and guess that CPU hotplug was also >> running in the background to specifically stress this sort of rare condition. >> Something like running a standard test with the monitors/watch-cpuoffline.sh >> from mmtests running in parallel. > > AFAIK the trigger was a test that continuously loads and > unloads kernel modules, while doing other stuff. > With this patch in place the module load/unload test ran for ~16 hours without failure. Without the patch the test usually fails in 5-10 minutes. Tested-by: Prarit Bhargava P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On 11/01/2013 07:08 AM, Mel Gorman wrote: > On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: >> There is a race between stop_two_cpus, and the global stop_cpus. >> > > What was the trigger for this? I want to see what was missing from my own > testing. I'm going to go out on a limb and guess that CPU hotplug was also > running in the background to specifically stress this sort of rare condition. > Something like running a standard test with the monitors/watch-cpuoffline.sh > from mmtests running in parallel. > I have a test that loads and unloads each module in /lib/modules/3.*/... Each run typically takes a few minutes. After running 4-5 times, the system issues a soft lockup warning with a CPU in multi_cpu_stop(). Unfortunately, kdump isn't working on this particular system (due to another bug) so I modified the code with (sorry for the cut-and-paste): diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 05039e3..4a8c9f9 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -323,8 +323,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtime else dump_stack(); - if (softlockup_panic) + if (softlockup_panic) { + show_state(); panic("softlockup: hung tasks"); + } __this_cpu_write(soft_watchdog_warn, true); } else __this_cpu_write(soft_watchdog_warn, false); and then 'echo 1 > /proc/sys/kernel/softlockup_panic' to get a full trace of all tasks. When I did this and ran the kernel module load unload test ... [prarit@prarit tmp]$ cat /tmp/intel.log | grep RIP [ 678.081168] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.156180] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.230190] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.244186] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.259194] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.274192] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.288195] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.303197] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.318200] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.333203] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.349206] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.364208] RIP: 0010:[] [] multi_cpu_stop+0x7b/0xf0 [ 678.379211] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.394212] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.409215] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.424217] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.438219] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.452221] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.466228] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.481228] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.496230] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.511234] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.526236] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.541238] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.556244] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.571243] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.586247] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 [ 678.601248] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.616251] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 678.632254] RIP: 0010:[] [] multi_cpu_stop+0x7b/0xf0 [ 678.647257] RIP: 0010:[] [] multi_cpu_stop+0x82/0xf0 [ 687.570464] RIP: 0010:[] [] multi_cpu_stop+0x86/0xf0 and, [prarit@prarit tmp]$ cat /tmp/intel.log | grep RIP | wc -l 32 which shows all 32 cpus are "correctly" in the cpu stop threads. After some investigation, Rik came up with his patch. Hope this explains things, P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On 11/01/2013 07:08 AM, Mel Gorman wrote: > On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: >> There is a race between stop_two_cpus, and the global stop_cpus. >> > > What was the trigger for this? I want to see what was missing from my own > testing. I'm going to go out on a limb and guess that CPU hotplug was also > running in the background to specifically stress this sort of rare condition. > Something like running a standard test with the monitors/watch-cpuoffline.sh > from mmtests running in parallel. AFAIK the trigger was a test that continuously loads and unloads kernel modules, while doing other stuff. >> It is possible for two CPUs to get their stopper functions queued >> "backwards" from one another, resulting in the stopper threads >> getting stuck, and the system hanging. This can happen because >> queuing up stoppers is not synchronized. >> >> This patch adds synchronization between stop_cpus (a rare operation), >> and stop_two_cpus. >> >> Signed-off-by: Rik van Riel >> --- >> Prarit is running a test with this patch. By now the kernel would have >> crashed already, yet it is still going. I expect Prarit will add his >> Tested-by: some time tomorrow morning. >> >> kernel/stop_machine.c | 43 ++- >> 1 file changed, 42 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c >> index 32a6c44..46cb4c2 100644 >> --- a/kernel/stop_machine.c >> +++ b/kernel/stop_machine.c >> @@ -40,8 +40,10 @@ struct cpu_stopper { >> }; >> >> static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper); >> +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing); >> static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task); >> static bool stop_machine_initialized = false; >> +static bool stop_cpus_queueing = false; >> >> static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int >> nr_todo) >> { >> @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int >> cpu2, cpu_stop_fn_t fn, void * >> cpu_stop_init_done(, 2); >> set_state(, MULTI_STOP_PREPARE); >> >> + wait_for_global: >> +/* If a global stop_cpus is queuing up stoppers, wait. */ >> +while (unlikely(stop_cpus_queueing)) >> +cpu_relax(); >> + > > This partially serialises callers to migrate_swap() while it is checked > if the pair of CPUs are being affected at the moment. It's two-stage Not really. This only serializes migrate_swap if there is a global stop_cpus underway. If there is no global stop_cpus, migrate_swap will continue the way it did before, without locking. > locking. The global lock is short-lived while the per-cpu data is updated > and the per-cpu values allow a degree of parallelisation on call_cpu which > could not be done with a spinlock held anyway. Why not make protection > of the initial update a normal spinlock? i.e. > > spin_lock(_cpus_queue_lock); > this_cpu_write(stop_two_cpus_queueing, true); > spin_unlock(_cpus_queue_lock); Because that would result in all migrate_swap instances serializing with each other. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: > There is a race between stop_two_cpus, and the global stop_cpus. > What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and guess that CPU hotplug was also running in the background to specifically stress this sort of rare condition. Something like running a standard test with the monitors/watch-cpuoffline.sh from mmtests running in parallel. > It is possible for two CPUs to get their stopper functions queued > "backwards" from one another, resulting in the stopper threads > getting stuck, and the system hanging. This can happen because > queuing up stoppers is not synchronized. > > This patch adds synchronization between stop_cpus (a rare operation), > and stop_two_cpus. > > Signed-off-by: Rik van Riel > --- > Prarit is running a test with this patch. By now the kernel would have > crashed already, yet it is still going. I expect Prarit will add his > Tested-by: some time tomorrow morning. > > kernel/stop_machine.c | 43 ++- > 1 file changed, 42 insertions(+), 1 deletion(-) > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > index 32a6c44..46cb4c2 100644 > --- a/kernel/stop_machine.c > +++ b/kernel/stop_machine.c > @@ -40,8 +40,10 @@ struct cpu_stopper { > }; > > static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper); > +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing); > static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task); > static bool stop_machine_initialized = false; > +static bool stop_cpus_queueing = false; > > static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int > nr_todo) > { > @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, > cpu_stop_fn_t fn, void * > cpu_stop_init_done(, 2); > set_state(, MULTI_STOP_PREPARE); > > + wait_for_global: > + /* If a global stop_cpus is queuing up stoppers, wait. */ > + while (unlikely(stop_cpus_queueing)) > + cpu_relax(); > + This partially serialises callers to migrate_swap() while it is checked if the pair of CPUs are being affected at the moment. It's two-stage locking. The global lock is short-lived while the per-cpu data is updated and the per-cpu values allow a degree of parallelisation on call_cpu which could not be done with a spinlock held anyway. Why not make protection of the initial update a normal spinlock? i.e. spin_lock(_cpus_queue_lock); this_cpu_write(stop_two_cpus_queueing, true); spin_unlock(_cpus_queue_lock); and get rid of the barriers and gogo wait_for_global loop entirely? I'm not seeing the hidden advantage. The this_cpu_write(stop_two_cpus_queueing, false) would also need to be within the lock as would the checks in queue_stop_cpus_work. The locks look bad but it's not clear to me why the barriers and retries are better. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and guess that CPU hotplug was also running in the background to specifically stress this sort of rare condition. Something like running a standard test with the monitors/watch-cpuoffline.sh from mmtests running in parallel. It is possible for two CPUs to get their stopper functions queued backwards from one another, resulting in the stopper threads getting stuck, and the system hanging. This can happen because queuing up stoppers is not synchronized. This patch adds synchronization between stop_cpus (a rare operation), and stop_two_cpus. Signed-off-by: Rik van Riel r...@redhat.com --- Prarit is running a test with this patch. By now the kernel would have crashed already, yet it is still going. I expect Prarit will add his Tested-by: some time tomorrow morning. kernel/stop_machine.c | 43 ++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 32a6c44..46cb4c2 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -40,8 +40,10 @@ struct cpu_stopper { }; static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper); +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing); static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task); static bool stop_machine_initialized = false; +static bool stop_cpus_queueing = false; static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo) { @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * cpu_stop_init_done(done, 2); set_state(msdata, MULTI_STOP_PREPARE); + wait_for_global: + /* If a global stop_cpus is queuing up stoppers, wait. */ + while (unlikely(stop_cpus_queueing)) + cpu_relax(); + This partially serialises callers to migrate_swap() while it is checked if the pair of CPUs are being affected at the moment. It's two-stage locking. The global lock is short-lived while the per-cpu data is updated and the per-cpu values allow a degree of parallelisation on call_cpu which could not be done with a spinlock held anyway. Why not make protection of the initial update a normal spinlock? i.e. spin_lock(stop_cpus_queue_lock); this_cpu_write(stop_two_cpus_queueing, true); spin_unlock(stop_cpus_queue_lock); and get rid of the barriers and gogo wait_for_global loop entirely? I'm not seeing the hidden advantage. The this_cpu_write(stop_two_cpus_queueing, false) would also need to be within the lock as would the checks in queue_stop_cpus_work. The locks look bad but it's not clear to me why the barriers and retries are better. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On 11/01/2013 07:08 AM, Mel Gorman wrote: On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and guess that CPU hotplug was also running in the background to specifically stress this sort of rare condition. Something like running a standard test with the monitors/watch-cpuoffline.sh from mmtests running in parallel. AFAIK the trigger was a test that continuously loads and unloads kernel modules, while doing other stuff. It is possible for two CPUs to get their stopper functions queued backwards from one another, resulting in the stopper threads getting stuck, and the system hanging. This can happen because queuing up stoppers is not synchronized. This patch adds synchronization between stop_cpus (a rare operation), and stop_two_cpus. Signed-off-by: Rik van Riel r...@redhat.com --- Prarit is running a test with this patch. By now the kernel would have crashed already, yet it is still going. I expect Prarit will add his Tested-by: some time tomorrow morning. kernel/stop_machine.c | 43 ++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 32a6c44..46cb4c2 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -40,8 +40,10 @@ struct cpu_stopper { }; static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper); +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing); static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task); static bool stop_machine_initialized = false; +static bool stop_cpus_queueing = false; static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo) { @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * cpu_stop_init_done(done, 2); set_state(msdata, MULTI_STOP_PREPARE); + wait_for_global: +/* If a global stop_cpus is queuing up stoppers, wait. */ +while (unlikely(stop_cpus_queueing)) +cpu_relax(); + This partially serialises callers to migrate_swap() while it is checked if the pair of CPUs are being affected at the moment. It's two-stage Not really. This only serializes migrate_swap if there is a global stop_cpus underway. If there is no global stop_cpus, migrate_swap will continue the way it did before, without locking. locking. The global lock is short-lived while the per-cpu data is updated and the per-cpu values allow a degree of parallelisation on call_cpu which could not be done with a spinlock held anyway. Why not make protection of the initial update a normal spinlock? i.e. spin_lock(stop_cpus_queue_lock); this_cpu_write(stop_two_cpus_queueing, true); spin_unlock(stop_cpus_queue_lock); Because that would result in all migrate_swap instances serializing with each other. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On 11/01/2013 07:08 AM, Mel Gorman wrote: On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and guess that CPU hotplug was also running in the background to specifically stress this sort of rare condition. Something like running a standard test with the monitors/watch-cpuoffline.sh from mmtests running in parallel. I have a test that loads and unloads each module in /lib/modules/3.*/... Each run typically takes a few minutes. After running 4-5 times, the system issues a soft lockup warning with a CPU in multi_cpu_stop(). Unfortunately, kdump isn't working on this particular system (due to another bug) so I modified the code with (sorry for the cut-and-paste): diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 05039e3..4a8c9f9 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -323,8 +323,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtime else dump_stack(); - if (softlockup_panic) + if (softlockup_panic) { + show_state(); panic(softlockup: hung tasks); + } __this_cpu_write(soft_watchdog_warn, true); } else __this_cpu_write(soft_watchdog_warn, false); and then 'echo 1 /proc/sys/kernel/softlockup_panic' to get a full trace of all tasks. When I did this and ran the kernel module load unload test ... [prarit@prarit tmp]$ cat /tmp/intel.log | grep RIP [ 678.081168] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.156180] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.230190] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.244186] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.259194] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.274192] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.288195] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.303197] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.318200] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.333203] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.349206] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.364208] RIP: 0010:[810d328b] [810d328b] multi_cpu_stop+0x7b/0xf0 [ 678.379211] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.394212] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.409215] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.424217] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.438219] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.452221] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.466228] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.481228] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.496230] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.511234] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.526236] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.541238] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.556244] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.571243] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.586247] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 [ 678.601248] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.616251] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 678.632254] RIP: 0010:[810d328b] [810d328b] multi_cpu_stop+0x7b/0xf0 [ 678.647257] RIP: 0010:[810d3292] [810d3292] multi_cpu_stop+0x82/0xf0 [ 687.570464] RIP: 0010:[810d3296] [810d3296] multi_cpu_stop+0x86/0xf0 and, [prarit@prarit tmp]$ cat /tmp/intel.log | grep RIP | wc -l 32 which shows all 32 cpus are correctly in the cpu stop threads. After some investigation, Rik came up with his patch. Hope this explains things, P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On 11/01/2013 07:36 AM, Rik van Riel wrote: On 11/01/2013 07:08 AM, Mel Gorman wrote: On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and guess that CPU hotplug was also running in the background to specifically stress this sort of rare condition. Something like running a standard test with the monitors/watch-cpuoffline.sh from mmtests running in parallel. AFAIK the trigger was a test that continuously loads and unloads kernel modules, while doing other stuff. With this patch in place the module load/unload test ran for ~16 hours without failure. Without the patch the test usually fails in 5-10 minutes. Tested-by: Prarit Bhargava pra...@redhat.com P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On Fri, Nov 01, 2013 at 07:36:36AM -0400, Rik van Riel wrote: On 11/01/2013 07:08 AM, Mel Gorman wrote: On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote: There is a race between stop_two_cpus, and the global stop_cpus. What was the trigger for this? I want to see what was missing from my own testing. I'm going to go out on a limb and guess that CPU hotplug was also running in the background to specifically stress this sort of rare condition. Something like running a standard test with the monitors/watch-cpuoffline.sh from mmtests running in parallel. AFAIK the trigger was a test that continuously loads and unloads kernel modules, while doing other stuff. ok, thanks. + wait_for_global: + /* If a global stop_cpus is queuing up stoppers, wait. */ + while (unlikely(stop_cpus_queueing)) + cpu_relax(); + This partially serialises callers to migrate_swap() while it is checked if the pair of CPUs are being affected at the moment. It's two-stage Not really. This only serializes migrate_swap if there is a global stop_cpus underway. Ok, I see your point now but still wonder if this is too specialised for what we are trying to do. Could it have been done with a read-write semaphore with the global stop_cpus taking it for write and stop_two_cpus taking it for read? -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On 11/01/2013 10:24 AM, Peter Zijlstra wrote: On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote: Ok, I see your point now but still wonder if this is too specialised for what we are trying to do. Could it have been done with a read-write semaphore with the global stop_cpus taking it for write and stop_two_cpus taking it for read? rwsem for read is still global state.. That said it should be fairly easy to use lglock for this. I'll rewrite the patch using lglocks, that should make things more readable. -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote: Ok, I see your point now but still wonder if this is too specialised for what we are trying to do. Could it have been done with a read-write semaphore with the global stop_cpus taking it for write and stop_two_cpus taking it for read? rwsem for read is still global state.. That said it should be fairly easy to use lglock for this. Or write it with spinlocks like: DEFINE_PER_CPU(spinlock_t, local_lock); DEFINE_PER_CPU(int, have_global); spinlock_t global_lock; void local_lock(void) { preempt_disable(); spin_lock(this_cpu_ptr(local_lock)); if (spin_is_locked(global_lock)) { spin_unlock(this_cpu_ptr(local_lock)); spin_lock(global_lock); this_cpu_write(have_global, true); spin_lock(this_cpu_ptr(local_lock)); } } void local_unlock(void) { spin_unlock(this_cpu_ptr(local_lock)); if (this_cpu_read(have_global)) { this_cpu_write(have_global, false); spin_unlock(global_lock); } } void global_lock(void) { int cpu; spin_lock(global_lock); for_each_possible_cpu(cpu) spin_unlock_wait(per_cpu(local_lock, cpu)); } void global_unlock(void) { spin_unlock(global_lock); } Or possibly make the global_lock a mutex, plenty variants possible. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -tip] fix race between stop_two_cpus and stop_cpus
There is a race between stop_two_cpus, and the global stop_cpus. It is possible for two CPUs to get their stopper functions queued "backwards" from one another, resulting in the stopper threads getting stuck, and the system hanging. This can happen because queuing up stoppers is not synchronized. This patch adds synchronization between stop_cpus (a rare operation), and stop_two_cpus. Signed-off-by: Rik van Riel --- Prarit is running a test with this patch. By now the kernel would have crashed already, yet it is still going. I expect Prarit will add his Tested-by: some time tomorrow morning. kernel/stop_machine.c | 43 ++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 32a6c44..46cb4c2 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -40,8 +40,10 @@ struct cpu_stopper { }; static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper); +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing); static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task); static bool stop_machine_initialized = false; +static bool stop_cpus_queueing = false; static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo) { @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * cpu_stop_init_done(, 2); set_state(, MULTI_STOP_PREPARE); + wait_for_global: + /* If a global stop_cpus is queuing up stoppers, wait. */ + while (unlikely(stop_cpus_queueing)) + cpu_relax(); + + /* This CPU is queuing up stoppers. */ + preempt_disable(); + this_cpu_write(stop_two_cpus_queueing, true); + smp_mb(); /* matches the smp_wmb in queue_stop_cpus_work */ + + /* Global stop_cpus got busy simultaneously. Wait and retry. */ + if (unlikely(stop_cpus_queueing)) { + smp_mb(); /* matches the smp_wmb in queue_stop_cpus_work */ + this_cpu_write(stop_two_cpus_queueing, false); + preempt_enable(); + goto wait_for_global; + } + /* * Queuing needs to be done by the lowest numbered CPU, to ensure * that works are always queued in the same order on every CPU. * This prevents deadlocks. */ call_cpu = min(cpu1, cpu2); - smp_call_function_single(call_cpu, _cpu_stop_queue_work, _args, 0); + smp_wmb(); /* matches the smp_mb in wait_on_stop_two_cpus */ + this_cpu_write(stop_two_cpus_queueing, false); + preempt_enable(); + wait_for_completion(); return done.executed ? done.ret : -ENOENT; } @@ -295,6 +318,19 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg, cpu_stop_queue_work(cpu, work_buf); } +static void wait_on_stop_two_cpus(const struct cpumask *cpumask) +{ + int cpu; + + /* Do not reorder reads before this point */ + smp_mb(); /* matches the smp_wmb in stop_two_cpus */ + + /* Wait until no stop_two_cpus stopper tasks are being queued */ + for_each_cpu(cpu, cpumask) + while (per_cpu(stop_two_cpus_queueing, cpu) == true) + cpu_relax(); +} + /* static data for stop_cpus */ static DEFINE_MUTEX(stop_cpus_mutex); static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work); @@ -320,8 +356,12 @@ static void queue_stop_cpus_work(const struct cpumask *cpumask, * to enter @fn which can lead to deadlock. */ preempt_disable(); + stop_cpus_queueing = true; + wait_on_stop_two_cpus(cpumask); for_each_cpu(cpu, cpumask) cpu_stop_queue_work(cpu, _cpu(stop_cpus_work, cpu)); + smp_wmb(); /* matches the smp_mb in stop_two_cpus */ + stop_cpus_queueing = false; preempt_enable(); } @@ -509,6 +549,7 @@ static int __init cpu_stop_init(void) spin_lock_init(>lock); INIT_LIST_HEAD(>works); + per_cpu(stop_two_cpus_queueing, cpu) = false; } BUG_ON(smpboot_register_percpu_thread(_stop_threads)); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -tip] fix race between stop_two_cpus and stop_cpus
There is a race between stop_two_cpus, and the global stop_cpus. It is possible for two CPUs to get their stopper functions queued backwards from one another, resulting in the stopper threads getting stuck, and the system hanging. This can happen because queuing up stoppers is not synchronized. This patch adds synchronization between stop_cpus (a rare operation), and stop_two_cpus. Signed-off-by: Rik van Riel r...@redhat.com --- Prarit is running a test with this patch. By now the kernel would have crashed already, yet it is still going. I expect Prarit will add his Tested-by: some time tomorrow morning. kernel/stop_machine.c | 43 ++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 32a6c44..46cb4c2 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -40,8 +40,10 @@ struct cpu_stopper { }; static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper); +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing); static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task); static bool stop_machine_initialized = false; +static bool stop_cpus_queueing = false; static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo) { @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * cpu_stop_init_done(done, 2); set_state(msdata, MULTI_STOP_PREPARE); + wait_for_global: + /* If a global stop_cpus is queuing up stoppers, wait. */ + while (unlikely(stop_cpus_queueing)) + cpu_relax(); + + /* This CPU is queuing up stoppers. */ + preempt_disable(); + this_cpu_write(stop_two_cpus_queueing, true); + smp_mb(); /* matches the smp_wmb in queue_stop_cpus_work */ + + /* Global stop_cpus got busy simultaneously. Wait and retry. */ + if (unlikely(stop_cpus_queueing)) { + smp_mb(); /* matches the smp_wmb in queue_stop_cpus_work */ + this_cpu_write(stop_two_cpus_queueing, false); + preempt_enable(); + goto wait_for_global; + } + /* * Queuing needs to be done by the lowest numbered CPU, to ensure * that works are always queued in the same order on every CPU. * This prevents deadlocks. */ call_cpu = min(cpu1, cpu2); - smp_call_function_single(call_cpu, irq_cpu_stop_queue_work, call_args, 0); + smp_wmb(); /* matches the smp_mb in wait_on_stop_two_cpus */ + this_cpu_write(stop_two_cpus_queueing, false); + preempt_enable(); + wait_for_completion(done.completion); return done.executed ? done.ret : -ENOENT; } @@ -295,6 +318,19 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg, cpu_stop_queue_work(cpu, work_buf); } +static void wait_on_stop_two_cpus(const struct cpumask *cpumask) +{ + int cpu; + + /* Do not reorder reads before this point */ + smp_mb(); /* matches the smp_wmb in stop_two_cpus */ + + /* Wait until no stop_two_cpus stopper tasks are being queued */ + for_each_cpu(cpu, cpumask) + while (per_cpu(stop_two_cpus_queueing, cpu) == true) + cpu_relax(); +} + /* static data for stop_cpus */ static DEFINE_MUTEX(stop_cpus_mutex); static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work); @@ -320,8 +356,12 @@ static void queue_stop_cpus_work(const struct cpumask *cpumask, * to enter @fn which can lead to deadlock. */ preempt_disable(); + stop_cpus_queueing = true; + wait_on_stop_two_cpus(cpumask); for_each_cpu(cpu, cpumask) cpu_stop_queue_work(cpu, per_cpu(stop_cpus_work, cpu)); + smp_wmb(); /* matches the smp_mb in stop_two_cpus */ + stop_cpus_queueing = false; preempt_enable(); } @@ -509,6 +549,7 @@ static int __init cpu_stop_init(void) spin_lock_init(stopper-lock); INIT_LIST_HEAD(stopper-works); + per_cpu(stop_two_cpus_queueing, cpu) = false; } BUG_ON(smpboot_register_percpu_thread(cpu_stop_threads)); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/