Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Peter Zijlstra
On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote:
> Ok, I see your point now but still wonder if this is too specialised
> for what we are trying to do.  Could it have been done with a read-write
> semaphore with the global stop_cpus taking it for write and stop_two_cpus
> taking it for read?

rwsem for read is still global state.. That said it should be fairly
easy to use lglock for this.

Or write it with spinlocks like:

DEFINE_PER_CPU(spinlock_t, local_lock);
DEFINE_PER_CPU(int, have_global);
spinlock_t global_lock;

void local_lock(void)
{
preempt_disable();
spin_lock(this_cpu_ptr(_lock));
if (spin_is_locked(_lock)) {
spin_unlock(this_cpu_ptr(_lock));
spin_lock(_lock);
this_cpu_write(have_global, true);
spin_lock(this_cpu_ptr(_lock));
}
}

void local_unlock(void)
{
spin_unlock(this_cpu_ptr(_lock));
if (this_cpu_read(have_global)) {
this_cpu_write(have_global, false);
spin_unlock(_lock);
}
}

void global_lock(void)
{
int cpu;

spin_lock(_lock);
for_each_possible_cpu(cpu)
spin_unlock_wait(_cpu(local_lock, cpu));
}

void global_unlock(void)
{
spin_unlock(_lock);
}

Or possibly make the global_lock a mutex, plenty variants possible.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Rik van Riel
On 11/01/2013 10:24 AM, Peter Zijlstra wrote:
> On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote:
>> Ok, I see your point now but still wonder if this is too specialised
>> for what we are trying to do.  Could it have been done with a read-write
>> semaphore with the global stop_cpus taking it for write and stop_two_cpus
>> taking it for read?
> 
> rwsem for read is still global state.. That said it should be fairly
> easy to use lglock for this.

I'll rewrite the patch using lglocks, that should make things
more readable.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Mel Gorman
On Fri, Nov 01, 2013 at 07:36:36AM -0400, Rik van Riel wrote:
> On 11/01/2013 07:08 AM, Mel Gorman wrote:
> > On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
> >> There is a race between stop_two_cpus, and the global stop_cpus.
> >>
> > 
> > What was the trigger for this? I want to see what was missing from my own
> > testing. I'm going to go out on a limb and guess that CPU hotplug was also
> > running in the background to specifically stress this sort of rare 
> > condition.
> > Something like running a standard test with the monitors/watch-cpuoffline.sh
> > from mmtests running in parallel.
> 
> AFAIK the trigger was a test that continuously loads and
> unloads kernel modules, while doing other stuff.
> 

ok, thanks.

> >> + wait_for_global:
> >> +  /* If a global stop_cpus is queuing up stoppers, wait. */
> >> +  while (unlikely(stop_cpus_queueing))
> >> +  cpu_relax();
> >> +
> > 
> > This partially serialises callers to migrate_swap() while it is checked
> > if the pair of CPUs are being affected at the moment. It's two-stage
> 
> Not really. This only serializes migrate_swap if there is a global
> stop_cpus underway.
> 

Ok, I see your point now but still wonder if this is too specialised
for what we are trying to do.  Could it have been done with a read-write
semaphore with the global stop_cpus taking it for write and stop_two_cpus
taking it for read?

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Prarit Bhargava


On 11/01/2013 07:36 AM, Rik van Riel wrote:
> On 11/01/2013 07:08 AM, Mel Gorman wrote:
>> On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
>>> There is a race between stop_two_cpus, and the global stop_cpus.
>>>
>>
>> What was the trigger for this? I want to see what was missing from my own
>> testing. I'm going to go out on a limb and guess that CPU hotplug was also
>> running in the background to specifically stress this sort of rare condition.
>> Something like running a standard test with the monitors/watch-cpuoffline.sh
>> from mmtests running in parallel.
> 
> AFAIK the trigger was a test that continuously loads and
> unloads kernel modules, while doing other stuff.
>

With this patch in place the module load/unload test ran for ~16 hours without
failure.  Without the patch the test usually fails in 5-10 minutes.

Tested-by: Prarit Bhargava 

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Prarit Bhargava


On 11/01/2013 07:08 AM, Mel Gorman wrote:
> On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
>> There is a race between stop_two_cpus, and the global stop_cpus.
>>
> 
> What was the trigger for this? I want to see what was missing from my own
> testing. I'm going to go out on a limb and guess that CPU hotplug was also
> running in the background to specifically stress this sort of rare condition.
> Something like running a standard test with the monitors/watch-cpuoffline.sh
> from mmtests running in parallel.
> 

I have a test that loads and unloads each module in /lib/modules/3.*/...

Each run typically takes a few minutes.  After running 4-5 times, the system
issues a soft lockup warning with a CPU in multi_cpu_stop().  Unfortunately,
kdump isn't working on this particular system (due to another bug) so I modified
the code with (sorry for the cut-and-paste):

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 05039e3..4a8c9f9 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -323,8 +323,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtime
else
dump_stack();

-   if (softlockup_panic)
+   if (softlockup_panic) {
+   show_state();
panic("softlockup: hung tasks");
+   }
__this_cpu_write(soft_watchdog_warn, true);
} else
__this_cpu_write(soft_watchdog_warn, false);

and then 'echo 1 > /proc/sys/kernel/softlockup_panic' to get a full trace of all
tasks.

When I did this and ran the kernel module load unload test ...

[prarit@prarit tmp]$ cat /tmp/intel.log | grep RIP
[  678.081168] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.156180] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.230190] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.244186] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.259194] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.274192] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.288195] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.303197] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.318200] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.333203] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.349206] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.364208] RIP: 0010:[]  []
multi_cpu_stop+0x7b/0xf0
[  678.379211] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.394212] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.409215] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.424217] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.438219] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.452221] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.466228] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.481228] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.496230] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.511234] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.526236] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.541238] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.556244] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.571243] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.586247] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0
[  678.601248] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.616251] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  678.632254] RIP: 0010:[]  []
multi_cpu_stop+0x7b/0xf0
[  678.647257] RIP: 0010:[]  []
multi_cpu_stop+0x82/0xf0
[  687.570464] RIP: 0010:[]  []
multi_cpu_stop+0x86/0xf0

and,

[prarit@prarit tmp]$ cat /tmp/intel.log | grep RIP | wc -l
32

which shows all 32 cpus are "correctly" in the cpu stop threads.  After some
investigation, Rik came up with his patch.

Hope this explains things,

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Rik van Riel
On 11/01/2013 07:08 AM, Mel Gorman wrote:
> On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
>> There is a race between stop_two_cpus, and the global stop_cpus.
>>
> 
> What was the trigger for this? I want to see what was missing from my own
> testing. I'm going to go out on a limb and guess that CPU hotplug was also
> running in the background to specifically stress this sort of rare condition.
> Something like running a standard test with the monitors/watch-cpuoffline.sh
> from mmtests running in parallel.

AFAIK the trigger was a test that continuously loads and
unloads kernel modules, while doing other stuff.

>> It is possible for two CPUs to get their stopper functions queued
>> "backwards" from one another, resulting in the stopper threads
>> getting stuck, and the system hanging. This can happen because
>> queuing up stoppers is not synchronized.
>>
>> This patch adds synchronization between stop_cpus (a rare operation),
>> and stop_two_cpus.
>>
>> Signed-off-by: Rik van Riel 
>> ---
>> Prarit is running a test with this patch. By now the kernel would have
>> crashed already, yet it is still going. I expect Prarit will add his
>> Tested-by: some time tomorrow morning.
>>
>>  kernel/stop_machine.c | 43 ++-
>>  1 file changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index 32a6c44..46cb4c2 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -40,8 +40,10 @@ struct cpu_stopper {
>>  };
>>  
>>  static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
>> +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing);
>>  static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
>>  static bool stop_machine_initialized = false;
>> +static bool stop_cpus_queueing = false;
>>  
>>  static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int 
>> nr_todo)
>>  {
>> @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
>> cpu2, cpu_stop_fn_t fn, void *
>>  cpu_stop_init_done(, 2);
>>  set_state(, MULTI_STOP_PREPARE);
>>  
>> + wait_for_global:
>> +/* If a global stop_cpus is queuing up stoppers, wait. */
>> +while (unlikely(stop_cpus_queueing))
>> +cpu_relax();
>> +
> 
> This partially serialises callers to migrate_swap() while it is checked
> if the pair of CPUs are being affected at the moment. It's two-stage

Not really. This only serializes migrate_swap if there is a global
stop_cpus underway.

If there is no global stop_cpus, migrate_swap will continue the way
it did before, without locking.

> locking. The global lock is short-lived while the per-cpu data is updated
> and the per-cpu values allow a degree of parallelisation on call_cpu which
> could not be done with a spinlock held anyway.  Why not make protection
> of the initial update a normal spinlock? i.e.
> 
> spin_lock(_cpus_queue_lock);
> this_cpu_write(stop_two_cpus_queueing, true);
> spin_unlock(_cpus_queue_lock);

Because that would result in all migrate_swap instances serializing
with each other.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Mel Gorman
On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
> There is a race between stop_two_cpus, and the global stop_cpus.
> 

What was the trigger for this? I want to see what was missing from my own
testing. I'm going to go out on a limb and guess that CPU hotplug was also
running in the background to specifically stress this sort of rare condition.
Something like running a standard test with the monitors/watch-cpuoffline.sh
from mmtests running in parallel.

> It is possible for two CPUs to get their stopper functions queued
> "backwards" from one another, resulting in the stopper threads
> getting stuck, and the system hanging. This can happen because
> queuing up stoppers is not synchronized.
> 
> This patch adds synchronization between stop_cpus (a rare operation),
> and stop_two_cpus.
> 
> Signed-off-by: Rik van Riel 
> ---
> Prarit is running a test with this patch. By now the kernel would have
> crashed already, yet it is still going. I expect Prarit will add his
> Tested-by: some time tomorrow morning.
> 
>  kernel/stop_machine.c | 43 ++-
>  1 file changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 32a6c44..46cb4c2 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -40,8 +40,10 @@ struct cpu_stopper {
>  };
>  
>  static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
> +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing);
>  static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
>  static bool stop_machine_initialized = false;
> +static bool stop_cpus_queueing = false;
>  
>  static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int 
> nr_todo)
>  {
> @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, 
> cpu_stop_fn_t fn, void *
>   cpu_stop_init_done(, 2);
>   set_state(, MULTI_STOP_PREPARE);
>  
> + wait_for_global:
> + /* If a global stop_cpus is queuing up stoppers, wait. */
> + while (unlikely(stop_cpus_queueing))
> + cpu_relax();
> +

This partially serialises callers to migrate_swap() while it is checked
if the pair of CPUs are being affected at the moment. It's two-stage
locking. The global lock is short-lived while the per-cpu data is updated
and the per-cpu values allow a degree of parallelisation on call_cpu which
could not be done with a spinlock held anyway.  Why not make protection
of the initial update a normal spinlock? i.e.

spin_lock(_cpus_queue_lock);
this_cpu_write(stop_two_cpus_queueing, true);
spin_unlock(_cpus_queue_lock);

and get rid of the barriers and gogo wait_for_global loop entirely? I'm not
seeing the hidden advantage. The this_cpu_write(stop_two_cpus_queueing, false)
would also need to be within the lock as would the checks in 
queue_stop_cpus_work.

The locks look bad but it's not clear to me why the barriers and retries
are better.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Mel Gorman
On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
 There is a race between stop_two_cpus, and the global stop_cpus.
 

What was the trigger for this? I want to see what was missing from my own
testing. I'm going to go out on a limb and guess that CPU hotplug was also
running in the background to specifically stress this sort of rare condition.
Something like running a standard test with the monitors/watch-cpuoffline.sh
from mmtests running in parallel.

 It is possible for two CPUs to get their stopper functions queued
 backwards from one another, resulting in the stopper threads
 getting stuck, and the system hanging. This can happen because
 queuing up stoppers is not synchronized.
 
 This patch adds synchronization between stop_cpus (a rare operation),
 and stop_two_cpus.
 
 Signed-off-by: Rik van Riel r...@redhat.com
 ---
 Prarit is running a test with this patch. By now the kernel would have
 crashed already, yet it is still going. I expect Prarit will add his
 Tested-by: some time tomorrow morning.
 
  kernel/stop_machine.c | 43 ++-
  1 file changed, 42 insertions(+), 1 deletion(-)
 
 diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
 index 32a6c44..46cb4c2 100644
 --- a/kernel/stop_machine.c
 +++ b/kernel/stop_machine.c
 @@ -40,8 +40,10 @@ struct cpu_stopper {
  };
  
  static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
 +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing);
  static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
  static bool stop_machine_initialized = false;
 +static bool stop_cpus_queueing = false;
  
  static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int 
 nr_todo)
  {
 @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, 
 cpu_stop_fn_t fn, void *
   cpu_stop_init_done(done, 2);
   set_state(msdata, MULTI_STOP_PREPARE);
  
 + wait_for_global:
 + /* If a global stop_cpus is queuing up stoppers, wait. */
 + while (unlikely(stop_cpus_queueing))
 + cpu_relax();
 +

This partially serialises callers to migrate_swap() while it is checked
if the pair of CPUs are being affected at the moment. It's two-stage
locking. The global lock is short-lived while the per-cpu data is updated
and the per-cpu values allow a degree of parallelisation on call_cpu which
could not be done with a spinlock held anyway.  Why not make protection
of the initial update a normal spinlock? i.e.

spin_lock(stop_cpus_queue_lock);
this_cpu_write(stop_two_cpus_queueing, true);
spin_unlock(stop_cpus_queue_lock);

and get rid of the barriers and gogo wait_for_global loop entirely? I'm not
seeing the hidden advantage. The this_cpu_write(stop_two_cpus_queueing, false)
would also need to be within the lock as would the checks in 
queue_stop_cpus_work.

The locks look bad but it's not clear to me why the barriers and retries
are better.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Rik van Riel
On 11/01/2013 07:08 AM, Mel Gorman wrote:
 On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
 There is a race between stop_two_cpus, and the global stop_cpus.

 
 What was the trigger for this? I want to see what was missing from my own
 testing. I'm going to go out on a limb and guess that CPU hotplug was also
 running in the background to specifically stress this sort of rare condition.
 Something like running a standard test with the monitors/watch-cpuoffline.sh
 from mmtests running in parallel.

AFAIK the trigger was a test that continuously loads and
unloads kernel modules, while doing other stuff.

 It is possible for two CPUs to get their stopper functions queued
 backwards from one another, resulting in the stopper threads
 getting stuck, and the system hanging. This can happen because
 queuing up stoppers is not synchronized.

 This patch adds synchronization between stop_cpus (a rare operation),
 and stop_two_cpus.

 Signed-off-by: Rik van Riel r...@redhat.com
 ---
 Prarit is running a test with this patch. By now the kernel would have
 crashed already, yet it is still going. I expect Prarit will add his
 Tested-by: some time tomorrow morning.

  kernel/stop_machine.c | 43 ++-
  1 file changed, 42 insertions(+), 1 deletion(-)

 diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
 index 32a6c44..46cb4c2 100644
 --- a/kernel/stop_machine.c
 +++ b/kernel/stop_machine.c
 @@ -40,8 +40,10 @@ struct cpu_stopper {
  };
  
  static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
 +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing);
  static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
  static bool stop_machine_initialized = false;
 +static bool stop_cpus_queueing = false;
  
  static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int 
 nr_todo)
  {
 @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
 cpu2, cpu_stop_fn_t fn, void *
  cpu_stop_init_done(done, 2);
  set_state(msdata, MULTI_STOP_PREPARE);
  
 + wait_for_global:
 +/* If a global stop_cpus is queuing up stoppers, wait. */
 +while (unlikely(stop_cpus_queueing))
 +cpu_relax();
 +
 
 This partially serialises callers to migrate_swap() while it is checked
 if the pair of CPUs are being affected at the moment. It's two-stage

Not really. This only serializes migrate_swap if there is a global
stop_cpus underway.

If there is no global stop_cpus, migrate_swap will continue the way
it did before, without locking.

 locking. The global lock is short-lived while the per-cpu data is updated
 and the per-cpu values allow a degree of parallelisation on call_cpu which
 could not be done with a spinlock held anyway.  Why not make protection
 of the initial update a normal spinlock? i.e.
 
 spin_lock(stop_cpus_queue_lock);
 this_cpu_write(stop_two_cpus_queueing, true);
 spin_unlock(stop_cpus_queue_lock);

Because that would result in all migrate_swap instances serializing
with each other.

-- 
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Prarit Bhargava


On 11/01/2013 07:08 AM, Mel Gorman wrote:
 On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
 There is a race between stop_two_cpus, and the global stop_cpus.

 
 What was the trigger for this? I want to see what was missing from my own
 testing. I'm going to go out on a limb and guess that CPU hotplug was also
 running in the background to specifically stress this sort of rare condition.
 Something like running a standard test with the monitors/watch-cpuoffline.sh
 from mmtests running in parallel.
 

I have a test that loads and unloads each module in /lib/modules/3.*/...

Each run typically takes a few minutes.  After running 4-5 times, the system
issues a soft lockup warning with a CPU in multi_cpu_stop().  Unfortunately,
kdump isn't working on this particular system (due to another bug) so I modified
the code with (sorry for the cut-and-paste):

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 05039e3..4a8c9f9 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -323,8 +323,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtime
else
dump_stack();

-   if (softlockup_panic)
+   if (softlockup_panic) {
+   show_state();
panic(softlockup: hung tasks);
+   }
__this_cpu_write(soft_watchdog_warn, true);
} else
__this_cpu_write(soft_watchdog_warn, false);

and then 'echo 1  /proc/sys/kernel/softlockup_panic' to get a full trace of all
tasks.

When I did this and ran the kernel module load unload test ...

[prarit@prarit tmp]$ cat /tmp/intel.log | grep RIP
[  678.081168] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.156180] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.230190] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.244186] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.259194] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.274192] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.288195] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.303197] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.318200] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.333203] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.349206] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.364208] RIP: 0010:[810d328b]  [810d328b]
multi_cpu_stop+0x7b/0xf0
[  678.379211] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.394212] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.409215] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.424217] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.438219] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.452221] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.466228] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.481228] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.496230] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.511234] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.526236] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.541238] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.556244] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.571243] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.586247] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0
[  678.601248] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.616251] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  678.632254] RIP: 0010:[810d328b]  [810d328b]
multi_cpu_stop+0x7b/0xf0
[  678.647257] RIP: 0010:[810d3292]  [810d3292]
multi_cpu_stop+0x82/0xf0
[  687.570464] RIP: 0010:[810d3296]  [810d3296]
multi_cpu_stop+0x86/0xf0

and,

[prarit@prarit tmp]$ cat /tmp/intel.log | grep RIP | wc -l
32

which shows all 32 cpus are correctly in the cpu stop threads.  After some
investigation, Rik came up with his patch.

Hope this explains things,

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Prarit Bhargava


On 11/01/2013 07:36 AM, Rik van Riel wrote:
 On 11/01/2013 07:08 AM, Mel Gorman wrote:
 On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
 There is a race between stop_two_cpus, and the global stop_cpus.


 What was the trigger for this? I want to see what was missing from my own
 testing. I'm going to go out on a limb and guess that CPU hotplug was also
 running in the background to specifically stress this sort of rare condition.
 Something like running a standard test with the monitors/watch-cpuoffline.sh
 from mmtests running in parallel.
 
 AFAIK the trigger was a test that continuously loads and
 unloads kernel modules, while doing other stuff.


With this patch in place the module load/unload test ran for ~16 hours without
failure.  Without the patch the test usually fails in 5-10 minutes.

Tested-by: Prarit Bhargava pra...@redhat.com

P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Mel Gorman
On Fri, Nov 01, 2013 at 07:36:36AM -0400, Rik van Riel wrote:
 On 11/01/2013 07:08 AM, Mel Gorman wrote:
  On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
  There is a race between stop_two_cpus, and the global stop_cpus.
 
  
  What was the trigger for this? I want to see what was missing from my own
  testing. I'm going to go out on a limb and guess that CPU hotplug was also
  running in the background to specifically stress this sort of rare 
  condition.
  Something like running a standard test with the monitors/watch-cpuoffline.sh
  from mmtests running in parallel.
 
 AFAIK the trigger was a test that continuously loads and
 unloads kernel modules, while doing other stuff.
 

ok, thanks.

  + wait_for_global:
  +  /* If a global stop_cpus is queuing up stoppers, wait. */
  +  while (unlikely(stop_cpus_queueing))
  +  cpu_relax();
  +
  
  This partially serialises callers to migrate_swap() while it is checked
  if the pair of CPUs are being affected at the moment. It's two-stage
 
 Not really. This only serializes migrate_swap if there is a global
 stop_cpus underway.
 

Ok, I see your point now but still wonder if this is too specialised
for what we are trying to do.  Could it have been done with a read-write
semaphore with the global stop_cpus taking it for write and stop_two_cpus
taking it for read?

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Rik van Riel
On 11/01/2013 10:24 AM, Peter Zijlstra wrote:
 On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote:
 Ok, I see your point now but still wonder if this is too specialised
 for what we are trying to do.  Could it have been done with a read-write
 semaphore with the global stop_cpus taking it for write and stop_two_cpus
 taking it for read?
 
 rwsem for read is still global state.. That said it should be fairly
 easy to use lglock for this.

I'll rewrite the patch using lglocks, that should make things
more readable.

-- 
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-11-01 Thread Peter Zijlstra
On Fri, Nov 01, 2013 at 01:44:24PM +, Mel Gorman wrote:
 Ok, I see your point now but still wonder if this is too specialised
 for what we are trying to do.  Could it have been done with a read-write
 semaphore with the global stop_cpus taking it for write and stop_two_cpus
 taking it for read?

rwsem for read is still global state.. That said it should be fairly
easy to use lglock for this.

Or write it with spinlocks like:

DEFINE_PER_CPU(spinlock_t, local_lock);
DEFINE_PER_CPU(int, have_global);
spinlock_t global_lock;

void local_lock(void)
{
preempt_disable();
spin_lock(this_cpu_ptr(local_lock));
if (spin_is_locked(global_lock)) {
spin_unlock(this_cpu_ptr(local_lock));
spin_lock(global_lock);
this_cpu_write(have_global, true);
spin_lock(this_cpu_ptr(local_lock));
}
}

void local_unlock(void)
{
spin_unlock(this_cpu_ptr(local_lock));
if (this_cpu_read(have_global)) {
this_cpu_write(have_global, false);
spin_unlock(global_lock);
}
}

void global_lock(void)
{
int cpu;

spin_lock(global_lock);
for_each_possible_cpu(cpu)
spin_unlock_wait(per_cpu(local_lock, cpu));
}

void global_unlock(void)
{
spin_unlock(global_lock);
}

Or possibly make the global_lock a mutex, plenty variants possible.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-10-31 Thread Rik van Riel
There is a race between stop_two_cpus, and the global stop_cpus.

It is possible for two CPUs to get their stopper functions queued
"backwards" from one another, resulting in the stopper threads
getting stuck, and the system hanging. This can happen because
queuing up stoppers is not synchronized.

This patch adds synchronization between stop_cpus (a rare operation),
and stop_two_cpus.

Signed-off-by: Rik van Riel 
---
Prarit is running a test with this patch. By now the kernel would have
crashed already, yet it is still going. I expect Prarit will add his
Tested-by: some time tomorrow morning.

 kernel/stop_machine.c | 43 ++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 32a6c44..46cb4c2 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -40,8 +40,10 @@ struct cpu_stopper {
 };
 
 static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
+static DEFINE_PER_CPU(bool, stop_two_cpus_queueing);
 static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
 static bool stop_machine_initialized = false;
+static bool stop_cpus_queueing = false;
 
 static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int 
nr_todo)
 {
@@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, 
cpu_stop_fn_t fn, void *
cpu_stop_init_done(, 2);
set_state(, MULTI_STOP_PREPARE);
 
+ wait_for_global:
+   /* If a global stop_cpus is queuing up stoppers, wait. */
+   while (unlikely(stop_cpus_queueing))
+   cpu_relax();
+
+   /* This CPU is queuing up stoppers. */
+   preempt_disable();
+   this_cpu_write(stop_two_cpus_queueing, true);
+   smp_mb(); /* matches the smp_wmb in queue_stop_cpus_work */
+
+   /* Global stop_cpus got busy simultaneously. Wait and retry. */
+   if (unlikely(stop_cpus_queueing)) {
+   smp_mb(); /* matches the smp_wmb in queue_stop_cpus_work */
+   this_cpu_write(stop_two_cpus_queueing, false);
+   preempt_enable();
+   goto wait_for_global;
+   }
+
/*
 * Queuing needs to be done by the lowest numbered CPU, to ensure
 * that works are always queued in the same order on every CPU.
 * This prevents deadlocks.
 */
call_cpu = min(cpu1, cpu2);
-
smp_call_function_single(call_cpu, _cpu_stop_queue_work,
 _args, 0);
 
+   smp_wmb(); /* matches the smp_mb in wait_on_stop_two_cpus */
+   this_cpu_write(stop_two_cpus_queueing, false);
+   preempt_enable();
+
wait_for_completion();
return done.executed ? done.ret : -ENOENT;
 }
@@ -295,6 +318,19 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t 
fn, void *arg,
cpu_stop_queue_work(cpu, work_buf);
 }
 
+static void wait_on_stop_two_cpus(const struct cpumask *cpumask)
+{
+   int cpu;
+
+   /* Do not reorder reads before this point */
+   smp_mb(); /* matches the smp_wmb in stop_two_cpus */
+   
+   /* Wait until no stop_two_cpus stopper tasks are being queued */
+   for_each_cpu(cpu, cpumask)
+   while (per_cpu(stop_two_cpus_queueing, cpu) == true)
+   cpu_relax();
+}
+
 /* static data for stop_cpus */
 static DEFINE_MUTEX(stop_cpus_mutex);
 static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);
@@ -320,8 +356,12 @@ static void queue_stop_cpus_work(const struct cpumask 
*cpumask,
 * to enter @fn which can lead to deadlock.
 */
preempt_disable();
+   stop_cpus_queueing = true;
+   wait_on_stop_two_cpus(cpumask);
for_each_cpu(cpu, cpumask)
cpu_stop_queue_work(cpu, _cpu(stop_cpus_work, cpu));
+   smp_wmb(); /* matches the smp_mb in stop_two_cpus */
+   stop_cpus_queueing = false;
preempt_enable();
 }
 
@@ -509,6 +549,7 @@ static int __init cpu_stop_init(void)
 
spin_lock_init(>lock);
INIT_LIST_HEAD(>works);
+   per_cpu(stop_two_cpus_queueing, cpu) = false;
}
 
BUG_ON(smpboot_register_percpu_thread(_stop_threads));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -tip] fix race between stop_two_cpus and stop_cpus

2013-10-31 Thread Rik van Riel
There is a race between stop_two_cpus, and the global stop_cpus.

It is possible for two CPUs to get their stopper functions queued
backwards from one another, resulting in the stopper threads
getting stuck, and the system hanging. This can happen because
queuing up stoppers is not synchronized.

This patch adds synchronization between stop_cpus (a rare operation),
and stop_two_cpus.

Signed-off-by: Rik van Riel r...@redhat.com
---
Prarit is running a test with this patch. By now the kernel would have
crashed already, yet it is still going. I expect Prarit will add his
Tested-by: some time tomorrow morning.

 kernel/stop_machine.c | 43 ++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 32a6c44..46cb4c2 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -40,8 +40,10 @@ struct cpu_stopper {
 };
 
 static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
+static DEFINE_PER_CPU(bool, stop_two_cpus_queueing);
 static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
 static bool stop_machine_initialized = false;
+static bool stop_cpus_queueing = false;
 
 static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int 
nr_todo)
 {
@@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, 
cpu_stop_fn_t fn, void *
cpu_stop_init_done(done, 2);
set_state(msdata, MULTI_STOP_PREPARE);
 
+ wait_for_global:
+   /* If a global stop_cpus is queuing up stoppers, wait. */
+   while (unlikely(stop_cpus_queueing))
+   cpu_relax();
+
+   /* This CPU is queuing up stoppers. */
+   preempt_disable();
+   this_cpu_write(stop_two_cpus_queueing, true);
+   smp_mb(); /* matches the smp_wmb in queue_stop_cpus_work */
+
+   /* Global stop_cpus got busy simultaneously. Wait and retry. */
+   if (unlikely(stop_cpus_queueing)) {
+   smp_mb(); /* matches the smp_wmb in queue_stop_cpus_work */
+   this_cpu_write(stop_two_cpus_queueing, false);
+   preempt_enable();
+   goto wait_for_global;
+   }
+
/*
 * Queuing needs to be done by the lowest numbered CPU, to ensure
 * that works are always queued in the same order on every CPU.
 * This prevents deadlocks.
 */
call_cpu = min(cpu1, cpu2);
-
smp_call_function_single(call_cpu, irq_cpu_stop_queue_work,
 call_args, 0);
 
+   smp_wmb(); /* matches the smp_mb in wait_on_stop_two_cpus */
+   this_cpu_write(stop_two_cpus_queueing, false);
+   preempt_enable();
+
wait_for_completion(done.completion);
return done.executed ? done.ret : -ENOENT;
 }
@@ -295,6 +318,19 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t 
fn, void *arg,
cpu_stop_queue_work(cpu, work_buf);
 }
 
+static void wait_on_stop_two_cpus(const struct cpumask *cpumask)
+{
+   int cpu;
+
+   /* Do not reorder reads before this point */
+   smp_mb(); /* matches the smp_wmb in stop_two_cpus */
+   
+   /* Wait until no stop_two_cpus stopper tasks are being queued */
+   for_each_cpu(cpu, cpumask)
+   while (per_cpu(stop_two_cpus_queueing, cpu) == true)
+   cpu_relax();
+}
+
 /* static data for stop_cpus */
 static DEFINE_MUTEX(stop_cpus_mutex);
 static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);
@@ -320,8 +356,12 @@ static void queue_stop_cpus_work(const struct cpumask 
*cpumask,
 * to enter @fn which can lead to deadlock.
 */
preempt_disable();
+   stop_cpus_queueing = true;
+   wait_on_stop_two_cpus(cpumask);
for_each_cpu(cpu, cpumask)
cpu_stop_queue_work(cpu, per_cpu(stop_cpus_work, cpu));
+   smp_wmb(); /* matches the smp_mb in stop_two_cpus */
+   stop_cpus_queueing = false;
preempt_enable();
 }
 
@@ -509,6 +549,7 @@ static int __init cpu_stop_init(void)
 
spin_lock_init(stopper-lock);
INIT_LIST_HEAD(stopper-works);
+   per_cpu(stop_two_cpus_queueing, cpu) = false;
}
 
BUG_ON(smpboot_register_percpu_thread(cpu_stop_threads));
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/