Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-08-06 Thread Pavan Kondeti
Hi Prasad,

On Wed, Aug 01, 2018 at 01:07:03AM -0700, Sodagudi Prasad wrote:
> On 2018-07-30 14:07, Peter Zijlstra wrote:
> >On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
> >>How about including below change as well?  Currently, there is
> >>no way to
> >>identify thread migrations completed or not.  When we observe
> >>this issue,
> >>the symptom was work queue lock up. It is better to have some
> >>timeout here
> >>and induce the bug_on.
> >
> >You'd trigger the soft-lockup or hung-task detector I think. And
> >if not,
> >we ought to look at making it trigger at least one of those.
> >
> >>There is no way to identify the migration threads stuck or not.
> >
> >Should be pretty obvious from the splat generated by the above, no?
> Hi Peter and Thomas,
> 
> Thanks for your support.
> I have another question on this flow and retry mechanism used in
> this cpu_stop_queue_two_works() function using the global variable
> stop_cpus_in_progress.
> 
> This variable is getting used in various paths, such as task
> migration, set task affinity, and CPU hotplug.
> 
> For example cpu hotplug path, stop_cpus_in_progress variable getting
> set with true with out checking.
> takedown_cpu()
> --stop_machine_cpuslocked()
> ---stop_cpus()
> ---__stop_cpus()
> queue_stop_cpus_work()
> setting stop_cpus_in_progress to true directly.
> 
> But in the task migration path only, the stop_cpus_in_progress
> variable is used for retry.
> 
> I am thinking that stop_cpus_in_progress variable lead race
> conditions, where CPU hotplug and task migration happening
> simultaneously. Please correct me If my understanding wrong.
> 

The stop_cpus_in_progress variable is to guard against out of order queuing.
The stopper locks does not protect this when cpu_stop_queue_two_works() and
stop_cpus() are executing in parallel.

stop_one_cpu_{nowait} functions are called to handle affinity change and
load balance. Since we are queuing the work only on 1 CPU,
stop_cpus_in_progress variable protection is not needed.

Thanks,
Pavan
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.



Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-08-06 Thread Pavan Kondeti
Hi Prasad,

On Wed, Aug 01, 2018 at 01:07:03AM -0700, Sodagudi Prasad wrote:
> On 2018-07-30 14:07, Peter Zijlstra wrote:
> >On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
> >>How about including below change as well?  Currently, there is
> >>no way to
> >>identify thread migrations completed or not.  When we observe
> >>this issue,
> >>the symptom was work queue lock up. It is better to have some
> >>timeout here
> >>and induce the bug_on.
> >
> >You'd trigger the soft-lockup or hung-task detector I think. And
> >if not,
> >we ought to look at making it trigger at least one of those.
> >
> >>There is no way to identify the migration threads stuck or not.
> >
> >Should be pretty obvious from the splat generated by the above, no?
> Hi Peter and Thomas,
> 
> Thanks for your support.
> I have another question on this flow and retry mechanism used in
> this cpu_stop_queue_two_works() function using the global variable
> stop_cpus_in_progress.
> 
> This variable is getting used in various paths, such as task
> migration, set task affinity, and CPU hotplug.
> 
> For example cpu hotplug path, stop_cpus_in_progress variable getting
> set with true with out checking.
> takedown_cpu()
> --stop_machine_cpuslocked()
> ---stop_cpus()
> ---__stop_cpus()
> queue_stop_cpus_work()
> setting stop_cpus_in_progress to true directly.
> 
> But in the task migration path only, the stop_cpus_in_progress
> variable is used for retry.
> 
> I am thinking that stop_cpus_in_progress variable lead race
> conditions, where CPU hotplug and task migration happening
> simultaneously. Please correct me If my understanding wrong.
> 

The stop_cpus_in_progress variable is to guard against out of order queuing.
The stopper locks does not protect this when cpu_stop_queue_two_works() and
stop_cpus() are executing in parallel.

stop_one_cpu_{nowait} functions are called to handle affinity change and
load balance. Since we are queuing the work only on 1 CPU,
stop_cpus_in_progress variable protection is not needed.

Thanks,
Pavan
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.



Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-08-01 Thread Sodagudi Prasad

On 2018-07-30 14:07, Peter Zijlstra wrote:

On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
How about including below change as well?  Currently, there is no way 
to
identify thread migrations completed or not.  When we observe this 
issue,
the symptom was work queue lock up. It is better to have some timeout 
here

and induce the bug_on.


You'd trigger the soft-lockup or hung-task detector I think. And if 
not,

we ought to look at making it trigger at least one of those.


There is no way to identify the migration threads stuck or not.


Should be pretty obvious from the splat generated by the above, no?

Hi Peter and Thomas,

Thanks for your support.
I have another question on this flow and retry mechanism used in this 
cpu_stop_queue_two_works() function using the global variable 
stop_cpus_in_progress.


This variable is getting used in various paths, such as task migration, 
set task affinity, and CPU hotplug.


For example cpu hotplug path, stop_cpus_in_progress variable getting set 
with true with out checking.

takedown_cpu()
--stop_machine_cpuslocked()
---stop_cpus()
---__stop_cpus()
queue_stop_cpus_work()
setting stop_cpus_in_progress to true directly.

But in the task migration path only, the stop_cpus_in_progress  variable 
is used for retry.


I am thinking that stop_cpus_in_progress variable lead race conditions, 
where CPU hotplug and task migration happening simultaneously. Please 
correct me If my understanding wrong.


-Thanks, Prasad




--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2,

cpu_stop_fn_t fn, void *
struct cpu_stop_done done;
struct cpu_stop_work work1, work2;
struct multi_stop_data msdata;
+   int ret;

msdata = (struct multi_stop_data){
.fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2,

cpu_stop_fn_t fn, void *
if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;

-   wait_for_completion();
+   ret = wait_for_completion_timeout(,
msecs_to_jiffies(1000));
+   if (!ret)
+   BUG_ON(1);
+


That's a random timeout, which if you spuriously trigger it, will take
down your machine. That seems like a cure worse than the disease.


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-08-01 Thread Sodagudi Prasad

On 2018-07-30 14:07, Peter Zijlstra wrote:

On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
How about including below change as well?  Currently, there is no way 
to
identify thread migrations completed or not.  When we observe this 
issue,
the symptom was work queue lock up. It is better to have some timeout 
here

and induce the bug_on.


You'd trigger the soft-lockup or hung-task detector I think. And if 
not,

we ought to look at making it trigger at least one of those.


There is no way to identify the migration threads stuck or not.


Should be pretty obvious from the splat generated by the above, no?

Hi Peter and Thomas,

Thanks for your support.
I have another question on this flow and retry mechanism used in this 
cpu_stop_queue_two_works() function using the global variable 
stop_cpus_in_progress.


This variable is getting used in various paths, such as task migration, 
set task affinity, and CPU hotplug.


For example cpu hotplug path, stop_cpus_in_progress variable getting set 
with true with out checking.

takedown_cpu()
--stop_machine_cpuslocked()
---stop_cpus()
---__stop_cpus()
queue_stop_cpus_work()
setting stop_cpus_in_progress to true directly.

But in the task migration path only, the stop_cpus_in_progress  variable 
is used for retry.


I am thinking that stop_cpus_in_progress variable lead race conditions, 
where CPU hotplug and task migration happening simultaneously. Please 
correct me If my understanding wrong.


-Thanks, Prasad




--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2,

cpu_stop_fn_t fn, void *
struct cpu_stop_done done;
struct cpu_stop_work work1, work2;
struct multi_stop_data msdata;
+   int ret;

msdata = (struct multi_stop_data){
.fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2,

cpu_stop_fn_t fn, void *
if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;

-   wait_for_completion();
+   ret = wait_for_completion_timeout(,
msecs_to_jiffies(1000));
+   if (!ret)
+   BUG_ON(1);
+


That's a random timeout, which if you spuriously trigger it, will take
down your machine. That seems like a cure worse than the disease.


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Peter Zijlstra
On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
> How about including below change as well?  Currently, there is no way to
> identify thread migrations completed or not.  When we observe this issue,
> the symptom was work queue lock up. It is better to have some timeout here
> and induce the bug_on.

You'd trigger the soft-lockup or hung-task detector I think. And if not,
we ought to look at making it trigger at least one of those.

> There is no way to identify the migration threads stuck or not.

Should be pretty obvious from the splat generated by the above, no?

> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2,
> cpu_stop_fn_t fn, void *
> struct cpu_stop_done done;
> struct cpu_stop_work work1, work2;
> struct multi_stop_data msdata;
> +   int ret;
> 
> msdata = (struct multi_stop_data){
> .fn = fn,
> @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2,
> cpu_stop_fn_t fn, void *
> if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
> return -ENOENT;
> 
> -   wait_for_completion();
> +   ret = wait_for_completion_timeout(,
> msecs_to_jiffies(1000));
> +   if (!ret)
> +   BUG_ON(1);
> +

That's a random timeout, which if you spuriously trigger it, will take
down your machine. That seems like a cure worse than the disease.


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Peter Zijlstra
On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
> How about including below change as well?  Currently, there is no way to
> identify thread migrations completed or not.  When we observe this issue,
> the symptom was work queue lock up. It is better to have some timeout here
> and induce the bug_on.

You'd trigger the soft-lockup or hung-task detector I think. And if not,
we ought to look at making it trigger at least one of those.

> There is no way to identify the migration threads stuck or not.

Should be pretty obvious from the splat generated by the above, no?

> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2,
> cpu_stop_fn_t fn, void *
> struct cpu_stop_done done;
> struct cpu_stop_work work1, work2;
> struct multi_stop_data msdata;
> +   int ret;
> 
> msdata = (struct multi_stop_data){
> .fn = fn,
> @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2,
> cpu_stop_fn_t fn, void *
> if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
> return -ENOENT;
> 
> -   wait_for_completion();
> +   ret = wait_for_completion_timeout(,
> msecs_to_jiffies(1000));
> +   if (!ret)
> +   BUG_ON(1);
> +

That's a random timeout, which if you spuriously trigger it, will take
down your machine. That seems like a cure worse than the disease.


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Thomas Gleixner
On Mon, 30 Jul 2018, Sodagudi Prasad wrote:
> How about including below change as well?  Currently, there is no way to

That would be a completely separate change.

> identify thread migrations completed or not.  When we observe this issue, the
> symptom was work queue lock up. It is better to have some timeout here and
> induce the bug_on.

BUG_ON() is wrong. Why kill the thing if there is at least a theoretical
chance that stuff can continue half baken so you can get more info out of
it. The back trace is pretty much uninteresting.

Thanks,

tglx


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Thomas Gleixner
On Mon, 30 Jul 2018, Sodagudi Prasad wrote:
> How about including below change as well?  Currently, there is no way to

That would be a completely separate change.

> identify thread migrations completed or not.  When we observe this issue, the
> symptom was work queue lock up. It is better to have some timeout here and
> induce the bug_on.

BUG_ON() is wrong. Why kill the thing if there is at least a theoretical
chance that stuff can continue half baken so you can get more info out of
it. The back trace is pretty much uninteresting.

Thanks,

tglx


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Sodagudi Prasad

On 2018-07-30 05:41, Thomas Gleixner wrote:

On Mon, 30 Jul 2018, Peter Zijlstra wrote:


On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > > Hi all,
> > Hi,
> >
> > > Are there any comments about this patch?
> >
> > I haven't look in detail at this but your new preempt_disable() makes
> > things unbalanced for the err != 0 case.
>
> It doesn't but that code is really an unreadable pile of ...

---
Subject: stop_machine: Reflow cpu_stop_queue_two_works()

The code flow in cpu_stop_queue_two_works() is a little arcane; fix
this by lifting the preempt_disable() to the top to create more 
natural
nesting wrt the spinlocks and make the wake_up_q() and 
preempt_enable()

unconditional at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we
spin-wait with preemption enabled.

Suggested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 



Hi Peter/Thomas,

How about including below change as well?  Currently, there is no way to 
identify thread migrations completed or not.  When we observe this 
issue, the symptom was work queue lock up. It is better to have some 
timeout here and induce the bug_on.


There is no way to identify the migration threads stuck or not.

--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

struct cpu_stop_done done;
struct cpu_stop_work work1, work2;
struct multi_stop_data msdata;
+   int ret;

msdata = (struct multi_stop_data){
.fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;

-   wait_for_completion();
+   ret = wait_for_completion_timeout(,  
msecs_to_jiffies(1000));

+   if (!ret)
+   BUG_ON(1);
+



Thanks for cleaning that up!

Reviewed-by: Thomas Gleixner 


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Sodagudi Prasad

On 2018-07-30 05:41, Thomas Gleixner wrote:

On Mon, 30 Jul 2018, Peter Zijlstra wrote:


On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > > Hi all,
> > Hi,
> >
> > > Are there any comments about this patch?
> >
> > I haven't look in detail at this but your new preempt_disable() makes
> > things unbalanced for the err != 0 case.
>
> It doesn't but that code is really an unreadable pile of ...

---
Subject: stop_machine: Reflow cpu_stop_queue_two_works()

The code flow in cpu_stop_queue_two_works() is a little arcane; fix
this by lifting the preempt_disable() to the top to create more 
natural
nesting wrt the spinlocks and make the wake_up_q() and 
preempt_enable()

unconditional at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we
spin-wait with preemption enabled.

Suggested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 



Hi Peter/Thomas,

How about including below change as well?  Currently, there is no way to 
identify thread migrations completed or not.  When we observe this 
issue, the symptom was work queue lock up. It is better to have some 
timeout here and induce the bug_on.


There is no way to identify the migration threads stuck or not.

--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

struct cpu_stop_done done;
struct cpu_stop_work work1, work2;
struct multi_stop_data msdata;
+   int ret;

msdata = (struct multi_stop_data){
.fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;

-   wait_for_completion();
+   ret = wait_for_completion_timeout(,  
msecs_to_jiffies(1000));

+   if (!ret)
+   BUG_ON(1);
+



Thanks for cleaning that up!

Reviewed-by: Thomas Gleixner 


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Thomas Gleixner
On Mon, 30 Jul 2018, Peter Zijlstra wrote:

> On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> > On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > > > Hi all,
> > > Hi,
> > > 
> > > > Are there any comments about this patch?
> > > 
> > > I haven't look in detail at this but your new preempt_disable() makes
> > > things unbalanced for the err != 0 case.
> > 
> > It doesn't but that code is really an unreadable pile of ...
> 
> ---
> Subject: stop_machine: Reflow cpu_stop_queue_two_works()
> 
> The code flow in cpu_stop_queue_two_works() is a little arcane; fix
> this by lifting the preempt_disable() to the top to create more natural
> nesting wrt the spinlocks and make the wake_up_q() and preempt_enable()
> unconditional at the end.
> 
> Furthermore, enable preemption in the -EDEADLK case, such that we
> spin-wait with preemption enabled.
> 
> Suggested-by: Thomas Gleixner 
> Signed-off-by: Peter Zijlstra (Intel) 

Thanks for cleaning that up!

Reviewed-by: Thomas Gleixner 


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Thomas Gleixner
On Mon, 30 Jul 2018, Peter Zijlstra wrote:

> On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> > On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > > > Hi all,
> > > Hi,
> > > 
> > > > Are there any comments about this patch?
> > > 
> > > I haven't look in detail at this but your new preempt_disable() makes
> > > things unbalanced for the err != 0 case.
> > 
> > It doesn't but that code is really an unreadable pile of ...
> 
> ---
> Subject: stop_machine: Reflow cpu_stop_queue_two_works()
> 
> The code flow in cpu_stop_queue_two_works() is a little arcane; fix
> this by lifting the preempt_disable() to the top to create more natural
> nesting wrt the spinlocks and make the wake_up_q() and preempt_enable()
> unconditional at the end.
> 
> Furthermore, enable preemption in the -EDEADLK case, such that we
> spin-wait with preemption enabled.
> 
> Suggested-by: Thomas Gleixner 
> Signed-off-by: Peter Zijlstra (Intel) 

Thanks for cleaning that up!

Reviewed-by: Thomas Gleixner 


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Peter Zijlstra
On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > > Hi all,
> > Hi,
> > 
> > > Are there any comments about this patch?
> > 
> > I haven't look in detail at this but your new preempt_disable() makes
> > things unbalanced for the err != 0 case.
> 
> It doesn't but that code is really an unreadable pile of ...

---
Subject: stop_machine: Reflow cpu_stop_queue_two_works()

The code flow in cpu_stop_queue_two_works() is a little arcane; fix
this by lifting the preempt_disable() to the top to create more natural
nesting wrt the spinlocks and make the wake_up_q() and preempt_enable()
unconditional at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we
spin-wait with preemption enabled.

Suggested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/stop_machine.c | 41 +++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1ef3a23..34b6652e8677 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -236,13 +236,24 @@ static int cpu_stop_queue_two_works(int cpu1, struct 
cpu_stop_work *work1,
struct cpu_stopper *stopper2 = per_cpu_ptr(_stopper, cpu2);
DEFINE_WAKE_Q(wakeq);
int err;
+
 retry:
+   /*
+* The waking up of stopper threads has to happen in the same
+* scheduling context as the queueing.  Otherwise, there is a
+* possibility of one of the above stoppers being woken up by another
+* CPU, and preempting us. This will cause us to not wake up the other
+* stopper forever.
+*/
+   preempt_disable();
raw_spin_lock_irq(>lock);
raw_spin_lock_nested(>lock, SINGLE_DEPTH_NESTING);
 
-   err = -ENOENT;
-   if (!stopper1->enabled || !stopper2->enabled)
+   if (!stopper1->enabled || !stopper2->enabled) {
+   err = -ENOENT;
goto unlock;
+   }
+
/*
 * Ensure that if we race with __stop_cpus() the stoppers won't get
 * queued up in reverse order leading to system deadlock.
@@ -253,36 +264,30 @@ static int cpu_stop_queue_two_works(int cpu1, struct 
cpu_stop_work *work1,
 * It can be falsely true but it is safe to spin until it is cleared,
 * queue_stop_cpus_work() does everything under preempt_disable().
 */
-   err = -EDEADLK;
-   if (unlikely(stop_cpus_in_progress))
-   goto unlock;
+   if (unlikely(stop_cpus_in_progress)) {
+   err = -EDEADLK;
+   goto unlock;
+   }
 
err = 0;
__cpu_stop_queue_work(stopper1, work1, );
__cpu_stop_queue_work(stopper2, work2, );
-   /*
-* The waking up of stopper threads has to happen
-* in the same scheduling context as the queueing.
-* Otherwise, there is a possibility of one of the
-* above stoppers being woken up by another CPU,
-* and preempting us. This will cause us to n ot
-* wake up the other stopper forever.
-*/
-   preempt_disable();
+
 unlock:
raw_spin_unlock(>lock);
raw_spin_unlock_irq(>lock);
 
if (unlikely(err == -EDEADLK)) {
+   preempt_enable();
+
while (stop_cpus_in_progress)
cpu_relax();
+
goto retry;
}
 
-   if (!err) {
-   wake_up_q();
-   preempt_enable();
-   }
+   wake_up_q();
+   preempt_enable();
 
return err;
 }


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Peter Zijlstra
On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > > Hi all,
> > Hi,
> > 
> > > Are there any comments about this patch?
> > 
> > I haven't look in detail at this but your new preempt_disable() makes
> > things unbalanced for the err != 0 case.
> 
> It doesn't but that code is really an unreadable pile of ...

---
Subject: stop_machine: Reflow cpu_stop_queue_two_works()

The code flow in cpu_stop_queue_two_works() is a little arcane; fix
this by lifting the preempt_disable() to the top to create more natural
nesting wrt the spinlocks and make the wake_up_q() and preempt_enable()
unconditional at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we
spin-wait with preemption enabled.

Suggested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/stop_machine.c | 41 +++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1ef3a23..34b6652e8677 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -236,13 +236,24 @@ static int cpu_stop_queue_two_works(int cpu1, struct 
cpu_stop_work *work1,
struct cpu_stopper *stopper2 = per_cpu_ptr(_stopper, cpu2);
DEFINE_WAKE_Q(wakeq);
int err;
+
 retry:
+   /*
+* The waking up of stopper threads has to happen in the same
+* scheduling context as the queueing.  Otherwise, there is a
+* possibility of one of the above stoppers being woken up by another
+* CPU, and preempting us. This will cause us to not wake up the other
+* stopper forever.
+*/
+   preempt_disable();
raw_spin_lock_irq(>lock);
raw_spin_lock_nested(>lock, SINGLE_DEPTH_NESTING);
 
-   err = -ENOENT;
-   if (!stopper1->enabled || !stopper2->enabled)
+   if (!stopper1->enabled || !stopper2->enabled) {
+   err = -ENOENT;
goto unlock;
+   }
+
/*
 * Ensure that if we race with __stop_cpus() the stoppers won't get
 * queued up in reverse order leading to system deadlock.
@@ -253,36 +264,30 @@ static int cpu_stop_queue_two_works(int cpu1, struct 
cpu_stop_work *work1,
 * It can be falsely true but it is safe to spin until it is cleared,
 * queue_stop_cpus_work() does everything under preempt_disable().
 */
-   err = -EDEADLK;
-   if (unlikely(stop_cpus_in_progress))
-   goto unlock;
+   if (unlikely(stop_cpus_in_progress)) {
+   err = -EDEADLK;
+   goto unlock;
+   }
 
err = 0;
__cpu_stop_queue_work(stopper1, work1, );
__cpu_stop_queue_work(stopper2, work2, );
-   /*
-* The waking up of stopper threads has to happen
-* in the same scheduling context as the queueing.
-* Otherwise, there is a possibility of one of the
-* above stoppers being woken up by another CPU,
-* and preempting us. This will cause us to n ot
-* wake up the other stopper forever.
-*/
-   preempt_disable();
+
 unlock:
raw_spin_unlock(>lock);
raw_spin_unlock_irq(>lock);
 
if (unlikely(err == -EDEADLK)) {
+   preempt_enable();
+
while (stop_cpus_in_progress)
cpu_relax();
+
goto retry;
}
 
-   if (!err) {
-   wake_up_q();
-   preempt_enable();
-   }
+   wake_up_q();
+   preempt_enable();
 
return err;
 }


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Thomas Gleixner
On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > Hi all,
> Hi,
> 
> > Are there any comments about this patch?
> 
> I haven't look in detail at this but your new preempt_disable() makes
> things unbalanced for the err != 0 case.

It doesn't but that code is really an unreadable pile of ...


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Thomas Gleixner
On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > Hi all,
> Hi,
> 
> > Are there any comments about this patch?
> 
> I haven't look in detail at this but your new preempt_disable() makes
> things unbalanced for the err != 0 case.

It doesn't but that code is really an unreadable pile of ...


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-24 Thread isaacm

Hi Sebastian,

Thanks for the response.

"I haven't look in detail at this but your new preempt_disable() makes
things unbalanced for the err != 0 case."

This cannot happen. The only possible return values of this function
are -ENOENT or 0.

In the case where we return -ENOENT, we'll go
straight to "unlock," which releases the two locks being held, but
doesn't disable preemption, and since err != we won't call
preemption_enable.

In the case where we return 0, then that means the works were queued
successfully, and preemption was disabled, and we'll fall into the
if branch, after releasing the locks, and enable preemption, which is
correct.

In either case, there is no imbalance between the 
preemption_[disable/enable]

calls.

Thanks,
Isaac Manjarres

On 2018-07-23 23:23, Sebastian Andrzej Siewior wrote:

On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:

Hi all,

Hi,


Are there any comments about this patch?


I haven't look in detail at this but your new preempt_disable() makes
things unbalanced for the err != 0 case.


Thanks,
Isaac Manjarres


Sebastian


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-24 Thread isaacm

Hi Sebastian,

Thanks for the response.

"I haven't look in detail at this but your new preempt_disable() makes
things unbalanced for the err != 0 case."

This cannot happen. The only possible return values of this function
are -ENOENT or 0.

In the case where we return -ENOENT, we'll go
straight to "unlock," which releases the two locks being held, but
doesn't disable preemption, and since err != we won't call
preemption_enable.

In the case where we return 0, then that means the works were queued
successfully, and preemption was disabled, and we'll fall into the
if branch, after releasing the locks, and enable preemption, which is
correct.

In either case, there is no imbalance between the 
preemption_[disable/enable]

calls.

Thanks,
Isaac Manjarres

On 2018-07-23 23:23, Sebastian Andrzej Siewior wrote:

On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:

Hi all,

Hi,


Are there any comments about this patch?


I haven't look in detail at this but your new preempt_disable() makes
things unbalanced for the err != 0 case.


Thanks,
Isaac Manjarres


Sebastian


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-24 Thread Sebastian Andrzej Siewior
On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> Hi all,
Hi,

> Are there any comments about this patch?

I haven't look in detail at this but your new preempt_disable() makes
things unbalanced for the err != 0 case.

> Thanks,
> Isaac Manjarres

Sebastian


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-24 Thread Sebastian Andrzej Siewior
On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> Hi all,
Hi,

> Are there any comments about this patch?

I haven't look in detail at this but your new preempt_disable() makes
things unbalanced for the err != 0 case.

> Thanks,
> Isaac Manjarres

Sebastian


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-23 Thread isaacm

Hi all,

Are there any comments about this patch?

Thanks,
Isaac Manjarres
On 2018-07-17 12:35, Isaac J. Manjarres wrote:

This commit:

9fb8d5dc4b64 ("stop_machine, Disable preemption when
waking two stopper threads")

does not fully address the race condition that can occur
as follows:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two
stopper threads")
Co-Developed-by: Prasad Sodagudi 
Co-Developed-by: Pavankumar Kondeti 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Prasad Sodagudi 
Signed-off-by: Pavankumar Kondeti 
Cc: sta...@vger.kernel.org
---
 kernel/stop_machine.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 1ff523d..e190d1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1,
struct cpu_stop_work *work1,
err = 0;
__cpu_stop_queue_work(stopper1, work1, );
__cpu_stop_queue_work(stopper2, work2, );
+   /*
+* The waking up of stopper threads has to happen
+* in the same scheduling context as the queueing.
+* Otherwise, there is a possibility of one of the
+* above stoppers being woken up by another CPU,
+* and preempting us. This will cause us to n ot
+* wake up the other stopper forever.
+*/
+   preempt_disable();
 unlock:
raw_spin_unlock(>lock);
raw_spin_unlock_irq(>lock);
@@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1,
struct cpu_stop_work *work1,
}

if (!err) {
-   preempt_disable();
wake_up_q();
preempt_enable();
}


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-23 Thread isaacm

Hi all,

Are there any comments about this patch?

Thanks,
Isaac Manjarres
On 2018-07-17 12:35, Isaac J. Manjarres wrote:

This commit:

9fb8d5dc4b64 ("stop_machine, Disable preemption when
waking two stopper threads")

does not fully address the race condition that can occur
as follows:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two
stopper threads")
Co-Developed-by: Prasad Sodagudi 
Co-Developed-by: Pavankumar Kondeti 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Prasad Sodagudi 
Signed-off-by: Pavankumar Kondeti 
Cc: sta...@vger.kernel.org
---
 kernel/stop_machine.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 1ff523d..e190d1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1,
struct cpu_stop_work *work1,
err = 0;
__cpu_stop_queue_work(stopper1, work1, );
__cpu_stop_queue_work(stopper2, work2, );
+   /*
+* The waking up of stopper threads has to happen
+* in the same scheduling context as the queueing.
+* Otherwise, there is a possibility of one of the
+* above stoppers being woken up by another CPU,
+* and preempting us. This will cause us to n ot
+* wake up the other stopper forever.
+*/
+   preempt_disable();
 unlock:
raw_spin_unlock(>lock);
raw_spin_unlock_irq(>lock);
@@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1,
struct cpu_stop_work *work1,
}

if (!err) {
-   preempt_disable();
wake_up_q();
preempt_enable();
}


[PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-17 Thread Isaac J. Manjarres
This commit:

9fb8d5dc4b64 ("stop_machine, Disable preemption when
waking two stopper threads")

does not fully address the race condition that can occur
as follows:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper 
threads")
Co-Developed-by: Prasad Sodagudi 
Co-Developed-by: Pavankumar Kondeti 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Prasad Sodagudi 
Signed-off-by: Pavankumar Kondeti 
Cc: sta...@vger.kernel.org
---
 kernel/stop_machine.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 1ff523d..e190d1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1, struct 
cpu_stop_work *work1,
err = 0;
__cpu_stop_queue_work(stopper1, work1, );
__cpu_stop_queue_work(stopper2, work2, );
+   /*
+* The waking up of stopper threads has to happen
+* in the same scheduling context as the queueing.
+* Otherwise, there is a possibility of one of the
+* above stoppers being woken up by another CPU,
+* and preempting us. This will cause us to n ot
+* wake up the other stopper forever.
+*/
+   preempt_disable();
 unlock:
raw_spin_unlock(>lock);
raw_spin_unlock_irq(>lock);
@@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1, struct 
cpu_stop_work *work1,
}
 
if (!err) {
-   preempt_disable();
wake_up_q();
preempt_enable();
}
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-17 Thread Isaac J. Manjarres
This commit:

9fb8d5dc4b64 ("stop_machine, Disable preemption when
waking two stopper threads")

does not fully address the race condition that can occur
as follows:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper 
threads")
Co-Developed-by: Prasad Sodagudi 
Co-Developed-by: Pavankumar Kondeti 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Prasad Sodagudi 
Signed-off-by: Pavankumar Kondeti 
Cc: sta...@vger.kernel.org
---
 kernel/stop_machine.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 1ff523d..e190d1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1, struct 
cpu_stop_work *work1,
err = 0;
__cpu_stop_queue_work(stopper1, work1, );
__cpu_stop_queue_work(stopper2, work2, );
+   /*
+* The waking up of stopper threads has to happen
+* in the same scheduling context as the queueing.
+* Otherwise, there is a possibility of one of the
+* above stoppers being woken up by another CPU,
+* and preempting us. This will cause us to n ot
+* wake up the other stopper forever.
+*/
+   preempt_disable();
 unlock:
raw_spin_unlock(>lock);
raw_spin_unlock_irq(>lock);
@@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1, struct 
cpu_stop_work *work1,
}
 
if (!err) {
-   preempt_disable();
wake_up_q();
preempt_enable();
}
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project