Re: [PATCH RT] nvdimm: make lane acquirement RT aware

2019-03-28 Thread Sebastian Andrzej Siewior
On 2019-03-18 11:48:28 [+], Liu, Yongxin wrote:
> 
> > 
> > Bummer. That would dead lock indeed.
> > Is it easily possible to recognize the recursive case?
> 
> Not easily. I don't have test case for recursive call. 
> For now, just code analysis.

So I've been playing with qemu's nvdimm device. So I *think* the
recursive case is here not possible because qemu only supports pmem
while it would require the blk mode to trigger it. It is just a wild
guess…

On top of qemu's nvdimm device I can create a block device via
ndctl create-namespace  namespace0.0 --mode=sector

and then I trigger the code path in question.

I would *really* prefer to understand the recursive case and avoid it.
That way the recursive case is explicitly known and uses another path.
The lock can then be always acquired which gives you always lockdep
coverage (which is now missing unless you have more LANEs than CPUs).

The local_lock thingy is completely unneeded: a simple get_cpu_light()
would do the job.

> Yongxin

Sebastian
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [PATCH RT] nvdimm: make lane acquirement RT aware

2019-03-18 Thread Liu, Yongxin


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior
> Sent: Monday, March 18, 2019 19:40
> To: Liu, Yongxin
> Cc: linux-ker...@vger.kernel.org; linux-rt-us...@vger.kernel.org;
> t...@linutronix.de; rost...@goodmis.org; dan.j.willi...@intel.com;
> pagu...@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org
> Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware
> 
> On 2019-03-18 01:41:10 [+], Liu, Yongxin wrote:
> >
> > Consider the recursive call to nd_region_acquire_lane() in the
> following situation.
> > Will there be a dead lock?
> >
> >
> > Thread AThread B
> >|   |
> >|   |
> >  CPU 1   CPU 2
> >|   |
> >|   |
> >  get lock for Lane 1 get lock for Lane 2
> >|   |
> >|   |
> >  migrate to CPU 2migrate to CPU 1
> >|   |
> >|   |
> >  wait lock for Lane 2wait lock for Lane 1
> >|   |
> >|   |
> >_
> >|
> > dead lock ?
> 
> Bummer. That would dead lock indeed.
> Is it easily possible to recognize the recursive case?

Not easily. I don't have test case for recursive call. 
For now, just code analysis.


Yongxin

> >
> > Thanks,
> > Yognxin
> 
> Sebastian
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH RT] nvdimm: make lane acquirement RT aware

2019-03-15 Thread Sebastian Andrzej Siewior
On 2019-03-11 00:44:58 [+], Liu, Yongxin wrote:
> > but you still have the ndl_lock->lock which protects the resource. So in
> > the unlikely (but possible event) that you switch CPUs after obtaining
> > the CPU number you block on the lock. No harm is done, right?
> 
> The resource "lane" can be acquired recursively, so "ndl_lock->lock" is a 
> conditional lock.
> 
> ndl_count->count is per CPU.
> ndl_lock->lock is per lane.
> 
> Here is an example:
> Thread A  on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> get 
> "ndl_lock->lock"
> --> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" due to 
> "ndl_count->count++".
> 
> Thread B on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> bypass 
> "ndl_lock->lock" ("ndl_count->count"
> was changed by Thread A)
> 
> If we use raw_smp_processor_id(), no matter which CPU the thread was migrated 
> to, 
> if there is another thread running on the old CPU, there will be race 
> condition 
> due to per CPU variable "ndl_count->count".

so I've been looking at it again. The recursive locking could have been
solved better. Like the local_lock() on -RT is doing it.
Given that you lock with preempt_disable() there should be no in-IRQ
usage.
But in the "nd_region->num_lanes >= nr_cpu_ids" case you don't take any
locks. That would be a problem with raw_smp_processor_id() approach.

So what about the completely untested patch here:

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 379bf4305e615..98c2e9df4b2e4 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -109,7 +109,8 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata *ndd);
res; res = next, next = next ? next->sibling : NULL)
 
 struct nd_percpu_lane {
-   int count;
+   struct task_struct *owner;
+   int nestcnt;
spinlock_t lock;
 };
 
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e2818f94f2928..8a62f9833513f 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -946,19 +946,17 @@ int nd_blk_region_init(struct nd_region *nd_region)
  */
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
 {
+   struct nd_percpu_lane *ndl_lock;
unsigned int cpu, lane;
 
-   cpu = get_cpu();
-   if (nd_region->num_lanes < nr_cpu_ids) {
-   struct nd_percpu_lane *ndl_lock, *ndl_count;
-
-   lane = cpu % nd_region->num_lanes;
-   ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-   ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-   if (ndl_count->count++ == 0)
-   spin_lock(_lock->lock);
-   } else
-   lane = cpu;
+   cpu = raw_smp_processor_id();
+   lane = cpu % nd_region->num_lanes;
+   ndl_lock  = per_cpu_ptr(nd_region->lane, lane);
+   if (ndl_lock->owner != current) {
+   spin_lock(_lock->lock);
+   ndl_lock->owner = current;
+   }
+   ndl_lock->nestcnt++;
 
return lane;
 }
@@ -966,17 +964,16 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
 
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
 {
-   if (nd_region->num_lanes < nr_cpu_ids) {
-   unsigned int cpu = get_cpu();
-   struct nd_percpu_lane *ndl_lock, *ndl_count;
+   struct nd_percpu_lane *ndl_lock;
 
-   ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-   ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-   if (--ndl_count->count == 0)
-   spin_unlock(_lock->lock);
-   put_cpu();
-   }
-   put_cpu();
+   ndl_lock = per_cpu_ptr(nd_region->lane, lane);
+   WARN_ON(ndl_lock->nestcnt == 0);
+   WARN_ON(ndl_lock->owner != current);
+   if (--ndl_lock->nestcnt)
+   return;
+
+   ndl_lock->owner = NULL;
+   spin_unlock(_lock->lock);
 }
 EXPORT_SYMBOL(nd_region_release_lane);
 
@@ -1042,7 +1039,8 @@ static struct nd_region *nd_region_create(struct 
nvdimm_bus *nvdimm_bus,
 
ndl = per_cpu_ptr(nd_region->lane, i);
spin_lock_init(>lock);
-   ndl->count = 0;
+   ndl->owner = NULL;
+   ndl->nestcnt = 0;
}
 
for (i = 0; i < ndr_desc->num_mappings; i++) {

> Thanks,
> Yongxin

Sebastian
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [PATCH RT] nvdimm: make lane acquirement RT aware

2019-03-10 Thread Liu, Yongxin


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior
> Sent: Friday, March 8, 2019 17:42
> To: Liu, Yongxin
> Cc: linux-ker...@vger.kernel.org; linux-rt-us...@vger.kernel.org;
> t...@linutronix.de; rost...@goodmis.org; dan.j.willi...@intel.com;
> pagu...@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org
> Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware
> 
> On 2019-03-08 00:07:41 [+], Liu, Yongxin wrote:
> > The lane is critical resource which needs to be protected. One CPU can
> use only one
> > lane. If CPU number is greater than the number of total lane, the lane
> can be shared
> > among CPUs.
> >
> > In non-RT kernel, get_cpu() disable preemption by calling
> preempt_disable() first.
> > Only one thread on the same CPU can get the lane.
> >
> > In RT kernel, if we only use raw_smp_processor_id(), this doesn't
> protect the lane.
> > Thus two threads on the same CPU can get the same lane at the same time.
> >
> > In this patch, two-level lock can avoid race condition for the lane.
> 
> but you still have the ndl_lock->lock which protects the resource. So in
> the unlikely (but possible event) that you switch CPUs after obtaining
> the CPU number you block on the lock. No harm is done, right?

The resource "lane" can be acquired recursively, so "ndl_lock->lock" is a 
conditional lock.

ndl_count->count is per CPU.
ndl_lock->lock is per lane.

Here is an example:
Thread A  on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> get 
"ndl_lock->lock"
--> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" due to 
"ndl_count->count++".

Thread B on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> bypass 
"ndl_lock->lock" ("ndl_count->count"
was changed by Thread A)

If we use raw_smp_processor_id(), no matter which CPU the thread was migrated 
to, 
if there is another thread running on the old CPU, there will be race condition 
due to per CPU variable "ndl_count->count".


Thanks,
Yongxin

> 
> Sebastian
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH RT] nvdimm: make lane acquirement RT aware

2019-03-08 Thread Sebastian Andrzej Siewior
On 2019-03-08 00:07:41 [+], Liu, Yongxin wrote:
> The lane is critical resource which needs to be protected. One CPU can use 
> only one
> lane. If CPU number is greater than the number of total lane, the lane can be 
> shared
> among CPUs. 
> 
> In non-RT kernel, get_cpu() disable preemption by calling preempt_disable() 
> first.
> Only one thread on the same CPU can get the lane.
> 
> In RT kernel, if we only use raw_smp_processor_id(), this doesn't protect the 
> lane. 
> Thus two threads on the same CPU can get the same lane at the same time.
> 
> In this patch, two-level lock can avoid race condition for the lane.

but you still have the ndl_lock->lock which protects the resource. So in
the unlikely (but possible event) that you switch CPUs after obtaining
the CPU number you block on the lock. No harm is done, right?

> Thanks,
> Yongxin

Sebastian
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH RT] nvdimm: make lane acquirement RT aware

2019-03-07 Thread Pankaj Gupta


> Currently, nvdimm driver isn't RT compatible.
> nd_region_acquire_lane() disables preemption with get_cpu() which
> causes "scheduling while atomic" spews on RT, when using fio to test
> pmem as block device.
> 
> In this change, we replace get_cpu/put_cpu with local_lock_cpu/
> local_unlock_cpu, and introduce per CPU variable "ndl_local_lock".
> Due to preemption on RT, this lock can avoid race condition for the
> same lane on the same CPU. When CPU number is greater than the lane
> number, lane can be shared among CPUs. "ndl_lock->lock" is used to
> protect the lane in this situation.
> 
> This patch is derived from Dan Williams and Pankaj Gupta's proposal from
> https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg13359.html
> and https://www.spinics.net/lists/linux-rt-users/msg20280.html.
> Many thanks to them.
> 
> Cc: Dan Williams 
> Cc: Pankaj Gupta 
> Cc: linux-rt-users 
> Cc: linux-nvdimm 
> Signed-off-by: Yongxin Liu 

This patch looks good to me.

Acked-by: Pankaj Gupta 

> ---
>  drivers/nvdimm/region_devs.c | 40 +++-
>  1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index fa37afcd43ff..6c5388cf2477 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -18,9 +18,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "nd-core.h"
>  #include "nd.h"
>  
> +/* lock for tasks on the same CPU to sequence the access to the lane */
> +static DEFINE_LOCAL_IRQ_LOCK(ndl_local_lock);
> +
>  /*
>   * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
>   * irrelevant.
> @@ -935,18 +939,15 @@ int nd_blk_region_init(struct nd_region *nd_region)
>  unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
>  {
>   unsigned int cpu, lane;
> + struct nd_percpu_lane *ndl_lock, *ndl_count;
>  
> - cpu = get_cpu();
> - if (nd_region->num_lanes < nr_cpu_ids) {
> - struct nd_percpu_lane *ndl_lock, *ndl_count;
> + cpu = local_lock_cpu(ndl_local_lock);
>  
> - lane = cpu % nd_region->num_lanes;
> - ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> - ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> - if (ndl_count->count++ == 0)
> - spin_lock(_lock->lock);
> - } else
> - lane = cpu;
> + lane = cpu % nd_region->num_lanes;
> + ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> + ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> + if (ndl_count->count++ == 0)
> + spin_lock(_lock->lock);
>  
>   return lane;
>  }
> @@ -954,17 +955,14 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
>  
>  void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
>  {
> - if (nd_region->num_lanes < nr_cpu_ids) {
> - unsigned int cpu = get_cpu();
> - struct nd_percpu_lane *ndl_lock, *ndl_count;
> -
> - ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> - ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> - if (--ndl_count->count == 0)
> - spin_unlock(_lock->lock);
> - put_cpu();
> - }
> - put_cpu();
> + struct nd_percpu_lane *ndl_lock, *ndl_count;
> + unsigned int cpu = smp_processor_id();
> +
> + ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> + ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> + if (--ndl_count->count == 0)
> + spin_unlock(_lock->lock);
> + local_unlock_cpu(ndl_local_lock);
>  }
>  EXPORT_SYMBOL(nd_region_release_lane);
>  
> --
> 2.14.4
> 
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [PATCH RT] nvdimm: make lane acquirement RT aware

2019-03-07 Thread Liu, Yongxin
> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior
> Sent: Thursday, March 7, 2019 22:34
> To: Liu, Yongxin
> Cc: linux-ker...@vger.kernel.org; linux-rt-us...@vger.kernel.org;
> t...@linutronix.de; rost...@goodmis.org; dan.j.willi...@intel.com;
> pagu...@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org
> Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware
> 
> On 2019-03-06 17:57:09 [+0800], Yongxin Liu wrote:
> > In this change, we replace get_cpu/put_cpu with local_lock_cpu/
> > local_unlock_cpu, and introduce per CPU variable "ndl_local_lock".
> > Due to preemption on RT, this lock can avoid race condition for the
> > same lane on the same CPU. When CPU number is greater than the lane
> > number, lane can be shared among CPUs. "ndl_lock->lock" is used to
> > protect the lane in this situation.
> 
> so what was the reason that get_cpu() can't be replaced with
> raw_smp_processor_id()?
> 
> Sebastian

The lane is critical resource which needs to be protected. One CPU can use only 
one
lane. If CPU number is greater than the number of total lane, the lane can be 
shared
among CPUs. 

In non-RT kernel, get_cpu() disable preemption by calling preempt_disable() 
first.
Only one thread on the same CPU can get the lane.

In RT kernel, if we only use raw_smp_processor_id(), this doesn't protect the 
lane. 
Thus two threads on the same CPU can get the same lane at the same time.

In this patch, two-level lock can avoid race condition for the lane.

  CPU A  CPU B (B == A % num_lanes)
 
task A1task A2 task B1task B2
   |  |   |  |
   |__|   |__|
|  |
   ndl_local_lock   ndl_local_lock
|  |
|__|
   |
   |
  ndl_lock->lock
   |
   |
  lane

 
Thanks,
Yongxin
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH RT] nvdimm: make lane acquirement RT aware

2019-03-07 Thread Sebastian Andrzej Siewior
On 2019-03-06 17:57:09 [+0800], Yongxin Liu wrote:
> In this change, we replace get_cpu/put_cpu with local_lock_cpu/
> local_unlock_cpu, and introduce per CPU variable "ndl_local_lock".
> Due to preemption on RT, this lock can avoid race condition for the
> same lane on the same CPU. When CPU number is greater than the lane
> number, lane can be shared among CPUs. "ndl_lock->lock" is used to
> protect the lane in this situation.

so what was the reason that get_cpu() can't be replaced with
raw_smp_processor_id()?

Sebastian
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH RT] nvdimm: make lane acquirement RT aware

2019-03-06 Thread Dan Williams
On Wed, Mar 6, 2019 at 2:05 AM Yongxin Liu  wrote:
>
> Currently, nvdimm driver isn't RT compatible.
> nd_region_acquire_lane() disables preemption with get_cpu() which
> causes "scheduling while atomic" spews on RT, when using fio to test
> pmem as block device.
>
> In this change, we replace get_cpu/put_cpu with local_lock_cpu/
> local_unlock_cpu, and introduce per CPU variable "ndl_local_lock".
> Due to preemption on RT, this lock can avoid race condition for the
> same lane on the same CPU. When CPU number is greater than the lane
> number, lane can be shared among CPUs. "ndl_lock->lock" is used to
> protect the lane in this situation.
>
> This patch is derived from Dan Williams and Pankaj Gupta's proposal from
> https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg13359.html
> and https://www.spinics.net/lists/linux-rt-users/msg20280.html.
> Many thanks to them.
>
> Cc: Dan Williams 
> Cc: Pankaj Gupta 
> Cc: linux-rt-users 
> Cc: linux-nvdimm 
> Signed-off-by: Yongxin Liu 

Looks ok to me in concept.

Acked-by: Dan Williams 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm