Re: [PATCH RT] nvdimm: make lane acquirement RT aware
On 2019-03-18 11:48:28 [+], Liu, Yongxin wrote: > > > > > Bummer. That would dead lock indeed. > > Is it easily possible to recognize the recursive case? > > Not easily. I don't have test case for recursive call. > For now, just code analysis. So I've been playing with qemu's nvdimm device. So I *think* the recursive case is here not possible because qemu only supports pmem while it would require the blk mode to trigger it. It is just a wild guess… On top of qemu's nvdimm device I can create a block device via ndctl create-namespace namespace0.0 --mode=sector and then I trigger the code path in question. I would *really* prefer to understand the recursive case and avoid it. That way the recursive case is explicitly known and uses another path. The lock can then be always acquired which gives you always lockdep coverage (which is now missing unless you have more LANEs than CPUs). The local_lock thingy is completely unneeded: a simple get_cpu_light() would do the job. > Yongxin Sebastian ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
RE: [PATCH RT] nvdimm: make lane acquirement RT aware
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > ow...@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior > Sent: Monday, March 18, 2019 19:40 > To: Liu, Yongxin > Cc: linux-ker...@vger.kernel.org; linux-rt-us...@vger.kernel.org; > t...@linutronix.de; rost...@goodmis.org; dan.j.willi...@intel.com; > pagu...@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org > Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware > > On 2019-03-18 01:41:10 [+], Liu, Yongxin wrote: > > > > Consider the recursive call to nd_region_acquire_lane() in the > following situation. > > Will there be a dead lock? > > > > > > Thread AThread B > >| | > >| | > > CPU 1 CPU 2 > >| | > >| | > > get lock for Lane 1 get lock for Lane 2 > >| | > >| | > > migrate to CPU 2migrate to CPU 1 > >| | > >| | > > wait lock for Lane 2wait lock for Lane 1 > >| | > >| | > >_ > >| > > dead lock ? > > Bummer. That would dead lock indeed. > Is it easily possible to recognize the recursive case? Not easily. I don't have test case for recursive call. For now, just code analysis. Yongxin > > > > Thanks, > > Yognxin > > Sebastian ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH RT] nvdimm: make lane acquirement RT aware
On 2019-03-11 00:44:58 [+], Liu, Yongxin wrote: > > but you still have the ndl_lock->lock which protects the resource. So in > > the unlikely (but possible event) that you switch CPUs after obtaining > > the CPU number you block on the lock. No harm is done, right? > > The resource "lane" can be acquired recursively, so "ndl_lock->lock" is a > conditional lock. > > ndl_count->count is per CPU. > ndl_lock->lock is per lane. > > Here is an example: > Thread A on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> get > "ndl_lock->lock" > --> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" due to > "ndl_count->count++". > > Thread B on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> bypass > "ndl_lock->lock" ("ndl_count->count" > was changed by Thread A) > > If we use raw_smp_processor_id(), no matter which CPU the thread was migrated > to, > if there is another thread running on the old CPU, there will be race > condition > due to per CPU variable "ndl_count->count". so I've been looking at it again. The recursive locking could have been solved better. Like the local_lock() on -RT is doing it. Given that you lock with preempt_disable() there should be no in-IRQ usage. But in the "nd_region->num_lanes >= nr_cpu_ids" case you don't take any locks. That would be a problem with raw_smp_processor_id() approach. So what about the completely untested patch here: diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index 379bf4305e615..98c2e9df4b2e4 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -109,7 +109,8 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata *ndd); res; res = next, next = next ? next->sibling : NULL) struct nd_percpu_lane { - int count; + struct task_struct *owner; + int nestcnt; spinlock_t lock; }; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index e2818f94f2928..8a62f9833513f 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -946,19 +946,17 @@ int nd_blk_region_init(struct nd_region *nd_region) */ unsigned int nd_region_acquire_lane(struct nd_region *nd_region) { + struct nd_percpu_lane *ndl_lock; unsigned int cpu, lane; - cpu = get_cpu(); - if (nd_region->num_lanes < nr_cpu_ids) { - struct nd_percpu_lane *ndl_lock, *ndl_count; - - lane = cpu % nd_region->num_lanes; - ndl_count = per_cpu_ptr(nd_region->lane, cpu); - ndl_lock = per_cpu_ptr(nd_region->lane, lane); - if (ndl_count->count++ == 0) - spin_lock(_lock->lock); - } else - lane = cpu; + cpu = raw_smp_processor_id(); + lane = cpu % nd_region->num_lanes; + ndl_lock = per_cpu_ptr(nd_region->lane, lane); + if (ndl_lock->owner != current) { + spin_lock(_lock->lock); + ndl_lock->owner = current; + } + ndl_lock->nestcnt++; return lane; } @@ -966,17 +964,16 @@ EXPORT_SYMBOL(nd_region_acquire_lane); void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane) { - if (nd_region->num_lanes < nr_cpu_ids) { - unsigned int cpu = get_cpu(); - struct nd_percpu_lane *ndl_lock, *ndl_count; + struct nd_percpu_lane *ndl_lock; - ndl_count = per_cpu_ptr(nd_region->lane, cpu); - ndl_lock = per_cpu_ptr(nd_region->lane, lane); - if (--ndl_count->count == 0) - spin_unlock(_lock->lock); - put_cpu(); - } - put_cpu(); + ndl_lock = per_cpu_ptr(nd_region->lane, lane); + WARN_ON(ndl_lock->nestcnt == 0); + WARN_ON(ndl_lock->owner != current); + if (--ndl_lock->nestcnt) + return; + + ndl_lock->owner = NULL; + spin_unlock(_lock->lock); } EXPORT_SYMBOL(nd_region_release_lane); @@ -1042,7 +1039,8 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, ndl = per_cpu_ptr(nd_region->lane, i); spin_lock_init(>lock); - ndl->count = 0; + ndl->owner = NULL; + ndl->nestcnt = 0; } for (i = 0; i < ndr_desc->num_mappings; i++) { > Thanks, > Yongxin Sebastian ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
RE: [PATCH RT] nvdimm: make lane acquirement RT aware
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > ow...@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior > Sent: Friday, March 8, 2019 17:42 > To: Liu, Yongxin > Cc: linux-ker...@vger.kernel.org; linux-rt-us...@vger.kernel.org; > t...@linutronix.de; rost...@goodmis.org; dan.j.willi...@intel.com; > pagu...@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org > Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware > > On 2019-03-08 00:07:41 [+], Liu, Yongxin wrote: > > The lane is critical resource which needs to be protected. One CPU can > use only one > > lane. If CPU number is greater than the number of total lane, the lane > can be shared > > among CPUs. > > > > In non-RT kernel, get_cpu() disable preemption by calling > preempt_disable() first. > > Only one thread on the same CPU can get the lane. > > > > In RT kernel, if we only use raw_smp_processor_id(), this doesn't > protect the lane. > > Thus two threads on the same CPU can get the same lane at the same time. > > > > In this patch, two-level lock can avoid race condition for the lane. > > but you still have the ndl_lock->lock which protects the resource. So in > the unlikely (but possible event) that you switch CPUs after obtaining > the CPU number you block on the lock. No harm is done, right? The resource "lane" can be acquired recursively, so "ndl_lock->lock" is a conditional lock. ndl_count->count is per CPU. ndl_lock->lock is per lane. Here is an example: Thread A on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> get "ndl_lock->lock" --> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" due to "ndl_count->count++". Thread B on CPU 5 --> nd_region_acquire_lane --> lane# 5 --> bypass "ndl_lock->lock" ("ndl_count->count" was changed by Thread A) If we use raw_smp_processor_id(), no matter which CPU the thread was migrated to, if there is another thread running on the old CPU, there will be race condition due to per CPU variable "ndl_count->count". Thanks, Yongxin > > Sebastian ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH RT] nvdimm: make lane acquirement RT aware
On 2019-03-08 00:07:41 [+], Liu, Yongxin wrote: > The lane is critical resource which needs to be protected. One CPU can use > only one > lane. If CPU number is greater than the number of total lane, the lane can be > shared > among CPUs. > > In non-RT kernel, get_cpu() disable preemption by calling preempt_disable() > first. > Only one thread on the same CPU can get the lane. > > In RT kernel, if we only use raw_smp_processor_id(), this doesn't protect the > lane. > Thus two threads on the same CPU can get the same lane at the same time. > > In this patch, two-level lock can avoid race condition for the lane. but you still have the ndl_lock->lock which protects the resource. So in the unlikely (but possible event) that you switch CPUs after obtaining the CPU number you block on the lock. No harm is done, right? > Thanks, > Yongxin Sebastian ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH RT] nvdimm: make lane acquirement RT aware
> Currently, nvdimm driver isn't RT compatible. > nd_region_acquire_lane() disables preemption with get_cpu() which > causes "scheduling while atomic" spews on RT, when using fio to test > pmem as block device. > > In this change, we replace get_cpu/put_cpu with local_lock_cpu/ > local_unlock_cpu, and introduce per CPU variable "ndl_local_lock". > Due to preemption on RT, this lock can avoid race condition for the > same lane on the same CPU. When CPU number is greater than the lane > number, lane can be shared among CPUs. "ndl_lock->lock" is used to > protect the lane in this situation. > > This patch is derived from Dan Williams and Pankaj Gupta's proposal from > https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg13359.html > and https://www.spinics.net/lists/linux-rt-users/msg20280.html. > Many thanks to them. > > Cc: Dan Williams > Cc: Pankaj Gupta > Cc: linux-rt-users > Cc: linux-nvdimm > Signed-off-by: Yongxin Liu This patch looks good to me. Acked-by: Pankaj Gupta > --- > drivers/nvdimm/region_devs.c | 40 +++- > 1 file changed, 19 insertions(+), 21 deletions(-) > > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c > index fa37afcd43ff..6c5388cf2477 100644 > --- a/drivers/nvdimm/region_devs.c > +++ b/drivers/nvdimm/region_devs.c > @@ -18,9 +18,13 @@ > #include > #include > #include > +#include > #include "nd-core.h" > #include "nd.h" > > +/* lock for tasks on the same CPU to sequence the access to the lane */ > +static DEFINE_LOCAL_IRQ_LOCK(ndl_local_lock); > + > /* > * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is > * irrelevant. > @@ -935,18 +939,15 @@ int nd_blk_region_init(struct nd_region *nd_region) > unsigned int nd_region_acquire_lane(struct nd_region *nd_region) > { > unsigned int cpu, lane; > + struct nd_percpu_lane *ndl_lock, *ndl_count; > > - cpu = get_cpu(); > - if (nd_region->num_lanes < nr_cpu_ids) { > - struct nd_percpu_lane *ndl_lock, *ndl_count; > + cpu = local_lock_cpu(ndl_local_lock); > > - lane = cpu % nd_region->num_lanes; > - ndl_count = per_cpu_ptr(nd_region->lane, cpu); > - ndl_lock = per_cpu_ptr(nd_region->lane, lane); > - if (ndl_count->count++ == 0) > - spin_lock(_lock->lock); > - } else > - lane = cpu; > + lane = cpu % nd_region->num_lanes; > + ndl_count = per_cpu_ptr(nd_region->lane, cpu); > + ndl_lock = per_cpu_ptr(nd_region->lane, lane); > + if (ndl_count->count++ == 0) > + spin_lock(_lock->lock); > > return lane; > } > @@ -954,17 +955,14 @@ EXPORT_SYMBOL(nd_region_acquire_lane); > > void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane) > { > - if (nd_region->num_lanes < nr_cpu_ids) { > - unsigned int cpu = get_cpu(); > - struct nd_percpu_lane *ndl_lock, *ndl_count; > - > - ndl_count = per_cpu_ptr(nd_region->lane, cpu); > - ndl_lock = per_cpu_ptr(nd_region->lane, lane); > - if (--ndl_count->count == 0) > - spin_unlock(_lock->lock); > - put_cpu(); > - } > - put_cpu(); > + struct nd_percpu_lane *ndl_lock, *ndl_count; > + unsigned int cpu = smp_processor_id(); > + > + ndl_count = per_cpu_ptr(nd_region->lane, cpu); > + ndl_lock = per_cpu_ptr(nd_region->lane, lane); > + if (--ndl_count->count == 0) > + spin_unlock(_lock->lock); > + local_unlock_cpu(ndl_local_lock); > } > EXPORT_SYMBOL(nd_region_release_lane); > > -- > 2.14.4 > > ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
RE: [PATCH RT] nvdimm: make lane acquirement RT aware
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > ow...@vger.kernel.org] On Behalf Of Sebastian Andrzej Siewior > Sent: Thursday, March 7, 2019 22:34 > To: Liu, Yongxin > Cc: linux-ker...@vger.kernel.org; linux-rt-us...@vger.kernel.org; > t...@linutronix.de; rost...@goodmis.org; dan.j.willi...@intel.com; > pagu...@redhat.com; Gortmaker, Paul; linux-nvdimm@lists.01.org > Subject: Re: [PATCH RT] nvdimm: make lane acquirement RT aware > > On 2019-03-06 17:57:09 [+0800], Yongxin Liu wrote: > > In this change, we replace get_cpu/put_cpu with local_lock_cpu/ > > local_unlock_cpu, and introduce per CPU variable "ndl_local_lock". > > Due to preemption on RT, this lock can avoid race condition for the > > same lane on the same CPU. When CPU number is greater than the lane > > number, lane can be shared among CPUs. "ndl_lock->lock" is used to > > protect the lane in this situation. > > so what was the reason that get_cpu() can't be replaced with > raw_smp_processor_id()? > > Sebastian The lane is critical resource which needs to be protected. One CPU can use only one lane. If CPU number is greater than the number of total lane, the lane can be shared among CPUs. In non-RT kernel, get_cpu() disable preemption by calling preempt_disable() first. Only one thread on the same CPU can get the lane. In RT kernel, if we only use raw_smp_processor_id(), this doesn't protect the lane. Thus two threads on the same CPU can get the same lane at the same time. In this patch, two-level lock can avoid race condition for the lane. CPU A CPU B (B == A % num_lanes) task A1task A2 task B1task B2 | | | | |__| |__| | | ndl_local_lock ndl_local_lock | | |__| | | ndl_lock->lock | | lane Thanks, Yongxin ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH RT] nvdimm: make lane acquirement RT aware
On 2019-03-06 17:57:09 [+0800], Yongxin Liu wrote: > In this change, we replace get_cpu/put_cpu with local_lock_cpu/ > local_unlock_cpu, and introduce per CPU variable "ndl_local_lock". > Due to preemption on RT, this lock can avoid race condition for the > same lane on the same CPU. When CPU number is greater than the lane > number, lane can be shared among CPUs. "ndl_lock->lock" is used to > protect the lane in this situation. so what was the reason that get_cpu() can't be replaced with raw_smp_processor_id()? Sebastian ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH RT] nvdimm: make lane acquirement RT aware
On Wed, Mar 6, 2019 at 2:05 AM Yongxin Liu wrote: > > Currently, nvdimm driver isn't RT compatible. > nd_region_acquire_lane() disables preemption with get_cpu() which > causes "scheduling while atomic" spews on RT, when using fio to test > pmem as block device. > > In this change, we replace get_cpu/put_cpu with local_lock_cpu/ > local_unlock_cpu, and introduce per CPU variable "ndl_local_lock". > Due to preemption on RT, this lock can avoid race condition for the > same lane on the same CPU. When CPU number is greater than the lane > number, lane can be shared among CPUs. "ndl_lock->lock" is used to > protect the lane in this situation. > > This patch is derived from Dan Williams and Pankaj Gupta's proposal from > https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg13359.html > and https://www.spinics.net/lists/linux-rt-users/msg20280.html. > Many thanks to them. > > Cc: Dan Williams > Cc: Pankaj Gupta > Cc: linux-rt-users > Cc: linux-nvdimm > Signed-off-by: Yongxin Liu Looks ok to me in concept. Acked-by: Dan Williams ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm