Re: [dm-devel] bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-08 Thread Benjamin Block
On 10:39 Mon 08 Aug , Jens Axboe wrote:
> On 08/08/2016 10:32 AM, Benjamin Block wrote:
> >On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> >>On Fri, Aug 05 2016 at 11:54am -0400,
> >>Jens Axboe  wrote:
> >>
> >>>On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> On Fri, Aug 05 2016 at 11:33P -0400,
> Jens Axboe  wrote:
> 
> >On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> >>On Wed, Aug 03 2016 at 11:35am -0400,
> >>Benjamin Block  wrote:
> >>
> >>>Hej Mike,
> >>>
> >>>when running a debug-kernel today with several multipath-devices using
> >>>the round-robin path selector I noticed that the kernel throws these
> >>>warnings here:
> >>>
> >>>BUG: using smp_processor_id() in preemptible [] code: 
> >>>kdmwork-252:0/881
> >>>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> >>>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> >>> 617679b8 61767a48 0002 
> >>> 
> >>> 61767ae8 61767a60 61767a60 
> >>> 001145d0
> >>>  00b962ae 00bb291e 
> >>> 000b
> >>> 61767aa8 61767a48  
> >>> 
> >>> 07b962ae 001145d0 61767a48 
> >>> 61767aa8
> >>>Call Trace:
> >>>([<001144a2>] show_trace+0x8a/0xe0)
> >>>([<00114586>] show_stack+0x8e/0xf0)
> >>>([<006c7fdc>] dump_stack+0x9c/0xe0)
> >>>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> >>>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> >>>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> >>>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> >>>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> >>>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> >>>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> >>>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> >>>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> >>>([<001681da>] kthread+0x112/0x120)
> >>>([<0098378a>] kernel_thread_starter+0x6/0xc)
> >>>([<00983784>] kernel_thread_starter+0x0/0xc)
> >>>no locks held by kdmwork-252:0/881.
> >>>
> >[:snip:]
> >>>
> >>>I always forget the details (if this confuses lockdep or not), but you
> >>>could potentially turn it into:
> >>>
> >>>local_irq_save(flags);
> >>>x = this_cpu_ptr();
> >>>[...]
> >>>
> >>>spin_lock(&s->lock);
> >>>[...]
> >>>
> >>>instead.
> >>
> >>Cool, I've coded up the patch (compile tested only).
> >>
> >>Benjamin, any chance you could test this against your v4.7 kernel and
> >>report back?
> >>
> >>Thanks,
> >>Mike
> >>
> >>diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
> >>index 4ace1da..ed446f8 100644
> >>--- a/drivers/md/dm-round-robin.c
> >>+++ b/drivers/md/dm-round-robin.c
> >>@@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
> >>path_selector *ps, size_t nr_bytes)
> >>struct path_info *pi = NULL;
> >>struct dm_path *current_path = NULL;
> >>
> >>+   local_irq_save(flags);
> >>current_path = *this_cpu_ptr(s->current_path);
> >>if (current_path) {
> >>percpu_counter_dec(&s->repeat_count);
> >>-   if (percpu_counter_read_positive(&s->repeat_count) > 0)
> >>+   if (percpu_counter_read_positive(&s->repeat_count) > 0) {
> >>+   local_irq_restore(flags);
> >>return current_path;
> >>+   }
> >>}
> >>
> >>-   spin_lock_irqsave(&s->lock, flags);
> >>+   spin_lock(&s->lock);
> >>if (!list_empty(&s->valid_paths)) {
> >>pi = list_entry(s->valid_paths.next, struct path_info, list);
> >>list_move_tail(&pi->list, &s->valid_paths);
> >>@@ -225,7 +228,8 @@ static struct dm_path *rr_select_path(struct 
> >>path_selector *ps, size_t nr_bytes)
> >>set_percpu_current_path(s, pi->path);
> >>current_path = pi->path;
> >>}
> >>-   spin_unlock_irqrestore(&s->lock, flags);
> >>+   spin_unlock(&s->lock);
> >>+   local_irq_restore(flags);
> >>
> >>return current_path;
> >> }
> >>
> >
> >Ok, this works as far as the warnings don't appear anymore. But while
> >applying the patch and thinking about it, why local_irq_save() and not
> >preempt_disable()? "Sounds" like this is the function you want, and I
> >also stumbled across this in Documentation/preempt-locking.txt:
> >
> >  But keep in mind that 'irqs disabled' is a fundamentally unsafe way of
> >  disabling preemption - any spin_unlock() decreasing the preemption
> >  count to 0 might trigger a reschedule.
> >
> >The spinlock would do an other nested preempt_disable(), but those even
> >out.
> 
> local_irq_save(), since we need to grab the lock irq safe very shortly
> 

Re: [dm-devel] bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-08 Thread Jens Axboe

On 08/08/2016 10:32 AM, Benjamin Block wrote:

On 12:06 Fri 05 Aug , Mike Snitzer wrote:

On Fri, Aug 05 2016 at 11:54am -0400,
Jens Axboe  wrote:


On 08/05/2016 09:42 AM, Mike Snitzer wrote:

On Fri, Aug 05 2016 at 11:33P -0400,
Jens Axboe  wrote:


On 08/05/2016 09:27 AM, Mike Snitzer wrote:

On Wed, Aug 03 2016 at 11:35am -0400,
Benjamin Block  wrote:


Hej Mike,

when running a debug-kernel today with several multipath-devices using
the round-robin path selector I noticed that the kernel throws these
warnings here:

BUG: using smp_processor_id() in preemptible [] code: kdmwork-252:0/881
caller is rr_select_path+0x36/0x108 [dm_round_robin]
CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
 617679b8 61767a48 0002 
 61767ae8 61767a60 61767a60 001145d0
  00b962ae 00bb291e 000b
 61767aa8 61767a48  
 07b962ae 001145d0 61767a48 61767aa8
Call Trace:
([<001144a2>] show_trace+0x8a/0xe0)
([<00114586>] show_stack+0x8e/0xf0)
([<006c7fdc>] dump_stack+0x9c/0xe0)
([<006fbbc0>] check_preemption_disabled+0x108/0x130)
([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
([<001681da>] kthread+0x112/0x120)
([<0098378a>] kernel_thread_starter+0x6/0xc)
([<00983784>] kernel_thread_starter+0x0/0xc)
no locks held by kdmwork-252:0/881.


[:snip:]


I always forget the details (if this confuses lockdep or not), but you
could potentially turn it into:

local_irq_save(flags);
x = this_cpu_ptr();
[...]

spin_lock(&s->lock);
[...]

instead.


Cool, I've coded up the patch (compile tested only).

Benjamin, any chance you could test this against your v4.7 kernel and
report back?

Thanks,
Mike

diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
index 4ace1da..ed446f8 100644
--- a/drivers/md/dm-round-robin.c
+++ b/drivers/md/dm-round-robin.c
@@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
path_selector *ps, size_t nr_bytes)
struct path_info *pi = NULL;
struct dm_path *current_path = NULL;

+   local_irq_save(flags);
current_path = *this_cpu_ptr(s->current_path);
if (current_path) {
percpu_counter_dec(&s->repeat_count);
-   if (percpu_counter_read_positive(&s->repeat_count) > 0)
+   if (percpu_counter_read_positive(&s->repeat_count) > 0) {
+   local_irq_restore(flags);
return current_path;
+   }
}

-   spin_lock_irqsave(&s->lock, flags);
+   spin_lock(&s->lock);
if (!list_empty(&s->valid_paths)) {
pi = list_entry(s->valid_paths.next, struct path_info, list);
list_move_tail(&pi->list, &s->valid_paths);
@@ -225,7 +228,8 @@ static struct dm_path *rr_select_path(struct path_selector 
*ps, size_t nr_bytes)
set_percpu_current_path(s, pi->path);
current_path = pi->path;
}
-   spin_unlock_irqrestore(&s->lock, flags);
+   spin_unlock(&s->lock);
+   local_irq_restore(flags);

return current_path;
 }



Ok, this works as far as the warnings don't appear anymore. But while
applying the patch and thinking about it, why local_irq_save() and not
preempt_disable()? "Sounds" like this is the function you want, and I
also stumbled across this in Documentation/preempt-locking.txt:

  But keep in mind that 'irqs disabled' is a fundamentally unsafe way of
  disabling preemption - any spin_unlock() decreasing the preemption
  count to 0 might trigger a reschedule.

The spinlock would do an other nested preempt_disable(), but those even
out.


local_irq_save(), since we need to grab the lock irq safe very shortly
anyway. As long as they nest properly, the approach is fine, and it's
more efficient than first doing a preempt_disable(), then still needing
a irq safe spinlock.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [dm-devel] bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-08 Thread Benjamin Block
On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> On Fri, Aug 05 2016 at 11:54am -0400,
> Jens Axboe  wrote:
> 
> > On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> > >On Fri, Aug 05 2016 at 11:33P -0400,
> > >Jens Axboe  wrote:
> > >
> > >>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> > >>>On Wed, Aug 03 2016 at 11:35am -0400,
> > >>>Benjamin Block  wrote:
> > >>>
> > Hej Mike,
> > 
> > when running a debug-kernel today with several multipath-devices using
> > the round-robin path selector I noticed that the kernel throws these
> > warnings here:
> > 
> > BUG: using smp_processor_id() in preemptible [] code: 
> > kdmwork-252:0/881
> > caller is rr_select_path+0x36/0x108 [dm_round_robin]
> > CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> >   617679b8 61767a48 0002 
> >  
> >   61767ae8 61767a60 61767a60 
> >  001145d0
> >    00b962ae 00bb291e 
> >  000b
> >   61767aa8 61767a48  
> >  
> >   07b962ae 001145d0 61767a48 
> >  61767aa8
> > Call Trace:
> > ([<001144a2>] show_trace+0x8a/0xe0)
> > ([<00114586>] show_stack+0x8e/0xf0)
> > ([<006c7fdc>] dump_stack+0x9c/0xe0)
> > ([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> > ([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> > ([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> > ([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> > ([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> > ([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> > ([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> > ([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> > ([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> > ([<001681da>] kthread+0x112/0x120)
> > ([<0098378a>] kernel_thread_starter+0x6/0xc)
> > ([<00983784>] kernel_thread_starter+0x0/0xc)
> > no locks held by kdmwork-252:0/881.
> > 
[:snip:]
> > 
> > I always forget the details (if this confuses lockdep or not), but you
> > could potentially turn it into:
> > 
> > local_irq_save(flags);
> > x = this_cpu_ptr();
> > [...]
> > 
> > spin_lock(&s->lock);
> > [...]
> > 
> > instead.
> 
> Cool, I've coded up the patch (compile tested only).
> 
> Benjamin, any chance you could test this against your v4.7 kernel and
> report back?
> 
> Thanks,
> Mike
> 
> diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
> index 4ace1da..ed446f8 100644
> --- a/drivers/md/dm-round-robin.c
> +++ b/drivers/md/dm-round-robin.c
> @@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
> path_selector *ps, size_t nr_bytes)
>   struct path_info *pi = NULL;
>   struct dm_path *current_path = NULL;
> 
> + local_irq_save(flags);
>   current_path = *this_cpu_ptr(s->current_path);
>   if (current_path) {
>   percpu_counter_dec(&s->repeat_count);
> - if (percpu_counter_read_positive(&s->repeat_count) > 0)
> + if (percpu_counter_read_positive(&s->repeat_count) > 0) {
> + local_irq_restore(flags);
>   return current_path;
> + }
>   }
> 
> - spin_lock_irqsave(&s->lock, flags);
> + spin_lock(&s->lock);
>   if (!list_empty(&s->valid_paths)) {
>   pi = list_entry(s->valid_paths.next, struct path_info, list);
>   list_move_tail(&pi->list, &s->valid_paths);
> @@ -225,7 +228,8 @@ static struct dm_path *rr_select_path(struct 
> path_selector *ps, size_t nr_bytes)
>   set_percpu_current_path(s, pi->path);
>   current_path = pi->path;
>   }
> - spin_unlock_irqrestore(&s->lock, flags);
> + spin_unlock(&s->lock);
> + local_irq_restore(flags);
> 
>   return current_path;
>  }
> 

Ok, this works as far as the warnings don't appear anymore. But while
applying the patch and thinking about it, why local_irq_save() and not
preempt_disable()? "Sounds" like this is the function you want, and I
also stumbled across this in Documentation/preempt-locking.txt:

  But keep in mind that 'irqs disabled' is a fundamentally unsafe way of
  disabling preemption - any spin_unlock() decreasing the preemption
  count to 0 might trigger a reschedule. 

The spinlock would do an other nested preempt_disable(), but those even
out.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR

Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-07 Thread Benjamin Block
On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> On Fri, Aug 05 2016 at 11:54am -0400,
> Jens Axboe  wrote:
> 
> > On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> > >On Fri, Aug 05 2016 at 11:33P -0400,
> > >Jens Axboe  wrote:
> > >
> > >>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> > >>>On Wed, Aug 03 2016 at 11:35am -0400,
> > >>>Benjamin Block  wrote:
> > >>>
> > Hej Mike,
> > 
> > when running a debug-kernel today with several multipath-devices using
> > the round-robin path selector I noticed that the kernel throws these
> > warnings here:
> > 
> > BUG: using smp_processor_id() in preemptible [] code: 
> > kdmwork-252:0/881
> > caller is rr_select_path+0x36/0x108 [dm_round_robin]
> > CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> >   617679b8 61767a48 0002 
> >  
> >   61767ae8 61767a60 61767a60 
> >  001145d0
> >    00b962ae 00bb291e 
> >  000b
> >   61767aa8 61767a48  
> >  
> >   07b962ae 001145d0 61767a48 
> >  61767aa8
> > Call Trace:
> > ([<001144a2>] show_trace+0x8a/0xe0)
> > ([<00114586>] show_stack+0x8e/0xf0)
> > ([<006c7fdc>] dump_stack+0x9c/0xe0)
> > ([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> > ([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> > ([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> > ([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> > ([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> > ([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> > ([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> > ([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> > ([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> > ([<001681da>] kthread+0x112/0x120)
> > ([<0098378a>] kernel_thread_starter+0x6/0xc)
> > ([<00983784>] kernel_thread_starter+0x0/0xc)
> > no locks held by kdmwork-252:0/881.
> > 
> > 
> > There are several more of these warnings, but all have the same
> > stack-trace (this is on s390x, but this looks like its only common code)
> > - sometimes the process-context is multipath.
> > 
> > Looking at the changes in this function, it rather looks like this is
> > caused by changes made in commit
> > b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
> > 'repeat_count' and 'current_path').
> > 
> > The kernel is a stock v4.7 with some debug options enabled (prominently
> > DEBUG_PREEMPT). Need any more info?
> > >>>
> > >>>As you can see from commit b0b477c7e0dd9 the round-robin path selector
> > >>>is now using percpu data (pointer) and a percpu_counter.
> > >>>
> > >>>I'm really not sure how else to access this percpu data.
> > >>>
> > >>>Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
> > >>>of getting an answer to how to fix this.
> > >>
> > >>From a quick look, looks like you are using this_cpu_ptr() without
> > >>having preemption disabled.
> > >
> > >Right, that is what it looked like to me too.
> > >
> > >I'm just not sure on what the proper pattern is to fix this.
> > >
> > >I'll look closer though.
> > 
> > I always forget the details (if this confuses lockdep or not), but you
> > could potentially turn it into:
> > 
> > local_irq_save(flags);
> > x = this_cpu_ptr();
> > [...]
> > 
> > spin_lock(&s->lock);
> > [...]
> > 
> > instead.
> 
> Cool, I've coded up the patch (compile tested only).
> 
> Benjamin, any chance you could test this against your v4.7 kernel and
> report back?
> 

I will give it a go on Monday. Thx for the quick fix.


- Benjamin

> 
> diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
> index 4ace1da..ed446f8 100644
> --- a/drivers/md/dm-round-robin.c
> +++ b/drivers/md/dm-round-robin.c
> @@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
> path_selector *ps, size_t nr_bytes)
>   struct path_info *pi = NULL;
>   struct dm_path *current_path = NULL;
> 
> + local_irq_save(flags);
>   current_path = *this_cpu_ptr(s->current_path);
>   if (current_path) {
>   percpu_counter_dec(&s->repeat_count);
> - if (percpu_counter_read_positive(&s->repeat_count) > 0)
> + if (percpu_counter_read_positive(&s->repeat_count) > 0) {
> + local_irq_restore(flags);
>   return current_path;
> + }
>   }
> 
> - spin_lock_irqsave(&s->lock, flags);
> + spin_lock(&s->lock);
>   if (!list_empty(&s->valid_paths)) {
>   pi = list_entry(s->valid_paths.next, struct path_info, list);
>

Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-05 Thread Jens Axboe

On 08/05/2016 10:06 AM, Mike Snitzer wrote:

On Fri, Aug 05 2016 at 11:54am -0400,
Jens Axboe  wrote:


On 08/05/2016 09:42 AM, Mike Snitzer wrote:

On Fri, Aug 05 2016 at 11:33P -0400,
Jens Axboe  wrote:


On 08/05/2016 09:27 AM, Mike Snitzer wrote:

On Wed, Aug 03 2016 at 11:35am -0400,
Benjamin Block  wrote:


Hej Mike,

when running a debug-kernel today with several multipath-devices using
the round-robin path selector I noticed that the kernel throws these
warnings here:

BUG: using smp_processor_id() in preemptible [] code: kdmwork-252:0/881
caller is rr_select_path+0x36/0x108 [dm_round_robin]
CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
 617679b8 61767a48 0002 
 61767ae8 61767a60 61767a60 001145d0
  00b962ae 00bb291e 000b
 61767aa8 61767a48  
 07b962ae 001145d0 61767a48 61767aa8
Call Trace:
([<001144a2>] show_trace+0x8a/0xe0)
([<00114586>] show_stack+0x8e/0xf0)
([<006c7fdc>] dump_stack+0x9c/0xe0)
([<006fbbc0>] check_preemption_disabled+0x108/0x130)
([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
([<001681da>] kthread+0x112/0x120)
([<0098378a>] kernel_thread_starter+0x6/0xc)
([<00983784>] kernel_thread_starter+0x0/0xc)
no locks held by kdmwork-252:0/881.


There are several more of these warnings, but all have the same
stack-trace (this is on s390x, but this looks like its only common code)
- sometimes the process-context is multipath.

Looking at the changes in this function, it rather looks like this is
caused by changes made in commit
b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
'repeat_count' and 'current_path').

The kernel is a stock v4.7 with some debug options enabled (prominently
DEBUG_PREEMPT). Need any more info?


As you can see from commit b0b477c7e0dd9 the round-robin path selector
is now using percpu data (pointer) and a percpu_counter.

I'm really not sure how else to access this percpu data.

Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
of getting an answer to how to fix this.



>From a quick look, looks like you are using this_cpu_ptr() without

having preemption disabled.


Right, that is what it looked like to me too.

I'm just not sure on what the proper pattern is to fix this.

I'll look closer though.


I always forget the details (if this confuses lockdep or not), but you
could potentially turn it into:

local_irq_save(flags);
x = this_cpu_ptr();
[...]

spin_lock(&s->lock);
[...]

instead.


Cool, I've coded up the patch (compile tested only).

Benjamin, any chance you could test this against your v4.7 kernel and
report back?

Thanks,
Mike

diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
index 4ace1da..ed446f8 100644
--- a/drivers/md/dm-round-robin.c
+++ b/drivers/md/dm-round-robin.c
@@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
path_selector *ps, size_t nr_bytes)
struct path_info *pi = NULL;
struct dm_path *current_path = NULL;

+   local_irq_save(flags);
current_path = *this_cpu_ptr(s->current_path);
if (current_path) {
percpu_counter_dec(&s->repeat_count);
-   if (percpu_counter_read_positive(&s->repeat_count) > 0)
+   if (percpu_counter_read_positive(&s->repeat_count) > 0) {
+   local_irq_restore(flags);
return current_path;
+   }
}

-   spin_lock_irqsave(&s->lock, flags);
+   spin_lock(&s->lock);
if (!list_empty(&s->valid_paths)) {
pi = list_entry(s->valid_paths.next, struct path_info, list);
list_move_tail(&pi->list, &s->valid_paths);
@@ -225,7 +228,8 @@ static struct dm_path *rr_select_path(struct path_selector 
*ps, size_t nr_bytes)
set_percpu_current_path(s, pi->path);
current_path = pi->path;


Just leave that as-is, should be fine.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-05 Thread Mike Snitzer
On Fri, Aug 05 2016 at 11:54am -0400,
Jens Axboe  wrote:

> On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> >On Fri, Aug 05 2016 at 11:33P -0400,
> >Jens Axboe  wrote:
> >
> >>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> >>>On Wed, Aug 03 2016 at 11:35am -0400,
> >>>Benjamin Block  wrote:
> >>>
> Hej Mike,
> 
> when running a debug-kernel today with several multipath-devices using
> the round-robin path selector I noticed that the kernel throws these
> warnings here:
> 
> BUG: using smp_processor_id() in preemptible [] code: 
> kdmwork-252:0/881
> caller is rr_select_path+0x36/0x108 [dm_round_robin]
> CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
>   617679b8 61767a48 0002 
>   61767ae8 61767a60 61767a60 001145d0
>    00b962ae 00bb291e 000b
>   61767aa8 61767a48  
>   07b962ae 001145d0 61767a48 61767aa8
> Call Trace:
> ([<001144a2>] show_trace+0x8a/0xe0)
> ([<00114586>] show_stack+0x8e/0xf0)
> ([<006c7fdc>] dump_stack+0x9c/0xe0)
> ([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> ([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> ([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> ([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> ([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> ([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> ([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> ([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> ([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> ([<001681da>] kthread+0x112/0x120)
> ([<0098378a>] kernel_thread_starter+0x6/0xc)
> ([<00983784>] kernel_thread_starter+0x0/0xc)
> no locks held by kdmwork-252:0/881.
> 
> 
> There are several more of these warnings, but all have the same
> stack-trace (this is on s390x, but this looks like its only common code)
> - sometimes the process-context is multipath.
> 
> Looking at the changes in this function, it rather looks like this is
> caused by changes made in commit
> b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
> 'repeat_count' and 'current_path').
> 
> The kernel is a stock v4.7 with some debug options enabled (prominently
> DEBUG_PREEMPT). Need any more info?
> >>>
> >>>As you can see from commit b0b477c7e0dd9 the round-robin path selector
> >>>is now using percpu data (pointer) and a percpu_counter.
> >>>
> >>>I'm really not sure how else to access this percpu data.
> >>>
> >>>Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
> >>>of getting an answer to how to fix this.
> >>
> >>From a quick look, looks like you are using this_cpu_ptr() without
> >>having preemption disabled.
> >
> >Right, that is what it looked like to me too.
> >
> >I'm just not sure on what the proper pattern is to fix this.
> >
> >I'll look closer though.
> 
> I always forget the details (if this confuses lockdep or not), but you
> could potentially turn it into:
> 
> local_irq_save(flags);
> x = this_cpu_ptr();
> [...]
> 
> spin_lock(&s->lock);
> [...]
> 
> instead.

Cool, I've coded up the patch (compile tested only).

Benjamin, any chance you could test this against your v4.7 kernel and
report back?

Thanks,
Mike

diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
index 4ace1da..ed446f8 100644
--- a/drivers/md/dm-round-robin.c
+++ b/drivers/md/dm-round-robin.c
@@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
path_selector *ps, size_t nr_bytes)
struct path_info *pi = NULL;
struct dm_path *current_path = NULL;
 
+   local_irq_save(flags);
current_path = *this_cpu_ptr(s->current_path);
if (current_path) {
percpu_counter_dec(&s->repeat_count);
-   if (percpu_counter_read_positive(&s->repeat_count) > 0)
+   if (percpu_counter_read_positive(&s->repeat_count) > 0) {
+   local_irq_restore(flags);
return current_path;
+   }
}
 
-   spin_lock_irqsave(&s->lock, flags);
+   spin_lock(&s->lock);
if (!list_empty(&s->valid_paths)) {
pi = list_entry(s->valid_paths.next, struct path_info, list);
list_move_tail(&pi->list, &s->valid_paths);
@@ -225,7 +228,8 @@ static struct dm_path *rr_select_path(struct path_selector 
*ps, size_t nr_bytes)
set_percpu_current_path(s, pi->path);
current_path = pi->path;
}
-   spin_unlock_irqrestore(&s->lock, flags);
+   spin_unlock(&s->lock);
+   

Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-05 Thread Jens Axboe

On 08/05/2016 09:42 AM, Mike Snitzer wrote:

On Fri, Aug 05 2016 at 11:33P -0400,
Jens Axboe  wrote:


On 08/05/2016 09:27 AM, Mike Snitzer wrote:

On Wed, Aug 03 2016 at 11:35am -0400,
Benjamin Block  wrote:


Hej Mike,

when running a debug-kernel today with several multipath-devices using
the round-robin path selector I noticed that the kernel throws these
warnings here:

BUG: using smp_processor_id() in preemptible [] code: kdmwork-252:0/881
caller is rr_select_path+0x36/0x108 [dm_round_robin]
CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
  617679b8 61767a48 0002 
  61767ae8 61767a60 61767a60 001145d0
   00b962ae 00bb291e 000b
  61767aa8 61767a48  
  07b962ae 001145d0 61767a48 61767aa8
Call Trace:
([<001144a2>] show_trace+0x8a/0xe0)
([<00114586>] show_stack+0x8e/0xf0)
([<006c7fdc>] dump_stack+0x9c/0xe0)
([<006fbbc0>] check_preemption_disabled+0x108/0x130)
([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
([<001681da>] kthread+0x112/0x120)
([<0098378a>] kernel_thread_starter+0x6/0xc)
([<00983784>] kernel_thread_starter+0x0/0xc)
no locks held by kdmwork-252:0/881.


There are several more of these warnings, but all have the same
stack-trace (this is on s390x, but this looks like its only common code)
- sometimes the process-context is multipath.

Looking at the changes in this function, it rather looks like this is
caused by changes made in commit
b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
'repeat_count' and 'current_path').

The kernel is a stock v4.7 with some debug options enabled (prominently
DEBUG_PREEMPT). Need any more info?


As you can see from commit b0b477c7e0dd9 the round-robin path selector
is now using percpu data (pointer) and a percpu_counter.

I'm really not sure how else to access this percpu data.

Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
of getting an answer to how to fix this.


From a quick look, looks like you are using this_cpu_ptr() without
having preemption disabled.


Right, that is what it looked like to me too.

I'm just not sure on what the proper pattern is to fix this.

I'll look closer though.


I always forget the details (if this confuses lockdep or not), but you
could potentially turn it into:

local_irq_save(flags);
x = this_cpu_ptr();
[...]

spin_lock(&s->lock);
[...]

instead.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-05 Thread Mike Snitzer
On Fri, Aug 05 2016 at 11:33P -0400,
Jens Axboe  wrote:

> On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> >On Wed, Aug 03 2016 at 11:35am -0400,
> >Benjamin Block  wrote:
> >
> >>Hej Mike,
> >>
> >>when running a debug-kernel today with several multipath-devices using
> >>the round-robin path selector I noticed that the kernel throws these
> >>warnings here:
> >>
> >>BUG: using smp_processor_id() in preemptible [] code: 
> >>kdmwork-252:0/881
> >>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> >>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> >>   617679b8 61767a48 0002 
> >>   61767ae8 61767a60 61767a60 001145d0
> >>    00b962ae 00bb291e 000b
> >>   61767aa8 61767a48  
> >>   07b962ae 001145d0 61767a48 61767aa8
> >>Call Trace:
> >>([<001144a2>] show_trace+0x8a/0xe0)
> >>([<00114586>] show_stack+0x8e/0xf0)
> >>([<006c7fdc>] dump_stack+0x9c/0xe0)
> >>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> >>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> >>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> >>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> >>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> >>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> >>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> >>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> >>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> >>([<001681da>] kthread+0x112/0x120)
> >>([<0098378a>] kernel_thread_starter+0x6/0xc)
> >>([<00983784>] kernel_thread_starter+0x0/0xc)
> >>no locks held by kdmwork-252:0/881.
> >>
> >>
> >>There are several more of these warnings, but all have the same
> >>stack-trace (this is on s390x, but this looks like its only common code)
> >>- sometimes the process-context is multipath.
> >>
> >>Looking at the changes in this function, it rather looks like this is
> >>caused by changes made in commit
> >>b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
> >>'repeat_count' and 'current_path').
> >>
> >>The kernel is a stock v4.7 with some debug options enabled (prominently
> >>DEBUG_PREEMPT). Need any more info?
> >
> >As you can see from commit b0b477c7e0dd9 the round-robin path selector
> >is now using percpu data (pointer) and a percpu_counter.
> >
> >I'm really not sure how else to access this percpu data.
> >
> >Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
> >of getting an answer to how to fix this.
> 
> From a quick look, looks like you are using this_cpu_ptr() without
> having preemption disabled.

Right, that is what it looked like to me too.

I'm just not sure on what the proper pattern is to fix this.

I'll look closer though.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-05 Thread Jens Axboe

On 08/05/2016 09:27 AM, Mike Snitzer wrote:

On Wed, Aug 03 2016 at 11:35am -0400,
Benjamin Block  wrote:


Hej Mike,

when running a debug-kernel today with several multipath-devices using
the round-robin path selector I noticed that the kernel throws these
warnings here:

BUG: using smp_processor_id() in preemptible [] code: kdmwork-252:0/881
caller is rr_select_path+0x36/0x108 [dm_round_robin]
CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
   617679b8 61767a48 0002 
   61767ae8 61767a60 61767a60 001145d0
    00b962ae 00bb291e 000b
   61767aa8 61767a48  
   07b962ae 001145d0 61767a48 61767aa8
Call Trace:
([<001144a2>] show_trace+0x8a/0xe0)
([<00114586>] show_stack+0x8e/0xf0)
([<006c7fdc>] dump_stack+0x9c/0xe0)
([<006fbbc0>] check_preemption_disabled+0x108/0x130)
([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
([<001681da>] kthread+0x112/0x120)
([<0098378a>] kernel_thread_starter+0x6/0xc)
([<00983784>] kernel_thread_starter+0x0/0xc)
no locks held by kdmwork-252:0/881.


There are several more of these warnings, but all have the same
stack-trace (this is on s390x, but this looks like its only common code)
- sometimes the process-context is multipath.

Looking at the changes in this function, it rather looks like this is
caused by changes made in commit
b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
'repeat_count' and 'current_path').

The kernel is a stock v4.7 with some debug options enabled (prominently
DEBUG_PREEMPT). Need any more info?


As you can see from commit b0b477c7e0dd9 the round-robin path selector
is now using percpu data (pointer) and a percpu_counter.

I'm really not sure how else to access this percpu data.

Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
of getting an answer to how to fix this.


From a quick look, looks like you are using this_cpu_ptr() without
having preemption disabled.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-05 Thread Mike Snitzer
On Wed, Aug 03 2016 at 11:35am -0400,
Benjamin Block  wrote:

> Hej Mike,
> 
> when running a debug-kernel today with several multipath-devices using
> the round-robin path selector I noticed that the kernel throws these
> warnings here:
> 
> BUG: using smp_processor_id() in preemptible [] code: 
> kdmwork-252:0/881
> caller is rr_select_path+0x36/0x108 [dm_round_robin]
> CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
>617679b8 61767a48 0002 
>61767ae8 61767a60 61767a60 001145d0
> 00b962ae 00bb291e 000b
>61767aa8 61767a48  
>07b962ae 001145d0 61767a48 61767aa8
> Call Trace:
> ([<001144a2>] show_trace+0x8a/0xe0)
> ([<00114586>] show_stack+0x8e/0xf0)
> ([<006c7fdc>] dump_stack+0x9c/0xe0)
> ([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> ([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> ([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> ([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> ([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> ([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> ([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> ([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> ([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> ([<001681da>] kthread+0x112/0x120)
> ([<0098378a>] kernel_thread_starter+0x6/0xc)
> ([<00983784>] kernel_thread_starter+0x0/0xc)
> no locks held by kdmwork-252:0/881.
> 
> 
> There are several more of these warnings, but all have the same
> stack-trace (this is on s390x, but this looks like its only common code)
> - sometimes the process-context is multipath.
> 
> Looking at the changes in this function, it rather looks like this is
> caused by changes made in commit
> b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
> 'repeat_count' and 'current_path').
> 
> The kernel is a stock v4.7 with some debug options enabled (prominently
> DEBUG_PREEMPT). Need any more info?

As you can see from commit b0b477c7e0dd9 the round-robin path selector
is now using percpu data (pointer) and a percpu_counter.

I'm really not sure how else to access this percpu data.

Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
of getting an answer to how to fix this.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html