Re: [PATCH v5 5/9] block: Change the runtime power management approach (1/2)

2018-08-08 Thread Ming Lei
On Wed, Aug 08, 2018 at 05:28:43PM +, Bart Van Assche wrote:
> On Wed, 2018-08-08 at 14:43 +0800, jianchao.wang wrote:
> > 
> > On 08/08/2018 02:11 PM, jianchao.wang wrote:
> > > Hi Bart
> > > 
> > > On 08/08/2018 06:51 AM, Bart Van Assche wrote:
> > > > @@ -391,6 +393,9 @@ static struct request *blk_mq_get_request(struct 
> > > > request_queue *q,
> > > > }
> > > > }
> > > > data->hctx->queued++;
> > > > +
> > > > +   blk_pm_add_request(q, rq);
> > > > +
> > > > return rq;
> > > >  }
> > > 
> > > The request_queue is in pm_only mode when suspended, who can reach here 
> > > to do the resume ?
> > 
> > I mean, in the original blk-legacy runtime pm implementation, any new IO 
> > could trigger the resume,
> > after your patch set, only the pm request could pass the blk_queue_enter 
> > while the queue is suspended
> > and in pm-only mode. But if no resume, where does the pm request come from ?
> > 
> > The blk_pm_add_request should be added to blk_queue_enter.
> > It looks like as following:
> >1. when an normal io reaches blk_queue_enter, if queue is in suspended 
> > mode, it invoke blk_pm_add_request
> >   to trigger the resume, then wait here for the pm_only mode to be 
> > cleared.
> >2. the runtime pm core does the resume work and clear the pm_only more 
> > finally
> >3. the task blocked in blk_queue_enter is waked up and proceed.
> 
> Hello Jianchao,
> 
> Some but not all blk_queue_enter() calls are related to request allocation so

The only one I remember is scsi_ioctl_reset(), in which scsi_autopm_get_host()
is called before allocating this request, that means it is enough to put
host up for handling RESET. Also this request shouldn't have entered the block
request queue.

Or are there other cases in which request allocation isn't related with
blk_queue_enter()?


thanks,
Ming


[PATCH] block: bvec_nr_vecs() returns value for wrong slab

2018-08-08 Thread Greg Edwards
In commit ed996a52c868 ("block: simplify and cleanup bvec pool
handling"), the value of the slab index is incremented by one in
bvec_alloc() after the allocation is done to indicate an index value of
0 does not need to be later freed.

bvec_nr_vecs() was not updated accordingly, and thus returns the wrong
value.  Decrement idx before performing the lookup.

Fixes: ed996a52c868 ("block: simplify and cleanup bvec pool handling")
Signed-off-by: Greg Edwards 
---
 block/bio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index 047c5dca6d90..ff94640bc734 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -156,7 +156,7 @@ static void bio_put_slab(struct bio_set *bs)
 
 unsigned int bvec_nr_vecs(unsigned short idx)
 {
-   return bvec_slabs[idx].nr_vecs;
+   return bvec_slabs[--idx].nr_vecs;
 }
 
 void bvec_free(mempool_t *pool, struct bio_vec *bv, unsigned int idx)
-- 
2.17.1



Re: [PATCH v5 6/9] block: Change the runtime power management approach (2/2)

2018-08-08 Thread Bart Van Assche
On Wed, 2018-08-08 at 16:50 +0800, Ming Lei wrote:
> On Tue, Aug 07, 2018 at 03:51:30PM -0700, Bart Van Assche wrote:
> > Instead of allowing requests that are not power management requests
> > to enter the queue in runtime suspended status (RPM_SUSPENDED), make
> > the blk_get_request() caller block. This change fixes a starvation
> 
> Looks not see the related change which blocks blk_get_request() in
> this patchset.

I was referring to setting pm-only mode for suspended devices. Since
blk_queue_enter() is called before a request is allocated that is sufficient
to make request allocation block.

> > diff --git a/block/blk-pm.c b/block/blk-pm.c
> > index bf8532da952d..d6b65cef9764 100644
> > --- a/block/blk-pm.c
> > +++ b/block/blk-pm.c
> > @@ -86,14 +86,39 @@ int blk_pre_runtime_suspend(struct request_queue *q)
> > if (!q->dev)
> > return ret;
> >  
> > +   WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE);
> > +
> > +   blk_set_pm_only(q);
> > +   /*
> > +* This function only gets called if the most recent
> > +* pm_request_resume() call occurred at least autosuspend_delay_ms
> > +* ago. Since blk_queue_enter() is called by the request allocation
> > +* code before pm_request_resume(), if q_usage_counter indicates that
> > +* no requests are in flight it is safe to suspend the device.
> > +*/
> > +   ret = -EBUSY;
> > +   if (!percpu_ref_is_in_use(>q_usage_counter)) {
> > +   /*
> > +* Switch to preempt-only mode before calling
> > +* synchronize_rcu() such that later blk_queue_enter() calls
> > +* see the preempt-only state. See also
> > +* http://lwn.net/Articles/573497/.
> > +*/
> > +   synchronize_rcu();
> > +   if (!percpu_ref_is_in_use(>q_usage_counter))
> > +   ret = 0;
> > +   }
> > +
> 
> In blk_queue_enter(), V5 starts to allow all RQF_PREEMPT requests
> to enter queue even though pm_only is set. That means any scsi_execute()
> may issue a new command to host just after the above percpu_ref_is_in_use()
> returns 0, meantime the suspend is in-progress.
> 
> This case is illegal given RQF_PM is the only kind of request which can be
> issued to hardware during suspend.

Right, that's something that needs to be addressed.

Bart.



Re: [PATCH v5 5/9] block: Change the runtime power management approach (1/2)

2018-08-08 Thread Bart Van Assche
On Wed, 2018-08-08 at 14:43 +0800, jianchao.wang wrote:
> 
> On 08/08/2018 02:11 PM, jianchao.wang wrote:
> > Hi Bart
> > 
> > On 08/08/2018 06:51 AM, Bart Van Assche wrote:
> > > @@ -391,6 +393,9 @@ static struct request *blk_mq_get_request(struct 
> > > request_queue *q,
> > >   }
> > >   }
> > >   data->hctx->queued++;
> > > +
> > > + blk_pm_add_request(q, rq);
> > > +
> > >   return rq;
> > >  }
> > 
> > The request_queue is in pm_only mode when suspended, who can reach here to 
> > do the resume ?
> 
> I mean, in the original blk-legacy runtime pm implementation, any new IO 
> could trigger the resume,
> after your patch set, only the pm request could pass the blk_queue_enter 
> while the queue is suspended
> and in pm-only mode. But if no resume, where does the pm request come from ?
> 
> The blk_pm_add_request should be added to blk_queue_enter.
> It looks like as following:
>1. when an normal io reaches blk_queue_enter, if queue is in suspended 
> mode, it invoke blk_pm_add_request
>   to trigger the resume, then wait here for the pm_only mode to be 
> cleared.
>2. the runtime pm core does the resume work and clear the pm_only more 
> finally
>3. the task blocked in blk_queue_enter is waked up and proceed.

Hello Jianchao,

Some but not all blk_queue_enter() calls are related to request allocation so
I'm not sure that that call should be added into blk_queue_enter(). The reason
my tests passed is probably because of the scsi_autopm_get_device() calls in
the sd and sr drivers. However, not all request submission code is protected by
these calls. I will have a closer look at how to preserve the behavior that
queueing a new request that is not protected by scsi_autopm_get_*() restores
full power mode.

Bart.



Re: [PATCH v5 1/9] block: Change the preempt-only flag into a counter

2018-08-08 Thread Bart Van Assche
On Wed, 2018-08-08 at 16:21 +0800, Ming Lei wrote:
> On Tue, Aug 07, 2018 at 03:51:25PM -0700, Bart Van Assche wrote:
> > The RQF_PREEMPT flag is used for three purposes:
> > - In the SCSI core, for making sure that power management requests
> >   are executed if a device is in the "quiesced" state.
> > - For domain validation by SCSI drivers that use the parallel port.
> > - In the IDE driver, for IDE preempt requests.
> 
> I think the above description may not be accurate, BLK_MQ_REQ_PREEMPT is
> always set inside scsi_execute(), that means any scsi_execute() callers
> can use the flag of RQF_PREEMPT, of course, not limited to the above
> three cases.

Hello Ming,

What I described is for which cases we really need the RQF_PREEMPT flag rather
than in which cases the RQF_PREEMPT flag is set today. I think we should split
the RQF_PREEMPT flag into three flags, one flag per case mentioned above.

Bart.


Re: [PATCH v5 4/9] percpu-refcount: Introduce percpu_ref_is_in_use()

2018-08-08 Thread Tejun Heo
Hello,

On Tue, Aug 07, 2018 at 03:51:28PM -0700, Bart Van Assche wrote:
> Introduce a function that allows to determine whether a per-cpu refcount
> is in use. This function will be used in a later patch to determine
> whether or not any block layer requests are being executed.

I thought about it a bit and am having a bit of difficulty convincing
myself this is necessary.  Switching a percpu_ref to atomic mode isn't
expensive - it's one spinlock cycle, a rcu wait and one sweep of the
percpu counters.  The most expensive part - the percpu sweep - needs
to be there even with optimization, the wait doesn't really matter as
all it'll do is slightly delaying timer based PM operation and can be
overlayed with the propagation of set_pm_only() anyway.

So, how about just doing the simple thing?  Switch it to atomic mode
and check the counter and switch back to percpu mode afterwards.  If
we see any issues with that, we can try to optimize it later but that
seems unlikely to me.

Thanks.

-- 
tejun


Re: [PATCH v4 2/2] block: Ensure that a request queue is dissociated from the cgroup controller

2018-08-08 Thread Jens Axboe
On 8/8/18 9:04 AM, Bart Van Assche wrote:
> On Wed, 2018-08-08 at 08:41 -0600, Jens Axboe wrote:
>> On 7/30/18 3:10 PM, Bart Van Assche wrote:
>>> +#ifdef CONFIG_BLK_CGROUP
>>> +   {
>>> +   struct blkcg_gq *blkg;
>>> +
>>> +   rcu_read_lock();
>>> +   blkg = blkg_lookup(_root, q);
>>> +   rcu_read_unlock();
>>> +
>>> +   WARN(blkg,
>>> +"request queue %p is being released but it has not yet 
>>> been removed from the blkcg controller\n",
>>> +q);
>>> +   }
>>> +#endif
>>
>> This last hunk should go in the cgroup code.
> 
> Hello Jens,
> 
> How about leaving out the #ifdef CONFIG_BLK_CGROUP / #endif? Would that be
> sufficient? I just noticed that blkg_lookup() returns NULL anyway if cgroup 
> support
> is disabled in the kernel config.

That's even better.

-- 
Jens Axboe



Re: [PATCH v4 2/2] block: Ensure that a request queue is dissociated from the cgroup controller

2018-08-08 Thread Bart Van Assche
On Wed, 2018-08-08 at 08:41 -0600, Jens Axboe wrote:
> On 7/30/18 3:10 PM, Bart Van Assche wrote:
> > +#ifdef CONFIG_BLK_CGROUP
> > +   {
> > +   struct blkcg_gq *blkg;
> > +
> > +   rcu_read_lock();
> > +   blkg = blkg_lookup(_root, q);
> > +   rcu_read_unlock();
> > +
> > +   WARN(blkg,
> > +"request queue %p is being released but it has not yet 
> > been removed from the blkcg controller\n",
> > +q);
> > +   }
> > +#endif
> 
> This last hunk should go in the cgroup code.

Hello Jens,

How about leaving out the #ifdef CONFIG_BLK_CGROUP / #endif? Would that be
sufficient? I just noticed that blkg_lookup() returns NULL anyway if cgroup 
support
is disabled in the kernel config.

Thanks,

Bart.



Re: [PATCH v4 2/2] block: Ensure that a request queue is dissociated from the cgroup controller

2018-08-08 Thread Jens Axboe
On 7/30/18 3:10 PM, Bart Van Assche wrote:
> Several block drivers call alloc_disk() followed by put_disk() if
> something fails before device_add_disk() is called without calling
> blk_cleanup_queue(). Make sure that also for this scenario a request
> queue is dissociated from the cgroup controller. This patch avoids
> that loading the parport_pc, paride and pf drivers triggers the
> following kernel crash:
> 
> BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride]
> Read of size 4 at addr 0008 by task modprobe/744
> Call Trace:
> dump_stack+0x9a/0xeb
> kasan_report+0x139/0x350
> pi_init+0x42e/0x580 [paride]
> pf_init+0x2bb/0x1000 [pf]
> do_one_initcall+0x8e/0x405
> do_init_module+0xd9/0x2f2
> load_module+0x3ab4/0x4700
> SYSC_finit_module+0x176/0x1a0
> do_syscall_64+0xee/0x2b0
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> 
> Reported-by: Alexandru Moise <00moses.alexande...@gmail.com>
> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the 
> block cgroup controller") # v4.17
> Signed-off-by: Bart Van Assche 
> Tested-by: Alexandru Moise <00moses.alexande...@gmail.com>
> Cc: Tejun Heo 
> Cc: Christoph Hellwig 
> Cc: Ming Lei 
> Cc: Johannes Thumshirn 
> Cc: Alexandru Moise <00moses.alexande...@gmail.com>
> Cc: Joseph Qi 
> Cc: 
> ---
>  block/blk-sysfs.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index ca1984ecbdeb..26275d9babcb 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -802,6 +802,31 @@ static void __blk_release_queue(struct work_struct *work)
>   blk_stat_remove_callback(q, q->poll_cb);
>   blk_stat_free_callback(q->poll_cb);
>  
> + if (!blk_queue_dead(q)) {
> + /*
> +  * Last reference was dropped without having called
> +  * blk_cleanup_queue().
> +  */
> + WARN_ONCE(blk_queue_init_done(q),
> +   "request queue %p has been registered but 
> blk_cleanup_queue() has not been called for that queue\n",
> +   q);
> + blk_exit_queue(q);
> + }
> +
> +#ifdef CONFIG_BLK_CGROUP
> + {
> + struct blkcg_gq *blkg;
> +
> + rcu_read_lock();
> + blkg = blkg_lookup(_root, q);
> + rcu_read_unlock();
> +
> + WARN(blkg,
> +  "request queue %p is being released but it has not yet 
> been removed from the blkcg controller\n",
> +  q);
> + }
> +#endif

This last hunk should go in the cgroup code.

-- 
Jens Axboe



Re: [PATCH v5 6/9] block: Change the runtime power management approach (2/2)

2018-08-08 Thread Ming Lei
On Tue, Aug 07, 2018 at 03:51:30PM -0700, Bart Van Assche wrote:
> Instead of allowing requests that are not power management requests
> to enter the queue in runtime suspended status (RPM_SUSPENDED), make
> the blk_get_request() caller block. This change fixes a starvation

Looks not see the related change which blocks blk_get_request() in
this patchset.

BTW, blk_pm_add_request() won't block since it uses the async version
of runtime resume.

> issue: it is now guaranteed that power management requests will be
> executed no matter how many blk_get_request() callers are waiting.
> Instead of maintaining the q->nr_pending counter, rely on
> q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a
> request finishes instead of only if the queue depth drops to zero.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Christoph Hellwig 
> Cc: Ming Lei 
> Cc: Jianchao Wang 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> Cc: Alan Stern 
> ---
>  block/blk-core.c   | 37 -
>  block/blk-mq-debugfs.c |  1 -
>  block/blk-pm.c | 40 +++-
>  block/blk-pm.h |  6 ++
>  include/linux/blkdev.h |  1 -
>  5 files changed, 45 insertions(+), 40 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 179a13be0fca..a2ef253edfbd 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2737,30 +2737,6 @@ void blk_account_io_done(struct request *req, u64 now)
>   }
>  }
>  
> -#ifdef CONFIG_PM
> -/*
> - * Don't process normal requests when queue is suspended
> - * or in the process of suspending/resuming
> - */
> -static bool blk_pm_allow_request(struct request *rq)
> -{
> - switch (rq->q->rpm_status) {
> - case RPM_RESUMING:
> - case RPM_SUSPENDING:
> - return rq->rq_flags & RQF_PM;
> - case RPM_SUSPENDED:
> - return false;
> - default:
> - return true;
> - }
> -}
> -#else
> -static bool blk_pm_allow_request(struct request *rq)
> -{
> - return true;
> -}
> -#endif
> -
>  void blk_account_io_start(struct request *rq, bool new_io)
>  {
>   struct hd_struct *part;
> @@ -2806,11 +2782,14 @@ static struct request *elv_next_request(struct 
> request_queue *q)
>  
>   while (1) {
>   list_for_each_entry(rq, >queue_head, queuelist) {
> - if (blk_pm_allow_request(rq))
> - return rq;
> -
> - if (rq->rq_flags & RQF_SOFTBARRIER)
> - break;
> +#ifdef CONFIG_PM
> + /*
> +  * If a request gets queued in state RPM_SUSPENDED
> +  * then that's a kernel bug.
> +  */
> + WARN_ON_ONCE(q->rpm_status == RPM_SUSPENDED);
> +#endif
> + return rq;
>   }
>  
>   /*
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index a5ea86835fcb..7d74d53dc098 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -332,7 +332,6 @@ static const char *const rqf_name[] = {
>   RQF_NAME(ELVPRIV),
>   RQF_NAME(IO_STAT),
>   RQF_NAME(ALLOCED),
> - RQF_NAME(PM),
>   RQF_NAME(HASHED),
>   RQF_NAME(STATS),
>   RQF_NAME(SPECIAL_PAYLOAD),
> diff --git a/block/blk-pm.c b/block/blk-pm.c
> index bf8532da952d..d6b65cef9764 100644
> --- a/block/blk-pm.c
> +++ b/block/blk-pm.c
> @@ -86,14 +86,39 @@ int blk_pre_runtime_suspend(struct request_queue *q)
>   if (!q->dev)
>   return ret;
>  
> + WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE);
> +
> + blk_set_pm_only(q);
> + /*
> +  * This function only gets called if the most recent
> +  * pm_request_resume() call occurred at least autosuspend_delay_ms
> +  * ago. Since blk_queue_enter() is called by the request allocation
> +  * code before pm_request_resume(), if q_usage_counter indicates that
> +  * no requests are in flight it is safe to suspend the device.
> +  */
> + ret = -EBUSY;
> + if (!percpu_ref_is_in_use(>q_usage_counter)) {
> + /*
> +  * Switch to preempt-only mode before calling
> +  * synchronize_rcu() such that later blk_queue_enter() calls
> +  * see the preempt-only state. See also
> +  * http://lwn.net/Articles/573497/.
> +  */
> + synchronize_rcu();
> + if (!percpu_ref_is_in_use(>q_usage_counter))
> + ret = 0;
> + }
> +

In blk_queue_enter(), V5 starts to allow all RQF_PREEMPT requests
to enter queue even though pm_only is set. That means any scsi_execute()
may issue a new command to host just after the above percpu_ref_is_in_use()
returns 0, meantime the suspend is in-progress.

This case is illegal given RQF_PM is the only kind of request which can be
issued to hardware during suspend.

Thanks,
Ming


Re: [PATCH v5 1/9] block: Change the preempt-only flag into a counter

2018-08-08 Thread Ming Lei
On Tue, Aug 07, 2018 at 03:51:25PM -0700, Bart Van Assche wrote:
> The RQF_PREEMPT flag is used for three purposes:
> - In the SCSI core, for making sure that power management requests
>   are executed if a device is in the "quiesced" state.
> - For domain validation by SCSI drivers that use the parallel port.
> - In the IDE driver, for IDE preempt requests.

I think the above description may not be accurate, BLK_MQ_REQ_PREEMPT is
always set inside scsi_execute(), that means any scsi_execute() callers
can use the flag of RQF_PREEMPT, of course, not limited to the above
three cases.


Thanks,
Ming


Re: [PATCH v4 2/2] block: Ensure that a request queue is dissociated from the cgroup controller

2018-08-08 Thread Johannes Thumshirn
>From my limited insight into this:
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH v4 1/2] block: Introduce blk_exit_queue()

2018-08-08 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH v5 5/9] block: Change the runtime power management approach (1/2)

2018-08-08 Thread jianchao.wang



On 08/08/2018 02:11 PM, jianchao.wang wrote:
> Hi Bart
> 
> On 08/08/2018 06:51 AM, Bart Van Assche wrote:
>> @@ -391,6 +393,9 @@ static struct request *blk_mq_get_request(struct 
>> request_queue *q,
>>  }
>>  }
>>  data->hctx->queued++;
>> +
>> +blk_pm_add_request(q, rq);
>> +
>>  return rq;
>>  }
> 
> The request_queue is in pm_only mode when suspended, who can reach here to do 
> the resume ?

I mean, in the original blk-legacy runtime pm implementation, any new IO could 
trigger the resume,
after your patch set, only the pm request could pass the blk_queue_enter while 
the queue is suspended
and in pm-only mode. But if no resume, where does the pm request come from ?

The blk_pm_add_request should be added to blk_queue_enter.
It looks like as following:
   1. when an normal io reaches blk_queue_enter, if queue is in suspended mode, 
it invoke blk_pm_add_request
  to trigger the resume, then wait here for the pm_only mode to be cleared.
   2. the runtime pm core does the resume work and clear the pm_only more 
finally
   3. the task blocked in blk_queue_enter is waked up and proceed.

Thanks
Jianchao


Re: [PATCH v5 5/9] block: Change the runtime power management approach (1/2)

2018-08-08 Thread jianchao.wang
Hi Bart

On 08/08/2018 06:51 AM, Bart Van Assche wrote:
> @@ -391,6 +393,9 @@ static struct request *blk_mq_get_request(struct 
> request_queue *q,
>   }
>   }
>   data->hctx->queued++;
> +
> + blk_pm_add_request(q, rq);
> +
>   return rq;
>  }

The request_queue is in pm_only mode when suspended, who can reach here to do 
the resume ?

Thanks
Jianchao