Re: [PATCH v7 5/6] block: Make blk_get_request() block for non-PM requests while suspended
On Wed, 2018-09-19 at 10:21 +0800, jianchao.wang wrote: > On 09/19/2018 01:44 AM, Bart Van Assche wrote: > > There is only one blk_pm_request_resume() call and that call is > > inside blk_queue_enter() after the pm_only counter has been > > checked. > > > > For the legacy block layer, nr_pending is increased after the > > blk_queue_enter() call from inside blk_old_get_request() succeeded. > > > > So I don't see how blk_pm_request_resume() or q->nr_pending++ could > > escape from the preempt checking? > > For example: > > > blk_pre_runtime_suspendgeneric_make_request > blk_queue_enter // > success here, no blk_pm_request_resume here > blk_mq_make_requset > blk_set_pm_only > > if (blk_requests_in_flight(q) == 0) { > synchronize_rcu(); > if (blk_requests_in_flight(q) == 0) > ret = 0; > } > blk_mq_get_request Hello Jianchao, I think you are right. I will add a q_usage_counter check before setting 'ret' to zero. Bart.
Re: [PATCH v7 5/6] block: Make blk_get_request() block for non-PM requests while suspended
Hi Bart On 09/19/2018 01:44 AM, Bart Van Assche wrote: >>> + * ago. Since blk_queue_enter() is called by the request allocation >>> + * code before pm_request_resume(), if no requests have a tag assigned >>> + * it is safe to suspend the device. >>> + */ >>> + ret = -EBUSY; >>> + if (blk_requests_in_flight(q) == 0) { >>> + /* >>> + * Call synchronize_rcu() such that later blk_queue_enter() >>> + * calls see the pm-only state. See also >>> + * >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lwn.net_Articles_573497_=DwIDAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=o3yai95U5Ge5APoSiMJX64Z8wlI7gf3x0mFnfHpuA4E=VPnOWAeo4J4VB944La0uBcynCgVHE-qp52b_9pV8NH4=. >>> + */ >>> + synchronize_rcu(); >>> + if (blk_requests_in_flight(q) == 0) >> >> Seems not safe here. >> >> For blk-mq path: >> Someone may have escaped from the preempt checking, missed the >> blk_pm_request_resume there, >> entered into generic_make_request, but have not allocated request and >> occupied any tag. >> >> There could be a similar scenario for blk-legacy path, the q->nr_pending is >> increased when >> request is queued. >> >> So I guess the q_usage_counter checking is still needed here. > > There is only one blk_pm_request_resume() call and that call is inside > blk_queue_enter() after the pm_only counter has been checked. > > For the legacy block layer, nr_pending is increased after the > blk_queue_enter() call from inside blk_old_get_request() succeeded. > > So I don't see how blk_pm_request_resume() or q->nr_pending++ could escape > from the preempt checking? For example: blk_pre_runtime_suspendgeneric_make_request blk_queue_enter // success here, no blk_pm_request_resume here blk_mq_make_requset blk_set_pm_only if (blk_requests_in_flight(q) == 0) { synchronize_rcu(); if (blk_requests_in_flight(q) == 0) ret = 0; } blk_mq_get_request Thanks Jianchao
Re: [PATCH v7 5/6] block: Make blk_get_request() block for non-PM requests while suspended
On 9/17/18 7:39 PM, jianchao.wang wrote: On 09/18/2018 04:15 AM, Bart Van Assche wrote: Instead of allowing requests that are not power management requests to enter the queue in runtime suspended status (RPM_SUSPENDED), make the blk_get_request() caller block. This change fixes a starvation issue: it is now guaranteed that power management requests will be executed no matter how many blk_get_request() callers are waiting. Instead of maintaining the q->nr_pending counter, rely on q->q_usage_counter. Looks like we still depend on this nr_pending for blk-legacy. That's right. I will update the commit message. blk_mq_queue_tag_busy_iter only accounts the driver tags. This will only work w/o io scheduler + /** * blk_pre_runtime_suspend - Pre runtime suspend check * @q: the queue of the device @@ -68,14 +101,38 @@ int blk_pre_runtime_suspend(struct request_queue *q) if (!q->dev) return ret; + WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE); + + blk_set_pm_only(q); + /* +* This function only gets called if the most recent +* pm_request_resume() call occurred at least autosuspend_delay_ms ^^^ pm_runtime_mark_last_busy ? Since every pm_request_resume() call from the block layer is followed by a pm_runtime_mark_last_busy() call and since the latter is called later I think you are right. I will update the comment. +* ago. Since blk_queue_enter() is called by the request allocation +* code before pm_request_resume(), if no requests have a tag assigned +* it is safe to suspend the device. +*/ + ret = -EBUSY; + if (blk_requests_in_flight(q) == 0) { + /* +* Call synchronize_rcu() such that later blk_queue_enter() +* calls see the pm-only state. See also +* https://urldefense.proofpoint.com/v2/url?u=http-3A__lwn.net_Articles_573497_=DwIDAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=o3yai95U5Ge5APoSiMJX64Z8wlI7gf3x0mFnfHpuA4E=VPnOWAeo4J4VB944La0uBcynCgVHE-qp52b_9pV8NH4=. +*/ + synchronize_rcu(); + if (blk_requests_in_flight(q) == 0) Seems not safe here. For blk-mq path: Someone may have escaped from the preempt checking, missed the blk_pm_request_resume there, entered into generic_make_request, but have not allocated request and occupied any tag. There could be a similar scenario for blk-legacy path, the q->nr_pending is increased when request is queued. So I guess the q_usage_counter checking is still needed here. There is only one blk_pm_request_resume() call and that call is inside blk_queue_enter() after the pm_only counter has been checked. For the legacy block layer, nr_pending is increased after the blk_queue_enter() call from inside blk_old_get_request() succeeded. So I don't see how blk_pm_request_resume() or q->nr_pending++ could escape from the preempt checking? Thanks, Bart.
Re: [PATCH v7 5/6] block: Make blk_get_request() block for non-PM requests while suspended
Hi Bart On 09/18/2018 04:15 AM, Bart Van Assche wrote: > Instead of allowing requests that are not power management requests > to enter the queue in runtime suspended status (RPM_SUSPENDED), make > the blk_get_request() caller block. This change fixes a starvation > issue: it is now guaranteed that power management requests will be > executed no matter how many blk_get_request() callers are waiting. > Instead of maintaining the q->nr_pending counter, rely on > q->q_usage_counter. Looks like we still depend on this nr_pending for blk-legacy. Call pm_runtime_mark_last_busy() every time a > request finishes instead of only if the queue depth drops to zero. ... > /* > diff --git a/block/blk-pm.c b/block/blk-pm.c > index 9b636960d285..5f21bedcb307 100644 > --- a/block/blk-pm.c > +++ b/block/blk-pm.c > @@ -1,8 +1,11 @@ > // SPDX-License-Identifier: GPL-2.0 > > +#include > #include > #include > #include > +#include "blk-mq.h" > +#include "blk-mq-tag.h" > > /** > * blk_pm_runtime_init - Block layer runtime PM initialization routine > @@ -40,6 +43,36 @@ void blk_pm_runtime_init(struct request_queue *q, struct > device *dev) > } > EXPORT_SYMBOL(blk_pm_runtime_init); > > +struct in_flight_data { > + struct request_queue*q; > + int in_flight; > +}; > + > +static void blk_count_in_flight(struct blk_mq_hw_ctx *hctx, struct request > *rq, > + void *priv, bool reserved) > +{ > + struct in_flight_data *in_flight = priv; > + > + if (rq->q == in_flight->q) > + in_flight->in_flight++; > +} > + > +/* > + * Count the number of requests that are in flight for request queue @q. Use > + * @q->nr_pending for legacy queues. Iterate over the tag set for blk-mq > + * queues. Use blk_mq_queue_tag_busy_iter() instead of > + * blk_mq_tagset_busy_iter() because the latter only considers requests that > + * have already been started. > + */ > +static int blk_requests_in_flight(struct request_queue *q) > +{ > + struct in_flight_data in_flight = { .q = q }; > + > + if (q->mq_ops) > + blk_mq_queue_tag_busy_iter(q, blk_count_in_flight, _flight); > + return q->nr_pending + in_flight.in_flight; > +} blk_mq_queue_tag_busy_iter only accounts the driver tags. This will only work w/o io scheduler > + > /** > * blk_pre_runtime_suspend - Pre runtime suspend check > * @q: the queue of the device > @@ -68,14 +101,38 @@ int blk_pre_runtime_suspend(struct request_queue *q) > if (!q->dev) > return ret; > > + WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE); > + > + blk_set_pm_only(q); > + /* > + * This function only gets called if the most recent > + * pm_request_resume() call occurred at least autosuspend_delay_ms ^^^ pm_runtime_mark_last_busy ? > + * ago. Since blk_queue_enter() is called by the request allocation > + * code before pm_request_resume(), if no requests have a tag assigned > + * it is safe to suspend the device. > + */ > + ret = -EBUSY; > + if (blk_requests_in_flight(q) == 0) { > + /* > + * Call synchronize_rcu() such that later blk_queue_enter() > + * calls see the pm-only state. See also > + * > https://urldefense.proofpoint.com/v2/url?u=http-3A__lwn.net_Articles_573497_=DwIDAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=o3yai95U5Ge5APoSiMJX64Z8wlI7gf3x0mFnfHpuA4E=VPnOWAeo4J4VB944La0uBcynCgVHE-qp52b_9pV8NH4=. > + */ > + synchronize_rcu(); > + if (blk_requests_in_flight(q) == 0) Seems not safe here. For blk-mq path: Someone may have escaped from the preempt checking, missed the blk_pm_request_resume there, entered into generic_make_request, but have not allocated request and occupied any tag. There could be a similar scenario for blk-legacy path, the q->nr_pending is increased when request is queued. So I guess the q_usage_counter checking is still needed here. > + ret = 0; > + } > + > spin_lock_irq(q->queue_lock); > - if (q->nr_pending) { > - ret = -EBUSY; > + if (ret < 0) > pm_runtime_mark_last_busy(q->dev); > - } else { > + else > q->rpm_status = RPM_SUSPENDING; > - } > spin_unlock_irq(q->queue_lock); > + > + if (ret) > + blk_clear_pm_only(q); > + > return ret; > } > EXPORT_SYMBOL(blk_pre_runtime_suspend); > @@ -106,6 +163,9 @@ void blk_post_runtime_suspend(struct request_queue *q, > int err) > pm_runtime_mark_last_busy(q->dev); > } > spin_unlock_irq(q->queue_lock); > + > + if (err) > + blk_clear_pm_only(q); > } > EXPORT_SYMBOL(blk_post_runtime_suspend); > > @@ -153,13 +213,15 @@ void blk_post_runtime_resume(struct request_queue *q, > int err) > spin_lock_irq(q->queue_lock); >
[PATCH v7 5/6] block: Make blk_get_request() block for non-PM requests while suspended
Instead of allowing requests that are not power management requests to enter the queue in runtime suspended status (RPM_SUSPENDED), make the blk_get_request() caller block. This change fixes a starvation issue: it is now guaranteed that power management requests will be executed no matter how many blk_get_request() callers are waiting. Instead of maintaining the q->nr_pending counter, rely on q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a request finishes instead of only if the queue depth drops to zero. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Ming Lei Cc: Jianchao Wang Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: Alan Stern --- block/blk-core.c | 37 ++--- block/blk-pm.c | 72 2 files changed, 75 insertions(+), 34 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 18b874d5c9c9..ae092ca121d5 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2747,30 +2747,6 @@ void blk_account_io_done(struct request *req, u64 now) } } -#ifdef CONFIG_PM -/* - * Don't process normal requests when queue is suspended - * or in the process of suspending/resuming - */ -static bool blk_pm_allow_request(struct request *rq) -{ - switch (rq->q->rpm_status) { - case RPM_RESUMING: - case RPM_SUSPENDING: - return rq->rq_flags & RQF_PM; - case RPM_SUSPENDED: - return false; - default: - return true; - } -} -#else -static bool blk_pm_allow_request(struct request *rq) -{ - return true; -} -#endif - void blk_account_io_start(struct request *rq, bool new_io) { struct hd_struct *part; @@ -2816,11 +2792,14 @@ static struct request *elv_next_request(struct request_queue *q) while (1) { list_for_each_entry(rq, >queue_head, queuelist) { - if (blk_pm_allow_request(rq)) - return rq; - - if (rq->rq_flags & RQF_SOFTBARRIER) - break; +#ifdef CONFIG_PM + /* +* If a request gets queued in state RPM_SUSPENDED +* then that's a kernel bug. +*/ + WARN_ON_ONCE(q->rpm_status == RPM_SUSPENDED); +#endif + return rq; } /* diff --git a/block/blk-pm.c b/block/blk-pm.c index 9b636960d285..5f21bedcb307 100644 --- a/block/blk-pm.c +++ b/block/blk-pm.c @@ -1,8 +1,11 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include +#include "blk-mq.h" +#include "blk-mq-tag.h" /** * blk_pm_runtime_init - Block layer runtime PM initialization routine @@ -40,6 +43,36 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev) } EXPORT_SYMBOL(blk_pm_runtime_init); +struct in_flight_data { + struct request_queue*q; + int in_flight; +}; + +static void blk_count_in_flight(struct blk_mq_hw_ctx *hctx, struct request *rq, + void *priv, bool reserved) +{ + struct in_flight_data *in_flight = priv; + + if (rq->q == in_flight->q) + in_flight->in_flight++; +} + +/* + * Count the number of requests that are in flight for request queue @q. Use + * @q->nr_pending for legacy queues. Iterate over the tag set for blk-mq + * queues. Use blk_mq_queue_tag_busy_iter() instead of + * blk_mq_tagset_busy_iter() because the latter only considers requests that + * have already been started. + */ +static int blk_requests_in_flight(struct request_queue *q) +{ + struct in_flight_data in_flight = { .q = q }; + + if (q->mq_ops) + blk_mq_queue_tag_busy_iter(q, blk_count_in_flight, _flight); + return q->nr_pending + in_flight.in_flight; +} + /** * blk_pre_runtime_suspend - Pre runtime suspend check * @q: the queue of the device @@ -68,14 +101,38 @@ int blk_pre_runtime_suspend(struct request_queue *q) if (!q->dev) return ret; + WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE); + + blk_set_pm_only(q); + /* +* This function only gets called if the most recent +* pm_request_resume() call occurred at least autosuspend_delay_ms +* ago. Since blk_queue_enter() is called by the request allocation +* code before pm_request_resume(), if no requests have a tag assigned +* it is safe to suspend the device. +*/ + ret = -EBUSY; + if (blk_requests_in_flight(q) == 0) { + /* +* Call synchronize_rcu() such that later blk_queue_enter() +* calls see the pm-only state. See also +* http://lwn.net/Articles/573497/. +*/ + synchronize_rcu(); + if (blk_requests_in_flight(q) == 0) + ret = 0; +
[PATCH v7 5/6] block: Make blk_get_request() block for non-PM requests while suspended
Instead of allowing requests that are not power management requests to enter the queue in runtime suspended status (RPM_SUSPENDED), make the blk_get_request() caller block. This change fixes a starvation issue: it is now guaranteed that power management requests will be executed no matter how many blk_get_request() callers are waiting. Instead of maintaining the q->nr_pending counter, rely on q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a request finishes instead of only if the queue depth drops to zero. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Ming Lei Cc: Jianchao Wang Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: Alan Stern --- block/blk-core.c | 37 ++--- block/blk-pm.c | 72 2 files changed, 75 insertions(+), 34 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 18b874d5c9c9..ae092ca121d5 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2747,30 +2747,6 @@ void blk_account_io_done(struct request *req, u64 now) } } -#ifdef CONFIG_PM -/* - * Don't process normal requests when queue is suspended - * or in the process of suspending/resuming - */ -static bool blk_pm_allow_request(struct request *rq) -{ - switch (rq->q->rpm_status) { - case RPM_RESUMING: - case RPM_SUSPENDING: - return rq->rq_flags & RQF_PM; - case RPM_SUSPENDED: - return false; - default: - return true; - } -} -#else -static bool blk_pm_allow_request(struct request *rq) -{ - return true; -} -#endif - void blk_account_io_start(struct request *rq, bool new_io) { struct hd_struct *part; @@ -2816,11 +2792,14 @@ static struct request *elv_next_request(struct request_queue *q) while (1) { list_for_each_entry(rq, >queue_head, queuelist) { - if (blk_pm_allow_request(rq)) - return rq; - - if (rq->rq_flags & RQF_SOFTBARRIER) - break; +#ifdef CONFIG_PM + /* +* If a request gets queued in state RPM_SUSPENDED +* then that's a kernel bug. +*/ + WARN_ON_ONCE(q->rpm_status == RPM_SUSPENDED); +#endif + return rq; } /* diff --git a/block/blk-pm.c b/block/blk-pm.c index 9b636960d285..5f21bedcb307 100644 --- a/block/blk-pm.c +++ b/block/blk-pm.c @@ -1,8 +1,11 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include +#include "blk-mq.h" +#include "blk-mq-tag.h" /** * blk_pm_runtime_init - Block layer runtime PM initialization routine @@ -40,6 +43,36 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev) } EXPORT_SYMBOL(blk_pm_runtime_init); +struct in_flight_data { + struct request_queue*q; + int in_flight; +}; + +static void blk_count_in_flight(struct blk_mq_hw_ctx *hctx, struct request *rq, + void *priv, bool reserved) +{ + struct in_flight_data *in_flight = priv; + + if (rq->q == in_flight->q) + in_flight->in_flight++; +} + +/* + * Count the number of requests that are in flight for request queue @q. Use + * @q->nr_pending for legacy queues. Iterate over the tag set for blk-mq + * queues. Use blk_mq_queue_tag_busy_iter() instead of + * blk_mq_tagset_busy_iter() because the latter only considers requests that + * have already been started. + */ +static int blk_requests_in_flight(struct request_queue *q) +{ + struct in_flight_data in_flight = { .q = q }; + + if (q->mq_ops) + blk_mq_queue_tag_busy_iter(q, blk_count_in_flight, _flight); + return q->nr_pending + in_flight.in_flight; +} + /** * blk_pre_runtime_suspend - Pre runtime suspend check * @q: the queue of the device @@ -68,14 +101,38 @@ int blk_pre_runtime_suspend(struct request_queue *q) if (!q->dev) return ret; + WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE); + + blk_set_pm_only(q); + /* +* This function only gets called if the most recent +* pm_request_resume() call occurred at least autosuspend_delay_ms +* ago. Since blk_queue_enter() is called by the request allocation +* code before pm_request_resume(), if no requests have a tag assigned +* it is safe to suspend the device. +*/ + ret = -EBUSY; + if (blk_requests_in_flight(q) == 0) { + /* +* Call synchronize_rcu() such that later blk_queue_enter() +* calls see the pm-only state. See also +* http://lwn.net/Articles/573497/. +*/ + synchronize_rcu(); + if (blk_requests_in_flight(q) == 0) + ret = 0; +