[PATCH] nbd: set sk->sk_sndtimeo for our sockets

2017-06-08 Thread josef
From: Josef Bacik 

If the nbd server stops receiving packets altogether we will get stuck
waiting for them to receive indefinitely as the tcp buffer will never
empty, which looks like a deadlock.  Fix this by setting the sk send
timeout to our configured timeout, that way if the server really
misbehaves we'll disconnect cleanly instead of waiting forever.

Reported-by: Dan Melnic 
Signed-off-by: Josef Bacik 
---
 drivers/block/nbd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index f3f191b..ac0f7a8 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -910,6 +910,7 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, 
unsigned long arg)
continue;
}
sk_set_memalloc(sock->sk);
+   sock->sk->sk_sndtimeo = nbd->tag_set.timeout;
atomic_inc(>recv_threads);
refcount_inc(>config_refs);
old = nsock->sock;
@@ -1071,6 +1072,7 @@ static int nbd_start_device(struct nbd_device *nbd)
return -ENOMEM;
}
sk_set_memalloc(config->socks[i]->sock->sk);
+   config->socks[i]->sock->sk->sk_sndtimeo = nbd->tag_set.timeout;
atomic_inc(>recv_threads);
refcount_inc(>config_refs);
INIT_WORK(>work, recv_work);
-- 
2.7.4



[PATCH v3 00/12] More patches for kernel v4.13

2017-06-08 Thread Bart Van Assche
Hello Jens,

This patch series contains one patch that reduces the size of struct
blk_mq_hw_ctx, a few patches that simplify some of the block layer code and
also patches that improve the block layer documentation. Please consider these
patches for kernel v4.13.

The basis for these patches is a merge of your for-next and for-linus
branches. Please note that the first patch in this series applies fine on top
of the merge of these two branches but not on top of your for-next branch.

Thanks,

Bart.

Changes between v2 and v3:
* Added patch "blk-mq: Reduce blk_mq_hw_ctx size".
* Removed patch "block: Rename blk_mq_rq_{to,from}_pdu()".
* Addressed Christoph's review comments about patch "block: Introduce
  request_queue.initialize_rq_fn()".
* Rebased (and retested) this patch series on top of a merge of Jens'
  for-next and for-linus branches.

Changes between v1 and v2:
* Addressed Christoph's comment about moving the .initialize_rq_fn() call
  from blk_rq_init() / blk_mq_rq_ctx_init() into blk_get_request().
* Left out patch "scsi: Make scsi_ioctl_reset() pass the request queue pointer
  to blk_rq_init()" since it's no longer needed.
* Restored the scsi_req_init() call in ide_prep_sense().
* Combined the two patches that reduce the blk_mq_hw_ctx size into a single
  patch.
* Modified patch "blk-mq: Initialize a request before assigning a tag" such
  that .tag and .internal_tag are no longer initialized twice.
* Removed WARN_ON_ONCE(q->mq_ops) from blk_queue_bypass_end() because this
  function is used by both blk-sq and blk-mq.
* Added several new patches, e.g. "block: Rename blk_mq_rq_{to,from}_pdu()".

Bart Van Assche (12):
  blk-mq: Reduce blk_mq_hw_ctx size
  block: Make request operation type argument declarations consistent
  block: Introduce request_queue.initialize_rq_fn()
  block: Make most scsi_req_init() calls implicit
  block: Change argument type of scsi_req_init()
  blk-mq: Initialize a request before assigning a tag
  block: Add a comment above queue_lockdep_assert_held()
  block: Check locking assumptions at runtime
  block: Document what queue type each function is intended for
  blk-mq: Document locking assumptions
  block: Constify disk_type
  blk-mq: Warn when attempting to run a hardware queue that is not
mapped

 block/blk-core.c   | 130 -
 block/blk-flush.c  |   8 ++-
 block/blk-merge.c  |   3 +
 block/blk-mq-sched.c   |   2 +
 block/blk-mq.c |  60 +++--
 block/blk-tag.c|  15 ++---
 block/blk-timeout.c|   4 +-
 block/blk.h|   2 +
 block/bsg.c|   1 -
 block/genhd.c  |   4 +-
 block/scsi_ioctl.c |  13 ++--
 drivers/block/pktcdvd.c|   1 -
 drivers/cdrom/cdrom.c  |   1 -
 drivers/ide/ide-atapi.c|   3 +-
 drivers/ide/ide-cd.c   |   1 -
 drivers/ide/ide-cd_ioctl.c |   1 -
 drivers/ide/ide-devsets.c  |   1 -
 drivers/ide/ide-disk.c |   1 -
 drivers/ide/ide-ioctls.c   |   2 -
 drivers/ide/ide-park.c |   2 -
 drivers/ide/ide-pm.c   |   2 -
 drivers/ide/ide-probe.c|   6 +-
 drivers/ide/ide-tape.c |   1 -
 drivers/ide/ide-taskfile.c |   1 -
 drivers/scsi/osd/osd_initiator.c   |   2 -
 drivers/scsi/osst.c|   1 -
 drivers/scsi/scsi_error.c  |   1 -
 drivers/scsi/scsi_lib.c|  12 +++-
 drivers/scsi/scsi_transport_sas.c  |   6 ++
 drivers/scsi/sg.c  |   2 -
 drivers/scsi/st.c  |   1 -
 drivers/target/target_core_pscsi.c |   2 -
 fs/nfsd/blocklayout.c  |   1 -
 include/linux/blk-mq.h |  13 ++--
 include/linux/blkdev.h |  14 +++-
 include/scsi/scsi_request.h|   2 +-
 36 files changed, 213 insertions(+), 109 deletions(-)

-- 
2.12.2



[PATCH v3 02/12] block: Make request operation type argument declarations consistent

2017-06-08 Thread Bart Van Assche
Instead of declaring the second argument of blk_*_get_request()
as int and passing it to functions that expect an unsigned int,
declare that second argument as unsigned int. Also because of
consistency, rename that second argument from 'rw' into 'op'.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Ming Lei 
---
 block/blk-core.c   | 13 +++--
 block/blk-mq.c | 10 +-
 include/linux/blk-mq.h |  6 +++---
 include/linux/blkdev.h |  3 ++-
 4 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a7421b772d0e..3bc431a77309 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1283,8 +1283,8 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
goto retry;
 }
 
-static struct request *blk_old_get_request(struct request_queue *q, int rw,
-   gfp_t gfp_mask)
+static struct request *blk_old_get_request(struct request_queue *q,
+  unsigned int op, gfp_t gfp_mask)
 {
struct request *rq;
 
@@ -1292,7 +1292,7 @@ static struct request *blk_old_get_request(struct 
request_queue *q, int rw,
create_io_context(gfp_mask, q->node);
 
spin_lock_irq(q->queue_lock);
-   rq = get_request(q, rw, NULL, gfp_mask);
+   rq = get_request(q, op, NULL, gfp_mask);
if (IS_ERR(rq)) {
spin_unlock_irq(q->queue_lock);
return rq;
@@ -1305,14 +1305,15 @@ static struct request *blk_old_get_request(struct 
request_queue *q, int rw,
return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, int rw, gfp_t 
gfp_mask)
+struct request *blk_get_request(struct request_queue *q, unsigned int op,
+   gfp_t gfp_mask)
 {
if (q->mq_ops)
-   return blk_mq_alloc_request(q, rw,
+   return blk_mq_alloc_request(q, op,
(gfp_mask & __GFP_DIRECT_RECLAIM) ?
0 : BLK_MQ_REQ_NOWAIT);
else
-   return blk_old_get_request(q, rw, gfp_mask);
+   return blk_old_get_request(q, op, gfp_mask);
 }
 EXPORT_SYMBOL(blk_get_request);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ef64a3ea4e83..8cd5261ca1ab 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -278,7 +278,7 @@ struct request *__blk_mq_alloc_request(struct 
blk_mq_alloc_data *data,
 }
 EXPORT_SYMBOL_GPL(__blk_mq_alloc_request);
 
-struct request *blk_mq_alloc_request(struct request_queue *q, int rw,
+struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
unsigned int flags)
 {
struct blk_mq_alloc_data alloc_data = { .flags = flags };
@@ -289,7 +289,7 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, int rw,
if (ret)
return ERR_PTR(ret);
 
-   rq = blk_mq_sched_get_request(q, NULL, rw, _data);
+   rq = blk_mq_sched_get_request(q, NULL, op, _data);
 
blk_mq_put_ctx(alloc_data.ctx);
blk_queue_exit(q);
@@ -304,8 +304,8 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, int rw,
 }
 EXPORT_SYMBOL(blk_mq_alloc_request);
 
-struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
-   unsigned int flags, unsigned int hctx_idx)
+struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
+   unsigned int op, unsigned int flags, unsigned int hctx_idx)
 {
struct blk_mq_alloc_data alloc_data = { .flags = flags };
struct request *rq;
@@ -340,7 +340,7 @@ struct request *blk_mq_alloc_request_hctx(struct 
request_queue *q, int rw,
cpu = cpumask_first(alloc_data.hctx->cpumask);
alloc_data.ctx = __blk_mq_get_ctx(q, cpu);
 
-   rq = blk_mq_sched_get_request(q, NULL, rw, _data);
+   rq = blk_mq_sched_get_request(q, NULL, op, _data);
 
blk_queue_exit(q);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index c534ec64e214..a4759fd34e7e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -205,10 +205,10 @@ enum {
BLK_MQ_REQ_INTERNAL = (1 << 2), /* allocate internal/sched tag */
 };
 
-struct request *blk_mq_alloc_request(struct request_queue *q, int rw,
+struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
unsigned int flags);
-struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int op,
-   unsigned int flags, unsigned int hctx_idx);
+struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
+   unsigned int op, unsigned int flags, unsigned int hctx_idx);
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag);
 
 enum {
diff --git a/include/linux/blkdev.h 

[PATCH v3 09/12] block: Document what queue type each function is intended for

2017-06-08 Thread Bart Van Assche
Some functions in block/blk-core.c must only be used on blk-sq queues
while others are safe to use against any queue type. Document which
functions are intended for blk-sq queues and issue a warning if the
blk-sq API is misused. This does not only help block driver authors
but will also make it easier to remove the blk-sq code once that code
is declared obsolete.

Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Ming Lei 
---
 block/blk-core.c | 33 +
 block/blk.h  |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 6bbdce8b8b6f..ab4cb509c170 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -182,6 +182,7 @@ static void blk_delay_work(struct work_struct *work)
 void blk_delay_queue(struct request_queue *q, unsigned long msecs)
 {
lockdep_assert_held(q->queue_lock);
+   WARN_ON_ONCE(q->mq_ops);
 
if (likely(!blk_queue_dead(q)))
queue_delayed_work(kblockd_workqueue, >delay_work,
@@ -201,6 +202,7 @@ EXPORT_SYMBOL(blk_delay_queue);
 void blk_start_queue_async(struct request_queue *q)
 {
lockdep_assert_held(q->queue_lock);
+   WARN_ON_ONCE(q->mq_ops);
 
queue_flag_clear(QUEUE_FLAG_STOPPED, q);
blk_run_queue_async(q);
@@ -220,6 +222,7 @@ void blk_start_queue(struct request_queue *q)
 {
lockdep_assert_held(q->queue_lock);
WARN_ON(!irqs_disabled());
+   WARN_ON_ONCE(q->mq_ops);
 
queue_flag_clear(QUEUE_FLAG_STOPPED, q);
__blk_run_queue(q);
@@ -243,6 +246,7 @@ EXPORT_SYMBOL(blk_start_queue);
 void blk_stop_queue(struct request_queue *q)
 {
lockdep_assert_held(q->queue_lock);
+   WARN_ON_ONCE(q->mq_ops);
 
cancel_delayed_work(>delay_work);
queue_flag_set(QUEUE_FLAG_STOPPED, q);
@@ -297,6 +301,7 @@ EXPORT_SYMBOL(blk_sync_queue);
 inline void __blk_run_queue_uncond(struct request_queue *q)
 {
lockdep_assert_held(q->queue_lock);
+   WARN_ON_ONCE(q->mq_ops);
 
if (unlikely(blk_queue_dead(q)))
return;
@@ -324,6 +329,7 @@ EXPORT_SYMBOL_GPL(__blk_run_queue_uncond);
 void __blk_run_queue(struct request_queue *q)
 {
lockdep_assert_held(q->queue_lock);
+   WARN_ON_ONCE(q->mq_ops);
 
if (unlikely(blk_queue_stopped(q)))
return;
@@ -348,6 +354,7 @@ EXPORT_SYMBOL(__blk_run_queue);
 void blk_run_queue_async(struct request_queue *q)
 {
lockdep_assert_held(q->queue_lock);
+   WARN_ON_ONCE(q->mq_ops);
 
if (likely(!blk_queue_stopped(q) && !blk_queue_dead(q)))
mod_delayed_work(kblockd_workqueue, >delay_work, 0);
@@ -366,6 +373,8 @@ void blk_run_queue(struct request_queue *q)
 {
unsigned long flags;
 
+   WARN_ON_ONCE(q->mq_ops);
+
spin_lock_irqsave(q->queue_lock, flags);
__blk_run_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
@@ -394,6 +403,7 @@ static void __blk_drain_queue(struct request_queue *q, bool 
drain_all)
int i;
 
lockdep_assert_held(q->queue_lock);
+   WARN_ON_ONCE(q->mq_ops);
 
while (true) {
bool drain = false;
@@ -472,6 +482,8 @@ static void __blk_drain_queue(struct request_queue *q, bool 
drain_all)
  */
 void blk_queue_bypass_start(struct request_queue *q)
 {
+   WARN_ON_ONCE(q->mq_ops);
+
spin_lock_irq(q->queue_lock);
q->bypass_depth++;
queue_flag_set(QUEUE_FLAG_BYPASS, q);
@@ -498,6 +510,9 @@ EXPORT_SYMBOL_GPL(blk_queue_bypass_start);
  * @q: queue of interest
  *
  * Leave bypass mode and restore the normal queueing behavior.
+ *
+ * Note: although blk_queue_bypass_start() is only called for blk-sq queues,
+ * this function is called for both blk-sq and blk-mq queues.
  */
 void blk_queue_bypass_end(struct request_queue *q)
 {
@@ -895,6 +910,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio);
 
 int blk_init_allocated_queue(struct request_queue *q)
 {
+   WARN_ON_ONCE(q->mq_ops);
+
q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, q->cmd_size);
if (!q->fq)
return -ENOMEM;
@@ -1032,6 +1049,8 @@ int blk_update_nr_requests(struct request_queue *q, 
unsigned int nr)
struct request_list *rl;
int on_thresh, off_thresh;
 
+   WARN_ON_ONCE(q->mq_ops);
+
spin_lock_irq(q->queue_lock);
q->nr_requests = nr;
blk_queue_congestion_threshold(q);
@@ -1270,6 +1289,7 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
struct request *rq;
 
lockdep_assert_held(q->queue_lock);
+   WARN_ON_ONCE(q->mq_ops);
 
rl = blk_get_rl(q, bio);/* transferred to @rq on success */
 retry:
@@ -1309,6 +1329,8 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
 {
struct 

[PATCH v3 08/12] block: Check locking assumptions at runtime

2017-06-08 Thread Bart Van Assche
Instead of documenting the locking assumptions of most block layer
functions as a comment, use lockdep_assert_held() to verify locking
assumptions at runtime.

Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Ming Lei 
---
 block/blk-core.c| 71 +++--
 block/blk-flush.c   |  8 +++---
 block/blk-merge.c   |  3 +++
 block/blk-tag.c | 15 +--
 block/blk-timeout.c |  4 ++-
 5 files changed, 64 insertions(+), 37 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 86fc08898fac..6bbdce8b8b6f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -177,10 +177,12 @@ static void blk_delay_work(struct work_struct *work)
  * Description:
  *   Sometimes queueing needs to be postponed for a little while, to allow
  *   resources to come back. This function will make sure that queueing is
- *   restarted around the specified time. Queue lock must be held.
+ *   restarted around the specified time.
  */
 void blk_delay_queue(struct request_queue *q, unsigned long msecs)
 {
+   lockdep_assert_held(q->queue_lock);
+
if (likely(!blk_queue_dead(q)))
queue_delayed_work(kblockd_workqueue, >delay_work,
   msecs_to_jiffies(msecs));
@@ -198,6 +200,8 @@ EXPORT_SYMBOL(blk_delay_queue);
  **/
 void blk_start_queue_async(struct request_queue *q)
 {
+   lockdep_assert_held(q->queue_lock);
+
queue_flag_clear(QUEUE_FLAG_STOPPED, q);
blk_run_queue_async(q);
 }
@@ -210,10 +214,11 @@ EXPORT_SYMBOL(blk_start_queue_async);
  * Description:
  *   blk_start_queue() will clear the stop flag on the queue, and call
  *   the request_fn for the queue if it was in a stopped state when
- *   entered. Also see blk_stop_queue(). Queue lock must be held.
+ *   entered. Also see blk_stop_queue().
  **/
 void blk_start_queue(struct request_queue *q)
 {
+   lockdep_assert_held(q->queue_lock);
WARN_ON(!irqs_disabled());
 
queue_flag_clear(QUEUE_FLAG_STOPPED, q);
@@ -233,10 +238,12 @@ EXPORT_SYMBOL(blk_start_queue);
  *   or if it simply chooses not to queue more I/O at one point, it can
  *   call this function to prevent the request_fn from being called until
  *   the driver has signalled it's ready to go again. This happens by calling
- *   blk_start_queue() to restart queue operations. Queue lock must be held.
+ *   blk_start_queue() to restart queue operations.
  **/
 void blk_stop_queue(struct request_queue *q)
 {
+   lockdep_assert_held(q->queue_lock);
+
cancel_delayed_work(>delay_work);
queue_flag_set(QUEUE_FLAG_STOPPED, q);
 }
@@ -289,6 +296,8 @@ EXPORT_SYMBOL(blk_sync_queue);
  */
 inline void __blk_run_queue_uncond(struct request_queue *q)
 {
+   lockdep_assert_held(q->queue_lock);
+
if (unlikely(blk_queue_dead(q)))
return;
 
@@ -310,11 +319,12 @@ EXPORT_SYMBOL_GPL(__blk_run_queue_uncond);
  * @q: The queue to run
  *
  * Description:
- *See @blk_run_queue. This variant must be called with the queue lock
- *held and interrupts disabled.
+ *See @blk_run_queue.
  */
 void __blk_run_queue(struct request_queue *q)
 {
+   lockdep_assert_held(q->queue_lock);
+
if (unlikely(blk_queue_stopped(q)))
return;
 
@@ -328,10 +338,17 @@ EXPORT_SYMBOL(__blk_run_queue);
  *
  * Description:
  *Tells kblockd to perform the equivalent of @blk_run_queue on behalf
- *of us. The caller must hold the queue lock.
+ *of us.
+ *
+ * Note:
+ *Since it is not allowed to run q->delay_work after blk_cleanup_queue()
+ *has canceled q->delay_work, callers must hold the queue lock to avoid
+ *race conditions between blk_cleanup_queue() and blk_run_queue_async().
  */
 void blk_run_queue_async(struct request_queue *q)
 {
+   lockdep_assert_held(q->queue_lock);
+
if (likely(!blk_queue_stopped(q) && !blk_queue_dead(q)))
mod_delayed_work(kblockd_workqueue, >delay_work, 0);
 }
@@ -1077,6 +1094,8 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
int may_queue;
req_flags_t rq_flags = RQF_ALLOCED;
 
+   lockdep_assert_held(q->queue_lock);
+
if (unlikely(blk_queue_dying(q)))
return ERR_PTR(-ENODEV);
 
@@ -1250,6 +1269,8 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
struct request_list *rl;
struct request *rq;
 
+   lockdep_assert_held(q->queue_lock);
+
rl = blk_get_rl(q, bio);/* transferred to @rq on success */
 retry:
rq = __get_request(rl, op, bio, gfp_mask);
@@ -1342,6 +1363,8 @@ EXPORT_SYMBOL(blk_get_request);
  */
 void blk_requeue_request(struct request_queue *q, struct request *rq)
 {
+   lockdep_assert_held(q->queue_lock);
+
blk_delete_timer(rq);
 

[PATCH v3 01/12] blk-mq: Reduce blk_mq_hw_ctx size

2017-06-08 Thread Bart Van Assche
Since the srcu structure is rather large (184 bytes on an x86-64
system with kernel debugging disabled), only allocate it if needed.

Reported-by: Ming Lei 
Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Ming Lei 
---
 block/blk-mq.c | 30 ++
 include/linux/blk-mq.h |  5 +++--
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e8b1bc87274..ef64a3ea4e83 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -172,7 +172,7 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 
queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->flags & BLK_MQ_F_BLOCKING)
-   synchronize_srcu(>queue_rq_srcu);
+   synchronize_srcu(hctx->queue_rq_srcu);
else
rcu = true;
}
@@ -1056,9 +1056,9 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
} else {
might_sleep();
 
-   srcu_idx = srcu_read_lock(>queue_rq_srcu);
+   srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
blk_mq_sched_dispatch_requests(hctx);
-   srcu_read_unlock(>queue_rq_srcu, srcu_idx);
+   srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
}
 }
 
@@ -1460,9 +1460,9 @@ static void blk_mq_try_issue_directly(struct 
blk_mq_hw_ctx *hctx,
 
might_sleep();
 
-   srcu_idx = srcu_read_lock(>queue_rq_srcu);
+   srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
__blk_mq_try_issue_directly(hctx, rq, cookie, true);
-   srcu_read_unlock(>queue_rq_srcu, srcu_idx);
+   srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
}
 }
 
@@ -1806,7 +1806,7 @@ static void blk_mq_exit_hctx(struct request_queue *q,
set->ops->exit_hctx(hctx, hctx_idx);
 
if (hctx->flags & BLK_MQ_F_BLOCKING)
-   cleanup_srcu_struct(>queue_rq_srcu);
+   cleanup_srcu_struct(hctx->queue_rq_srcu);
 
blk_mq_remove_cpuhp(hctx);
blk_free_flush_queue(hctx->fq);
@@ -1879,7 +1879,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
goto free_fq;
 
if (hctx->flags & BLK_MQ_F_BLOCKING)
-   init_srcu_struct(>queue_rq_srcu);
+   init_srcu_struct(hctx->queue_rq_srcu);
 
blk_mq_debugfs_register_hctx(q, hctx);
 
@@ -2154,6 +2154,20 @@ struct request_queue *blk_mq_init_queue(struct 
blk_mq_tag_set *set)
 }
 EXPORT_SYMBOL(blk_mq_init_queue);
 
+static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set)
+{
+   int hw_ctx_size = sizeof(struct blk_mq_hw_ctx);
+
+   BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, queue_rq_srcu),
+  __alignof__(struct blk_mq_hw_ctx)) !=
+sizeof(struct blk_mq_hw_ctx));
+
+   if (tag_set->flags & BLK_MQ_F_BLOCKING)
+   hw_ctx_size += sizeof(struct srcu_struct);
+
+   return hw_ctx_size;
+}
+
 static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
struct request_queue *q)
 {
@@ -2168,7 +2182,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set 
*set,
continue;
 
node = blk_mq_hw_queue_to_node(q->mq_map, i);
-   hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx),
+   hctxs[i] = kzalloc_node(blk_mq_hw_ctx_size(set),
GFP_KERNEL, node);
if (!hctxs[i])
break;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index fcd641032f8d..c534ec64e214 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -39,8 +39,6 @@ struct blk_mq_hw_ctx {
struct blk_mq_tags  *tags;
struct blk_mq_tags  *sched_tags;
 
-   struct srcu_struct  queue_rq_srcu;
-
unsigned long   queued;
unsigned long   run;
 #define BLK_MQ_MAX_DISPATCH_ORDER  7
@@ -62,6 +60,9 @@ struct blk_mq_hw_ctx {
struct dentry   *debugfs_dir;
struct dentry   *sched_debugfs_dir;
 #endif
+
+   /* Must be the last member - see also blk_mq_hw_ctx_size(). */
+   struct srcu_struct  queue_rq_srcu[0];
 };
 
 struct blk_mq_tag_set {
-- 
2.12.2



[PATCH v3 11/12] block: Constify disk_type

2017-06-08 Thread Bart Van Assche
The variable 'disk_type' is never modified so constify it.

Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Ming Lei 
---
 block/genhd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index d252d29fe837..7f520fa25d16 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -36,7 +36,7 @@ struct kobject *block_depr;
 static DEFINE_SPINLOCK(ext_devt_lock);
 static DEFINE_IDR(ext_devt_idr);
 
-static struct device_type disk_type;
+static const struct device_type disk_type;
 
 static void disk_check_events(struct disk_events *ev,
  unsigned int *clearing_ptr);
@@ -1183,7 +1183,7 @@ static char *block_devnode(struct device *dev, umode_t 
*mode,
return NULL;
 }
 
-static struct device_type disk_type = {
+static const struct device_type disk_type = {
.name   = "disk",
.groups = disk_attr_groups,
.release= disk_release,
-- 
2.12.2



[PATCH v3 04/12] block: Make most scsi_req_init() calls implicit

2017-06-08 Thread Bart Van Assche
Instead of explicitly calling scsi_req_init() after blk_get_request(),
call that function from inside blk_get_request(). Add an
.initialize_rq_fn() callback function to the block drivers that need
it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn()
because it is too small to keep it as a separate function. Keep the
scsi_req_init() call in ide_prep_sense() because it follows a
blk_rq_init() call.

References: commit 82ed4db499b8 ("block: split scsi_request out of struct 
request")
Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Nicholas Bellinger 
---
 block/bsg.c|  1 -
 block/scsi_ioctl.c |  3 ---
 drivers/block/pktcdvd.c|  1 -
 drivers/cdrom/cdrom.c  |  1 -
 drivers/ide/ide-atapi.c|  1 -
 drivers/ide/ide-cd.c   |  1 -
 drivers/ide/ide-cd_ioctl.c |  1 -
 drivers/ide/ide-devsets.c  |  1 -
 drivers/ide/ide-disk.c |  1 -
 drivers/ide/ide-ioctls.c   |  2 --
 drivers/ide/ide-park.c |  2 --
 drivers/ide/ide-pm.c   |  2 --
 drivers/ide/ide-probe.c|  6 +++---
 drivers/ide/ide-tape.c |  1 -
 drivers/ide/ide-taskfile.c |  1 -
 drivers/scsi/osd/osd_initiator.c   |  2 --
 drivers/scsi/osst.c|  1 -
 drivers/scsi/scsi_error.c  |  1 -
 drivers/scsi/scsi_lib.c| 10 +-
 drivers/scsi/scsi_transport_sas.c  |  6 ++
 drivers/scsi/sg.c  |  2 --
 drivers/scsi/st.c  |  1 -
 drivers/target/target_core_pscsi.c |  2 --
 fs/nfsd/blocklayout.c  |  1 -
 24 files changed, 18 insertions(+), 33 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 40db8ff4c618..84ec1b19d516 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -236,7 +236,6 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, 
fmode_t has_write_perm)
rq = blk_get_request(q, op, GFP_KERNEL);
if (IS_ERR(rq))
return rq;
-   scsi_req_init(rq);
 
ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, has_write_perm);
if (ret)
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 4a294a5f7fab..f96c51f5df40 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -326,7 +326,6 @@ static int sg_io(struct request_queue *q, struct gendisk 
*bd_disk,
if (IS_ERR(rq))
return PTR_ERR(rq);
req = scsi_req(rq);
-   scsi_req_init(rq);
 
if (hdr->cmd_len > BLK_MAX_CDB) {
req->cmd = kzalloc(hdr->cmd_len, GFP_KERNEL);
@@ -456,7 +455,6 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk 
*disk, fmode_t mode,
goto error_free_buffer;
}
req = scsi_req(rq);
-   scsi_req_init(rq);
 
cmdlen = COMMAND_SIZE(opcode);
 
@@ -542,7 +540,6 @@ static int __blk_send_generic(struct request_queue *q, 
struct gendisk *bd_disk,
rq = blk_get_request(q, REQ_OP_SCSI_OUT, __GFP_RECLAIM);
if (IS_ERR(rq))
return PTR_ERR(rq);
-   scsi_req_init(rq);
rq->timeout = BLK_DEFAULT_SG_TIMEOUT;
scsi_req(rq)->cmd[0] = cmd;
scsi_req(rq)->cmd[4] = data;
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 42e3c880a8a5..2ea332c9438a 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -707,7 +707,6 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, 
struct packet_command *
 REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
if (IS_ERR(rq))
return PTR_ERR(rq);
-   scsi_req_init(rq);
 
if (cgc->buflen) {
ret = blk_rq_map_kern(q, rq, cgc->buffer, cgc->buflen,
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index ff19cfc587f0..e36d160c458f 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2201,7 +2201,6 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info 
*cdi, __u8 __user *ubuf,
break;
}
req = scsi_req(rq);
-   scsi_req_init(rq);
 
ret = blk_rq_map_user(q, rq, NULL, ubuf, len, GFP_KERNEL);
if (ret) {
diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c
index 5901937284e7..98e78b520417 100644
--- a/drivers/ide/ide-atapi.c
+++ b/drivers/ide/ide-atapi.c
@@ -93,7 +93,6 @@ int ide_queue_pc_tail(ide_drive_t *drive, struct gendisk 
*disk,
int error;
 
rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
-   scsi_req_init(rq);
ide_req(rq)->type = ATA_PRIV_MISC;
rq->special = (char *)pc;
 
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 07e5ff3a64c3..a14ccb34c923 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -438,7 +438,6 @@ int 

[PATCH v3 10/12] blk-mq: Document locking assumptions

2017-06-08 Thread Bart Van Assche
Document the locking assumptions in functions that modify
blk_mq_ctx.rq_list to make it easier for humans to verify
this code.

Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Ming Lei 
---
 block/blk-mq-sched.c | 2 ++
 block/blk-mq.c   | 4 
 2 files changed, 6 insertions(+)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c4e2afb9d12d..88aa460b2e8a 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -232,6 +232,8 @@ static bool blk_mq_attempt_merge(struct request_queue *q,
struct request *rq;
int checked = 8;
 
+   lockdep_assert_held(>lock);
+
list_for_each_entry_reverse(rq, >rq_list, queuelist) {
bool merged = false;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5d9cca62c2f0..0f8c011eff97 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1274,6 +1274,8 @@ static inline void __blk_mq_insert_req_list(struct 
blk_mq_hw_ctx *hctx,
 {
struct blk_mq_ctx *ctx = rq->mq_ctx;
 
+   lockdep_assert_held(>lock);
+
trace_block_rq_insert(hctx->queue, rq);
 
if (at_head)
@@ -1287,6 +1289,8 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, 
struct request *rq,
 {
struct blk_mq_ctx *ctx = rq->mq_ctx;
 
+   lockdep_assert_held(>lock);
+
__blk_mq_insert_req_list(hctx, rq, at_head);
blk_mq_hctx_mark_pending(hctx, ctx);
 }
-- 
2.12.2



[PATCH v3 12/12] blk-mq: Warn when attempting to run a hardware queue that is not mapped

2017-06-08 Thread Bart Van Assche
A queue must be frozen while the mapped state of a hardware queue
is changed. Additionally, any change of the mapped state is
followed by a call to blk_mq_map_swqueue() (see also
blk_mq_init_allocated_queue() and blk_mq_update_nr_hw_queues()).
Since blk_mq_map_swqueue() does not map any unmapped hardware
queue onto any software queue, no attempt will be made to run
an unmapped hardware queue. Hence issue a warning upon attempts
to run an unmapped hardware queue.

Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Ming Lei 
---
 block/blk-mq.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0f8c011eff97..689026d3c4bd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1091,8 +1091,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx 
*hctx)
 static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
unsigned long msecs)
 {
-   if (unlikely(blk_mq_hctx_stopped(hctx) ||
-!blk_mq_hw_queue_mapped(hctx)))
+   WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx));
+
+   if (unlikely(blk_mq_hctx_stopped(hctx)))
return;
 
if (!async && !(hctx->flags & BLK_MQ_F_BLOCKING)) {
@@ -1252,7 +1253,7 @@ static void blk_mq_run_work_fn(struct work_struct *work)
 
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
 {
-   if (unlikely(!blk_mq_hw_queue_mapped(hctx)))
+   if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
return;
 
/*
-- 
2.12.2



[PATCH v3 07/12] block: Add a comment above queue_lockdep_assert_held()

2017-06-08 Thread Bart Van Assche
Add a comment above the queue_lockdep_assert_held() macro that
explains the purpose of the q->queue_lock test.

Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Ming Lei 
---
 include/linux/blkdev.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index cbc0028290e4..1e73b4df13a9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -634,6 +634,13 @@ struct request_queue {
 (1 << QUEUE_FLAG_SAME_COMP)|   \
 (1 << QUEUE_FLAG_POLL))
 
+/*
+ * @q->queue_lock is set while a queue is being initialized. Since we know
+ * that no other threads access the queue object before @q->queue_lock has
+ * been set, it is safe to manipulate queue flags without holding the
+ * queue_lock if @q->queue_lock == NULL. See also blk_alloc_queue_node() and
+ * blk_init_allocated_queue().
+ */
 static inline void queue_lockdep_assert_held(struct request_queue *q)
 {
if (q->queue_lock)
-- 
2.12.2



[PATCH v3 06/12] blk-mq: Initialize a request before assigning a tag

2017-06-08 Thread Bart Van Assche
Initialization of blk-mq requests is a bit weird: blk_mq_rq_ctx_init()
is called after a value has been assigned to .rq_flags and .rq_flags
is initialized in __blk_mq_finish_request(). Call blk_mq_rq_ctx_init()
before modifying any struct request members. Initialize .rq_flags in
blk_mq_rq_ctx_init() instead of relying on __blk_mq_finish_request().
Moving the initialization of .rq_flags is fine because all changes
and tests of .rq_flags occur between blk_get_request() and finishing
a request.

Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Omar Sandoval 
Cc: Ming Lei 
---
 block/blk-mq.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8cd5261ca1ab..5d9cca62c2f0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -212,6 +212,7 @@ void blk_mq_rq_ctx_init(struct request_queue *q, struct 
blk_mq_ctx *ctx,
rq->q = q;
rq->mq_ctx = ctx;
rq->cmd_flags = op;
+   rq->rq_flags = 0;
if (blk_queue_io_stat(q))
rq->rq_flags |= RQF_IO_STAT;
/* do not touch atomic flags, it needs atomic ops against the timer */
@@ -231,7 +232,7 @@ void blk_mq_rq_ctx_init(struct request_queue *q, struct 
blk_mq_ctx *ctx,
rq->nr_integrity_segments = 0;
 #endif
rq->special = NULL;
-   /* tag was already set */
+   /* tag will be set by caller */
rq->extra_len = 0;
 
INIT_LIST_HEAD(>timeout_list);
@@ -257,12 +258,14 @@ struct request *__blk_mq_alloc_request(struct 
blk_mq_alloc_data *data,
 
rq = tags->static_rqs[tag];
 
+   blk_mq_rq_ctx_init(data->q, data->ctx, rq, op);
+
if (data->flags & BLK_MQ_REQ_INTERNAL) {
rq->tag = -1;
rq->internal_tag = tag;
} else {
if (blk_mq_tag_busy(data->hctx)) {
-   rq->rq_flags = RQF_MQ_INFLIGHT;
+   rq->rq_flags |= RQF_MQ_INFLIGHT;
atomic_inc(>hctx->nr_active);
}
rq->tag = tag;
@@ -270,7 +273,6 @@ struct request *__blk_mq_alloc_request(struct 
blk_mq_alloc_data *data,
data->hctx->tags->rqs[rq->tag] = rq;
}
 
-   blk_mq_rq_ctx_init(data->q, data->ctx, rq, op);
return rq;
}
 
@@ -361,7 +363,6 @@ void __blk_mq_finish_request(struct blk_mq_hw_ctx *hctx, 
struct blk_mq_ctx *ctx,
atomic_dec(>nr_active);
 
wbt_done(q->rq_wb, >issue_stat);
-   rq->rq_flags = 0;
 
clear_bit(REQ_ATOM_STARTED, >atomic_flags);
clear_bit(REQ_ATOM_POLL_SLEPT, >atomic_flags);
-- 
2.12.2



[PATCH v3 05/12] block: Change argument type of scsi_req_init()

2017-06-08 Thread Bart Van Assche
Since scsi_req_init() works on a struct scsi_request, change the
argument type into struct scsi_request *.

Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Martin K. Petersen 
---
 block/scsi_ioctl.c| 10 +++---
 drivers/ide/ide-atapi.c   |  2 +-
 drivers/ide/ide-probe.c   |  2 +-
 drivers/scsi/scsi_lib.c   |  4 +++-
 drivers/scsi/scsi_transport_sas.c |  2 +-
 include/scsi/scsi_request.h   |  2 +-
 6 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index f96c51f5df40..7440de44dd85 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -741,10 +741,14 @@ int scsi_cmd_blk_ioctl(struct block_device *bd, fmode_t 
mode,
 }
 EXPORT_SYMBOL(scsi_cmd_blk_ioctl);
 
-void scsi_req_init(struct request *rq)
+/**
+ * scsi_req_init - initialize certain fields of a scsi_request structure
+ * @req: Pointer to a scsi_request structure.
+ * Initializes .__cmd[], .cmd, .cmd_len and .sense_len but no other members
+ * of struct scsi_request.
+ */
+void scsi_req_init(struct scsi_request *req)
 {
-   struct scsi_request *req = scsi_req(rq);
-
memset(req->__cmd, 0, sizeof(req->__cmd));
req->cmd = req->__cmd;
req->cmd_len = BLK_MAX_CDB;
diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c
index 98e78b520417..5ffecef8b910 100644
--- a/drivers/ide/ide-atapi.c
+++ b/drivers/ide/ide-atapi.c
@@ -199,7 +199,7 @@ void ide_prep_sense(ide_drive_t *drive, struct request *rq)
memset(sense, 0, sizeof(*sense));
 
blk_rq_init(rq->q, sense_rq);
-   scsi_req_init(sense_rq);
+   scsi_req_init(req);
 
err = blk_rq_map_kern(drive->queue, sense_rq, sense, sense_len,
  GFP_NOIO);
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index c60e5ffc9231..01b2adfd8226 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -745,7 +745,7 @@ static void ide_initialize_rq(struct request *rq)
 {
struct ide_request *req = blk_mq_rq_to_pdu(rq);
 
-   scsi_req_init(rq);
+   scsi_req_init(>sreq);
req->sreq.sense = req->sense;
 }
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 378cf44a97fc..c5a43a7c960d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1136,7 +1136,9 @@ EXPORT_SYMBOL(scsi_init_io);
 /* Called from inside blk_get_request() */
 static void scsi_initialize_rq(struct request *rq)
 {
-   scsi_req_init(rq);
+   struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
+
+   scsi_req_init(>req);
 }
 
 /* Called after a request has been started. */
diff --git a/drivers/scsi/scsi_transport_sas.c 
b/drivers/scsi/scsi_transport_sas.c
index f5449da6fcad..35598905d785 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -215,7 +215,7 @@ static void sas_host_release(struct device *dev)
 
 static void sas_initialize_rq(struct request *rq)
 {
-   scsi_req_init(rq);
+   scsi_req_init(scsi_req(rq));
 }
 
 static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
diff --git a/include/scsi/scsi_request.h b/include/scsi/scsi_request.h
index f0c76f9dc285..e0afa445ee4e 100644
--- a/include/scsi/scsi_request.h
+++ b/include/scsi/scsi_request.h
@@ -27,6 +27,6 @@ static inline void scsi_req_free_cmd(struct scsi_request *req)
kfree(req->cmd);
 }
 
-void scsi_req_init(struct request *);
+void scsi_req_init(struct scsi_request *req);
 
 #endif /* _SCSI_SCSI_REQUEST_H */
-- 
2.12.2



Re: dedicated error codes for the block layer V3

2017-06-08 Thread Mike Snitzer
On Thu, Jun 08 2017 at 11:42am -0400,
Jens Axboe  wrote:

> On 06/03/2017 01:37 AM, Christoph Hellwig wrote:
> > This series introduces a new blk_status_t error code type for the block
> > layer so that we can have tigher control and explicit semantics for
> > block layer errors.
> > 
> > All but the last three patches are cleanups that lead to the new type.
> > 
> > The series it mostly limited to the block layer and drivers, and touching
> > file systems a little bit.  The only major exception is btrfs, which
> > does funny things with bios and thus sees a larger amount of propagation
> > of the new blk_status_t.
> > 
> > A git tree is also available at:
> > 
> > git://git.infradead.org/users/hch/block.git block-errors
> > 
> > gitweb:
> > 
> > 
> > http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/block-errors
> > 
> > Note the the two biggest patches didn't make it to linux-block and
> > linux-btrfs last time.  If you didn't get them they are available in
> > the git tree above.  Unfortunately there is no easy way to split them
> > up.
> 
> Mike, can you take a look at the dm bits in this series? I'd like to get
> this queued up, but I'd also greatly prefer if the dm patches had sign
> off from your end.

Will do.  I'll have a look by the end of the week.


Re: [PATCH BUGFIX] block, bfq: access and cache blkg data only when safe

2017-06-08 Thread Jens Axboe
On 06/08/2017 09:30 AM, Paolo Valente wrote:
> 
>> Il giorno 05 giu 2017, alle ore 10:11, Paolo Valente 
>>  ha scritto:
>>
>> In blk-cgroup, operations on blkg objects are protected with the
>> request_queue lock. This is no more the lock that protects
>> I/O-scheduler operations in blk-mq. In fact, the latter are now
>> protected with a finer-grained per-scheduler-instance lock. As a
>> consequence, although blkg lookups are also rcu-protected, blk-mq I/O
>> schedulers may see inconsistent data when they access blkg and
>> blkg-related objects. BFQ does access these objects, and does incur
>> this problem, in the following case.
>>
>> The blkg_lookup performed in bfq_get_queue, being protected (only)
>> through rcu, may happen to return the address of a copy of the
>> original blkg. If this is the case, then the blkg_get performed in
>> bfq_get_queue, to pin down the blkg, is useless: it does not prevent
>> blk-cgroup code from destroying both the original blkg and all objects
>> directly or indirectly referred by the copy of the blkg. BFQ accesses
>> these objects, which typically causes a crash for NULL-pointer
>> dereference of memory-protection violation.
>>
>> Some additional protection mechanism should be added to blk-cgroup to
>> address this issue. In the meantime, this commit provides a quick
>> temporary fix for BFQ: cache (when safe) blkg data that might
>> disappear right after a blkg_lookup.
>>
>> In particular, this commit exploits the following facts to achieve its
>> goal without introducing further locks.  Destroy operations on a blkg
>> invoke, as a first step, hooks of the scheduler associated with the
>> blkg. And these hooks are executed with bfqd->lock held for BFQ. As a
>> consequence, for any blkg associated with the request queue an
>> instance of BFQ is attached to, we are guaranteed that such a blkg is
>> not destroyed, and that all the pointers it contains are consistent,
>> while that instance is holding its bfqd->lock. A blkg_lookup performed
>> with bfqd->lock held then returns a fully consistent blkg, which
>> remains consistent until this lock is held. In more detail, this holds
>> even if the returned blkg is a copy of the original one.
>>
>> Finally, also the object describing a group inside BFQ needs to be
>> protected from destruction on the blkg_free of the original blkg
>> (which invokes bfq_pd_free). This commit adds private refcounting for
>> this object, to let it disappear only after no bfq_queue refers to it
>> any longer.
>>
>> This commit also removes or updates some stale comments on locking
>> issues related to blk-cgroup operations.
>>
>> Reported-by: Tomas Konir 
>> Reported-by: Lee Tibbert 
>> Reported-by: Marco Piazza 
>> Signed-off-by: Paolo Valente 
>> Tested-by: Tomas Konir 
>> Tested-by: Lee Tibbert 
>> Tested-by: Marco Piazza 
> 
> Hi Jens,
> are you waiting for some further review/ack on this, or is it just in
> your queue of patches to check?  Sorry for bothering you, but this bug
> is causing problems to users.

I'll pull it in, it'll make the next -rc. I'll often let patches sit
for a few days even if I agree with them, to give others a chance to
either further review, comment, or disagree with them.

-- 
Jens Axboe



Re: dedicated error codes for the block layer V3

2017-06-08 Thread Jens Axboe
On 06/03/2017 01:37 AM, Christoph Hellwig wrote:
> This series introduces a new blk_status_t error code type for the block
> layer so that we can have tigher control and explicit semantics for
> block layer errors.
> 
> All but the last three patches are cleanups that lead to the new type.
> 
> The series it mostly limited to the block layer and drivers, and touching
> file systems a little bit.  The only major exception is btrfs, which
> does funny things with bios and thus sees a larger amount of propagation
> of the new blk_status_t.
> 
> A git tree is also available at:
> 
> git://git.infradead.org/users/hch/block.git block-errors
> 
> gitweb:
> 
> 
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/block-errors
> 
> Note the the two biggest patches didn't make it to linux-block and
> linux-btrfs last time.  If you didn't get them they are available in
> the git tree above.  Unfortunately there is no easy way to split them
> up.

Mike, can you take a look at the dm bits in this series? I'd like to get
this queued up, but I'd also greatly prefer if the dm patches had sign
off from your end.

-- 
Jens Axboe



Re: [PATCH BUGFIX] block, bfq: access and cache blkg data only when safe

2017-06-08 Thread Paolo Valente

> Il giorno 05 giu 2017, alle ore 10:11, Paolo Valente 
>  ha scritto:
> 
> In blk-cgroup, operations on blkg objects are protected with the
> request_queue lock. This is no more the lock that protects
> I/O-scheduler operations in blk-mq. In fact, the latter are now
> protected with a finer-grained per-scheduler-instance lock. As a
> consequence, although blkg lookups are also rcu-protected, blk-mq I/O
> schedulers may see inconsistent data when they access blkg and
> blkg-related objects. BFQ does access these objects, and does incur
> this problem, in the following case.
> 
> The blkg_lookup performed in bfq_get_queue, being protected (only)
> through rcu, may happen to return the address of a copy of the
> original blkg. If this is the case, then the blkg_get performed in
> bfq_get_queue, to pin down the blkg, is useless: it does not prevent
> blk-cgroup code from destroying both the original blkg and all objects
> directly or indirectly referred by the copy of the blkg. BFQ accesses
> these objects, which typically causes a crash for NULL-pointer
> dereference of memory-protection violation.
> 
> Some additional protection mechanism should be added to blk-cgroup to
> address this issue. In the meantime, this commit provides a quick
> temporary fix for BFQ: cache (when safe) blkg data that might
> disappear right after a blkg_lookup.
> 
> In particular, this commit exploits the following facts to achieve its
> goal without introducing further locks.  Destroy operations on a blkg
> invoke, as a first step, hooks of the scheduler associated with the
> blkg. And these hooks are executed with bfqd->lock held for BFQ. As a
> consequence, for any blkg associated with the request queue an
> instance of BFQ is attached to, we are guaranteed that such a blkg is
> not destroyed, and that all the pointers it contains are consistent,
> while that instance is holding its bfqd->lock. A blkg_lookup performed
> with bfqd->lock held then returns a fully consistent blkg, which
> remains consistent until this lock is held. In more detail, this holds
> even if the returned blkg is a copy of the original one.
> 
> Finally, also the object describing a group inside BFQ needs to be
> protected from destruction on the blkg_free of the original blkg
> (which invokes bfq_pd_free). This commit adds private refcounting for
> this object, to let it disappear only after no bfq_queue refers to it
> any longer.
> 
> This commit also removes or updates some stale comments on locking
> issues related to blk-cgroup operations.
> 
> Reported-by: Tomas Konir 
> Reported-by: Lee Tibbert 
> Reported-by: Marco Piazza 
> Signed-off-by: Paolo Valente 
> Tested-by: Tomas Konir 
> Tested-by: Lee Tibbert 
> Tested-by: Marco Piazza 

Hi Jens,
are you waiting for some further review/ack on this, or is it just in
your queue of patches to check?  Sorry for bothering you, but this bug
is causing problems to users.

Thanks,
Paolo

> ---
> block/bfq-cgroup.c  | 116 +---
> block/bfq-iosched.c |   2 +-
> block/bfq-iosched.h |  23 +--
> 3 files changed, 105 insertions(+), 36 deletions(-)
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index c8a32fb..78b2e0d 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -52,7 +52,7 @@ BFQG_FLAG_FNS(idling)
> BFQG_FLAG_FNS(empty)
> #undef BFQG_FLAG_FNS
> 
> -/* This should be called with the queue_lock held. */
> +/* This should be called with the scheduler lock held. */
> static void bfqg_stats_update_group_wait_time(struct bfqg_stats *stats)
> {
>   unsigned long long now;
> @@ -67,7 +67,7 @@ static void bfqg_stats_update_group_wait_time(struct 
> bfqg_stats *stats)
>   bfqg_stats_clear_waiting(stats);
> }
> 
> -/* This should be called with the queue_lock held. */
> +/* This should be called with the scheduler lock held. */
> static void bfqg_stats_set_start_group_wait_time(struct bfq_group *bfqg,
>struct bfq_group *curr_bfqg)
> {
> @@ -81,7 +81,7 @@ static void bfqg_stats_set_start_group_wait_time(struct 
> bfq_group *bfqg,
>   bfqg_stats_mark_waiting(stats);
> }
> 
> -/* This should be called with the queue_lock held. */
> +/* This should be called with the scheduler lock held. */
> static void bfqg_stats_end_empty_time(struct bfqg_stats *stats)
> {
>   unsigned long long now;
> @@ -203,12 +203,30 @@ struct bfq_group *bfqq_group(struct bfq_queue *bfqq)
> 
> static void bfqg_get(struct bfq_group *bfqg)
> {
> - return blkg_get(bfqg_to_blkg(bfqg));
> + bfqg->ref++;
> }
> 
> void bfqg_put(struct bfq_group *bfqg)
> {
> - return blkg_put(bfqg_to_blkg(bfqg));
> + bfqg->ref--;
> +
> + if (bfqg->ref == 0)
> + kfree(bfqg);
> +}
> +
> +static void 

Re: [PATCH blktests] loop/002: Regression testing for loop device flush

2017-06-08 Thread Jens Axboe
On 06/08/2017 06:28 AM, James Wang wrote:
> Add a regression testing for loop device. when an unbound device
> be close that take too long time. kernel will consume serveral orders
> of magnitude more wall time than it does for a mounted device.

Thanks a lot for taking the time to turn this into a blktests regression
test!

-- 
Jens Axboe



Re: [PATCHv8 0/2] loop: enable different logical blocksizes

2017-06-08 Thread Jens Axboe
On 06/08/2017 05:46 AM, Hannes Reinecke wrote:
> Currently the loop driver just simulates 512-byte blocks. When
> creating bootable images for virtual machines it might be required
> to use a different physical blocksize (eg 4k for S/390 DASD), as
> the some bootloaders (like lilo or zipl for S/390) need to know
> the physical block addresses of the kernel and initrd.
> 
> With this patchset the loop driver will export the logical and
> physical blocksize and the current LOOP_SET_STATUS64 ioctl is
> extended to set the logical blocksize by re-using the existing
> 'init' fields, which are currently unused.
> 
> As usual, comments and reviews are welcome.

Queued up for 4.13.

-- 
Jens Axboe



Re: [PATCH 6/7] pktcdvd: use class_groups instead of class_attrs

2017-06-08 Thread Jens Axboe
On 06/08/2017 02:12 AM, Greg Kroah-Hartman wrote:
> The class_attrs pointer is long depreciated, and is about to be finally
> removed, so move to use the class_groups pointer instead.

Feel free to add my Acked-by to this.

-- 
Jens Axboe



Re: [PATCH] Fix loop device flush before configure v3

2017-06-08 Thread Jens Axboe
On 06/08/2017 12:52 AM, James Wang wrote:
> While installing SLES-12 (based on v4.4), I found that the installer
> will stall for 60+ seconds during LVM disk scan.  The root cause was
> determined to be the removal of a bound device check in loop_flush()
> by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq").
> 
> Restoring this check, examining ->lo_state as set by loop_set_fd()
> eliminates the bad behavior.
> 
> Test method:
> modprobe loop max_loop=64
> dd if=/dev/zero of=disk bs=512 count=200K
> for((i=0;i<4;i++))do losetup -f disk; done
> mkfs.ext4 -F /dev/loop0
> for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
> for f in `ls /dev/loop[0-9]*|sort`; do \
>   echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
>   done
> 
> Test output:  stock  patched
> /dev/loop018.1217e-058.3842e-05
> /dev/loop1 6.1114e-050.000147979
> /dev/loop100.414701  0.000116564
> /dev/loop110.74746.7942e-05
> /dev/loop120.747986  8.9082e-05
> /dev/loop130.746532  7.4799e-05
> /dev/loop140.480041  9.3926e-05
> /dev/loop151.26453   7.2522e-05
> 
> Note that from loop10 onward, the device is not mounted, yet the
> stock kernel consumes several orders of magnitude more wall time
> than it does for a mounted device.
> (Thanks for Mike Galbraith , give a changelog review.)

Added for 4.12, thanks.

-- 
Jens Axboe



Re: [xfstests PATCH v3 5/5] btrfs: allow it to use $SCRATCH_LOGDEV

2017-06-08 Thread Jeff Layton
On Tue, 2017-06-06 at 17:19 +0800, Eryu Guan wrote:
> On Wed, May 31, 2017 at 09:08:20AM -0400, Jeff Layton wrote:
> > With btrfs, we can't really put the log on a separate device. What we
> > can do however is mirror the metadata across two devices and make the
> > data striped across all devices. When we turn on dmerror then the
> > metadata can fall back to using the other mirror while the data errors
> > out.
> > 
> > Note that the current incarnation of btrfs has a fixed 64k stripe
> > width. If that ever changes or becomes settable, we may need to adjust
> > the amount of data that the test program writes.
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> >  common/rc | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/common/rc b/common/rc
> > index 83765aacfb06..078270451b53 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -830,6 +830,8 @@ _scratch_mkfs()
> > ;;
> > btrfs)
> > mkfs_cmd="$MKFS_BTRFS_PROG"
> > +   [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \
> > +   mkfs_cmd="$mkfs_cmd -d raid0 -m raid1 $SCRATCH_LOGDEV"
> 
> I don't think this is the correct way to do it. If btrfs doesn't support
> external log device, then this test doesn't fit btrfs, or we need other
> ways to test btrfs.
> 
> One of the problems of this hack is that raid1 requires all devices are
> in the same size, we have a _require_scratch_dev_pool_equal_size() rule
> to check on it, but this hack doesn't do the proper check and test fails
> if SCRATCH_LOGDEV is smaller or bigger in size.
> 
> If btrfs "-d raid0 -m raid1" is capable to do this writeback error test,
> perhaps you can write a new btrfs test and mkfs with "-d raid0 -m raid1"
> explicitly. e.g.
> 
> ...
> _require_scratch_dev_pool 2
> _require_scratch_dev_pool_equal_size
> ...
> _scratch_mkfs "-d raid0 -m raid1"
> ...
> 
> Thanks,
> Eryu


Yeah, that's probably the right way to do this. It looks like btrfs also
has $SCRATCH_DEV_POOL, and we can probably base it on that. I'll look at
reworking it.

-- 
Jeff Layton 


[PATCH blktests] loop/002: Regression testing for loop device flush

2017-06-08 Thread James Wang
Add a regression testing for loop device. when an unbound device
be close that take too long time. kernel will consume serveral orders
of magnitude more wall time than it does for a mounted device.

Signed-off-by: James Wang 
---
 tests/loop/002 | 77 ++
 tests/loop/002.out |  2 ++
 2 files changed, 79 insertions(+)

diff --git a/tests/loop/002 b/tests/loop/002
new file mode 100755
index 000..fd607d1
--- /dev/null
+++ b/tests/loop/002
@@ -0,0 +1,77 @@
+#!/bin/bash
+#
+# Test if close()ing a unbound loop device is too slow
+# Copyright (C) 2017 James Wang
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+DESCRIPTION="Test if close()ing a unbound loop device is too slow"
+
+QUICK=1
+
+function run_test() {
+   TIMEFORMAT='%5R'
+   time {
+   for f in `ls /dev/loop[0-9]*|sort`; do dd if=$f of=/dev/null  
bs=512 count=1 >/dev/null 2>&1; done
+   }
+}
+function clean_up() {
+   if lsmod | grep loop >/dev/null 2>&1; then
+   umount /dev/loop* >/dev/null 2>&1
+   losetup -D
+   sleep 5
+   
+   if ! rmmod loop;then
+   return 2;
+   fi
+   fi
+}
+
+function prepare() {
+   modprobe loop max_loop=64
+   dd if=/dev/zero of=${TMPDIR}/disk bs=512 count=200K >/dev/null 2>&1
+   for((i=0;i<4;i++))
+   do
+   losetup -f ${TMPDIR}/disk;
+   done
+   mkfs.ext4 -F /dev/loop0 >/dev/null 2>&1
+   for((i=0;i<4;i++))
+   do
+   mkdir -p t$i;
+   mount /dev/loop$i t$i;
+   done
+
+}
+
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   prepare
+   SECONDS=0
+   run_test >/dev/null 2>&1
+   DURATION=${SECONDS}
+
+   clean_up
+   if ! clean_up; then
+   echo "Test complete"
+   return 2
+   fi
+   echo "Test complete"
+   if [[ "${DURATION}" -gt 1 ]]; then
+   return 1
+   else
+   return 0
+   fi
+}
diff --git a/tests/loop/002.out b/tests/loop/002.out
new file mode 100644
index 000..5c34a37
--- /dev/null
+++ b/tests/loop/002.out
@@ -0,0 +1,2 @@
+Running loop/002
+Test complete
-- 
2.12.3



[PATCHv8 2/2] loop: support 4k physical blocksize

2017-06-08 Thread Hannes Reinecke
When generating bootable VM images certain systems (most notably
s390x) require devices with 4k blocksize. This patch implements
a new flag 'LO_FLAGS_BLOCKSIZE' which will set the physical
blocksize to that of the underlying device, and allow to change
the logical blocksize for up to the physical blocksize.

Signed-off-by: Hannes Reinecke 
---
 drivers/block/loop.c  | 43 +--
 drivers/block/loop.h  |  1 +
 include/uapi/linux/loop.h |  3 +++
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index fc706ad..4d376c1 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -221,7 +221,8 @@ static void __loop_update_dio(struct loop_device *lo, bool 
dio)
 }
 
 static int
-figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit)
+figure_loop_size(struct loop_device *lo, loff_t offset, loff_t sizelimit,
+loff_t logical_blocksize)
 {
loff_t size = get_size(offset, sizelimit, lo->lo_backing_file);
sector_t x = (sector_t)size;
@@ -233,6 +234,12 @@ static void __loop_update_dio(struct loop_device *lo, bool 
dio)
lo->lo_offset = offset;
if (lo->lo_sizelimit != sizelimit)
lo->lo_sizelimit = sizelimit;
+   if (lo->lo_flags & LO_FLAGS_BLOCKSIZE) {
+   lo->lo_logical_blocksize = logical_blocksize;
+   blk_queue_physical_block_size(lo->lo_queue, lo->lo_blocksize);
+   blk_queue_logical_block_size(lo->lo_queue,
+lo->lo_logical_blocksize);
+   }
set_capacity(lo->lo_disk, x);
bd_set_size(bdev, (loff_t)get_capacity(bdev->bd_disk) << 9);
/* let user-space know about the new size */
@@ -810,6 +817,7 @@ static void loop_config_discard(struct loop_device *lo)
struct file *file = lo->lo_backing_file;
struct inode *inode = file->f_mapping->host;
struct request_queue *q = lo->lo_queue;
+   int lo_bits = 9;
 
/*
 * We use punch hole to reclaim the free space used by the
@@ -829,8 +837,11 @@ static void loop_config_discard(struct loop_device *lo)
 
q->limits.discard_granularity = inode->i_sb->s_blocksize;
q->limits.discard_alignment = 0;
-   blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
-   blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
+   if (lo->lo_flags & LO_FLAGS_BLOCKSIZE)
+   lo_bits = blksize_bits(lo->lo_logical_blocksize);
+
+   blk_queue_max_discard_sectors(q, UINT_MAX >> lo_bits);
+   blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> lo_bits);
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 }
 
@@ -918,6 +929,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 
lo->use_dio = false;
lo->lo_blocksize = lo_blocksize;
+   lo->lo_logical_blocksize = 512;
lo->lo_device = bdev;
lo->lo_flags = lo_flags;
lo->lo_backing_file = file;
@@ -1083,6 +1095,7 @@ static int loop_clr_fd(struct loop_device *lo)
int err;
struct loop_func_table *xfer;
kuid_t uid = current_uid();
+   int lo_flags = lo->lo_flags;
 
if (lo->lo_encrypt_key_size &&
!uid_eq(lo->lo_key_owner, uid) &&
@@ -1115,9 +1128,26 @@ static int loop_clr_fd(struct loop_device *lo)
if (err)
goto exit;
 
+   if (info->lo_flags & LO_FLAGS_BLOCKSIZE) {
+   if (!(lo->lo_flags & LO_FLAGS_BLOCKSIZE))
+   lo->lo_logical_blocksize = 512;
+   lo->lo_flags |= LO_FLAGS_BLOCKSIZE;
+   if (LO_INFO_BLOCKSIZE(info) != 512 &&
+   LO_INFO_BLOCKSIZE(info) != 1024 &&
+   LO_INFO_BLOCKSIZE(info) != 2048 &&
+   LO_INFO_BLOCKSIZE(info) != 4096)
+   return -EINVAL;
+   if (LO_INFO_BLOCKSIZE(info) > lo->lo_blocksize)
+   return -EINVAL;
+   }
+
if (lo->lo_offset != info->lo_offset ||
-   lo->lo_sizelimit != info->lo_sizelimit)
-   if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit)) {
+   lo->lo_sizelimit != info->lo_sizelimit ||
+   lo->lo_flags != lo_flags ||
+   ((lo->lo_flags & LO_FLAGS_BLOCKSIZE) &&
+lo->lo_logical_blocksize != LO_INFO_BLOCKSIZE(info))) {
+   if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit,
+LO_INFO_BLOCKSIZE(info)))
err = -EFBIG;
goto exit;
}
@@ -1308,7 +1338,8 @@ static int loop_set_capacity(struct loop_device *lo)
if (unlikely(lo->lo_state != Lo_bound))
return -ENXIO;
 
-   return figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit);
+   return figure_loop_size(lo, lo->lo_offset, lo->lo_sizelimit,
+   

[PATCHv8 1/2] loop: Remove unused 'bdev' argument from loop_set_capacity

2017-06-08 Thread Hannes Reinecke
Signed-off-by: Hannes Reinecke 
Reviewed-by: Christoph Hellwig 
---
 drivers/block/loop.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 28d9329..fc706ad 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1303,7 +1303,7 @@ static int loop_clr_fd(struct loop_device *lo)
return err;
 }
 
-static int loop_set_capacity(struct loop_device *lo, struct block_device *bdev)
+static int loop_set_capacity(struct loop_device *lo)
 {
if (unlikely(lo->lo_state != Lo_bound))
return -ENXIO;
@@ -1366,7 +1366,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t 
mode,
case LOOP_SET_CAPACITY:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
-   err = loop_set_capacity(lo, bdev);
+   err = loop_set_capacity(lo);
break;
case LOOP_SET_DIRECT_IO:
err = -EPERM;
-- 
1.8.5.6



[PATCHv8 0/2] loop: enable different logical blocksizes

2017-06-08 Thread Hannes Reinecke
Currently the loop driver just simulates 512-byte blocks. When
creating bootable images for virtual machines it might be required
to use a different physical blocksize (eg 4k for S/390 DASD), as
the some bootloaders (like lilo or zipl for S/390) need to know
the physical block addresses of the kernel and initrd.

With this patchset the loop driver will export the logical and
physical blocksize and the current LOOP_SET_STATUS64 ioctl is
extended to set the logical blocksize by re-using the existing
'init' fields, which are currently unused.

As usual, comments and reviews are welcome.

Changes to v1:
- Move LO_FLAGS_BLOCKSIZE definition
- Reshuffle patches
Changes to v2:
- Include reviews from Ming Lei
Changes to v3:
- Include reviews from Christoph
- Merge patches
Changes to v4:
- Add LO_INFO_BLOCKSIZE definition
Changes to v5:
- Rediff to latest upstream
Changes to v6:
- Include review from Ming Lei
Changes to v7:
- Reset logical blocksize on loop_set_fd()

Hannes Reinecke (2):
  loop: Remove unused 'bdev' argument from loop_set_capacity
  loop: support 4k physical blocksize

 drivers/block/loop.c  | 47 +++
 drivers/block/loop.h  |  1 +
 include/uapi/linux/loop.h |  3 +++
 3 files changed, 43 insertions(+), 8 deletions(-)

-- 
1.8.5.6



Re: [PATCH] Fix loop device flush before configure v3

2017-06-08 Thread James Wang


On 06/08/2017 03:53 PM, Johannes Thumshirn wrote:
> On 06/08/2017 08:52 AM, James Wang wrote:
>> Test method:
>> modprobe loop max_loop=64
>> dd if=/dev/zero of=disk bs=512 count=200K
>> for((i=0;i<4;i++))do losetup -f disk; done
>> mkfs.ext4 -F /dev/loop0
>> for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
>> for f in `ls /dev/loop[0-9]*|sort`; do \
>>  echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
>>  done
> I think Christoph already asked this, but can you send a patch for
> blktests [1] as well?
I have forked this project and I'm working on it.
due to framework is limited, so I'm adjusting my code.
> Thanks,
>   Johannes
>
> [1] https://github.com/osandov/blktests
>

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)



[PATCH 6/7] pktcdvd: use class_groups instead of class_attrs

2017-06-08 Thread Greg Kroah-Hartman
The class_attrs pointer is long depreciated, and is about to be finally
removed, so move to use the class_groups pointer instead.

Cc: 
Cc: Jens Axboe 
Cc: Hannes Reinecke 
Cc: Jan Kara 
Cc: Mike Christie 
Cc: Bart Van Assche 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/block/pktcdvd.c | 35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 205b865ebeb9..98939ee97476 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -348,9 +348,9 @@ static void class_pktcdvd_release(struct class *cls)
 {
kfree(cls);
 }
-static ssize_t class_pktcdvd_show_map(struct class *c,
-   struct class_attribute *attr,
-   char *data)
+
+static ssize_t device_map_show(struct class *c, struct class_attribute *attr,
+  char *data)
 {
int n = 0;
int idx;
@@ -368,11 +368,10 @@ static ssize_t class_pktcdvd_show_map(struct class *c,
mutex_unlock(_mutex);
return n;
 }
+static CLASS_ATTR_RO(device_map);
 
-static ssize_t class_pktcdvd_store_add(struct class *c,
-   struct class_attribute *attr,
-   const char *buf,
-   size_t count)
+static ssize_t add_store(struct class *c, struct class_attribute *attr,
+const char *buf, size_t count)
 {
unsigned int major, minor;
 
@@ -390,11 +389,10 @@ static ssize_t class_pktcdvd_store_add(struct class *c,
 
return -EINVAL;
 }
+static CLASS_ATTR_WO(add);
 
-static ssize_t class_pktcdvd_store_remove(struct class *c,
- struct class_attribute *attr,
- const char *buf,
-   size_t count)
+static ssize_t remove_store(struct class *c, struct class_attribute *attr,
+   const char *buf, size_t count)
 {
unsigned int major, minor;
if (sscanf(buf, "%u:%u", , ) == 2) {
@@ -403,14 +401,15 @@ static ssize_t class_pktcdvd_store_remove(struct class *c,
}
return -EINVAL;
 }
+static CLASS_ATTR_WO(remove);
 
-static struct class_attribute class_pktcdvd_attrs[] = {
- __ATTR(add,0200, NULL, class_pktcdvd_store_add),
- __ATTR(remove, 0200, NULL, class_pktcdvd_store_remove),
- __ATTR(device_map, 0444, class_pktcdvd_show_map, NULL),
- __ATTR_NULL
+static struct attribute *class_pktcdvd_attrs[] = {
+   _attr_add.attr,
+   _attr_remove.attr,
+   _attr_device_map.attr,
+   NULL,
 };
-
+ATTRIBUTE_GROUPS(class_pktcdvd);
 
 static int pkt_sysfs_init(void)
 {
@@ -426,7 +425,7 @@ static int pkt_sysfs_init(void)
class_pktcdvd->name = DRIVER_NAME;
class_pktcdvd->owner = THIS_MODULE;
class_pktcdvd->class_release = class_pktcdvd_release;
-   class_pktcdvd->class_attrs = class_pktcdvd_attrs;
+   class_pktcdvd->class_groups = class_pktcdvd_groups;
ret = class_register(class_pktcdvd);
if (ret) {
kfree(class_pktcdvd);
-- 
2.13.1



Re: [PATCH] Fix loop device flush before configure v3

2017-06-08 Thread Johannes Thumshirn
On 06/08/2017 08:52 AM, James Wang wrote:
> Test method:
> modprobe loop max_loop=64
> dd if=/dev/zero of=disk bs=512 count=200K
> for((i=0;i<4;i++))do losetup -f disk; done
> mkfs.ext4 -F /dev/loop0
> for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
> for f in `ls /dev/loop[0-9]*|sort`; do \
>   echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
>   done
I think Christoph already asked this, but can you send a patch for
blktests [1] as well?

Thanks,
Johannes

[1] https://github.com/osandov/blktests

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH 0/10 v11] No wait AIO

2017-06-08 Thread Christoph Hellwig
As already indicated this whole series looks fine to me.

Al: are you going to pick this up?  Or Andrew?

On Tue, Jun 06, 2017 at 06:19:29AM -0500, Goldwyn Rodrigues wrote:
> This series adds nonblocking feature to asynchronous I/O writes.
> io_submit() can be delayed because of a number of reason:
>  - Block allocation for files
>  - Data writebacks for direct I/O
>  - Sleeping because of waiting to acquire i_rwsem
>  - Congested block device
> 
> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
> any of these conditions are met. This way userspace can push most
> of the write()s to the kernel to the best of its ability to complete
> and if it returns -EAGAIN, can defer it to another thread.
> 
> In order to enable this, IOCB_RW_FLAG_NOWAIT is introduced in
> uapi/linux/aio_abi.h. If set for aio_rw_flags, it translates to
> IOCB_NOWAIT for struct iocb, REQ_NOWAIT for bio.bi_opf and IOMAP_NOWAIT for
> iomap. aio_rw_flags is a new flag replacing aio_reserved1. We could
> not use aio_flags because it is not currently checked for invalidity
> in the kernel.
> 
> This feature is provided for direct I/O of asynchronous I/O only. I have
> tested it against xfs, ext4, and btrfs while I intend to add more filesystems.
> The nowait feature is for request based devices. In the future, I intend to
> add support to stacked devices such as md.
> 
> Applications will have to check supportability
> by sending a async direct write and any other error besides -EAGAIN
> would mean it is not supported.
> 
> First two patches are prep patches into nowait I/O.
> 
> Changes since v1:
>  + changed name from _NONBLOCKING to *_NOWAIT
>  + filemap_range_has_page call moved to closer to (just before) calling 
> filemap_write_and_wait_range().
>  + BIO_NOWAIT limited to get_request()
>  + XFS fixes 
>   - included reflink 
>   - use of xfs_ilock_nowait() instead of a XFS_IOLOCK_NONBLOCKING flag
>   - Translate the flag through IOMAP_NOWAIT (iomap) to check for
> block allocation for the file.
>  + ext4 coding style
> 
> Changes since v2:
>  + Using aio_reserved1 as aio_rw_flags instead of aio_flags
>  + blk-mq support
>  + xfs uptodate with kernel and reflink changes
> 
>  Changes since v3:
>   + Added FS_NOWAIT, which is set if the filesystem supports NOWAIT feature.
>   + Checks in generic_make_request() to make sure BIO_NOWAIT comes in
> for async direct writes only.
>   + Added QUEUE_FLAG_NOWAIT, which is set if the device supports BIO_NOWAIT.
> This is added (rather not set) to block devices such as dm/md currently.
> 
>  Changes since v4:
>   + Ported AIO code to use RWF_* flags. Check for RWF_* flags in
> generic_file_write_iter().
>   + Changed IOCB_RW_FLAGS_NOWAIT to RWF_NOWAIT.
> 
>  Changes since v5:
>   + BIO_NOWAIT to REQ_NOWAIT
>   + Common helper for RWF flags.
> 
>  Changes since v6:
>   + REQ_NOWAIT will be ignored for request based devices since they
> cannot block. So, removed QUEUE_FLAG_NOWAIT since it is not
> required in the current implementation. It will be resurrected
> when we program for stacked devices.
>   + changed kiocb_rw_flags() to kiocb_set_rw_flags() in order to accomodate
> for errors. Moved checks in the function.
> 
>  Changes since v7:
>   + split patches into prep so the main patches are smaller and easier
> to understand
>   + All patches are reviewed or acked!
>  
>  Changes since v8:
>  + Err out AIO reads with -EINVAL flagged as RWF_NOWAIT
> 
>  Changes since v9:
>  + Retract - Err out AIO reads with -EINVAL flagged as RWF_NOWAIT
>  + XFS returns EAGAIN if extent list is not in memory
>  + Man page updates to io_submit with iocb description and nowait features.
> 
>  Changes since v10:
>  + Corrected comment and subject in "return on congested block device"
> 
> -- 
> Goldwyn
> 
> 
---end quoted text---


Re: [PATCH] Fix loop device flush before configure

2017-06-08 Thread James Wang


On 06/08/2017 02:56 PM, Christoph Hellwig wrote:
> On Thu, Jun 08, 2017 at 08:45:31AM +0800, James Wang wrote:
>> Ok I got it blktests is a suite. I'd like to contribute something. If you 
>> need, we adapt you,;-)!
>> But I have to learn some how to do that, need time.
> I haven't added test myself to blktests yet either, so I'd have to
> learn it as well.  Omar can probably help you though.
>
>
Ah, I have fork this project  in Github.  and write 1 script in 'loop' 
group.

Debuging


James

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)



Re: [PATCH 7/8] blk-mq: create hctx for each present CPU

2017-06-08 Thread Christoph Hellwig
On Wed, Jun 07, 2017 at 03:04:11PM -0700, Omar Sandoval wrote:
> On Sat, Jun 03, 2017 at 04:04:02PM +0200, Christoph Hellwig wrote:
> > Currently we only create hctx for online CPUs, which can lead to a lot
> > of churn due to frequent soft offline / online operations.  Instead
> > allocate one for each present CPU to avoid this and dramatically simplify
> > the code.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> Oh man, this cleanup is great. Did you run blktests on this? block/008
> does a bunch of hotplugging while I/O is running.

I haven't run blktests yet, in fact when I did the work blktests didn't
exist yet.   But thanks for the reminder, I'll run it now.


Re: [PATCH] Fix loop device flush before configure

2017-06-08 Thread Christoph Hellwig
On Thu, Jun 08, 2017 at 08:45:31AM +0800, James Wang wrote:
> Ok I got it blktests is a suite. I'd like to contribute something. If you 
> need, we adapt you,;-)!
> But I have to learn some how to do that, need time.

I haven't added test myself to blktests yet either, so I'd have to
learn it as well.  Omar can probably help you though.


[PATCH] Fix loop device flush before configure v3

2017-06-08 Thread James Wang
While installing SLES-12 (based on v4.4), I found that the installer
will stall for 60+ seconds during LVM disk scan.  The root cause was
determined to be the removal of a bound device check in loop_flush()
by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq").

Restoring this check, examining ->lo_state as set by loop_set_fd()
eliminates the bad behavior.

Test method:
modprobe loop max_loop=64
dd if=/dev/zero of=disk bs=512 count=200K
for((i=0;i<4;i++))do losetup -f disk; done
mkfs.ext4 -F /dev/loop0
for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
for f in `ls /dev/loop[0-9]*|sort`; do \
echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
done

Test output:  stock  patched
/dev/loop018.1217e-058.3842e-05
/dev/loop1 6.1114e-050.000147979
/dev/loop100.414701  0.000116564
/dev/loop110.74746.7942e-05
/dev/loop120.747986  8.9082e-05
/dev/loop130.746532  7.4799e-05
/dev/loop140.480041  9.3926e-05
/dev/loop151.26453   7.2522e-05

Note that from loop10 onward, the device is not mounted, yet the
stock kernel consumes several orders of magnitude more wall time
than it does for a mounted device.
(Thanks for Mike Galbraith , give a changelog review.)

Reviewed-by: Hannes Reinecke 
Reviewed-by: Ming Lei 
Signed-off-by: James Wang 
Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq")
---
 drivers/block/loop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 48f6fa6f810e..2e5b8538760c 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -625,6 +625,9 @@ static int loop_switch(struct loop_device *lo, struct file 
*file)
  */
 static int loop_flush(struct loop_device *lo)
 {
+   /* loop not yet configured, no running thread, nothing to flush */
+   if (lo->lo_state != Lo_bound)
+   return 0;
return loop_switch(lo, NULL);
 }
 
-- 
2.12.3



[PATCH] Fix loop device flush before configure

2017-06-08 Thread James Wang
While installing SLES-12 (based on v4.4), I found that the installer
will stall for 60+ seconds during LVM disk scan.  The root cause was
determined to be the removal of a bound device check in loop_flush()
by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq").

Restoring this check, examining ->lo_state as set by loop_set_fd()
eliminates the bad behavior.

Test method:
modprobe loop max_loop=64
dd if=/dev/zero of=disk bs=512 count=200K
for((i=0;i<4;i++))do losetup -f disk; done
mkfs.ext4 -F /dev/loop0
for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
for f in `ls /dev/loop[0-9]*|sort`; do \
echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
done

Test output:  stock  patched
/dev/loop018.1217e-058.3842e-05
/dev/loop1 6.1114e-050.000147979
/dev/loop100.414701  0.000116564
/dev/loop110.74746.7942e-05
/dev/loop120.747986  8.9082e-05
/dev/loop130.746532  7.4799e-05
/dev/loop140.480041  9.3926e-05
/dev/loop151.26453   7.2522e-05

Note that from loop10 onward, the device is not mounted, yet the
stock kernel consumes several orders of magnitude more wall time
than it does for a mounted device.
(Thanks for Mike Galbraith , give a changelog review.)

Reviewed-by: Hannes Reinecke 
Reviewed-by: Ming Lei 
Signed-off-by: James Wang 
Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq")
---
 drivers/block/loop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 48f6fa6f810e..2e5b8538760c 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -625,6 +625,9 @@ static int loop_switch(struct loop_device *lo, struct file 
*file)
  */
 static int loop_flush(struct loop_device *lo)
 {
+   /* loop not yet configured, no running thread, nothing to flush */
+   if (lo->lo_state != Lo_bound)
+   return 0;
return loop_switch(lo, NULL);
 }
 
-- 
2.12.3