Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better
On 2017/9/30 上午11:17, Michael Lyle wrote: > Coly-- > > What you say is correct-- it has a few changes from current behavior. > > - When writeback rate is low, it is more willing to do contiguous > I/Os. This provides an opportunity for the IO scheduler to combine > operations together. The cost of doing 5 contiguous I/Os and 1 I/O is > usually about the same on spinning disks, because most of the cost is > seeking and rotational latency-- the actual sequential I/O bandwidth > is very high. This is a benefit. Hi Mike, Yes I can see it. > - When writeback rate is medium, it does I/O more efficiently. e.g. > if the current writeback rate is 10MB/sec, and there are two > contiguous 1MB segments, they would not presently be combined. A 1MB > write would occur, then we would increase the delay counter by 100ms, > and then the next write would wait; this new code would issue 2 1MB > writes one after the other, and then sleep 200ms. On a disk that does > 150MB/sec sequential, and has a 7ms seek time, this uses the disk for > 13ms + 7ms, compared to the old code that does 13ms + 7ms * 2. This > is the difference between using 10% of the disk's I/O throughput and > 13% of the disk's throughput to do the same work. If writeback_rate is not minimum value, it means there are front end write requests existing. In this case, backend writeback I/O should nice I/O throughput to front end I/O. Otherwise, application will observe increased I/O latency, especially when dirty percentage is not very high. For enterprise workload, this change hurts performance. An desired behavior for low latency enterprise workload is, when dirty percentage is low, once there is front end I/O, backend writeback should be at minimum rate. This patch will introduce unstable and unpredictable I/O latency. Unless there is performance bottleneck of writeback seeking, at least enterprise users will focus more on front end I/O latency > - When writeback rate is very high (e.g. can't be obtained), there is > not much difference currently, BUT: > > Patch 5 is very important. Right now, if there are many writebacks > happening at once, the cached blocks can be read in any order. This > means that if we want to writeback blocks 1,2,3,4,5 we could actually > end up issuing the write I/Os to the backing device as 3,1,4,2,5, with > delays between them. This is likely to make the disk seek a lot. > Patch 5 provides an ordering property to ensure that the writes get > issued in LBA order to the backing device. This method is helpful only when writeback I/Os is not issued continuously, other wise if they are issued within slice_idle, underlying elevator will reorder or merge the I/Os in larger request. > > ***The next step in this line of development (patch 6 ;) is to link > groups of contiguous I/Os into a list in the dirty_io structure. To > know whether the "next I/Os" will be contiguous, we need to scan ahead > like the new code in patch 4 does. Then, in turn, we can plug the > block device, and issue the contiguous writes together. This allows > us to guarantee that the I/Os will be properly merged and optimized by > the underlying block IO scheduler. Even with patch 5, currently the > I/Os end up imperfectly combined, and the block layer ends up issuing > writes 1, then 2,3, then 4,5. This is great that things are combined > some, but it could be combined into one big request.*** To get this > benefit, it requires something like what was done in patch 4. > Hmm, if you move the dirty IO from btree into dirty_io list, then perform I/O, there is risk that once machine is power down during writeback there might be dirty data lost. If you continuously issue dirty I/O and remove them from btree at same time, that means you will introduce more latency to front end I/O... And plug list will be unplugged automatically as default, when context switching happens. If you will performance read I/Os to the btrees, a context switch is probably to happen, then you won't keep a large bio lists ... IMHO when writeback rate is low, especially when backing hard disk is not bottleneck, group continuous I/Os in bcache code does not help too much for writeback performance. The only benefit is less I/O issued when front end I/O is low or idle, but not most of users care about it, especially enterprise users. > I believe patch 4 is useful on its own, but I have this and other > pieces of development that depend upon it. Current bcache code works well in most of writeback loads, I just worry that implementing an elevator in bcache writeback logic is a big investment with a little return. -- Coly Li
[PATCH v2] blk-throttle: fix possible io stall when upgrade to max
From: Joseph Qi There is a case which will lead to io stall. The case is described as follows. /test1 |-subtest1 /test2 |-subtest2 And subtest1 and subtest2 each has 32 queued bios already. Now upgrade to max. In throtl_upgrade_state, it will try to dispatch bios as follows: 1) tg=subtest1, do nothing; 2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending left, no need to schedule next dispatch; 3) tg=subtest2, do nothing; 4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending left, no need to schedule next dispatch; 5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2 to /; note that test1 and test2 each still has 16 queued bios left; 6) tg=/, try to schedule next dispatch, but since disptime is now (update in tg_update_disptime, wait=0), pending timer is not scheduled in fact; 7) In throtl_upgrade_state it totally dispatches 32 queued bios and with 32 left. test1 and test2 each has 16 queued bios; 8) throtl_pending_timer_fn sees the left over bios, but could do nothing, because throtl_select_dispatch returns 0, and test1/test2 has no pending tg. The blktrace shows the following: 8,32 00 2.539007641 0 m N throtl upgrade to max 8,32 00 2.539072267 0 m N throtl /test2 dispatch nr_queued=16 read=0 write=16 8,32 70 2.539077142 0 m N throtl /test1 dispatch nr_queued=16 read=0 write=16 So force schedule dispatch if there are pending children. Signed-off-by: Joseph Qi --- block/blk-throttle.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 0fea76a..17816a0 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -1911,11 +1911,11 @@ static void throtl_upgrade_state(struct throtl_data *td) tg->disptime = jiffies - 1; throtl_select_dispatch(sq); - throtl_schedule_next_dispatch(sq, false); + throtl_schedule_next_dispatch(sq, true); } rcu_read_unlock(); throtl_select_dispatch(&td->service_queue); - throtl_schedule_next_dispatch(&td->service_queue, false); + throtl_schedule_next_dispatch(&td->service_queue, true); queue_work(kthrotld_workqueue, &td->dispatch_work); } -- 1.9.4
[PATCH V7 3/6] block: pass flags to blk_queue_enter()
We need to pass PREEMPT flags to blk_queue_enter() for allocating request with RQF_PREEMPT in the following patch. Tested-by: Oleksandr Natalenko Tested-by: Martin Steigerwald Cc: Bart Van Assche Signed-off-by: Ming Lei --- block/blk-core.c | 10 ++ block/blk-mq.c | 5 +++-- block/blk-timeout.c| 2 +- fs/block_dev.c | 4 ++-- include/linux/blkdev.h | 7 ++- 5 files changed, 18 insertions(+), 10 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index a5011c824ac6..7d5040a6d5a4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -766,7 +766,7 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask) } EXPORT_SYMBOL(blk_alloc_queue); -int blk_queue_enter(struct request_queue *q, bool nowait) +int blk_queue_enter(struct request_queue *q, unsigned flags) { while (true) { int ret; @@ -774,7 +774,7 @@ int blk_queue_enter(struct request_queue *q, bool nowait) if (percpu_ref_tryget_live(&q->q_usage_counter)) return 0; - if (nowait) + if (flags & BLK_REQ_NOWAIT) return -EBUSY; /* @@ -1408,7 +1408,8 @@ static struct request *blk_old_get_request(struct request_queue *q, /* create ioc upfront */ create_io_context(gfp_mask, q->node); - ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM)); + ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ? + BLK_REQ_NOWAIT : 0); if (ret) return ERR_PTR(ret); spin_lock_irq(q->queue_lock); @@ -2215,7 +2216,8 @@ blk_qc_t generic_make_request(struct bio *bio) do { struct request_queue *q = bio->bi_disk->queue; - if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) { + if (likely(blk_queue_enter(q, (bio->bi_opf & REQ_NOWAIT) ? + BLK_REQ_NOWAIT : 0) == 0)) { struct bio_list lower, same; /* Create a fresh bio_list for all subordinate requests */ diff --git a/block/blk-mq.c b/block/blk-mq.c index 10c1f49f663d..45bff90e08f7 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -384,7 +384,8 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, struct request *rq; int ret; - ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT); + ret = blk_queue_enter(q, (flags & BLK_MQ_REQ_NOWAIT) ? + BLK_REQ_NOWAIT : 0); if (ret) return ERR_PTR(ret); @@ -423,7 +424,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, if (hctx_idx >= q->nr_hw_queues) return ERR_PTR(-EIO); - ret = blk_queue_enter(q, true); + ret = blk_queue_enter(q, BLK_REQ_NOWAIT); if (ret) return ERR_PTR(ret); diff --git a/block/blk-timeout.c b/block/blk-timeout.c index 17ec83bb0900..e803106a5e5b 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -134,7 +134,7 @@ void blk_timeout_work(struct work_struct *work) struct request *rq, *tmp; int next_set = 0; - if (blk_queue_enter(q, true)) + if (blk_queue_enter(q, BLK_REQ_NOWAIT)) return; spin_lock_irqsave(q->queue_lock, flags); diff --git a/fs/block_dev.c b/fs/block_dev.c index 93d088ffc05c..98cf2d7ee9d3 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -674,7 +674,7 @@ int bdev_read_page(struct block_device *bdev, sector_t sector, if (!ops->rw_page || bdev_get_integrity(bdev)) return result; - result = blk_queue_enter(bdev->bd_queue, false); + result = blk_queue_enter(bdev->bd_queue, 0); if (result) return result; result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, false); @@ -710,7 +710,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector, if (!ops->rw_page || bdev_get_integrity(bdev)) return -EOPNOTSUPP; - result = blk_queue_enter(bdev->bd_queue, false); + result = blk_queue_enter(bdev->bd_queue, 0); if (result) return result; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 02fa42d24b52..127f64c7012c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -858,6 +858,11 @@ enum { BLKPREP_INVALID,/* invalid command, kill, return -EREMOTEIO */ }; +/* passed to blk_queue_enter */ +enum { + BLK_REQ_NOWAIT = (1 << 0), +}; + extern unsigned long blk_max_low_pfn, blk_max_pfn; /* @@ -963,7 +968,7 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t, extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t, struct scsi_ioctl_command __user *); -extern int blk_queue_enter(struct request_queue *q, boo
[PATCH V7 4/6] block: prepare for passing RQF_PREEMPT to request allocation
REQF_PREEMPT is a bit special because the request is required to be dispatched to lld even when SCSI device is quiesced. So this patch introduces __blk_get_request() and allows users to pass RQF_PREEMPT flag in, then we can allow to allocate request of RQF_PREEMPT when queue is in mode of PREEMPT ONLY which will be introduced in the following patch. Tested-by: Oleksandr Natalenko Tested-by: Martin Steigerwald Cc: Bart Van Assche Signed-off-by: Ming Lei --- block/blk-core.c | 19 +-- block/blk-mq.c | 3 +-- include/linux/blk-mq.h | 7 --- include/linux/blkdev.h | 17 ++--- 4 files changed, 28 insertions(+), 18 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 7d5040a6d5a4..95b1c5e50be3 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1398,7 +1398,8 @@ static struct request *get_request(struct request_queue *q, unsigned int op, } static struct request *blk_old_get_request(struct request_queue *q, - unsigned int op, gfp_t gfp_mask) + unsigned int op, gfp_t gfp_mask, + unsigned int flags) { struct request *rq; int ret = 0; @@ -1408,8 +1409,7 @@ static struct request *blk_old_get_request(struct request_queue *q, /* create ioc upfront */ create_io_context(gfp_mask, q->node); - ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ? - BLK_REQ_NOWAIT : 0); + ret = blk_queue_enter(q, flags & BLK_REQ_BITS_MASK); if (ret) return ERR_PTR(ret); spin_lock_irq(q->queue_lock); @@ -1427,26 +1427,25 @@ static struct request *blk_old_get_request(struct request_queue *q, return rq; } -struct request *blk_get_request(struct request_queue *q, unsigned int op, - gfp_t gfp_mask) +struct request *__blk_get_request(struct request_queue *q, unsigned int op, + gfp_t gfp_mask, unsigned int flags) { struct request *req; + flags |= gfp_mask & __GFP_DIRECT_RECLAIM ? 0 : BLK_REQ_NOWAIT; if (q->mq_ops) { - req = blk_mq_alloc_request(q, op, - (gfp_mask & __GFP_DIRECT_RECLAIM) ? - 0 : BLK_MQ_REQ_NOWAIT); + req = blk_mq_alloc_request(q, op, flags); if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn) q->mq_ops->initialize_rq_fn(req); } else { - req = blk_old_get_request(q, op, gfp_mask); + req = blk_old_get_request(q, op, gfp_mask, flags); if (!IS_ERR(req) && q->initialize_rq_fn) q->initialize_rq_fn(req); } return req; } -EXPORT_SYMBOL(blk_get_request); +EXPORT_SYMBOL(__blk_get_request); /** * blk_requeue_request - put a request back on queue diff --git a/block/blk-mq.c b/block/blk-mq.c index 45bff90e08f7..90b43f607e3c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -384,8 +384,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, struct request *rq; int ret; - ret = blk_queue_enter(q, (flags & BLK_MQ_REQ_NOWAIT) ? - BLK_REQ_NOWAIT : 0); + ret = blk_queue_enter(q, flags & BLK_REQ_BITS_MASK); if (ret) return ERR_PTR(ret); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 50c6485cb04f..066a676d7749 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -197,9 +197,10 @@ void blk_mq_free_request(struct request *rq); bool blk_mq_can_queue(struct blk_mq_hw_ctx *); enum { - BLK_MQ_REQ_NOWAIT = (1 << 0), /* return when out of requests */ - BLK_MQ_REQ_RESERVED = (1 << 1), /* allocate from reserved pool */ - BLK_MQ_REQ_INTERNAL = (1 << 2), /* allocate internal/sched tag */ + BLK_MQ_REQ_NOWAIT = BLK_REQ_NOWAIT, /* return when out of requests */ + BLK_MQ_REQ_PREEMPT = BLK_REQ_PREEMPT, /* allocate for RQF_PREEMPT */ + BLK_MQ_REQ_RESERVED = (1 << BLK_REQ_MQ_START_BIT), /* allocate from reserved pool */ + BLK_MQ_REQ_INTERNAL = (1 << (BLK_REQ_MQ_START_BIT + 1)), /* allocate internal/sched tag */ }; struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 127f64c7012c..68445adc8765 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -860,7 +860,10 @@ enum { /* passed to blk_queue_enter */ enum { - BLK_REQ_NOWAIT = (1 << 0), + BLK_REQ_NOWAIT = (1 << 0), + BLK_REQ_PREEMPT = (1 << 1), + BLK_REQ_MQ_START_BIT= 2, + BLK_REQ_BITS_MASK = (1U << BLK_REQ_MQ_START_BIT) - 1, }; extern unsigned long blk_max_low_pfn, blk_max_pfn; @@ -945,8 +948,9 @@ extern vo
[PATCH V7 6/6] SCSI: set block queue at preempt only when SCSI device is put into quiesce
Simply quiesing SCSI device and waiting for completeion of IO dispatched to SCSI queue isn't safe, it is easy to use up request pool because all allocated requests before can't be dispatched when device is put in QIUESCE. Then no request can be allocated for RQF_PREEMPT, and system may hang somewhere, such as When sending commands of sync_cache or start_stop during system suspend path. Before quiesing SCSI, this patch sets block queue in preempt mode first, so no new normal request can enter queue any more, and all pending requests are drained too once blk_set_preempt_only(true) is returned. Then RQF_PREEMPT can be allocated successfully duirng SCSI quiescing. This patch fixes one long term issue of IO hang, in either block legacy and blk-mq. Tested-by: Oleksandr Natalenko Tested-by: Martin Steigerwald Cc: sta...@vger.kernel.org Cc: Bart Van Assche Signed-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 9cf6a80fe297..82c51619f1b7 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -252,9 +252,10 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, struct scsi_request *rq; int ret = DRIVER_ERROR << 24; - req = blk_get_request(sdev->request_queue, + req = __blk_get_request(sdev->request_queue, data_direction == DMA_TO_DEVICE ? - REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM); + REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM, + BLK_REQ_PREEMPT); if (IS_ERR(req)) return ret; rq = scsi_req(req); @@ -2928,12 +2929,28 @@ scsi_device_quiesce(struct scsi_device *sdev) { int err; + /* +* Simply quiesing SCSI device isn't safe, it is easy +* to use up requests because all these allocated requests +* can't be dispatched when device is put in QIUESCE. +* Then no request can be allocated and we may hang +* somewhere, such as system suspend/resume. +* +* So we set block queue in preempt only first, no new +* normal request can enter queue any more, and all pending +* requests are drained once blk_set_preempt_only() +* returns. Only RQF_PREEMPT is allowed in preempt only mode. +*/ + blk_set_preempt_only(sdev->request_queue, true); + mutex_lock(&sdev->state_mutex); err = scsi_device_set_state(sdev, SDEV_QUIESCE); mutex_unlock(&sdev->state_mutex); - if (err) + if (err) { + blk_set_preempt_only(sdev->request_queue, false); return err; + } scsi_run_queue(sdev->request_queue); while (atomic_read(&sdev->device_busy)) { @@ -2964,6 +2981,8 @@ void scsi_device_resume(struct scsi_device *sdev) scsi_device_set_state(sdev, SDEV_RUNNING) == 0) scsi_run_queue(sdev->request_queue); mutex_unlock(&sdev->state_mutex); + + blk_set_preempt_only(sdev->request_queue, false); } EXPORT_SYMBOL(scsi_device_resume); -- 2.9.5
[PATCH V7 5/6] block: support PREEMPT_ONLY
When queue is in PREEMPT_ONLY mode, only RQF_PREEMPT request can be allocated and dispatched, other requests won't be allowed to enter I/O path. This is useful for supporting safe SCSI quiesce. Part of this patch is from Bart's '[PATCH v4 4∕7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag'. Tested-by: Oleksandr Natalenko Tested-by: Martin Steigerwald Cc: Bart Van Assche Signed-off-by: Ming Lei --- block/blk-core.c | 26 -- include/linux/blkdev.h | 5 + 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 95b1c5e50be3..bb683bfe37b2 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -346,6 +346,17 @@ void blk_sync_queue(struct request_queue *q) } EXPORT_SYMBOL(blk_sync_queue); +void blk_set_preempt_only(struct request_queue *q, bool preempt_only) +{ + blk_mq_freeze_queue(q); + if (preempt_only) + queue_flag_set_unlocked(QUEUE_FLAG_PREEMPT_ONLY, q); + else + queue_flag_clear_unlocked(QUEUE_FLAG_PREEMPT_ONLY, q); + blk_mq_unfreeze_queue(q); +} +EXPORT_SYMBOL(blk_set_preempt_only); + /** * __blk_run_queue_uncond - run a queue whether or not it has been stopped * @q: The queue to run @@ -771,9 +782,18 @@ int blk_queue_enter(struct request_queue *q, unsigned flags) while (true) { int ret; + /* +* preempt_only flag has to be set after queue is frozen, +* so it can be checked here lockless and safely +*/ + if (blk_queue_preempt_only(q)) { + if (!(flags & BLK_REQ_PREEMPT)) + goto slow_path; + } + if (percpu_ref_tryget_live(&q->q_usage_counter)) return 0; - + slow_path: if (flags & BLK_REQ_NOWAIT) return -EBUSY; @@ -787,7 +807,9 @@ int blk_queue_enter(struct request_queue *q, unsigned flags) smp_rmb(); ret = wait_event_interruptible(q->mq_freeze_wq, - !atomic_read(&q->mq_freeze_depth) || + (!atomic_read(&q->mq_freeze_depth) && + ((flags & BLK_REQ_PREEMPT) || +!blk_queue_preempt_only(q))) || blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 68445adc8765..b01a0c6bb1f0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -631,6 +631,7 @@ struct request_queue { #define QUEUE_FLAG_REGISTERED 26 /* queue has been registered to a disk */ #define QUEUE_FLAG_SCSI_PASSTHROUGH 27 /* queue supports SCSI commands */ #define QUEUE_FLAG_QUIESCED28 /* queue has been quiesced */ +#define QUEUE_FLAG_PREEMPT_ONLY29 /* only process REQ_PREEMPT requests */ #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\ (1 << QUEUE_FLAG_STACKABLE)| \ @@ -735,6 +736,10 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q) ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \ REQ_FAILFAST_DRIVER)) #define blk_queue_quiesced(q) test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags) +#define blk_queue_preempt_only(q) \ + test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags) + +extern void blk_set_preempt_only(struct request_queue *q, bool preempt_only); static inline bool blk_account_rq(struct request *rq) { -- 2.9.5
[PATCH V7 1/6] blk-mq: only run hw queues for blk-mq
This patch just makes it explicitely. Tested-by: Oleksandr Natalenko Tested-by: Martin Steigerwald Reviewed-by: Johannes Thumshirn Cc: Bart Van Assche Signed-off-by: Ming Lei --- block/blk-mq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 98a18609755e..6fd9f86fc86d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q) freeze_depth = atomic_inc_return(&q->mq_freeze_depth); if (freeze_depth == 1) { percpu_ref_kill(&q->q_usage_counter); - blk_mq_run_hw_queues(q, false); + if (q->mq_ops) + blk_mq_run_hw_queues(q, false); } } EXPORT_SYMBOL_GPL(blk_freeze_queue_start); -- 2.9.5
[PATCH V7 0/6] block/scsi: safe SCSI quiescing
Hi Jens, Please consider this patchset for V4.15, and it fixes one kind of long-term I/O hang issue in either block legacy path or blk-mq. The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT can be dispatched to SCSI successfully, and scsi_device_quiesce() just simply waits for completion of I/Os dispatched to SCSI stack. It isn't enough at all. Because new request still can be comming, but all the allocated requests can't be dispatched successfully, so request pool can be consumed up easily. Then request with RQF_PREEMPT can't be allocated and wait forever, then system hangs forever, such as during system suspend or sending SCSI domain alidation in case of transport_spi. Both IO hang inside system suspend[1] or SCSI domain validation were reported before. This patch introduces preempt only mode, and solves the issue by allowing RQF_PREEMP only during SCSI quiesce. Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes them all. V7: - add Reviewed-by & Tested-by - one line change in patch 5 for checking preempt request V6: - borrow Bart's idea of preempt only, with clean implementation(patch 5/patch 6) - needn't any external driver's dependency, such as MD's change V5: - fix one tiny race by introducing blk_queue_enter_preempt_freeze() given this change is small enough compared with V4, I added tested-by directly V4: - reorganize patch order to make it more reasonable - support nested preempt freeze, as required by SCSI transport spi - check preempt freezing in slow path of of blk_queue_enter() - add "SCSI: transport_spi: resume a quiesced device" - wake up freeze queue in setting dying for both blk-mq and legacy - rename blk_mq_[freeze|unfreeze]_queue() in one patch - rename .mq_freeze_wq and .mq_freeze_depth - improve comment V3: - introduce q->preempt_unfreezing to fix one bug of preempt freeze - call blk_queue_enter_live() only when queue is preempt frozen - cleanup a bit on the implementation of preempt freeze - only patch 6 and 7 are changed V2: - drop the 1st patch in V1 because percpu_ref_is_dying() is enough as pointed by Tejun - introduce preempt version of blk_[freeze|unfreeze]_queue - sync between preempt freeze and normal freeze - fix warning from percpu-refcount as reported by Oleksandr [1] https://marc.info/?t=150340250100013&r=3&w=2 Thanks, Ming Ming Lei (6): blk-mq: only run hw queues for blk-mq block: tracking request allocation with q_usage_counter block: pass flags to blk_queue_enter() block: prepare for passing RQF_PREEMPT to request allocation block: support PREEMPT_ONLY SCSI: set block queue at preempt only when SCSI device is put into quiesce block/blk-core.c| 63 +++-- block/blk-mq.c | 14 --- block/blk-timeout.c | 2 +- drivers/scsi/scsi_lib.c | 25 +--- fs/block_dev.c | 4 ++-- include/linux/blk-mq.h | 7 +++--- include/linux/blkdev.h | 27 ++--- 7 files changed, 107 insertions(+), 35 deletions(-) -- 2.9.5
[PATCH V7 2/6] block: tracking request allocation with q_usage_counter
This usage is basically same with blk-mq, so that we can support to freeze legacy queue easily. Also 'wake_up_all(&q->mq_freeze_wq)' has to be moved into blk_set_queue_dying() since both legacy and blk-mq may wait on the wait queue of .mq_freeze_wq. Tested-by: Oleksandr Natalenko Tested-by: Martin Steigerwald Reviewed-by: Hannes Reinecke Cc: Bart Van Assche Signed-off-by: Ming Lei --- block/blk-core.c | 14 ++ block/blk-mq.c | 7 --- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 048be4aa6024..a5011c824ac6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -610,6 +610,12 @@ void blk_set_queue_dying(struct request_queue *q) } spin_unlock_irq(q->queue_lock); } + + /* +* We need to ensure that processes currently waiting on +* the queue are notified as well. +*/ + wake_up_all(&q->mq_freeze_wq); } EXPORT_SYMBOL_GPL(blk_set_queue_dying); @@ -1395,16 +1401,21 @@ static struct request *blk_old_get_request(struct request_queue *q, unsigned int op, gfp_t gfp_mask) { struct request *rq; + int ret = 0; WARN_ON_ONCE(q->mq_ops); /* create ioc upfront */ create_io_context(gfp_mask, q->node); + ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM)); + if (ret) + return ERR_PTR(ret); spin_lock_irq(q->queue_lock); rq = get_request(q, op, NULL, gfp_mask); if (IS_ERR(rq)) { spin_unlock_irq(q->queue_lock); + blk_queue_exit(q); return rq; } @@ -1576,6 +1587,7 @@ void __blk_put_request(struct request_queue *q, struct request *req) blk_free_request(rl, req); freed_request(rl, sync, rq_flags); blk_put_rl(rl); + blk_queue_exit(q); } } EXPORT_SYMBOL_GPL(__blk_put_request); @@ -1857,8 +1869,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio) * Grab a free request. This is might sleep but can not fail. * Returns with the queue unlocked. */ + blk_queue_enter_live(q); req = get_request(q, bio->bi_opf, bio, GFP_NOIO); if (IS_ERR(req)) { + blk_queue_exit(q); __wbt_done(q->rq_wb, wb_acct); if (PTR_ERR(req) == -ENOMEM) bio->bi_status = BLK_STS_RESOURCE; diff --git a/block/blk-mq.c b/block/blk-mq.c index 6fd9f86fc86d..10c1f49f663d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -256,13 +256,6 @@ void blk_mq_wake_waiters(struct request_queue *q) queue_for_each_hw_ctx(q, hctx, i) if (blk_mq_hw_queue_mapped(hctx)) blk_mq_tag_wakeup_all(hctx->tags, true); - - /* -* If we are called because the queue has now been marked as -* dying, we need to ensure that processes currently waiting on -* the queue are notified as well. -*/ - wake_up_all(&q->mq_freeze_wq); } bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) -- 2.9.5
Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better
Coly-- What you say is correct-- it has a few changes from current behavior. - When writeback rate is low, it is more willing to do contiguous I/Os. This provides an opportunity for the IO scheduler to combine operations together. The cost of doing 5 contiguous I/Os and 1 I/O is usually about the same on spinning disks, because most of the cost is seeking and rotational latency-- the actual sequential I/O bandwidth is very high. This is a benefit. - When writeback rate is medium, it does I/O more efficiently. e.g. if the current writeback rate is 10MB/sec, and there are two contiguous 1MB segments, they would not presently be combined. A 1MB write would occur, then we would increase the delay counter by 100ms, and then the next write would wait; this new code would issue 2 1MB writes one after the other, and then sleep 200ms. On a disk that does 150MB/sec sequential, and has a 7ms seek time, this uses the disk for 13ms + 7ms, compared to the old code that does 13ms + 7ms * 2. This is the difference between using 10% of the disk's I/O throughput and 13% of the disk's throughput to do the same work. - When writeback rate is very high (e.g. can't be obtained), there is not much difference currently, BUT: Patch 5 is very important. Right now, if there are many writebacks happening at once, the cached blocks can be read in any order. This means that if we want to writeback blocks 1,2,3,4,5 we could actually end up issuing the write I/Os to the backing device as 3,1,4,2,5, with delays between them. This is likely to make the disk seek a lot. Patch 5 provides an ordering property to ensure that the writes get issued in LBA order to the backing device. ***The next step in this line of development (patch 6 ;) is to link groups of contiguous I/Os into a list in the dirty_io structure. To know whether the "next I/Os" will be contiguous, we need to scan ahead like the new code in patch 4 does. Then, in turn, we can plug the block device, and issue the contiguous writes together. This allows us to guarantee that the I/Os will be properly merged and optimized by the underlying block IO scheduler. Even with patch 5, currently the I/Os end up imperfectly combined, and the block layer ends up issuing writes 1, then 2,3, then 4,5. This is great that things are combined some, but it could be combined into one big request.*** To get this benefit, it requires something like what was done in patch 4. I believe patch 4 is useful on its own, but I have this and other pieces of development that depend upon it. Thanks, Mike
Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better
On 2017/9/27 下午3:32, tang.jun...@zte.com.cn wrote: > From: Tang Junhui > > Hello Mike: > > For the second question, I thinks this modification is somewhat complex, > cannot we do something simple to resolve it? I remember there were some > patches trying to avoid too small writeback rate, Coly, is there any > progress now? > Junhui, That patch works well, but before I solve the latency of calculating dirty stripe numbers, I won't push it upstream so far. This patch does not conflict with my max-writeback-rate-when-idle, this patch tries to fetch more dirty keys from cache device which are continuous on cached device, and assume they can be continuously written back to cached device. For the above purpose, if writeback_rate is high, dc->last_read just works well. But when dc->writeback_rate is low, e.g. 8, event KEY_START(&w->key) == dc->last_read, the continuous key will only be submit in next delay cycle. I feel Micheal wants to make larger writeback I/O and delay more, then backing cached device may be woke up less chance. This policy only works better then current dc->last_read when writeback_rate is low, that's say, when front write I/O is low or no front write. I hesitate whether it is worthy to modify general writeback logic for it. > --- > Tang Junhui > >> Ah-- re #1 -- I was investigating earlier why not as much was combined >> as I thought should be when idle. This is surely a factor. Thanks >> for the catch-- KEY_OFFSET is correct. I will fix and retest. >> >> (Under heavy load, the correct thing still happens, but not under >> light or intermediate load0. >> >> About #2-- I wanted to attain a bounded amount of "combining" of >> operations. If we have 5 4k extents in a row to dispatch, it seems >> really wasteful to issue them as 5 IOs 60ms apart, which the existing >> code would be willing to do-- I'd rather do a 20k write IO (basically >> the same cost as a 4k write IO) and then sleep 300ms. It is dependent >> on the elevator/IO scheduler merging the requests. At the same time, >> I'd rather not combine a really large request. >> >> It would be really neat to blk_plug the backing device during the >> write issuance, but that requires further work. >> >> Thanks >> >> Mike >> >> On Tue, Sep 26, 2017 at 11:51 PM, wrote: >>> From: Tang Junhui >>> >>> Hello Lyle: >>> >>> Two questions: >>> 1) In keys_contiguous(), you judge I/O contiguous in cache device, but not >>> in backing device. I think you should judge it by backing device (remove >>> PTR_CACHE() and use KEY_OFFSET() instead of PTR_OFFSET()?). >>> >>> 2) I did not see you combine samll contiguous I/Os to big I/O, so I think >>> it is useful when writeback rate was low by avoiding single I/O write, but >>> have no sense in high writeback rate, since previously it is also write >>> I/Os asynchronously. >>> >>> --- >>> Tang Junhui >>> Previously, there was some logic that attempted to immediately issue writeback of backing-contiguous blocks when the writeback rate was fast. The previous logic did not have any limits on the aggregate size it would issue, nor the number of keys it would combine at once. It would also discard the chance to do a contiguous write when the writeback rate was low-- e.g. at "background" writeback of target rate = 8, it would not combine two adjacent 4k writes and would instead seek the disk twice. This patch imposes limits and explicitly understands the size of contiguous I/O during issue. It also will combine contiguous I/O in all circumstances, not just when writeback is requested to be relatively fast. It is a win on its own, but also lays the groundwork for skip writes to short keys to make the I/O more sequential/contiguous. Signed-off-by: Michael Lyle [snip code] -- Coly Li
[PATCH v2] null_blk: add "no_sched" module parameter
add an option that disable io scheduler for null block device. Signed-off-by: weiping zhang --- drivers/block/null_blk.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index bd92286..38f4a8c 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -154,6 +154,10 @@ enum { NULL_Q_MQ = 2, }; +static int g_no_sched; +module_param_named(no_sched, g_no_sched, int, S_IRUGO); +MODULE_PARM_DESC(no_sched, "No io scheduler"); + static int g_submit_queues = 1; module_param_named(submit_queues, g_submit_queues, int, S_IRUGO); MODULE_PARM_DESC(submit_queues, "Number of submission queues"); @@ -1754,6 +1758,8 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set) set->numa_node = nullb ? nullb->dev->home_node : g_home_node; set->cmd_size = sizeof(struct nullb_cmd); set->flags = BLK_MQ_F_SHOULD_MERGE; + if (g_no_sched) + set->flags |= BLK_MQ_F_NO_SCHED; set->driver_data = NULL; if ((nullb && nullb->dev->blocking) || g_blocking) -- 2.9.4
Re: [PATCH] null_blk: add "no_sched" module parameter
On Fri, Sep 29, 2017 at 11:39:03PM +0200, Jens Axboe wrote: > On 09/29/2017 07:09 PM, weiping zhang wrote: > > add an option that disable io scheduler for null block device. > > > > Signed-off-by: weiping zhang > > --- > > drivers/block/null_blk.c | 6 +- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c > > index bd92286..3c63863 100644 > > --- a/drivers/block/null_blk.c > > +++ b/drivers/block/null_blk.c > > @@ -154,6 +154,10 @@ enum { > > NULL_Q_MQ = 2, > > }; > > > > +static int g_no_sched; > > +module_param_named(no_sched, g_no_sched, int, S_IRUGO); > > +MODULE_PARM_DESC(no_sched, "No io scheduler"); > > + > > static int g_submit_queues = 1; > > module_param_named(submit_queues, g_submit_queues, int, S_IRUGO); > > MODULE_PARM_DESC(submit_queues, "Number of submission queues"); > > @@ -1753,7 +1757,7 @@ static int null_init_tag_set(struct nullb *nullb, > > struct blk_mq_tag_set *set) > > g_hw_queue_depth; > > set->numa_node = nullb ? nullb->dev->home_node : g_home_node; > > set->cmd_size = sizeof(struct nullb_cmd); > > - set->flags = BLK_MQ_F_SHOULD_MERGE; > > + set->flags = g_no_sched ? BLK_MQ_F_NO_SCHED : BLK_MQ_F_SHOULD_MERGE; > > This should be: > > set->flags = BLK_MQ_F_SHOULD_MERGE; > if (g_no_sched) > set->flags |= BLK_MQ_F_NO_SCHED; > That's right, I go through these two flags, if no io scheduler, BLK_MQ_F_SHOULD_MERGE can make sw ctx merge happen. I will send V2. Thanks weiping
How to enable multi-path on kernel 4.8.17
Hi, All Because my environment requirements, the kernel must use 4.8.17, I would like to ask, how to use the kernel 4.8.17 nvme multi-path? Because I see support for multi-path versions are above 4.13 Expect everyone's help, thank you very much
Re: [PATCH 1/2] block: genhd: add device_add_disk_with_groups
On Thu, Sep 28, 2017 at 09:36:36PM +0200, Martin Wilck wrote: > In the NVME subsystem, we're seeing a race condition with udev where > device_add_disk() is called (which triggers an "add" uevent), and a > sysfs attribute group is added to the disk device afterwards. > If udev rules access these attributes before they are created, > udev processing of the device is incomplete, in particular, device > WWIDs may not be determined correctly. > > To fix this, this patch introduces a new function > device_add_disk_with_groups(), which takes a list of attribute groups > and adds them to the device before sending out uevents. > > Signed-off-by: Martin Wilck Is NVMe the only one having this problem? Was putting our attributes in the disk's kobj a bad choice? Any, looks fine to me. Reviewed-by: Keith Busch
Re: [PATCH 2/2] nvme: use device_add_disk_with_groups()
On Thu, Sep 28, 2017 at 09:36:37PM +0200, Martin Wilck wrote: > By using device_add_disk_with_groups(), we can avoid the race > condition with udev rule processing, because no udev event will > be triggered before all attributes are available. > > Signed-off-by: Martin Wilck Looks good. Reviewed-by: Keith Busch
Re: [PATCH] null_blk: add "no_sched" module parameter
On 09/29/2017 07:09 PM, weiping zhang wrote: > add an option that disable io scheduler for null block device. > > Signed-off-by: weiping zhang > --- > drivers/block/null_blk.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c > index bd92286..3c63863 100644 > --- a/drivers/block/null_blk.c > +++ b/drivers/block/null_blk.c > @@ -154,6 +154,10 @@ enum { > NULL_Q_MQ = 2, > }; > > +static int g_no_sched; > +module_param_named(no_sched, g_no_sched, int, S_IRUGO); > +MODULE_PARM_DESC(no_sched, "No io scheduler"); > + > static int g_submit_queues = 1; > module_param_named(submit_queues, g_submit_queues, int, S_IRUGO); > MODULE_PARM_DESC(submit_queues, "Number of submission queues"); > @@ -1753,7 +1757,7 @@ static int null_init_tag_set(struct nullb *nullb, > struct blk_mq_tag_set *set) > g_hw_queue_depth; > set->numa_node = nullb ? nullb->dev->home_node : g_home_node; > set->cmd_size = sizeof(struct nullb_cmd); > - set->flags = BLK_MQ_F_SHOULD_MERGE; > + set->flags = g_no_sched ? BLK_MQ_F_NO_SCHED : BLK_MQ_F_SHOULD_MERGE; This should be: set->flags = BLK_MQ_F_SHOULD_MERGE; if (g_no_sched) set->flags |= BLK_MQ_F_NO_SCHED; -- Jens Axboe
RE: [PATCH 2/2] nvme: use device_add_disk_with_groups()
> From: Linux-nvme [mailto:linux-nvme-boun...@lists.infradead.org] On Behalf Of > Martin Wilck > Sent: Thursday, September 28, 2017 2:37 PM > To: Jens Axboe ; Christoph Hellwig ; Johannes > Thumshirn > Cc: linux-block@vger.kernel.org; Martin Wilck ; > linux-ker...@vger.kernel.org; linux-n...@lists.infradead.org; > Hannes Reinecke > Subject: [PATCH 2/2] nvme: use device_add_disk_with_groups() > Tested-by: Steve Schremmer
RE: [PATCH 1/2] block: genhd: add device_add_disk_with_groups
> From: Linux-nvme [mailto:linux-nvme-boun...@lists.infradead.org] On Behalf Of > Martin Wilck > Sent: Thursday, September 28, 2017 2:37 PM > To: Jens Axboe ; Christoph Hellwig ; Johannes > Thumshirn > Cc: linux-block@vger.kernel.org; Martin Wilck ; > linux-ker...@vger.kernel.org; linux-n...@lists.infradead.org; > Hannes Reinecke > Subject: [PATCH 1/2] block: genhd: add device_add_disk_with_groups > Tested-by: Steve Schremmer
Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing
Ming Lei - 27.09.17, 16:27: > On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote: > > Hi Ming. > > > > Ming Lei - 27.09.17, 13:48: > > > Hi, > > > > > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. > > > > > > Once SCSI device is put into QUIESCE, no new request except for > > > RQF_PREEMPT can be dispatched to SCSI successfully, and > > > scsi_device_quiesce() just simply waits for completion of I/Os > > > dispatched to SCSI stack. It isn't enough at all. > > > > > > Because new request still can be comming, but all the allocated > > > requests can't be dispatched successfully, so request pool can be > > > consumed up easily. > > > > > > Then request with RQF_PREEMPT can't be allocated and wait forever, > > > meantime scsi_device_resume() waits for completion of RQF_PREEMPT, > > > then system hangs forever, such as during system suspend or > > > sending SCSI domain alidation. > > > > > > Both IO hang inside system suspend[1] or SCSI domain validation > > > were reported before. > > > > > > This patch introduces preempt only mode, and solves the issue > > > by allowing RQF_PREEMP only during SCSI quiesce. > > > > > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes > > > them all. > > > > > > V6: > > > - borrow Bart's idea of preempt only, with clean > > > > > > implementation(patch 5/patch 6) > > > > > > - needn't any external driver's dependency, such as MD's > > > change > > > > Do you want me to test with v6 of the patch set? If so, it would be nice > > if > > you´d make a v6 branch in your git repo. > > Hi Martin, > > I appreciate much if you may run V6 and provide your test result, > follows the branch: > > https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V6 > > https://github.com/ming1/linux.git #blk_safe_scsi_quiesce_V6 > > > After an uptime of almost 6 days I am pretty confident that the V5 one > > fixes the issue for me. So > > > > Tested-by: Martin Steigerwald > > > > for V5. > > Thanks for your test! Two days and almost 6 hours, no hang yet. I bet the whole thing works. (3e45474d7df3bfdabe4801b5638d197df9810a79) Tested-By: Martin Steigerwald (It could still hang after three days, but usually I got the first hang within the first two days.) Thanks, -- Martin
[PATCH] null_blk: add "no_sched" module parameter
add an option that disable io scheduler for null block device. Signed-off-by: weiping zhang --- drivers/block/null_blk.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index bd92286..3c63863 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -154,6 +154,10 @@ enum { NULL_Q_MQ = 2, }; +static int g_no_sched; +module_param_named(no_sched, g_no_sched, int, S_IRUGO); +MODULE_PARM_DESC(no_sched, "No io scheduler"); + static int g_submit_queues = 1; module_param_named(submit_queues, g_submit_queues, int, S_IRUGO); MODULE_PARM_DESC(submit_queues, "Number of submission queues"); @@ -1753,7 +1757,7 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set) g_hw_queue_depth; set->numa_node = nullb ? nullb->dev->home_node : g_home_node; set->cmd_size = sizeof(struct nullb_cmd); - set->flags = BLK_MQ_F_SHOULD_MERGE; + set->flags = g_no_sched ? BLK_MQ_F_NO_SCHED : BLK_MQ_F_SHOULD_MERGE; set->driver_data = NULL; if ((nullb && nullb->dev->blocking) || g_blocking) -- 2.9.4
Re: [PATCH 8/9] nvme: implement multipath access to nvme subsystems
Hi, All Because my environment requirements, the kernel must use 4.8.17, I would like to ask, how to use the kernel 4.8.17 nvme multi-path? Because I see support for multi-path versions are above 4.13 Expect everyone's help, thank you very much 2017-09-28 23:53 GMT+08:00 Keith Busch : > On Mon, Sep 25, 2017 at 03:40:30PM +0200, Christoph Hellwig wrote: >> The new block devices nodes for multipath access will show up as >> >> /dev/nvm-subXnZ > > Just thinking ahead ... Once this goes in, someone will want to boot their > OS from a multipath target. It was a pain getting installers to recognize > /dev/nvmeXnY as an install destination. I'm not sure if installers have > gotten any better in the last 5 years about recognizing new block names. > > ___ > Linux-nvme mailing list > linux-n...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme