Re: [dm-devel] [PATCH v4 0/2] block/dm: support bio polling
On Fri, Mar 04 2022 at 8:43P -0500, Ming Lei wrote: > On Fri, Mar 04, 2022 at 04:26:21PM -0500, Mike Snitzer wrote: > > Hi, > > > > I've rebased Ming's latest [1] ontop of dm-5.18 [2] (which is based on > > for-5.18/block). End result available in dm-5.18-biopoll branch [3] > > > > These changes add bio polling support to DM. Tested with linear and > > striped DM targets. > > > > IOPS improvement was ~5% on my baremetal system with a single Intel > > Optane NVMe device (555K hipri=1 vs 525K hipri=0). > > > > Ming has seen better improvement while testing within a VM: > > dm-linear: hipri=1 vs hipri=0 15~20% iops improvement > > dm-stripe: hipri=1 vs hipri=0 ~30% iops improvement > > > > I'd like to merge these changes via the DM tree when the 5.18 merge > > window opens. The first block patch that adds ->poll_bio to > > block_device_operations will need review so that I can take it > > through the DM tree. Reason for going through the DM tree is there > > have been some fairly extensive changes queued in dm-5.18 that build > > on for-5.18/block. So I think it easiest to just add the block > > depenency via DM tree since DM is first consumer of ->poll_bio > > > > FYI, Ming does have another DM patch [4] that looks to avoid using > > hlist but I only just saw it. bio_split() _is_ involved (see > > dm_split_and_process_bio) so I'm not exactly sure where he is going > > with that change. > > io_uring(polling) workloads often cares latency, so big IO request > isn't involved usually, I guess. Then bio_split() is seldom called in > dm_split_and_process_bio(), such as if 4k random IO is run on dm-linear > or dm-stripe via io_uring, bio_split() won't be run into. > > Single list is enough here, and efficient than hlist, just need > a little care to delete element from the list since linux kernel doesn't > have generic single list implementation. OK, makes sense, thanks for clarifying. But yeah its a bit fiddley for sure. > > But that is DM-implementation detail that we'll > > sort out. > > Yeah, that patch also needs more test. Yeap, sounds good. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v5 0/2] block/dm: support bio polling
Hi, I've rebased Ming's latest [1] ontop of dm-5.18 [2] (which is based on for-5.18/block). End result available in dm-5.18-biopoll branch [3] These changes add bio polling support to DM. Tested with linear and striped DM targets. IOPS improvement was ~5% on my baremetal system with a single Intel Optane NVMe device (555K hipri=1 vs 525K hipri=0). Ming has seen better improvement while testing within a VM: dm-linear: hipri=1 vs hipri=0 15~20% iops improvement dm-stripe: hipri=1 vs hipri=0 ~30% iops improvement I'd like to merge these changes via the DM tree when the 5.18 merge window opens. The first block patch that adds ->poll_bio to block_device_operations will need review so that I can take it through the DM tree. Reason for going through the DM tree is there have been some fairly extensive changes queued in dm-5.18 that build on for-5.18/block. So I think it easiest to just add the block depenency via DM tree since DM is first consumer of ->poll_bio FYI, Ming does have another DM patch [4] that looks to avoid using hlist but I only just saw it. bio_split() _is_ involved (see dm_split_and_process_bio) so I'm not exactly sure where he is going with that change. But that is DM-implementation detail that we'll sort out. Big thing is we need approval for the first block patch to go to Linus via the DM tree ;) Thanks, Mike [1] https://github.com/ming1/linux/commits/my_v5.18-dm-bio-poll [2] https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.18 [3] https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.18-biopoll [4] https://github.com/ming1/linux/commit/c107c30e15041ac1ce672f56809961406e2a3e52 v5: remove WARN_ONs in ->poll_bio interface patch. Fixed comment typo along the way (found while seeing how other block_device_operations are referenced in block's code comments). Ming Lei (2): block: add ->poll_bio to block_device_operations dm: support bio polling block/blk-core.c | 14 +++-- block/genhd.c | 4 ++ drivers/md/dm-core.h | 2 + drivers/md/dm-table.c | 27 + drivers/md/dm.c| 150 - include/linux/blkdev.h | 2 + 6 files changed, 191 insertions(+), 8 deletions(-) -- 2.15.0 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v5 2/2] dm: support bio polling
From: Ming Lei Support bio(REQ_POLLED) polling in the following approach: 1) only support io polling on normal READ/WRITE, and other abnormal IOs still fallback to IRQ mode, so the target io is exactly inside the dm io. 2) hold one refcnt on io->io_count after submitting this dm bio with REQ_POLLED 3) support dm native bio splitting, any dm io instance associated with current bio will be added into one list which head is bio->bi_end_io which will be recovered before ending this bio 4) implement .poll_bio() callback, call bio_poll() on the single target bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call dm_io_dec_pending() after the target io is done in .poll_bio() 5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL, which is based on Jeffle's previous patch. Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer --- drivers/md/dm-core.h | 2 + drivers/md/dm-table.c | 27 + drivers/md/dm.c | 150 +- 3 files changed, 176 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index 8078b6c155ef..b3d1429fba83 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -235,6 +235,8 @@ struct dm_io { bool start_io_acct:1; int was_accounted; unsigned long start_time; + void *saved_bio_end_io; + struct hlist_node node; spinlock_t endio_lock; struct dm_stats_aux stats_aux; /* last member of dm_target_io is 'struct bio' */ diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index f4ed756ab391..c0be4f60b427 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1481,6 +1481,14 @@ struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector) return >targets[(KEYS_PER_NODE * n) + k]; } +static int device_not_poll_capable(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) +{ + struct request_queue *q = bdev_get_queue(dev->bdev); + + return !test_bit(QUEUE_FLAG_POLL, >queue_flags); +} + /* * type->iterate_devices() should be called when the sanity check needs to * iterate and check all underlying data devices. iterate_devices() will @@ -1531,6 +1539,11 @@ static int count_device(struct dm_target *ti, struct dm_dev *dev, return 0; } +static int dm_table_supports_poll(struct dm_table *t) +{ + return !dm_table_any_dev_attr(t, device_not_poll_capable, NULL); +} + /* * Check whether a table has no data devices attached using each * target's iterate_devices method. @@ -2067,6 +2080,20 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, dm_update_crypto_profile(q, t); disk_update_readahead(t->md->disk); + /* +* Check for request-based device is left to +* dm_mq_init_request_queue()->blk_mq_init_allocated_queue(). +* +* For bio-based device, only set QUEUE_FLAG_POLL when all +* underlying devices supporting polling. +*/ + if (__table_type_bio_based(t->type)) { + if (dm_table_supports_poll(t)) + blk_queue_flag_set(QUEUE_FLAG_POLL, q); + else + blk_queue_flag_clear(QUEUE_FLAG_POLL, q); + } + return 0; } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 454d39bc7745..c28d453e9930 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -40,6 +40,13 @@ #define DM_COOKIE_ENV_VAR_NAME "DM_COOKIE" #define DM_COOKIE_LENGTH 24 +/* + * For REQ_POLLED fs bio, this flag is set if we link mapped underlying + * dm_io into one list, and reuse bio->bi_end_io as the list head. Before + * ending this fs bio, we will recover its ->bi_end_io callback. + */ +#define REQ_DM_POLL_LIST REQ_DRV + static const char *_name = DM_NAME; static unsigned int major = 0; @@ -73,6 +80,7 @@ struct clone_info { struct dm_io *io; sector_t sector; unsigned sector_count; + bool submit_as_polled; }; #define DM_TARGET_IO_BIO_OFFSET (offsetof(struct dm_target_io, clone)) @@ -599,6 +607,9 @@ static struct bio *alloc_tio(struct clone_info *ci, struct dm_target *ti, if (!clone) return NULL; + /* REQ_DM_POLL_LIST shouldn't be inherited */ + clone->bi_opf &= ~REQ_DM_POLL_LIST; + tio = clone_to_tio(clone); tio->inside_dm_io = false; } @@ -888,8 +899,15 @@ void dm_io_dec_pending(struct dm_io *io, blk_status_t error) if (unlikely(wq_has_sleeper(>wait))) wake_up(>wait); - if (io_error == BLK_STS_DM_REQUEUE) + if (io_error == BLK_STS_DM_REQUEUE) { + /* +* Upper layer won't help us poll split bio, io->orig_bio +* may only reflect a
[dm-devel] [PATCH v5 1/2] block: add ->poll_bio to block_device_operations
From: Ming Lei Prepare for supporting IO polling for bio-based driver. Add ->poll_bio callback so that bio-based driver can provide their own logic for polling bio. Also fix ->submit_bio_bio typo in comment block above __submit_bio_noacct. Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer --- block/blk-core.c | 14 +- block/genhd.c | 4 include/linux/blkdev.h | 2 ++ 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 94bf37f8e61d..ce08f0aa9dfc 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -708,7 +708,7 @@ static void __submit_bio(struct bio *bio) * * bio_list_on_stack[0] contains bios submitted by the current ->submit_bio. * bio_list_on_stack[1] contains bios that were submitted before the current - * ->submit_bio_bio, but that haven't been processed yet. + * ->submit_bio, but that haven't been processed yet. */ static void __submit_bio_noacct(struct bio *bio) { @@ -975,7 +975,7 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags) { struct request_queue *q = bdev_get_queue(bio->bi_bdev); blk_qc_t cookie = READ_ONCE(bio->bi_cookie); - int ret; + int ret = 0; if (cookie == BLK_QC_T_NONE || !test_bit(QUEUE_FLAG_POLL, >queue_flags)) @@ -985,10 +985,14 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags) if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT)) return 0; - if (WARN_ON_ONCE(!queue_is_mq(q))) - ret = 0;/* not yet implemented, should not happen */ - else + if (queue_is_mq(q)) { ret = blk_mq_poll(q, cookie, iob, flags); + } else { + struct gendisk *disk = q->disk; + + if (disk && disk->fops->poll_bio) + ret = disk->fops->poll_bio(bio, iob, flags); + } blk_queue_exit(q); return ret; } diff --git a/block/genhd.c b/block/genhd.c index e351fac41bf2..1ed46a6f94f5 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -410,6 +410,10 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk, struct device *ddev = disk_to_dev(disk); int ret; + /* Only makes sense for bio-based to set ->poll_bio */ + if (queue_is_mq(disk->queue) && disk->fops->poll_bio) + return -EINVAL; + /* * The disk queue should now be all set with enough information about * the device for the elevator code to pick an adequate default diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f757f9c2871f..51f1b1ddbed2 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1455,6 +1455,8 @@ enum blk_unique_id { struct block_device_operations { void (*submit_bio)(struct bio *bio); + int (*poll_bio)(struct bio *bio, struct io_comp_batch *iob, + unsigned int flags); int (*open) (struct block_device *, fmode_t); void (*release) (struct gendisk *, fmode_t); int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int); -- 2.15.0 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v4 0/2] block/dm: support bio polling
On Fri, Mar 04, 2022 at 04:26:21PM -0500, Mike Snitzer wrote: > Hi, > > I've rebased Ming's latest [1] ontop of dm-5.18 [2] (which is based on > for-5.18/block). End result available in dm-5.18-biopoll branch [3] > > These changes add bio polling support to DM. Tested with linear and > striped DM targets. > > IOPS improvement was ~5% on my baremetal system with a single Intel > Optane NVMe device (555K hipri=1 vs 525K hipri=0). > > Ming has seen better improvement while testing within a VM: > dm-linear: hipri=1 vs hipri=0 15~20% iops improvement > dm-stripe: hipri=1 vs hipri=0 ~30% iops improvement > > I'd like to merge these changes via the DM tree when the 5.18 merge > window opens. The first block patch that adds ->poll_bio to > block_device_operations will need review so that I can take it > through the DM tree. Reason for going through the DM tree is there > have been some fairly extensive changes queued in dm-5.18 that build > on for-5.18/block. So I think it easiest to just add the block > depenency via DM tree since DM is first consumer of ->poll_bio > > FYI, Ming does have another DM patch [4] that looks to avoid using > hlist but I only just saw it. bio_split() _is_ involved (see > dm_split_and_process_bio) so I'm not exactly sure where he is going > with that change. io_uring(polling) workloads often cares latency, so big IO request isn't involved usually, I guess. Then bio_split() is seldom called in dm_split_and_process_bio(), such as if 4k random IO is run on dm-linear or dm-stripe via io_uring, bio_split() won't be run into. Single list is enough here, and efficient than hlist, just need a little care to delete element from the list since linux kernel doesn't have generic single list implementation. > But that is DM-implementation detail that we'll > sort out. Yeah, that patch also needs more test. Thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v4 1/2] block: add ->poll_bio to block_device_operations
On Fri, Mar 04 2022 at 4:39P -0500, Jens Axboe wrote: > On 3/4/22 2:26 PM, Mike Snitzer wrote: > > diff --git a/block/blk-core.c b/block/blk-core.c > > index 94bf37f8e61d..e739c6264331 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -985,10 +985,16 @@ int bio_poll(struct bio *bio, struct io_comp_batch > > *iob, unsigned int flags) > > > > if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT)) > > return 0; > > - if (WARN_ON_ONCE(!queue_is_mq(q))) > > - ret = 0;/* not yet implemented, should not happen */ > > - else > > + if (queue_is_mq(q)) { > > ret = blk_mq_poll(q, cookie, iob, flags); > > + } else { > > + struct gendisk *disk = q->disk; > > + > > + if (disk && disk->fops->poll_bio) > > + ret = disk->fops->poll_bio(bio, iob, flags); > > + else > > + ret = !WARN_ON_ONCE(1); > > This is an odd way to do it, would be a lot more readable as > > ret = 0; > WARN_ON_ONCE(1); > > if we even need that WARN_ON? Would be a pretty glaring oversight for a bio-based driver developer to forget to define ->poll_bio but remember to clear BLK_QC_T_NONE in bio->bi_cookie and set QUEUE_FLAG_POLL in queue flags. Silent failure it is! ;) > > diff --git a/block/genhd.c b/block/genhd.c > > index e351fac41bf2..eb43fa63ba47 100644 > > --- a/block/genhd.c > > +++ b/block/genhd.c > > @@ -410,6 +410,8 @@ int __must_check device_add_disk(struct device *parent, > > struct gendisk *disk, > > struct device *ddev = disk_to_dev(disk); > > int ret; > > > > + WARN_ON_ONCE(queue_is_mq(disk->queue) && disk->fops->poll_bio); > > Also seems kind of useless, maybe at least combine it with failing to > add the disk. This is a "I'm developing some new driver or feature" > failure, and would be more visible that way. And if you do that, then > the WARN_ON_ONCE() seems pointless anyway, and I'd just do: > > if (queue_is_mq(disk->queue) && disk->fops->poll_bio) > return -EINVAL; > > or something like that, with a comment saying why that doesn't make any > sense. Absolutely. The thought did cross my mind that it seemed WARN_ON heavy. Will fix it up and send v5. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v4 1/2] block: add ->poll_bio to block_device_operations
On 3/4/22 2:26 PM, Mike Snitzer wrote: > diff --git a/block/blk-core.c b/block/blk-core.c > index 94bf37f8e61d..e739c6264331 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -985,10 +985,16 @@ int bio_poll(struct bio *bio, struct io_comp_batch > *iob, unsigned int flags) > > if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT)) > return 0; > - if (WARN_ON_ONCE(!queue_is_mq(q))) > - ret = 0;/* not yet implemented, should not happen */ > - else > + if (queue_is_mq(q)) { > ret = blk_mq_poll(q, cookie, iob, flags); > + } else { > + struct gendisk *disk = q->disk; > + > + if (disk && disk->fops->poll_bio) > + ret = disk->fops->poll_bio(bio, iob, flags); > + else > + ret = !WARN_ON_ONCE(1); This is an odd way to do it, would be a lot more readable as ret = 0; WARN_ON_ONCE(1); if we even need that WARN_ON? > diff --git a/block/genhd.c b/block/genhd.c > index e351fac41bf2..eb43fa63ba47 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -410,6 +410,8 @@ int __must_check device_add_disk(struct device *parent, > struct gendisk *disk, > struct device *ddev = disk_to_dev(disk); > int ret; > > + WARN_ON_ONCE(queue_is_mq(disk->queue) && disk->fops->poll_bio); Also seems kind of useless, maybe at least combine it with failing to add the disk. This is a "I'm developing some new driver or feature" failure, and would be more visible that way. And if you do that, then the WARN_ON_ONCE() seems pointless anyway, and I'd just do: if (queue_is_mq(disk->queue) && disk->fops->poll_bio) return -EINVAL; or something like that, with a comment saying why that doesn't make any sense. -- Jens Axboe -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v4 1/2] block: add ->poll_bio to block_device_operations
From: Ming Lei Prepare for supporting IO polling for bio based driver. Add ->poll_bio callback so that bio driver can provide their own logic for polling bio. Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer --- block/blk-core.c | 12 +--- block/genhd.c | 2 ++ include/linux/blkdev.h | 2 ++ 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 94bf37f8e61d..e739c6264331 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -985,10 +985,16 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags) if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT)) return 0; - if (WARN_ON_ONCE(!queue_is_mq(q))) - ret = 0;/* not yet implemented, should not happen */ - else + if (queue_is_mq(q)) { ret = blk_mq_poll(q, cookie, iob, flags); + } else { + struct gendisk *disk = q->disk; + + if (disk && disk->fops->poll_bio) + ret = disk->fops->poll_bio(bio, iob, flags); + else + ret = !WARN_ON_ONCE(1); + } blk_queue_exit(q); return ret; } diff --git a/block/genhd.c b/block/genhd.c index e351fac41bf2..eb43fa63ba47 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -410,6 +410,8 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk, struct device *ddev = disk_to_dev(disk); int ret; + WARN_ON_ONCE(queue_is_mq(disk->queue) && disk->fops->poll_bio); + /* * The disk queue should now be all set with enough information about * the device for the elevator code to pick an adequate default diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f757f9c2871f..51f1b1ddbed2 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1455,6 +1455,8 @@ enum blk_unique_id { struct block_device_operations { void (*submit_bio)(struct bio *bio); + int (*poll_bio)(struct bio *bio, struct io_comp_batch *iob, + unsigned int flags); int (*open) (struct block_device *, fmode_t); void (*release) (struct gendisk *, fmode_t); int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int); -- 2.15.0 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v4 0/2] block/dm: support bio polling
Hi, I've rebased Ming's latest [1] ontop of dm-5.18 [2] (which is based on for-5.18/block). End result available in dm-5.18-biopoll branch [3] These changes add bio polling support to DM. Tested with linear and striped DM targets. IOPS improvement was ~5% on my baremetal system with a single Intel Optane NVMe device (555K hipri=1 vs 525K hipri=0). Ming has seen better improvement while testing within a VM: dm-linear: hipri=1 vs hipri=0 15~20% iops improvement dm-stripe: hipri=1 vs hipri=0 ~30% iops improvement I'd like to merge these changes via the DM tree when the 5.18 merge window opens. The first block patch that adds ->poll_bio to block_device_operations will need review so that I can take it through the DM tree. Reason for going through the DM tree is there have been some fairly extensive changes queued in dm-5.18 that build on for-5.18/block. So I think it easiest to just add the block depenency via DM tree since DM is first consumer of ->poll_bio FYI, Ming does have another DM patch [4] that looks to avoid using hlist but I only just saw it. bio_split() _is_ involved (see dm_split_and_process_bio) so I'm not exactly sure where he is going with that change. But that is DM-implementation detail that we'll sort out. Big thing is we need approval for the first block patch to go to Linus with the DM tree ;) Thanks, Mike [1] https://github.com/ming1/linux/commits/my_v5.18-dm-bio-poll [2] https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.18 [3] https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.18-biopoll [4] https://github.com/ming1/linux/commit/c107c30e15041ac1ce672f56809961406e2a3e52 Ming Lei (2): block: add ->poll_bio to block_device_operations dm: support bio polling block/blk-core.c | 12 +++- block/genhd.c | 2 + drivers/md/dm-core.h | 2 + drivers/md/dm-table.c | 27 + drivers/md/dm.c| 150 - include/linux/blkdev.h | 2 + 6 files changed, 189 insertions(+), 6 deletions(-) -- 2.15.0 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v4 2/2] dm: support bio polling
From: Ming Lei Support bio(REQ_POLLED) polling in the following approach: 1) only support io polling on normal READ/WRITE, and other abnormal IOs still fallback to IRQ mode, so the target io is exactly inside the dm io. 2) hold one refcnt on io->io_count after submitting this dm bio with REQ_POLLED 3) support dm native bio splitting, any dm io instance associated with current bio will be added into one list which head is bio->bi_end_io which will be recovered before ending this bio 4) implement .poll_bio() callback, call bio_poll() on the single target bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call dm_io_dec_pending() after the target io is done in .poll_bio() 5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL, which is based on Jeffle's previous patch. Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer --- drivers/md/dm-core.h | 2 + drivers/md/dm-table.c | 27 + drivers/md/dm.c | 150 +- 3 files changed, 176 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index 8078b6c155ef..b3d1429fba83 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -235,6 +235,8 @@ struct dm_io { bool start_io_acct:1; int was_accounted; unsigned long start_time; + void *saved_bio_end_io; + struct hlist_node node; spinlock_t endio_lock; struct dm_stats_aux stats_aux; /* last member of dm_target_io is 'struct bio' */ diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index f4ed756ab391..c0be4f60b427 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1481,6 +1481,14 @@ struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector) return >targets[(KEYS_PER_NODE * n) + k]; } +static int device_not_poll_capable(struct dm_target *ti, struct dm_dev *dev, + sector_t start, sector_t len, void *data) +{ + struct request_queue *q = bdev_get_queue(dev->bdev); + + return !test_bit(QUEUE_FLAG_POLL, >queue_flags); +} + /* * type->iterate_devices() should be called when the sanity check needs to * iterate and check all underlying data devices. iterate_devices() will @@ -1531,6 +1539,11 @@ static int count_device(struct dm_target *ti, struct dm_dev *dev, return 0; } +static int dm_table_supports_poll(struct dm_table *t) +{ + return !dm_table_any_dev_attr(t, device_not_poll_capable, NULL); +} + /* * Check whether a table has no data devices attached using each * target's iterate_devices method. @@ -2067,6 +2080,20 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, dm_update_crypto_profile(q, t); disk_update_readahead(t->md->disk); + /* +* Check for request-based device is left to +* dm_mq_init_request_queue()->blk_mq_init_allocated_queue(). +* +* For bio-based device, only set QUEUE_FLAG_POLL when all +* underlying devices supporting polling. +*/ + if (__table_type_bio_based(t->type)) { + if (dm_table_supports_poll(t)) + blk_queue_flag_set(QUEUE_FLAG_POLL, q); + else + blk_queue_flag_clear(QUEUE_FLAG_POLL, q); + } + return 0; } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 454d39bc7745..c28d453e9930 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -40,6 +40,13 @@ #define DM_COOKIE_ENV_VAR_NAME "DM_COOKIE" #define DM_COOKIE_LENGTH 24 +/* + * For REQ_POLLED fs bio, this flag is set if we link mapped underlying + * dm_io into one list, and reuse bio->bi_end_io as the list head. Before + * ending this fs bio, we will recover its ->bi_end_io callback. + */ +#define REQ_DM_POLL_LIST REQ_DRV + static const char *_name = DM_NAME; static unsigned int major = 0; @@ -73,6 +80,7 @@ struct clone_info { struct dm_io *io; sector_t sector; unsigned sector_count; + bool submit_as_polled; }; #define DM_TARGET_IO_BIO_OFFSET (offsetof(struct dm_target_io, clone)) @@ -599,6 +607,9 @@ static struct bio *alloc_tio(struct clone_info *ci, struct dm_target *ti, if (!clone) return NULL; + /* REQ_DM_POLL_LIST shouldn't be inherited */ + clone->bi_opf &= ~REQ_DM_POLL_LIST; + tio = clone_to_tio(clone); tio->inside_dm_io = false; } @@ -888,8 +899,15 @@ void dm_io_dec_pending(struct dm_io *io, blk_status_t error) if (unlikely(wq_has_sleeper(>wait))) wake_up(>wait); - if (io_error == BLK_STS_DM_REQUEUE) + if (io_error == BLK_STS_DM_REQUEUE) { + /* +* Upper layer won't help us poll split bio, io->orig_bio +* may only reflect a
[dm-devel] [PATCH 05/10] dm-integrity: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size. Signed-off-by: Christoph Hellwig --- drivers/md/dm-integrity.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c index eb4b5e52bd6ff..c58a5111cb575 100644 --- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -1788,12 +1788,11 @@ static void integrity_metadata(struct work_struct *w) checksums_ptr - checksums, dio->op == REQ_OP_READ ? TAG_CMP : TAG_WRITE); if (unlikely(r)) { if (r > 0) { - char b[BDEVNAME_SIZE]; sector_t s; s = sector - ((r + ic->tag_size - 1) / ic->tag_size); - DMERR_LIMIT("%s: Checksum failed at sector 0x%llx", - bio_devname(bio, b), s); + DMERR_LIMIT("%pg: Checksum failed at sector 0x%llx", + bio->bi_bdev, s); r = -EILSEQ; atomic64_inc(>number_of_mismatches); dm_audit_log_bio(DM_MSG_PREFIX, "integrity-checksum", -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 07/10] raid1: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size. Signed-off-by: Christoph Hellwig --- drivers/md/raid1.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index c180c188da574..97574575ad0b4 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2069,15 +2069,14 @@ static int fix_sync_read_error(struct r1bio *r1_bio) } while (!success && d != r1_bio->read_disk); if (!success) { - char b[BDEVNAME_SIZE]; int abort = 0; /* Cannot read from anywhere, this block is lost. * Record a bad block on each device. If that doesn't * work just disable and interrupt the recovery. * Don't fail devices as that won't really help. */ - pr_crit_ratelimited("md/raid1:%s: %s: unrecoverable I/O read error for block %llu\n", - mdname(mddev), bio_devname(bio, b), + pr_crit_ratelimited("md/raid1:%s: %pg: unrecoverable I/O read error for block %llu\n", + mdname(mddev), bio->bi_bdev, (unsigned long long)r1_bio->sector); for (d = 0; d < conf->raid_disks * 2; d++) { rdev = conf->mirrors[d].rdev; -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 08/10] raid5-ppl: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size. Signed-off-by: Christoph Hellwig --- drivers/md/raid5-ppl.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c index 93d9364a930e3..845db0ba7c17f 100644 --- a/drivers/md/raid5-ppl.c +++ b/drivers/md/raid5-ppl.c @@ -416,12 +416,10 @@ static void ppl_log_endio(struct bio *bio) static void ppl_submit_iounit_bio(struct ppl_io_unit *io, struct bio *bio) { - char b[BDEVNAME_SIZE]; - - pr_debug("%s: seq: %llu size: %u sector: %llu dev: %s\n", + pr_debug("%s: seq: %llu size: %u sector: %llu dev: %pg\n", __func__, io->seq, bio->bi_iter.bi_size, (unsigned long long)bio->bi_iter.bi_sector, -bio_devname(bio, b)); +bio->bi_bdev); submit_bio(bio); } @@ -589,9 +587,8 @@ static void ppl_flush_endio(struct bio *bio) struct ppl_log *log = io->log; struct ppl_conf *ppl_conf = log->ppl_conf; struct r5conf *conf = ppl_conf->mddev->private; - char b[BDEVNAME_SIZE]; - pr_debug("%s: dev: %s\n", __func__, bio_devname(bio, b)); + pr_debug("%s: dev: %pg\n", __func__, bio->bi_bdev); if (bio->bi_status) { struct md_rdev *rdev; @@ -634,7 +631,6 @@ static void ppl_do_flush(struct ppl_io_unit *io) if (bdev) { struct bio *bio; - char b[BDEVNAME_SIZE]; bio = bio_alloc_bioset(bdev, 0, GFP_NOIO, REQ_OP_WRITE | REQ_PREFLUSH, @@ -642,8 +638,7 @@ static void ppl_do_flush(struct ppl_io_unit *io) bio->bi_private = io; bio->bi_end_io = ppl_flush_endio; - pr_debug("%s: dev: %s\n", __func__, -bio_devname(bio, b)); + pr_debug("%s: dev: %ps\n", __func__, bio->bi_bdev); submit_bio(bio); flushed_disks++; -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 06/10] md-multipath: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size. Signed-off-by: Christoph Hellwig --- drivers/md/md-multipath.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c index c056a7d707b09..bc38a6133cda3 100644 --- a/drivers/md/md-multipath.c +++ b/drivers/md/md-multipath.c @@ -294,7 +294,6 @@ static void multipathd(struct md_thread *thread) md_check_recovery(mddev); for (;;) { - char b[BDEVNAME_SIZE]; spin_lock_irqsave(>device_lock, flags); if (list_empty(head)) break; @@ -306,13 +305,13 @@ static void multipathd(struct md_thread *thread) bio->bi_iter.bi_sector = mp_bh->master_bio->bi_iter.bi_sector; if ((mp_bh->path = multipath_map (conf))<0) { - pr_err("multipath: %s: unrecoverable IO read error for block %llu\n", - bio_devname(bio, b), + pr_err("multipath: %pg: unrecoverable IO read error for block %llu\n", + bio->bi_bdev, (unsigned long long)bio->bi_iter.bi_sector); multipath_end_bh_io(mp_bh, BLK_STS_IOERR); } else { - pr_err("multipath: %s: redirecting sector %llu to another IO path\n", - bio_devname(bio, b), + pr_err("multipath: %pg: redirecting sector %llu to another IO path\n", + bio->bi_bdev, (unsigned long long)bio->bi_iter.bi_sector); *bio = *(mp_bh->master_bio); bio->bi_iter.bi_sector += -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 02/10] block: remove handle_bad_sector
Use the %pg format specifier instead of the stack hungry bdevname function, and remove handle_bad_sector given that it is not pointless. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 17 + 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 34e1b7fdb7c89..4d858fc08f8ba 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -540,17 +540,6 @@ bool blk_get_queue(struct request_queue *q) } EXPORT_SYMBOL(blk_get_queue); -static void handle_bad_sector(struct bio *bio, sector_t maxsector) -{ - char b[BDEVNAME_SIZE]; - - pr_info_ratelimited("%s: attempt to access beyond end of device\n" - "%s: rw=%d, want=%llu, limit=%llu\n", - current->comm, - bio_devname(bio, b), bio->bi_opf, - bio_end_sector(bio), maxsector); -} - #ifdef CONFIG_FAIL_MAKE_REQUEST static DECLARE_FAULT_ATTR(fail_make_request); @@ -612,7 +601,11 @@ static inline int bio_check_eod(struct bio *bio) if (nr_sectors && maxsector && (nr_sectors > maxsector || bio->bi_iter.bi_sector > maxsector - nr_sectors)) { - handle_bad_sector(bio, maxsector); + pr_info_ratelimited("%s: attempt to access beyond end of device\n" + "%pg: rw=%d, want=%llu, limit=%llu\n", + current->comm, + bio->bi_bdev, bio->bi_opf, + bio_end_sector(bio), maxsector); return -EIO; } return 0; -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] remove bio_devname
Hi Jens, this series removes the bio_devname helper and just switches all users to use the %pg format string directly. Diffstat block/bio.c |6 -- block/blk-core.c | 25 +++-- drivers/block/pktcdvd.c |9 + drivers/md/dm-crypt.c | 10 -- drivers/md/dm-integrity.c |5 ++--- drivers/md/md-multipath.c |9 - drivers/md/raid1.c|5 ++--- drivers/md/raid5-ppl.c| 13 - fs/ext4/page-io.c |5 ++--- include/linux/bio.h |2 -- 10 files changed, 26 insertions(+), 63 deletions(-) -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 10/10] block: remove bio_devname
All callers are gone, so remove this wrapper. Signed-off-by: Christoph Hellwig --- block/bio.c | 6 -- include/linux/bio.h | 2 -- 2 files changed, 8 deletions(-) diff --git a/block/bio.c b/block/bio.c index b15f5466ce084..151cace2dbe16 100644 --- a/block/bio.c +++ b/block/bio.c @@ -807,12 +807,6 @@ int bio_init_clone(struct block_device *bdev, struct bio *bio, } EXPORT_SYMBOL(bio_init_clone); -const char *bio_devname(struct bio *bio, char *buf) -{ - return bdevname(bio->bi_bdev, buf); -} -EXPORT_SYMBOL(bio_devname); - /** * bio_full - check if the bio is full * @bio: bio to check diff --git a/include/linux/bio.h b/include/linux/bio.h index 7523aba4ddf7c..4c21f6e69e182 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -491,8 +491,6 @@ static inline void bio_release_pages(struct bio *bio, bool mark_dirty) __bio_release_pages(bio, mark_dirty); } -extern const char *bio_devname(struct bio *bio, char *buffer); - #define bio_dev(bio) \ disk_devt((bio)->bi_bdev->bd_disk) -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 01/10] block: fix and cleanup bio_check_ro
Don't use a WARN_ON when printing a potentially user triggered condition. Also don't print the partno when the block device name already includes it, and use the %pg specifier to simplify printing the block device name. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 94bf37f8e61d2..34e1b7fdb7c89 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -580,14 +580,10 @@ late_initcall(fail_make_request_debugfs); static inline bool bio_check_ro(struct bio *bio) { if (op_is_write(bio_op(bio)) && bdev_read_only(bio->bi_bdev)) { - char b[BDEVNAME_SIZE]; - if (op_is_flush(bio->bi_opf) && !bio_sectors(bio)) return false; - - WARN_ONCE(1, - "Trying to write to read-only block-device %s (partno %d)\n", - bio_devname(bio, b), bio->bi_bdev->bd_partno); + pr_warn("Trying to write to read-only block-device %pg\n", + bio->bi_bdev); /* Older lvm-tools actually trigger this */ return false; } -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 03/10] pktcdvd: remove a pointless debug check in pkt_submit_bio
->queuedata is set up in pkt_init_queue, so it can't be NULL here. Signed-off-by: Christoph Hellwig --- drivers/block/pktcdvd.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 623df4141ff3f..a52bbedd2f33d 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -2400,18 +2400,11 @@ static void pkt_make_request_write(struct request_queue *q, struct bio *bio) static void pkt_submit_bio(struct bio *bio) { - struct pktcdvd_device *pd; - char b[BDEVNAME_SIZE]; + struct pktcdvd_device *pd = bio->bi_bdev->bd_disk->queue->queuedata; struct bio *split; blk_queue_split(); - pd = bio->bi_bdev->bd_disk->queue->queuedata; - if (!pd) { - pr_err("%s incorrect request queue\n", bio_devname(bio, b)); - goto end_io; - } - pkt_dbg(2, pd, "start = %6llx stop = %6llx\n", (unsigned long long)bio->bi_iter.bi_sector, (unsigned long long)bio_end_sector(bio)); -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 09/10] ext4: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size. Signed-off-by: Christoph Hellwig --- fs/ext4/page-io.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 1253982268730..18373fa529225 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -323,10 +323,9 @@ static void ext4_end_bio(struct bio *bio) { ext4_io_end_t *io_end = bio->bi_private; sector_t bi_sector = bio->bi_iter.bi_sector; - char b[BDEVNAME_SIZE]; - if (WARN_ONCE(!io_end, "io_end is NULL: %s: sector %Lu len %u err %d\n", - bio_devname(bio, b), + if (WARN_ONCE(!io_end, "io_end is NULL: %pg: sector %Lu len %u err %d\n", + bio->bi_bdev, (long long) bio->bi_iter.bi_sector, (unsigned) bio_sectors(bio), bio->bi_status)) { -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 04/10] dm-crypt: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size. Signed-off-by: Christoph Hellwig --- drivers/md/dm-crypt.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index a5006cb6ee8ad..e2b0af4a2ee84 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -1364,11 +1364,10 @@ static int crypt_convert_block_aead(struct crypt_config *cc, } if (r == -EBADMSG) { - char b[BDEVNAME_SIZE]; sector_t s = le64_to_cpu(*sector); - DMERR_LIMIT("%s: INTEGRITY AEAD ERROR, sector %llu", - bio_devname(ctx->bio_in, b), s); + DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu", + ctx->bio_in->bi_bdev, s); dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead", ctx->bio_in, s, 0); } @@ -2169,11 +2168,10 @@ static void kcryptd_async_done(struct crypto_async_request *async_req, error = cc->iv_gen_ops->post(cc, org_iv_of_dmreq(cc, dmreq), dmreq); if (error == -EBADMSG) { - char b[BDEVNAME_SIZE]; sector_t s = le64_to_cpu(*org_sector_of_dmreq(cc, dmreq)); - DMERR_LIMIT("%s: INTEGRITY AEAD ERROR, sector %llu", - bio_devname(ctx->bio_in, b), s); + DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu", + ctx->bio_in->bi_bdev, s); dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead", ctx->bio_in, s, 0); io->error = BLK_STS_PROTECTION; -- 2.30.2 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel