On Tue, May 14, 2019 at 07:14:41AM +0200, Christoph Hellwig wrote:
> On Tue, May 14, 2019 at 12:36:43PM +0800, Ming Lei wrote:
> > > > Some workloads need this optimization, please see 729204ef49ec00b
> > > > ("block: relax check on sg gap"):
> > >
> > > And we still allow to merge the segments with this patch. The only
> > > difference is that these merges do get accounted as extra segments.
> >
> > It is easy for .nr_phys_segments to reach the max segment limit by this
> > way, then no new bio can be merged any more.
>
> As said in my other mail we only decremented it for request merges
> in the non-gap case before and no one complained.
However we still may make it better, for example, the attached patch can
save 10~20% time when running 'mkfs.ntfs -s 512 /dev/vda', lots of small
request(63KB) can be merged to big IO(1MB).
>
> > We don't consider segment merge between two bios in ll_new_hw_segment(),
> > in my mkfs test over virtio-blk, request size can be increased to
> > ~1M(several
> > segments) from 63k(126 bios/segments) easily if the segment merge between
> > two bios is considered.
>
> With the gap devices we have unlimited segment size, see my next patch
> to actually enforce that. Which is much more efficient than using
But this patch does effect on non-gap device, and actually most of
device are non-gap type.
> multiple segments. Also instead of hacking up the merge path even more
> we can fix the block device buffered I/O path to submit large I/Os
> instead of relying on merging like we do in the direct I/O code and every
> major file system. I have that on my plate as a todo list item.
>
> > > We do that in a couple of places. For one the nvme single segment
> > > optimization that triggered this bug. Also for range discard support
> > > in nvme and virtio. Then we have loop that iterate the segments, but
> > > doesn't use the nr_phys_segments count, and plenty of others that
> > > iterate over pages at the moment but should be iterating bvecs,
> > > e.g. ubd or aoe.
> >
> > Seems discard segment doesn't consider bios merge for nvme and virtio,
> > so it should be fine in this way. Will take a close look at nvme/virtio
> > discard segment merge later.
>
> I found the bio case by looking at doing the proper accounting in the
> bio merge path and hitting KASAN warning due to the range kmalloc.
> So that issue is real as well.
I guess the following patch may fix it:
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index f1d90cd3dc47..a913110de389 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -175,7 +175,7 @@ static int virtblk_add_req(struct virtqueue *vq, struct
virtblk_req *vbr,
static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap)
{
- unsigned short segments = blk_rq_nr_discard_segments(req);
+ unsigned short segments = 0;
unsigned short n = 0;
struct virtio_blk_discard_write_zeroes *range;
struct bio *bio;
@@ -184,6 +184,9 @@ static int virtblk_setup_discard_write_zeroes(struct
request *req, bool unmap)
if (unmap)
flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP;
+ __rq_for_each_bio(bio, req)
+ segments++;
+
range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC);
if (!range)
return -ENOMEM;
Thanks,
Ming
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 21e87a714a73..af122efeb28d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -622,6 +622,8 @@ static inline int ll_new_hw_segment(struct request_queue *q,
struct bio *bio)
{
int nr_phys_segs = bio_phys_segments(q, bio);
+ unsigned int seg_size =
+ req->biotail->bi_seg_back_size + bio->bi_seg_front_size;
if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(q))
goto no_merge;
@@ -629,11 +631,15 @@ static inline int ll_new_hw_segment(struct request_queue
*q,
if (blk_integrity_merge_bio(q, req, bio) == false)
goto no_merge;
- /*
- * This will form the start of a new hw segment. Bump both
- * counters.
- */
- req->nr_phys_segments += nr_phys_segs;
+ if (blk_phys_contig_segment(q, req->biotail, bio)) {
+ if (nr_phys_segs)
+ req->nr_phys_segments += nr_phys_segs - 1;
+ if (nr_phys_segs == 1)
+ bio->bi_seg_back_size = seg_size;
+ if (req->nr_phys_segments == 1)
+ req->bio->bi_seg_front_size = seg_size;
+ } else
+ req->nr_phys_segments += nr_phys_segs;
return 1;
no_merge:
--
2.20.1