RE: [PATCH v9] virtio_blk: add discard and write zeroes support
What's the status of this patch ? anybody pulled it for the branch ? > -Original Message- > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Friday, November 2, 2018 12:18 PM > To: Daniel Verkamp > Cc: virtualization@lists.linux-foundation.org; linux-bl...@vger.kernel.org; > Michael S. Tsirkin ; Jason Wang ; > Jens Axboe ; Paolo Bonzini ; > Christoph Hellwig ; Liu, Changpeng > > Subject: Re: [PATCH v9] virtio_blk: add discard and write zeroes support > > On Thu, Nov 01, 2018 at 03:40:35PM -0700, Daniel Verkamp wrote: > > From: Changpeng Liu > > > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio > > block specification has been extended to add VIRTIO_BLK_T_DISCARD and > > VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for > > discard and write zeroes in the virtio-blk driver when the device > > advertises the corresponding features, VIRTIO_BLK_F_DISCARD and > > VIRTIO_BLK_F_WRITE_ZEROES. > > > > Signed-off-by: Changpeng Liu > > Signed-off-by: Daniel Verkamp > > --- > > dverkamp: I've picked up this patch and made a few minor changes (as > > listed below); most notably, I changed the kmalloc back to GFP_ATOMIC, > > since it can be called from a context where sleeping is not allowed. > > To prevent large allocations, I've also clamped the maximum number of > > discard segments to 256; this results in a 4K allocation and should be > > plenty of descriptors for most use cases. > > > > I also removed most of the description from the commit message, since it > > was duplicating the comments from virtio_blk.h and quoting parts of the > > spec without adding any extra information. I have tested this iteration > > of the patch using crosvm with modifications to enable the new features: > > https://chromium.googlesource.com/chromiumos/platform/crosvm/ > > > > v9 fixes a number of review issues; I didn't attempt to optimize the > > single-element write zeroes case, so it still does an allocation per > > request (I did not see any easy place to put the payload that would > > avoid the allocation). > > > > CHANGELOG: > > v9: [dverkamp] fix LE types in discard struct; cleanups from Ming Lei > > v8: [dverkamp] replace shifts by 9 with SECTOR_SHIFT constant > > v7: [dverkamp] use GFP_ATOMIC for allocation that may not sleep; clarify > > descriptor flags field; comment wording cleanups. > > v6: don't set T_OUT bit to discard and write zeroes commands. > > v5: use new block layer API: blk_queue_flag_set. > > v4: several optimizations based on MST's comments, remove bit field > > usage for command descriptor. > > v3: define the virtio-blk protocol to add discard and write zeroes > > support, first version implementation based on proposed specification. > > v2: add write zeroes command support. > > v1: initial proposal implementation for discard command. > > --- > > drivers/block/virtio_blk.c | 83 - > > include/uapi/linux/virtio_blk.h | 54 + > > 2 files changed, 135 insertions(+), 2 deletions(-) > > Reviewed-by: Stefan Hajnoczi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v8] virtio_blk: add discard and write zeroes support
> -Original Message- > From: Christoph Hellwig [mailto:h...@infradead.org] > Sent: Friday, October 26, 2018 4:27 PM > To: Paolo Bonzini > Cc: Christoph Hellwig ; Daniel Verkamp > ; Jens Axboe ; Michael S. Tsirkin > ; virtualization@lists.linux-foundation.org; linux- > bl...@vger.kernel.org; Stefan Hajnoczi ; Liu, Changpeng > > Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support > > On Fri, Oct 26, 2018 at 01:28:54AM +0200, Paolo Bonzini wrote: > > On 15/10/2018 11:27, Christoph Hellwig wrote: > > > There is some issues in this spec. For one using the multiple ranges > > > also for write zeroes is rather inefficient. Write zeroes really should > > > use the same format as read and write. > > > > What makes it inefficient? > > We require a memory allocation for each write zeroes instead of encoding > the lba/len in the command. Make sense to me, but need to change the spec first. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v8] virtio_blk: add discard and write zeroes support
> -Original Message- > From: Daniel Verkamp [mailto:dverk...@chromium.org] > Sent: Tuesday, October 16, 2018 7:16 AM > To: h...@infradead.org > Cc: virtualization@lists.linux-foundation.org; linux-bl...@vger.kernel.org; > m...@redhat.com; jasow...@redhat.com; ax...@kernel.dk; > stefa...@redhat.com; Liu, Changpeng > Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support > > On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig wrote: > > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote: > > > From: Changpeng Liu > > > > > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio > > > > There is some issues in this spec. For one using the multiple ranges > > also for write zeroes is rather inefficient. Write zeroes really should > > use the same format as read and write. > > I wasn't involved in the writing of the spec, so I'll defer to Michael > and Changpeng here, but I'm not sure how "set in stone" the virtio > specification is, or if it can be updated somehow without breaking > compatibility. > > I agree that Write Zeroes would be simpler to implement as a single > LBA + length rather than a list. However, it's not really possible to > use the same format as the regular virtio block read/write commands > (VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do > not specify a length explicitly; length is implied by the length of > the data buffer as defined by the virtio descriptor, but a Write > Zeroes command does not require a data buffer. At best, this could be > a separate command mirroring the layout of struct virtio_blk_req but > with data replaced with a length field; I'm not sure that buys much in > the way of consistency. Yeah, that's the consideration here. > > > Second the unmap flag isn't properly specified at all, as nothing > > says the device may not unmap without the unmap flag. Please take > > a look at the SCSI or NVMe ѕpec for some guidance. > > This could probably use some clarifying text in the specification, but > given that there is nothing in the spec describing what the device > needs to do when unmap = 0, I would assume that the device can do > whatever it likes, as long as the blocks read back as 0s afterwards. > Reading back 0s is required by the definition of the Write Zeroes > command in the same virtio spec change. It would probably be good to > clarify this and explicitly define what the device is allowed to do in > response to both settings of the unmap bit. > > My understanding of the corresponding feature in NVMe (the Deallocate > bit in the Write Zeroes command) is that the only difference between > Deallocate = 1 and 0 is that the device "should" versus "may" (no > "shall" on either side) deallocate the corresponding blocks, but only > if the device supports reading 0s back after blocks are deallocated. > If the device does not support reading 0s after deallocation, it is > not allowed to deallocate blocks as part of a Write Zeroes command > regardless of the setting of the Deallocate bit. > > Some similar wording could probably be added to the virtio spec to > clarify the meaning of unmap, although I would prefer something that > makes it a little clearer that the bit is only intended as a hint from > the driver to indicate whether the device should attempt to keep > storage allocated for the zeroed blocks, if that is indeed the > intended behavior. Yes, that's the original idea. Adding a clear description to the specification may be better. > > Is there some in-kernel doc that describes what behavior the Linux > block layer needs from a write zeroes command? > > > > +static inline int virtblk_setup_discard_write_zeroes(struct request *req, > > > + bool unmap) > > > > Why is this an inline function? > > I don't think there's any reason it needs to be inline; I can drop the > inline in the next revision. > > Given (as far as I can tell) your concerns seem to apply to the Write > Zeroes command specifically, would it be reasonable to start with a > patch that just adds support for the Discard command (along with fixes > for Ming's feedback)? This would be sufficient for my particular use > case (although I can't speak for Changpeng), and we can revisit Write > Zeroes once the spec concerns are worked out. > > Thanks, > -- Daniel ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v8] virtio_blk: add discard and write zeroes support
> -Original Message- > From: Christoph Hellwig [mailto:h...@infradead.org] > Sent: Monday, October 15, 2018 5:28 PM > To: Daniel Verkamp > Cc: virtualization@lists.linux-foundation.org; linux-bl...@vger.kernel.org; > Michael S. Tsirkin ; Jason Wang ; > Jens Axboe ; Stefan Hajnoczi ; Liu, > Changpeng > Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support > > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote: > > From: Changpeng Liu > > > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio > > There is some issues in this spec. For one using the multiple ranges > also for write zeroes is rather inefficient. Write zeroes really should > use the same format as read and write. Because there is no length parameter for virtio block specification, adding the two extra commands will not break the existing specification and driver implementation. Also existing Linux implementation for write zeroes will not use multiple segment at all so there is always one range in practice. > > Second the unmap flag isn't properly specified at all, as nothing > says the device may not unmap without the unmap flag. Please take > a look at the SCSI or NVMe ѕpec for some guidance. The unmap flag is only used for write zeroes command, as discard command will not guarantee the spaces will be zeroed, so adding this flag means (Discard + Write Zeroes), so this definitely is backend related, the backend implementation can use same code to implement discard and write zeroes commands. > > > +static inline int virtblk_setup_discard_write_zeroes(struct request *req, > > + bool unmap) > > Why is this an inline function? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
> -Original Message- > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Friday, June 8, 2018 6:20 PM > To: Liu, Changpeng > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com; > jasow...@redhat.com; pbonz...@redhat.com; Wang, Wei W > > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands > support > > On Thu, Jun 07, 2018 at 11:07:06PM +, Liu, Changpeng wrote: > > > > > > > -Original Message- > > > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > > > Sent: Thursday, June 7, 2018 9:10 PM > > > To: Liu, Changpeng > > > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com; > > > jasow...@redhat.com; pbonz...@redhat.com; Wang, Wei W > > > > > > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands > > > support > > > > > > On Wed, Jun 06, 2018 at 12:19:00PM +0800, Changpeng Liu wrote: > > > > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands > > > > support, this will impact the performance when using SSD backend over > > > > file systems. > > > > > > > > Commit 88c85538 "virtio-blk: add discard and write zeroes features to > > > > specification"(see https://github.com/oasis-tcs/virtio-spec) extended > > > > existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES > > > > commands support. > > > > > > > > While here, using 16 bytes descriptor to describe one segment of DISCARD > > > > or WRITE ZEROES commands, each command may contain one or more > > > decriptors. > > > > > > > > The following data structure shows the definition of one descriptor: > > > > > > > > struct virtio_blk_discard_write_zeroes { > > > > le64 sector; > > > > le32 num_sectors; > > > > le32 unmap; > > > > }; > > > > > > > > Field 'sector' means the start sector for DISCARD and WRITE ZEROES, > > > > filed 'num_sectors' means the number of sectors for DISCARD and WRITE > > > > ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap' > > > > maybe used for WRITE ZEROES command with DISCARD enabled. > > > > > > > > We also extended the virtio-blk configuration space to let backend > > > > device put DISCARD and WRITE ZEROES configuration parameters. > > > > > > > > struct virtio_blk_config { > > > > [...] > > > > > > > > le32 max_discard_sectors; > > > > le32 max_discard_seg; > > > > le32 discard_sector_alignment; > > > > le32 max_write_zeroes_sectors; > > > > le32 max_write_zeroes_seg; > > > > u8 write_zeroes_may_unmap; > > > > } > > > > > > > > New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard > > > > command, maximum discard sectors size in field 'max_discard_sectors' and > > > > maximum discard segment number in field 'max_discard_seg'. > > > > > > > > New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write > > > > zeroes command, maximum write zeroes sectors size in field > > > > 'max_write_zeroes_sectors' and maximum write zeroes segment number in > > > > field 'max_write_zeroes_seg'. > > > > > > > > The parameters in the configuration space of the device field > > > > 'max_discard_sectors' and field 'discard_sector_alignment' are > > > > expressed in > > > > 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. > > > > The > > > > field 'max_write_zeroes_sectors' is expressed in 512-byte units if the > > > > VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated. > > > > > > > > Signed-off-by: Changpeng Liu > > > > --- > > > > CHANGELOG: > > > > v6: don't set T_OUT bit to discard and write zeroes commands. > > > > > > I don't see this in the patch... > > Yeah, do noting with DISCARD/WRITE ZEROES means no need to OR BLK_T_OUT > again. > > > > > > > @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct > > > blk_mq_hw_ctx *hctx, > > > > int qid = hctx->queue_num; > &
RE: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
> -Original Message- > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Thursday, June 7, 2018 9:10 PM > To: Liu, Changpeng > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com; > jasow...@redhat.com; pbonz...@redhat.com; Wang, Wei W > > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands > support > > On Wed, Jun 06, 2018 at 12:19:00PM +0800, Changpeng Liu wrote: > > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands > > support, this will impact the performance when using SSD backend over > > file systems. > > > > Commit 88c85538 "virtio-blk: add discard and write zeroes features to > > specification"(see https://github.com/oasis-tcs/virtio-spec) extended > > existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES > > commands support. > > > > While here, using 16 bytes descriptor to describe one segment of DISCARD > > or WRITE ZEROES commands, each command may contain one or more > decriptors. > > > > The following data structure shows the definition of one descriptor: > > > > struct virtio_blk_discard_write_zeroes { > > le64 sector; > > le32 num_sectors; > > le32 unmap; > > }; > > > > Field 'sector' means the start sector for DISCARD and WRITE ZEROES, > > filed 'num_sectors' means the number of sectors for DISCARD and WRITE > > ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap' > > maybe used for WRITE ZEROES command with DISCARD enabled. > > > > We also extended the virtio-blk configuration space to let backend > > device put DISCARD and WRITE ZEROES configuration parameters. > > > > struct virtio_blk_config { > > [...] > > > > le32 max_discard_sectors; > > le32 max_discard_seg; > > le32 discard_sector_alignment; > > le32 max_write_zeroes_sectors; > > le32 max_write_zeroes_seg; > > u8 write_zeroes_may_unmap; > > } > > > > New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard > > command, maximum discard sectors size in field 'max_discard_sectors' and > > maximum discard segment number in field 'max_discard_seg'. > > > > New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write > > zeroes command, maximum write zeroes sectors size in field > > 'max_write_zeroes_sectors' and maximum write zeroes segment number in > > field 'max_write_zeroes_seg'. > > > > The parameters in the configuration space of the device field > > 'max_discard_sectors' and field 'discard_sector_alignment' are expressed in > > 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The > > field 'max_write_zeroes_sectors' is expressed in 512-byte units if the > > VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated. > > > > Signed-off-by: Changpeng Liu > > --- > > CHANGELOG: > > v6: don't set T_OUT bit to discard and write zeroes commands. > > I don't see this in the patch... Yeah, do noting with DISCARD/WRITE ZEROES means no need to OR BLK_T_OUT again. > > > @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct > blk_mq_hw_ctx *hctx, > > int qid = hctx->queue_num; > > int err; > > bool notify = false; > > + bool unmap = false; > > u32 type; > > > > BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); > > @@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct > blk_mq_hw_ctx *hctx, > > case REQ_OP_FLUSH: > > type = VIRTIO_BLK_T_FLUSH; > > break; > > + case REQ_OP_DISCARD: > > + type = VIRTIO_BLK_T_DISCARD; > > + break; > > + case REQ_OP_WRITE_ZEROES: > > + type = VIRTIO_BLK_T_WRITE_ZEROES; > > + unmap = !(req->cmd_flags & REQ_NOUNMAP); > > + break; > > case REQ_OP_SCSI_IN: > > case REQ_OP_SCSI_OUT: > > type = VIRTIO_BLK_T_SCSI_CMD; > > @@ -256,6 +299,12 @@ static blk_status_t virtio_queue_rq(struct > blk_mq_hw_ctx *hctx, > > > > blk_mq_start_request(req); > > > > + if (type == VIRTIO_BLK_T_DISCARD || type == > VIRTIO_BLK_T_WRITE_ZEROES) { > > + err = virtblk_setup_discard_write_zeroes(req, unmap); > > + if (err) > > + return BLK_STS_RESOURCE; > > + } > > + > > num = blk_rq_map_sg(hctx->queue, req, vbr->sg); > > if (num) { > > if (rq_data_dir(req) == WRITE) > > ...since we still do blk_rq_map_sg() here and num should be != 0. No, while here, we should keep the original logic for READ/WRITE commands. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v4] virtio_blk: add DISCARD and WRIET ZEROES commands support
> -Original Message- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Monday, June 4, 2018 6:03 PM > To: Liu, Changpeng ; Stefan Hajnoczi > > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com; > jasow...@redhat.com; Wang, Wei W > Subject: Re: [PATCH v4] virtio_blk: add DISCARD and WRIET ZEROES commands > support > > On 04/06/2018 06:14, Liu, Changpeng wrote: > >>> But I believe the specification says VIRTIO_BLK_T_OUT means direction, so > >>> OR the two bits together should compliance with the specification. > >> I cannot find that in the specification: > >> > >> > >> http://docs.oasis-open.org/virtio/virtio/v1.0/cs04/virtio-v1.0-cs04.html#x1- > >> 2020002 > >> > >> and it would contradict the "The type of the request is either ... or > >> ..." wording that I quoted from the spec above. > >> > >> If you do find something in the spec, please let me know and we can > >> figure out how to make the spec consistent. > > > > I saw comments from file linux/usr/include/uapi/linux/virtio_blk.h, which > > says > > VIRTIO_BLK_T_OUT may be combined with other commands and means direction, > > the specification does not have such description. > > I don't think it is in the specification indeed (however, 11 and 13 were > chosen so that VIRTIO_BLK_T_OUT can still indicate direction). Correct, we don't need to OR VIRTIO_BLK_T_OUT again for DISCARD and WRITE ZEROES commands. I'll remove it in next patch set. > > Paolo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v4] virtio_blk: add DISCARD and WRIET ZEROES commands support
> -Original Message- > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Friday, June 1, 2018 5:00 PM > To: Liu, Changpeng > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com; > jasow...@redhat.com; pbonz...@redhat.com; Wang, Wei W > > Subject: Re: [PATCH v4] virtio_blk: add DISCARD and WRIET ZEROES commands > support > > On Thu, May 31, 2018 at 11:53:26PM +, Liu, Changpeng wrote: > > > > > > > -Original Message- > > > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > > > Sent: Thursday, May 31, 2018 6:31 PM > > > To: Liu, Changpeng > > > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com; > > > jasow...@redhat.com; pbonz...@redhat.com; Wang, Wei W > > > > > > Subject: Re: [PATCH v4] virtio_blk: add DISCARD and WRIET ZEROES commands > > > support > > > > > > On Tue, May 29, 2018 at 09:42:24AM +0800, Changpeng Liu wrote: > > > > num = blk_rq_map_sg(hctx->queue, req, vbr->sg); > > > > if (num) { > > > > - if (rq_data_dir(req) == WRITE) > > > > + if (rq_data_dir(req) == WRITE || type == > > > > VIRTIO_BLK_T_DISCARD > > > || > > > > + type == VIRTIO_BLK_T_WRITE_ZEROES) > > > > vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, > > > VIRTIO_BLK_T_OUT); > > > > > > The VIRTIO specification says: > > > > > > The type of the request is either a read (VIRTIO_BLK_T_IN), a write > > > (VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes > > > (VIRTIO_BLK_T_WRITE_ZEROES) or a flush (VIRTIO_BLK_T_FLUSH). > > > > > > But this patch actually uses VIRTIO_BLK_T_DISCARD | VIRTIO_BLK_T_OUT or > > > VIRTIO_BLK_T_WRITE_ZEROES | VIRTIO_BLK_T_OUT. "either A or B" is > > > exclusive, it does not mean "A and B". > > Hi Stefan, > > > > For the new add DISCARD and WRITE ZEROES commands, I defined the > > type code to 11 and 13, so the last bit always is 1, there is no difference > > in practice. > > Then the code is misleading. This is clearer: > > if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) > /* Do nothing, type already set */ > else if (rq_data_dir(req) == WRITE) > vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_OUT); > ... This change sounds good to me, will change the patch accordingly. > > > But I believe the specification says VIRTIO_BLK_T_OUT means direction, so > > OR the two bits together should compliance with the specification. > > I cannot find that in the specification: > > http://docs.oasis-open.org/virtio/virtio/v1.0/cs04/virtio-v1.0-cs04.html#x1- > 2020002 > > and it would contradict the "The type of the request is either ... or > ..." wording that I quoted from the spec above. > > If you do find something in the spec, please let me know and we can > figure out how to make the spec consistent. I saw comments from file linux/usr/include/uapi/linux/virtio_blk.h, which says VIRTIO_BLK_T_OUT may be combined with other commands and means direction, the specification does not have such description. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v4] virtio_blk: add DISCARD and WRIET ZEROES commands support
> -Original Message- > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Thursday, May 31, 2018 6:31 PM > To: Liu, Changpeng > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com; > jasow...@redhat.com; pbonz...@redhat.com; Wang, Wei W > > Subject: Re: [PATCH v4] virtio_blk: add DISCARD and WRIET ZEROES commands > support > > On Tue, May 29, 2018 at 09:42:24AM +0800, Changpeng Liu wrote: > > num = blk_rq_map_sg(hctx->queue, req, vbr->sg); > > if (num) { > > - if (rq_data_dir(req) == WRITE) > > + if (rq_data_dir(req) == WRITE || type == VIRTIO_BLK_T_DISCARD > || > > + type == VIRTIO_BLK_T_WRITE_ZEROES) > > vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, > VIRTIO_BLK_T_OUT); > > The VIRTIO specification says: > > The type of the request is either a read (VIRTIO_BLK_T_IN), a write > (VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes > (VIRTIO_BLK_T_WRITE_ZEROES) or a flush (VIRTIO_BLK_T_FLUSH). > > But this patch actually uses VIRTIO_BLK_T_DISCARD | VIRTIO_BLK_T_OUT or > VIRTIO_BLK_T_WRITE_ZEROES | VIRTIO_BLK_T_OUT. "either A or B" is > exclusive, it does not mean "A and B". Hi Stefan, For the new add DISCARD and WRITE ZEROES commands, I defined the type code to 11 and 13, so the last bit always is 1, there is no difference in practice. But I believe the specification says VIRTIO_BLK_T_OUT means direction, so OR the two bits together should compliance with the specification. > > Can you clarify whether the spec needs to be changed or what the purpose > of ORing in the VIRTIO_BLK_T_OUT bit is? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v3] virtio_blk: add DISCARD and WRIET ZEROES command support
> -Original Message- > From: Michael S. Tsirkin [mailto:m...@redhat.com] > Sent: Thursday, March 29, 2018 6:07 PM > To: Liu, Changpeng > Cc: virtualization@lists.linux-foundation.org; cav...@redhat.com; > stefa...@redhat.com; jasow...@redhat.com; pbonz...@redhat.com; > Wodkowski, PawelX ; Harris, James R > ; fabio.miranda.mart...@canonical.com > Subject: Re: [PATCH v3] virtio_blk: add DISCARD and WRIET ZEROES command > support > > On Fri, Mar 30, 2018 at 08:49:34AM +0800, Changpeng Liu wrote: > > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES > > command support, this will impact the performance when using SSD > > backend over file systems. > > > > The idea here is using 16 Bytes payload as one descriptor for > > DISCARD/WRITE ZEROES command, users can put several ranges into > > one command, for the purpose to support such feature, two feature > > flags VIRTIO_BLK_F_DISCARD/VIRTIO_BLK_F_WRITE_ZEROES and two > > commands VIRTIO_BLK_T_DISCARD/VIRTIO_BLK_T_WRITE_ZEROES are > > introduced, and some parameters are added to the configuration > > space to tell the OS the granularity of DISCARD/WRITE ZEROES > > commands. > > Pls fix grammar in this comment, I am not sure what are you > trying to say. Okay, will add more description here. > > > > > The specification change list here: > > https://github.com/oasis-tcs/virtio-spec > > Which commit? > > > CHANGELOG: > > v3: finalized the specification change. > > Changelog belongs after -- and should be complete > including changes v1 to v2. > > > Signed-off-by: Changpeng Liu > > --- > > drivers/block/virtio_blk.c | 96 > +++-- > > include/uapi/linux/virtio_blk.h | 39 + > > 2 files changed, 132 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > > index 4a07593c..1943adb 100644 > > --- a/drivers/block/virtio_blk.c > > +++ b/drivers/block/virtio_blk.c > > @@ -172,10 +172,53 @@ static int virtblk_add_req(struct virtqueue *vq, > > struct > virtblk_req *vbr, > > return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); > > } > > > > +static inline int virtblk_setup_discard_write_zeroes(struct request *req, > > bool > unmap) > > +{ > > + unsigned short segments = blk_rq_nr_discard_segments(req), n = 0; > > Split on two lines pls: > unsigned short n = 0; > > > + u32 block_size = queue_logical_block_size(req->q); > > Seems to be unused except for the sanity check below. Why? > > > + struct virtio_blk_discard_write_zeroes *range; > > + struct bio *bio; > > + > > + if (block_size < 512 || !block_size) > > Why 512? And when is it 0? Ok, actually this check is not necessary, will remove it. > > > + return -1; > > -1 isn't a normal errno code. > > > + > > + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); > > This might be pretty large: with 64K segments it looks like you are > trying to allocate ~1Mbyte with GFP_ATOMIC which is unlikely to succeed. > Can we split this up in chunks? This is already chunks now, if the backend can only support 1 segment, so the number is always 1, I can change the GFP_ATOMIC with GFP_KERNEL. > > > + if (!range) > > + return -1; > > + > > + __rq_for_each_bio(bio, req) { > > + u64 sector = bio->bi_iter.bi_sector; > > + u32 num_sectors = bio->bi_iter.bi_size >> 9; > > why 9? The sectors in virtio-blk protocol and Linux is expressed by 512-bytes. > > > + > > + range[n].flags.unmap = cpu_to_le32(unmap); > > + range[n].flags.reserved = cpu_to_le32(0); > > + range[n].num_sectors = cpu_to_le32(num_sectors); > > + range[n].sector = cpu_to_le64(sector); > > Isn't this causing sparse warnings? No warning when I complied the module. > > > > + n++; > > + } > > + > > + if (WARN_ON_ONCE(n != segments)) { > > and when does this happen? This check shouldn't happen too, will remove it. > > > + kfree(range); > > + return -1; > > + } > > + > > + req->special_vec.bv_page = virt_to_page(range); > > + req->special_vec.bv_offset = offset_in_page(range); > > + req->special_vec.bv_len = sizeof(*range) * segments; > > + req->rq_flags |= RQF_SPECIAL_PAYLOAD; > > + > > + return 0; > > +} > > + > &
RE: [PATCH v2] virtio-blk: add DISCARD support to virtio-blk driver
> -Original Message- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Tuesday, July 4, 2017 5:24 PM > To: Liu, Changpeng ; virtualization@lists.linux- > foundation.org > Cc: stefa...@gmail.com; h...@lst.de; m...@redhat.com > Subject: Re: [PATCH v2] virtio-blk: add DISCARD support to virtio-blk driver > > > > On 05/07/2017 10:44, Changpeng Liu wrote: > > Currently virtio-blk driver does not provide discard feature flag, so the > > filesystems which built on top of the block device will not send discard > > command. This is okay for HDD backend, but it will impact the performance > > for SSD backend. > > > > Add a feature flag VIRTIO_BLK_F_DISCARD and command > VIRTIO_BLK_T_DISCARD > > to extend exist virtio-blk protocol, define 16 bytes discard descriptor > > for each discard segment, the discard segment defination aligns with > > SCSI or NVM Express protocols, virtio-blk driver will support multi-range > > discard request as well. > > > > Signed-off-by: Changpeng Liu > > Please include a patch for the specification. Since we are at it, I Thanks Paolo, do you mean include a text file which describe the changes for the specification? > would like to have three operations defined using the same descriptor: > > - discard (SCSI UNMAP) > > - write zeroes (SCSI WRITE SAME without UNMAP flag) > > - write zeroes and possibly discard (SCSI WRITE SAME with UNMAP flag) > > The last two can use the same command VIRTIO_BLK_T_WRITE_ZEROES, using > the reserved field as a flags field. Will add write zeroes feature. > > Paolo > > > --- > > drivers/block/virtio_blk.c | 76 > +++-- > > include/uapi/linux/virtio_blk.h | 19 +++ > > 2 files changed, 92 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > > index 0297ad7..8f0c614 100644 > > --- a/drivers/block/virtio_blk.c > > +++ b/drivers/block/virtio_blk.c > > @@ -172,10 +172,52 @@ static int virtblk_add_req(struct virtqueue *vq, > > struct > virtblk_req *vbr, > > return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); > > } > > > > +static inline int virtblk_setup_discard(struct request *req) > > +{ > > + unsigned short segments = blk_rq_nr_discard_segments(req), n = 0; > > + u32 block_size = queue_logical_block_size(req->q); > > + struct virtio_blk_discard *range; > > + struct bio *bio; > > + > > + if (block_size < 512 || !block_size) > > + return -1; > > + > > + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); > > + if (!range) > > + return -1; > > + > > + __rq_for_each_bio(bio, req) { > > + u64 slba = (bio->bi_iter.bi_sector << 9) / block_size; > > + u32 nlb = bio->bi_iter.bi_size / block_size; > > + > > + range[n].reserved = cpu_to_le32(0); > > + range[n].nlba = cpu_to_le32(nlb); > > + range[n].slba = cpu_to_le64(slba); > > + n++; > > + } > > + > > + if (WARN_ON_ONCE(n != segments)) { > > + kfree(range); > > + return -1; > > + } > > + > > + req->special_vec.bv_page = virt_to_page(range); > > + req->special_vec.bv_offset = offset_in_page(range); > > + req->special_vec.bv_len = sizeof(*range) * segments; > > + req->rq_flags |= RQF_SPECIAL_PAYLOAD; > > + > > + return 0; > > +} > > + > > static inline void virtblk_request_done(struct request *req) > > { > > struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); > > > > + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) { > > + kfree(page_address(req->special_vec.bv_page) + > > + req->special_vec.bv_offset); > > + } > > + > > switch (req_op(req)) { > > case REQ_OP_SCSI_IN: > > case REQ_OP_SCSI_OUT: > > @@ -237,6 +279,9 @@ static blk_status_t virtio_queue_rq(struct > blk_mq_hw_ctx *hctx, > > case REQ_OP_FLUSH: > > type = VIRTIO_BLK_T_FLUSH; > > break; > > + case REQ_OP_DISCARD: > > + type = VIRTIO_BLK_T_DISCARD; > > + break; > > case REQ_OP_SCSI_IN: > > case REQ_OP_SCSI_OUT: > > type = VIRTIO_BLK_T_SCSI_CMD; > > @@ -256,9 +301,15 @@ static blk_status_t virtio_queue_rq(struct > blk_mq_hw_ctx *hctx, > > > > blk_mq_start_request(req); > > > > + if (type == VIR
RE: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
> -Original Message- > From: Stefan Hajnoczi [mailto:stefa...@gmail.com] > Sent: Tuesday, March 28, 2017 4:20 AM > To: Liu, Changpeng > Cc: virtio-...@lists.oasis-open.org; > virtualization@lists.linux-foundation.org; linux- > ker...@vger.kernel.org; h...@lst.de; qemu-de...@nongnu.org > Subject: Re: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver > > On Tue, Mar 28, 2017 at 04:39:25PM +0800, Changpeng Liu wrote: > > Currently virtio-blk driver does not provide discard feature flag, so the > > filesystems which built on top of the block device will not send discard > > command. This is okay for HDD backend, but it will impact the performance > > for SSD backend. > > > > Add a feature flag VIRTIO_BLK_F_DISCARD and command > VIRTIO_BLK_T_DISCARD > > to extend exist virtio-blk protocol. virtio-blk protocol uses a single > > 8 bytes descriptor containing type,reserved and sector, currently Linux > > uses the reserved field as IO priority, here we also re-use the reserved > > field as number of discard sectors. > > > > Signed-off-by: Changpeng Liu > > --- > > drivers/block/virtio_blk.c | 38 +- > > include/uapi/linux/virtio_blk.h | 12 ++-- > > 2 files changed, 39 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > > index 1d4c9f8..550cfe7 100644 > > --- a/drivers/block/virtio_blk.c > > +++ b/drivers/block/virtio_blk.c > > @@ -241,6 +241,9 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > > case REQ_OP_FLUSH: > > type = VIRTIO_BLK_T_FLUSH; > > break; > > + case REQ_OP_DISCARD: > > + type = VIRTIO_BLK_T_DISCARD; > > + break; > > case REQ_OP_SCSI_IN: > > case REQ_OP_SCSI_OUT: > > type = VIRTIO_BLK_T_SCSI_CMD; > > @@ -256,16 +259,24 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, > > vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, type); > > vbr->out_hdr.sector = type ? > > 0 : cpu_to_virtio64(vblk->vdev, blk_rq_pos(req)); > > - vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(req)); > > + vbr->out_hdr.u.ioprio = cpu_to_virtio32(vblk->vdev, > > req_get_ioprio(req)); > > > > blk_mq_start_request(req); > > > > - num = blk_rq_map_sg(hctx->queue, req, vbr->sg); > > - if (num) { > > - if (rq_data_dir(req) == WRITE) > > - vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, > VIRTIO_BLK_T_OUT); > > - else > > - vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, > VIRTIO_BLK_T_IN); > > + if (type == VIRTIO_BLK_T_DISCARD) { > > + vbr->out_hdr.u.discard_nr_sectors = cpu_to_virtio32(vblk->vdev, > > + > blk_rq_sectors(req)); > > + num = 0; > > + } else { > > + num = blk_rq_map_sg(hctx->queue, req, vbr->sg); > > + if (num) { > > + if (rq_data_dir(req) == WRITE) > > + vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, > > + > VIRTIO_BLK_T_OUT); > > + else > > + vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, > > + > VIRTIO_BLK_T_IN); > > + } > > } > > > > spin_lock_irqsave(&vblk->vqs[qid].lock, flags); > > @@ -775,6 +786,15 @@ static int virtblk_probe(struct virtio_device *vdev) > > if (!err && opt_io_size) > > blk_queue_io_opt(q, blk_size * opt_io_size); > > > > + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { > > + q->limits.discard_zeroes_data = 0; > > + q->limits.discard_alignment = blk_size; > > + q->limits.discard_granularity = blk_size; > > + blk_queue_max_discard_sectors(q, UINT_MAX); > > + blk_queue_max_discard_segments(q, 1); > > + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); > > + } > > Please add configuration space fields for these limits. Looking at the > virtio-scsi block limits code in QEMU's scsi_disk_emulate_inquiry() I > can see that the hypervisor has useful values that it wants to > communicate. They shouldn't be hardcoded to blk_size. Yes, move discard related parameters to configuration space make sense. > > > + > > virtio_device_ready(vdev); > > > > device_add_disk(&vdev->dev, vblk->disk);
RE: [virtio-dev] [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
> -Original Message- > From: virtio-...@lists.oasis-open.org > [mailto:virtio-...@lists.oasis-open.org] On > Behalf Of Paolo Bonzini > Sent: Monday, March 27, 2017 7:34 PM > To: Liu, Changpeng ; virtio-...@lists.oasis-open.org; > virtualization@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > h...@lst.de > Cc: qemu-de...@nongnu.org > Subject: Re: [virtio-dev] [PATCH] virtio-blk: add DISCARD support to > virtio-blk driver > > > > On 28/03/2017 10:39, Changpeng Liu wrote: > > + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { > > + q->limits.discard_zeroes_data = 0; > > Maybe you could use another feature bit to populate discard_zeroes_data. > > Paolo > Sounds good to me, Christoph Hellwig mentioned this field will be removed in next release, just removed this line makes clear. > > + q->limits.discard_alignment = blk_size; > > + q->limits.discard_granularity = blk_size; > > + blk_queue_max_discard_sectors(q, UINT_MAX); > > + blk_queue_max_discard_segments(q, 1); > > + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); > > + } > > + > > - > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
> -Original Message- > From: Christoph Hellwig [mailto:h...@lst.de] > Sent: Monday, March 27, 2017 10:56 PM > To: Liu, Changpeng > Cc: virtio-...@lists.oasis-open.org; > virtualization@lists.linux-foundation.org; linux- > ker...@vger.kernel.org; h...@lst.de; qemu-de...@nongnu.org > Subject: Re: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver > > On Tue, Mar 28, 2017 at 04:39:25PM +0800, Changpeng Liu wrote: > > Currently virtio-blk driver does not provide discard feature flag, so the > > filesystems which built on top of the block device will not send discard > > command. This is okay for HDD backend, but it will impact the performance > > for SSD backend. > > > > Add a feature flag VIRTIO_BLK_F_DISCARD and command > VIRTIO_BLK_T_DISCARD > > to extend exist virtio-blk protocol. virtio-blk protocol uses a single > > 8 bytes descriptor containing type,reserved and sector, currently Linux > > uses the reserved field as IO priority, here we also re-use the reserved > > field as number of discard sectors. > > Do you have a link to the specification for this feature? At least > virtio-v1.0 does not seem to specify a discard feature. Not yet, patch goes first, there are uses who don't want to migrate from virtio-blk to virtio-scsi, so this feature seems reasonable, the question is that how we should define the specification to support this feature. > > Note that Linux 4.11 and later have support for multi-range discard > ala ATA TRIM, SCSI UNMAP and NVMe deallocate which might be useful > here, too. For support multi-range discard features, the DISCARD command must have data_in buffer, similar with SCSI UNMAP and NVMe DEALLOCATE commands, maybe 16 bytes aligned descriptors are required for each DISCARD command. > > > + q->limits.discard_zeroes_data = 0; > > No need to clear this. Also hopefully this field goes away for 4.12 > > > + blk_queue_max_discard_segments(q, 1); > > No need to set this. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization