Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-11-02 Thread Daniel Verkamp
On Thu, Nov 1, 2018 at 2:25 PM Michael S. Tsirkin  wrote:
>
> On Mon, Oct 29, 2018 at 05:05:21AM +, Stefan Hajnoczi wrote:
> > On Fri, Oct 26, 2018 at 10:47:16AM -0400, Michael S. Tsirkin wrote:
> > > On Fri, Oct 26, 2018 at 09:08:38AM +0100, Stefan Hajnoczi wrote:
> > > > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > > > > +   range[n].flags = cpu_to_le32(flags);
> > > > > +   range[n].num_sectors = cpu_to_le32(num_sectors);
> > > > > +   range[n].sector = cpu_to_le64(sector);
> > > > ...
> > > > > +/* Discard/write zeroes range for each request. */
> > > > > +struct virtio_blk_discard_write_zeroes {
> > > > > +   /* discard/write zeroes start sector */
> > > > > +   __virtio64 sector;
> > > > > +   /* number of discard/write zeroes sectors */
> > > > > +   __virtio32 num_sectors;
> > > > > +   /* flags for this range */
> > > > > +   __virtio32 flags;
> > > >
> > > > cpu_to_le32() is being used on __virtio32 fields instead of 
> > > > cpu_to_virtio32().
> > > >
> > > > From include/uapi/linux/virtio_types.h:
> > > >
> > > >   /*
> > > >* __virtio{16,32,64} have the following meaning:
> > > >* - __u{16,32,64} for virtio devices in legacy mode, accessed in 
> > > > native endian
> > > >* - __le{16,32,64} for standard-compliant virtio devices
> > > >*/
> > > >
> > > > From the VIRTIO specification:
> > > >
> > > >   struct virtio_blk_discard_write_zeroes {
> > > >  le64 sector;
> > > >  le32 num_sectors;
> > > >  struct {
> > > >  le32 unmap:1;
> > > >  le32 reserved:31;
> > > >  } flags;
> > > >   };
> > > >
> > > >
> > > > Since the VIRTIO spec says these fields are little-endian, I think these
> > > > fields should be declared just __u32 and __u64 instead of __virtio32 and
> > > > __virtio64.
> > > >
> > > > Stefan
> > >
> > >
> > > __le32/__le64 rather?
> >
> > Yes.
> >
> > Stefan
>
> I agree. And further using bitfields for this is questionable -
> it is preferable to set bits in a full 32 bit field using "|".

The bitfield is only in the specification, not the code - the actual
implementation in this patch (quoted above from earlier in the thread)
uses a 32-bit field with a flag #define.

I did misunderstand the meaning of __virtio32 vs __le32 - I'll fix
that up (I think the spec definition and code for these is already
correct; the structure definition just needs to change to match).

Thanks,
-- Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-11-02 Thread Michael S. Tsirkin
On Mon, Oct 29, 2018 at 05:05:21AM +, Stefan Hajnoczi wrote:
> On Fri, Oct 26, 2018 at 10:47:16AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Oct 26, 2018 at 09:08:38AM +0100, Stefan Hajnoczi wrote:
> > > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > > > +   range[n].flags = cpu_to_le32(flags);
> > > > +   range[n].num_sectors = cpu_to_le32(num_sectors);
> > > > +   range[n].sector = cpu_to_le64(sector);
> > > ...
> > > > +/* Discard/write zeroes range for each request. */
> > > > +struct virtio_blk_discard_write_zeroes {
> > > > +   /* discard/write zeroes start sector */
> > > > +   __virtio64 sector;
> > > > +   /* number of discard/write zeroes sectors */
> > > > +   __virtio32 num_sectors;
> > > > +   /* flags for this range */
> > > > +   __virtio32 flags;
> > > 
> > > cpu_to_le32() is being used on __virtio32 fields instead of 
> > > cpu_to_virtio32().
> > > 
> > > From include/uapi/linux/virtio_types.h:
> > > 
> > >   /*
> > >* __virtio{16,32,64} have the following meaning:
> > >* - __u{16,32,64} for virtio devices in legacy mode, accessed in 
> > > native endian
> > >* - __le{16,32,64} for standard-compliant virtio devices
> > >*/
> > > 
> > > From the VIRTIO specification:
> > > 
> > >   struct virtio_blk_discard_write_zeroes {
> > >  le64 sector;
> > >  le32 num_sectors;
> > >  struct {
> > >  le32 unmap:1;
> > >  le32 reserved:31;
> > >  } flags;
> > >   };
> > > 
> > > 
> > > Since the VIRTIO spec says these fields are little-endian, I think these
> > > fields should be declared just __u32 and __u64 instead of __virtio32 and
> > > __virtio64.
> > > 
> > > Stefan
> > 
> > 
> > __le32/__le64 rather?
> 
> Yes.
> 
> Stefan

I agree. And further using bitfields for this is questionable -
it is preferable to set bits in a full 32 bit field using "|".


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-29 Thread Paolo Bonzini
On 26/10/2018 10:26, Christoph Hellwig wrote:
> On Fri, Oct 26, 2018 at 01:28:54AM +0200, Paolo Bonzini wrote:
>> On 15/10/2018 11:27, Christoph Hellwig wrote:
>>> There is some issues in this spec.  For one using the multiple ranges
>>> also for write zeroes is rather inefficient.  Write zeroes really should
>>> use the same format as read and write.
>>
>> What makes it inefficient?
> 
> We require a memory allocation for each write zeroes instead of encoding
> the lba/len in the command.

Oh, I see.  That's not a spec issue, the lba/length descriptor can be
included in the same buffer as the rest of the command; using
kmalloc_array even for a single-bio REQ_OP_WRITE_ZEROES is a choice made
by this patch, I suppose for simplicity.

It is possible to special case single-bio unmap and write zeroes so that
they don't call virtblk_setup_discard_write_zeroes and avoid
RQF_SPECIAL_PAYLOAD.

Thanks,

Paolo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-28 Thread Stefan Hajnoczi
On Fri, Oct 26, 2018 at 10:47:16AM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 26, 2018 at 09:08:38AM +0100, Stefan Hajnoczi wrote:
> > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > > + range[n].flags = cpu_to_le32(flags);
> > > + range[n].num_sectors = cpu_to_le32(num_sectors);
> > > + range[n].sector = cpu_to_le64(sector);
> > ...
> > > +/* Discard/write zeroes range for each request. */
> > > +struct virtio_blk_discard_write_zeroes {
> > > + /* discard/write zeroes start sector */
> > > + __virtio64 sector;
> > > + /* number of discard/write zeroes sectors */
> > > + __virtio32 num_sectors;
> > > + /* flags for this range */
> > > + __virtio32 flags;
> > 
> > cpu_to_le32() is being used on __virtio32 fields instead of 
> > cpu_to_virtio32().
> > 
> > From include/uapi/linux/virtio_types.h:
> > 
> >   /*
> >* __virtio{16,32,64} have the following meaning:
> >* - __u{16,32,64} for virtio devices in legacy mode, accessed in native 
> > endian
> >* - __le{16,32,64} for standard-compliant virtio devices
> >*/
> > 
> > From the VIRTIO specification:
> > 
> >   struct virtio_blk_discard_write_zeroes {
> >  le64 sector;
> >  le32 num_sectors;
> >  struct {
> >  le32 unmap:1;
> >  le32 reserved:31;
> >  } flags;
> >   };
> > 
> > 
> > Since the VIRTIO spec says these fields are little-endian, I think these
> > fields should be declared just __u32 and __u64 instead of __virtio32 and
> > __virtio64.
> > 
> > Stefan
> 
> 
> __le32/__le64 rather?

Yes.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-28 Thread Liu, Changpeng



> -Original Message-
> From: Christoph Hellwig [mailto:h...@infradead.org]
> Sent: Friday, October 26, 2018 4:27 PM
> To: Paolo Bonzini 
> Cc: Christoph Hellwig ; Daniel Verkamp
> ; Jens Axboe ; Michael S. Tsirkin
> ; virtualization@lists.linux-foundation.org; linux-
> bl...@vger.kernel.org; Stefan Hajnoczi ; Liu, Changpeng
> 
> Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support
> 
> On Fri, Oct 26, 2018 at 01:28:54AM +0200, Paolo Bonzini wrote:
> > On 15/10/2018 11:27, Christoph Hellwig wrote:
> > > There is some issues in this spec.  For one using the multiple ranges
> > > also for write zeroes is rather inefficient.  Write zeroes really should
> > > use the same format as read and write.
> >
> > What makes it inefficient?
> 
> We require a memory allocation for each write zeroes instead of encoding
> the lba/len in the command.
Make sense to me, but need to change the spec first.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-26 Thread Michael S. Tsirkin
On Fri, Oct 26, 2018 at 09:08:38AM +0100, Stefan Hajnoczi wrote:
> On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > +   range[n].flags = cpu_to_le32(flags);
> > +   range[n].num_sectors = cpu_to_le32(num_sectors);
> > +   range[n].sector = cpu_to_le64(sector);
> ...
> > +/* Discard/write zeroes range for each request. */
> > +struct virtio_blk_discard_write_zeroes {
> > +   /* discard/write zeroes start sector */
> > +   __virtio64 sector;
> > +   /* number of discard/write zeroes sectors */
> > +   __virtio32 num_sectors;
> > +   /* flags for this range */
> > +   __virtio32 flags;
> 
> cpu_to_le32() is being used on __virtio32 fields instead of cpu_to_virtio32().
> 
> From include/uapi/linux/virtio_types.h:
> 
>   /*
>* __virtio{16,32,64} have the following meaning:
>* - __u{16,32,64} for virtio devices in legacy mode, accessed in native 
> endian
>* - __le{16,32,64} for standard-compliant virtio devices
>*/
> 
> From the VIRTIO specification:
> 
>   struct virtio_blk_discard_write_zeroes {
>  le64 sector;
>  le32 num_sectors;
>  struct {
>  le32 unmap:1;
>  le32 reserved:31;
>  } flags;
>   };
> 
> 
> Since the VIRTIO spec says these fields are little-endian, I think these
> fields should be declared just __u32 and __u64 instead of __virtio32 and
> __virtio64.
> 
> Stefan


__le32/__le64 rather?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-26 Thread Christoph Hellwig
On Fri, Oct 26, 2018 at 01:28:54AM +0200, Paolo Bonzini wrote:
> On 15/10/2018 11:27, Christoph Hellwig wrote:
> > There is some issues in this spec.  For one using the multiple ranges
> > also for write zeroes is rather inefficient.  Write zeroes really should
> > use the same format as read and write.
> 
> What makes it inefficient?

We require a memory allocation for each write zeroes instead of encoding
the lba/len in the command.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-26 Thread Stefan Hajnoczi
On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> + range[n].flags = cpu_to_le32(flags);
> + range[n].num_sectors = cpu_to_le32(num_sectors);
> + range[n].sector = cpu_to_le64(sector);
...
> +/* Discard/write zeroes range for each request. */
> +struct virtio_blk_discard_write_zeroes {
> + /* discard/write zeroes start sector */
> + __virtio64 sector;
> + /* number of discard/write zeroes sectors */
> + __virtio32 num_sectors;
> + /* flags for this range */
> + __virtio32 flags;

cpu_to_le32() is being used on __virtio32 fields instead of cpu_to_virtio32().

From include/uapi/linux/virtio_types.h:

  /*
   * __virtio{16,32,64} have the following meaning:
   * - __u{16,32,64} for virtio devices in legacy mode, accessed in native 
endian
   * - __le{16,32,64} for standard-compliant virtio devices
   */

From the VIRTIO specification:

  struct virtio_blk_discard_write_zeroes {
 le64 sector;
 le32 num_sectors;
 struct {
 le32 unmap:1;
 le32 reserved:31;
 } flags;
  };


Since the VIRTIO spec says these fields are little-endian, I think these
fields should be declared just __u32 and __u64 instead of __virtio32 and
__virtio64.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-25 Thread Paolo Bonzini
On 15/10/2018 11:27, Christoph Hellwig wrote:
> There is some issues in this spec.  For one using the multiple ranges
> also for write zeroes is rather inefficient.  Write zeroes really should
> use the same format as read and write.

What makes it inefficient?

> Second the unmap flag isn't properly specified at all, as nothing
> says the device may not unmap without the unmap flag.  Please take
> a look at the SCSI or NVMe ѕpec for some guidance.

Thanks, I'll submit a patch for this.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-15 Thread Liu, Changpeng


> -Original Message-
> From: Daniel Verkamp [mailto:dverk...@chromium.org]
> Sent: Tuesday, October 16, 2018 7:16 AM
> To: h...@infradead.org
> Cc: virtualization@lists.linux-foundation.org; linux-bl...@vger.kernel.org;
> m...@redhat.com; jasow...@redhat.com; ax...@kernel.dk;
> stefa...@redhat.com; Liu, Changpeng 
> Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support
> 
> On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig  wrote:
> > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > > From: Changpeng Liu 
> > >
> > > In commit 88c85538, "virtio-blk: add discard and write zeroes features
> > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
> >
> > There is some issues in this spec.  For one using the multiple ranges
> > also for write zeroes is rather inefficient.  Write zeroes really should
> > use the same format as read and write.
> 
> I wasn't involved in the writing of the spec, so I'll defer to Michael
> and Changpeng here, but I'm not sure how "set in stone" the virtio
> specification is, or if it can be updated somehow without breaking
> compatibility.
> 
> I agree that Write Zeroes would be simpler to implement as a single
> LBA + length rather than a list.  However, it's not really possible to
> use the same format as the regular virtio block read/write commands
> (VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do
> not specify a length explicitly; length is implied by the length of
> the data buffer as defined by the virtio descriptor, but a Write
> Zeroes command does not require a data buffer.  At best, this could be
> a separate command mirroring the layout of struct virtio_blk_req but
> with data replaced with a length field; I'm not sure that buys much in
> the way of consistency.
Yeah, that's the consideration here.
> 
> > Second the unmap flag isn't properly specified at all, as nothing
> > says the device may not unmap without the unmap flag.  Please take
> > a look at the SCSI or NVMe ѕpec for some guidance.
> 
> This could probably use some clarifying text in the specification, but
> given that there is nothing in the spec describing what the device
> needs to do when unmap = 0, I would assume that the device can do
> whatever it likes, as long as the blocks read back as 0s afterwards.
> Reading back 0s is required by the definition of the Write Zeroes
> command in the same virtio spec change.  It would probably be good to
> clarify this and explicitly define what the device is allowed to do in
> response to both settings of the unmap bit.
> 
> My understanding of the corresponding feature in NVMe (the Deallocate
> bit in the Write Zeroes command) is that the only difference between
> Deallocate = 1 and 0 is that the device "should" versus "may" (no
> "shall" on either side) deallocate the corresponding blocks, but only
> if the device supports reading 0s back after blocks are deallocated.
> If the device does not support reading 0s after deallocation, it is
> not allowed to deallocate blocks as part of a Write Zeroes command
> regardless of the setting of the Deallocate bit.
> 
> Some similar wording could probably be added to the virtio spec to
> clarify the meaning of unmap, although I would prefer something that
> makes it a little clearer that the bit is only intended as a hint from
> the driver to indicate whether the device should attempt to keep
> storage allocated for the zeroed blocks, if that is indeed the
> intended behavior.
Yes, that's the original idea.  Adding a clear description to the specification 
may be better. 
> 
> Is there some in-kernel doc that describes what behavior the Linux
> block layer needs from a write zeroes command?
> 
> > > +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> > > + bool unmap)
> >
> > Why is this an inline function?
> 
> I don't think there's any reason it needs to be inline; I can drop the
> inline in the next revision.
> 
> Given (as far as I can tell) your concerns seem to apply to the Write
> Zeroes command specifically, would it be reasonable to start with a
> patch that just adds support for the Discard command (along with fixes
> for Ming's feedback)?  This would be sufficient for my particular use
> case (although I can't speak for Changpeng), and we can revisit Write
> Zeroes once the spec concerns are worked out.
> 
> Thanks,
> -- Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-15 Thread Liu, Changpeng


> -Original Message-
> From: Christoph Hellwig [mailto:h...@infradead.org]
> Sent: Monday, October 15, 2018 5:28 PM
> To: Daniel Verkamp 
> Cc: virtualization@lists.linux-foundation.org; linux-bl...@vger.kernel.org;
> Michael S. Tsirkin ; Jason Wang ;
> Jens Axboe ; Stefan Hajnoczi ; Liu,
> Changpeng 
> Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support
> 
> On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > From: Changpeng Liu 
> >
> > In commit 88c85538, "virtio-blk: add discard and write zeroes features
> > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
> 
> There is some issues in this spec.  For one using the multiple ranges
> also for write zeroes is rather inefficient.  Write zeroes really should
> use the same format as read and write.
Because there is no length parameter for virtio block specification, adding the
two extra commands will not break the existing specification and driver 
implementation. 
Also existing Linux implementation for write zeroes will not use multiple 
segment
at all so there is always one range in practice.
> 
> Second the unmap flag isn't properly specified at all, as nothing
> says the device may not unmap without the unmap flag.  Please take
> a look at the SCSI or NVMe ѕpec for some guidance.
The unmap flag is only used for write zeroes command, as discard command will 
not 
guarantee the spaces will be zeroed, so adding this flag means (Discard + Write 
Zeroes),
so this definitely is backend related, the backend implementation can use same 
code
to implement discard and write zeroes commands.
> 
> > +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> > +   bool unmap)
> 
> Why is this an inline function?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-15 Thread Daniel Verkamp
On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig  wrote:
> On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > From: Changpeng Liu 
> >
> > In commit 88c85538, "virtio-blk: add discard and write zeroes features
> > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
>
> There is some issues in this spec.  For one using the multiple ranges
> also for write zeroes is rather inefficient.  Write zeroes really should
> use the same format as read and write.

I wasn't involved in the writing of the spec, so I'll defer to Michael
and Changpeng here, but I'm not sure how "set in stone" the virtio
specification is, or if it can be updated somehow without breaking
compatibility.

I agree that Write Zeroes would be simpler to implement as a single
LBA + length rather than a list.  However, it's not really possible to
use the same format as the regular virtio block read/write commands
(VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do
not specify a length explicitly; length is implied by the length of
the data buffer as defined by the virtio descriptor, but a Write
Zeroes command does not require a data buffer.  At best, this could be
a separate command mirroring the layout of struct virtio_blk_req but
with data replaced with a length field; I'm not sure that buys much in
the way of consistency.

> Second the unmap flag isn't properly specified at all, as nothing
> says the device may not unmap without the unmap flag.  Please take
> a look at the SCSI or NVMe ѕpec for some guidance.

This could probably use some clarifying text in the specification, but
given that there is nothing in the spec describing what the device
needs to do when unmap = 0, I would assume that the device can do
whatever it likes, as long as the blocks read back as 0s afterwards.
Reading back 0s is required by the definition of the Write Zeroes
command in the same virtio spec change.  It would probably be good to
clarify this and explicitly define what the device is allowed to do in
response to both settings of the unmap bit.

My understanding of the corresponding feature in NVMe (the Deallocate
bit in the Write Zeroes command) is that the only difference between
Deallocate = 1 and 0 is that the device "should" versus "may" (no
"shall" on either side) deallocate the corresponding blocks, but only
if the device supports reading 0s back after blocks are deallocated.
If the device does not support reading 0s after deallocation, it is
not allowed to deallocate blocks as part of a Write Zeroes command
regardless of the setting of the Deallocate bit.

Some similar wording could probably be added to the virtio spec to
clarify the meaning of unmap, although I would prefer something that
makes it a little clearer that the bit is only intended as a hint from
the driver to indicate whether the device should attempt to keep
storage allocated for the zeroed blocks, if that is indeed the
intended behavior.

Is there some in-kernel doc that describes what behavior the Linux
block layer needs from a write zeroes command?

> > +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> > + bool unmap)
>
> Why is this an inline function?

I don't think there's any reason it needs to be inline; I can drop the
inline in the next revision.

Given (as far as I can tell) your concerns seem to apply to the Write
Zeroes command specifically, would it be reasonable to start with a
patch that just adds support for the Discard command (along with fixes
for Ming's feedback)?  This would be sufficient for my particular use
case (although I can't speak for Changpeng), and we can revisit Write
Zeroes once the spec concerns are worked out.

Thanks,
-- Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-15 Thread Christoph Hellwig
On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> From: Changpeng Liu 
> 
> In commit 88c85538, "virtio-blk: add discard and write zeroes features
> to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio

There is some issues in this spec.  For one using the multiple ranges
also for write zeroes is rather inefficient.  Write zeroes really should
use the same format as read and write.

Second the unmap flag isn't properly specified at all, as nothing
says the device may not unmap without the unmap flag.  Please take
a look at the SCSI or NVMe ѕpec for some guidance.

> +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> + bool unmap)

Why is this an inline function?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8] virtio_blk: add discard and write zeroes support

2018-10-14 Thread Michael S. Tsirkin
On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> From: Changpeng Liu 
> 
> In commit 88c85538, "virtio-blk: add discard and write zeroes features
> to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
> block specification has been extended to add VIRTIO_BLK_T_DISCARD and
> VIRTIO_BLK_T_WRITE_ZEROES commands.  This patch enables support for
> discard and write zeroes in the virtio-blk driver when the device
> advertises the corresponding features, VIRTIO_BLK_F_DISCARD and
> VIRTIO_BLK_F_WRITE_ZEROES.
> 
> Signed-off-by: Changpeng Liu 
> Signed-off-by: Daniel Verkamp 

Cc Paolo as well.

> ---
> dverkamp: I've picked up this patch and made a few minor changes (as
> listed below); most notably, I changed the kmalloc back to GFP_ATOMIC,
> since it can be called from a context where sleeping is not allowed.
> To prevent large allocations, I've also clamped the maximum number of
> discard segments to 256; this results in a 4K allocation and should be
> plenty of descriptors for most use cases.
> 
> I also removed most of the description from the commit message, since it
> was duplicating the comments from virtio_blk.h and quoting parts of the
> spec without adding any extra information.  I have tested this iteration
> of the patch using crosvm with modifications to enable the new features:
> https://chromium.googlesource.com/chromiumos/platform/crosvm/
> 
> CHANGELOG:
> v8: [dverkamp] replace shifts by 9 with SECTOR_SHIFT constant
> v7: [dverkamp] use GFP_ATOMIC for allocation that may not sleep; clarify
> descriptor flags field; comment wording cleanups.
> v6: don't set T_OUT bit to discard and write zeroes commands.
> v5: use new block layer API: blk_queue_flag_set.
> v4: several optimizations based on MST's comments, remove bit field
> usage for command descriptor.
> v3: define the virtio-blk protocol to add discard and write zeroes
> support, first version implementation based on proposed specification.
> v2: add write zeroes command support.
> v1: initial proposal implementation for discard command.
> ---
>  drivers/block/virtio_blk.c  | 95 -
>  include/uapi/linux/virtio_blk.h | 54 +++
>  2 files changed, 147 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 23752dc99b00..04a7ae602e2f 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -18,6 +18,7 @@
>  
>  #define PART_BITS 4
>  #define VQ_NAME_LEN 16
> +#define MAX_DISCARD_SEGMENTS 256
>  
>  static int major;
>  static DEFINE_IDA(vd_index_ida);
> @@ -172,10 +173,50 @@ static int virtblk_add_req(struct virtqueue *vq, struct 
> virtblk_req *vbr,
>   return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
>  }
>  
> +
> +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> + bool unmap)
> +{
> + unsigned short segments = blk_rq_nr_discard_segments(req);
> + unsigned short n = 0;
> + struct virtio_blk_discard_write_zeroes *range;
> + struct bio *bio;
> + u32 flags = 0;
> +
> + if (unmap)
> + flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP;
> +
> + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC);
> + if (!range)
> + return -ENOMEM;
> +
> + __rq_for_each_bio(bio, req) {
> + u64 sector = bio->bi_iter.bi_sector;
> + u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT;
> +
> + range[n].flags = cpu_to_le32(flags);
> + range[n].num_sectors = cpu_to_le32(num_sectors);
> + range[n].sector = cpu_to_le64(sector);
> + n++;
> + }
> +
> + req->special_vec.bv_page = virt_to_page(range);
> + req->special_vec.bv_offset = offset_in_page(range);
> + req->special_vec.bv_len = sizeof(*range) * segments;
> + req->rq_flags |= RQF_SPECIAL_PAYLOAD;
> +
> + return 0;
> +}
> +
>  static inline void virtblk_request_done(struct request *req)
>  {
>   struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>  
> + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
> + kfree(page_address(req->special_vec.bv_page) +
> +   req->special_vec.bv_offset);
> + }
> +
>   switch (req_op(req)) {
>   case REQ_OP_SCSI_IN:
>   case REQ_OP_SCSI_OUT:
> @@ -225,6 +266,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx 
> *hctx,
>   int qid = hctx->queue_num;
>   int err;
>   bool notify = false;
> + bool unmap = false;
>   u32 type;
>  
>   BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
> @@ -237,6 +279,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx 
> *hctx,
>   case REQ_OP_FLUSH:
>   type = VIRTIO_BLK_T_FLUSH;
>   break;
> + case REQ_OP_DISCARD:
> + type = VIRTIO_BLK_T_DISCARD;
> + break;
> + case REQ_OP_WRITE_ZEROES:
> +