Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-03 Thread Klaus Jensen
On Nov  3 21:37, Philippe Mathieu-Daudé wrote:
> On 11/3/20 8:48 PM, Dmitry Fomichev wrote:
> >> -Original Message-
> >> From: Philippe Mathieu-Daudé 
> ...
> >>>  typedef struct QEMU_PACKED NvmeCqe {
> >>> -uint32_tresult;
> >>> -uint32_trsvd;
> >>> +union {
> >>> +uint64_t result64;
> >>> +uint32_t result32;
> >>> +};
> >>
> >> When using packed structure you want to define all fields to
> >> avoid alignment confusion (and I'm surprised the compiler doesn't
> >> complain...). So this would be:
> >>
> >>union {
> >>uint64_t result64;
> >>struct {
> >>uint32_tresult32;
> >>uint32_trsvd32;
> >>};
> >>};
> >>

I align (hehe...) towards this. The amount of bit-juggling we need for
commands justify the need for separate NvmeCmd's, but in this case I
think an NvmeCqeZA is just unnecessary clutter. If the result value is
complex, the approach used by AERs is better I think:

   typedef struct QEMU_PACKED NvmeAerResult {
   uint8_t event_type;
   uint8_t event_info;
   uint8_t log_page;
   uint8_t resv;
   } NvmeAerResult;

   NvmeAerResult *result = (NvmeAerResult *)>cqe.result32;

Since storing the Zone Append ALBA in the result64 isn't really a
complex operation, let's just assign it into that member directly.

(Addendum) That DW1 is command specific and no longer reserved is
defined by TP 4056 (Namespace Types) - not v1.4 or any of its revisions.


signature.asc
Description: PGP signature


Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-03 Thread Philippe Mathieu-Daudé
On 11/3/20 8:48 PM, Dmitry Fomichev wrote:
>> -Original Message-
>> From: Philippe Mathieu-Daudé 
...
>>>  typedef struct QEMU_PACKED NvmeCqe {
>>> -uint32_tresult;
>>> -uint32_trsvd;
>>> +union {
>>> +uint64_t result64;
>>> +uint32_t result32;
>>> +};
>>
>> When using packed structure you want to define all fields to
>> avoid alignment confusion (and I'm surprised the compiler doesn't
>> complain...). So this would be:
>>
>>union {
>>uint64_t result64;
>>struct {
>>uint32_tresult32;
>>uint32_trsvd32;
>>};
>>};
>>
> 
> IMO, the compiler doesn't complain because it's a union. Smaller
> variants in unions are "padded" to the size of the largest variant
> regardless of whether the struct is packed or not.
> 
>> But since the ZNS is still a technical proposal and not in the spec,
>> this doesn't look correct (the spec list this field as 32-bit).
>>
>> What do you think about adding NvmeCqeZNS?
>>
>> Maybe:
>>
>>   typedef struct QEMU_PACKED NvmeCqeZNS {
>>   uint64_tresult;
>>   uint16_tsq_head;
>>   uint16_tsq_id;
>>   uint16_tcid;
>>   uint16_tstatus;
>>   } NvmeCqeZNS;
>>
>> Or clever:
>>
>>   typedef union QEMU_PACKED NvmeCqeZNS {
>>   union {
>>   struct {
>>   uint64_t result;
>>   uint32_t dw2;
>>   uint32_t dw3;
>>   };
>>   NvmeCqe  cqe;
>>   };
>>   } NvmeCqeZNS;
>>
> 
> The 1.4 base spec changes Reserved DW1 in CQE to become the
> Command Specific DW1, so it would rather make sense to define
> a command-specific CQE for Zone Append -
> 
> In include/block/nvme.h:
> 
> typedef struct QEMU_PACKED NvmeCqe {
>  uint32_tresult;
> -uint32_trsvd;
> +uint32_tdw1;
>  uint16_tsq_head;
>  uint16_tsq_id;
>  uint16_tcid;
>  uint16_tstatus;
> } NvmeCqe;
> 
> +/* Zone Append - specific CQE */
> +typedef struct QEMU_PACKED NvmeCqeZA {
> +uint64_tza_slba;
> +uint16_tsq_head;
> +uint16_tsq_id;
> +uint16_tcid;
> +uint16_tstatus;
> +} NvmeCqeZA;
> 
> ...
> 
> +QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != sizeof(NvmeCqeZA));
> 
> This will go away with all CQE unions and it will also allow the returned SLBA
> value to be properly named. What do you think?

This is cleaner, I like it :)

> 
>> I wonder what part could go in hw/block/nvme-ns.h or "block/nvme-zns.h".
> 
> NvmeCqeZA could simply be defined in include/block/nvme.h next to NvmeCqe.
> The problem with adding include/block/nvme-zns.h is that it would be hard if
> not impossible to separate all ZNS-specific content from block/nvme.h and it
> would become necessary for developers to deal with two files that present
> different parts of ZNS definitions instead of just one.

Got it.

Regards,

Phil.

> 
> Best regards,
> Dmitry




RE: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-03 Thread Dmitry Fomichev
> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Friday, October 30, 2020 3:16 AM
> To: Dmitry Fomichev ; Keith Busch
> ; Klaus Jensen ; Kevin Wolf
> ; Maxim Levitsky ; Fam Zheng
> 
> Cc: Alistair Francis ; Matias Bjorling
> ; Niklas Cassel ;
> Damien Le Moal ; qemu-bl...@nongnu.org;
> qemu-devel@nongnu.org
> Subject: Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace
> Command Set
> 
> Hi Dmitry,
> 
> On 10/30/20 3:32 AM, Dmitry Fomichev wrote:
> > The emulation code has been changed to advertise NVM Command Set
> when
> > "zoned" device property is not set (default) and Zoned Namespace
> > Command Set otherwise.
> >
> > Define values and structures that are needed to support Zoned
> > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller
> emulator.
> > Define trace events where needed in newly introduced code.
> >
> > In order to improve scalability, all open, closed and full zones
> > are organized in separate linked lists. Consequently, almost all
> > zone operations don't require scanning of the entire zone array
> > (which potentially can be quite large) - it is only necessary to
> > enumerate one or more zone lists.
> >
> > Handlers for three new NVMe commands introduced in Zoned Namespace
> > Command Set specification are added, namely for Zone Management
> > Receive, Zone Management Send and Zone Append.
> >
> > Device initialization code has been extended to create a proper
> > configuration for zoned operation using device properties.
> >
> > Read/Write command handler is modified to only allow writes at the
> > write pointer if the namespace is zoned. For Zone Append command,
> > writes implicitly happen at the write pointer and the starting write
> > pointer value is returned as the result of the command. Write Zeroes
> > handler is modified to add zoned checks that are identical to those
> > done as a part of Write flow.
> >
> > Subsequent commits in this series add ZDE support and checks for
> > active and open zone limits.
> >
> > Signed-off-by: Niklas Cassel 
> > Signed-off-by: Hans Holmberg 
> > Signed-off-by: Ajay Joshi 
> > Signed-off-by: Chaitanya Kulkarni 
> > Signed-off-by: Matias Bjorling 
> > Signed-off-by: Aravind Ramesh 
> > Signed-off-by: Shin'ichiro Kawasaki 
> > Signed-off-by: Adam Manzanares 
> > Signed-off-by: Dmitry Fomichev 
> > Reviewed-by: Niklas Cassel 
> > ---
> >  block/nvme.c  |   2 +-
> >  hw/block/nvme-ns.c| 173 
> >  hw/block/nvme-ns.h|  54 +++
> >  hw/block/nvme.c   | 977
> +-
> >  hw/block/nvme.h   |   8 +
> >  hw/block/trace-events |  18 +-
> >  include/block/nvme.h  | 113 -
> 
> When you start modifying include/ files, it is recommended
> to start using scripts/git.orderfile as this makes review
> easier (no need to scroll back / up constantly).
>

Hi Philippe,

Thanks for the suggestion, indeed it makes browsing through
diffs much easier!

> As "block/nvme.h" is shared by 2 subsystems, keeping its
> changes in a separate patch is preferred.
>

No problem, I'll move all changes to include/block/nvme.h to
a separate patch.
 
> >  7 files changed, 1322 insertions(+), 23 deletions(-)
> >
> > diff --git a/block/nvme.c b/block/nvme.c
> > index 05485fdd11..7a513c9a17 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const
> NvmeCqe *c)
> >  {
> >  uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
> >  if (status) {
> > -trace_nvme_error(le32_to_cpu(c->result),
> > +trace_nvme_error(le32_to_cpu(c->result32),
> >   le16_to_cpu(c->sq_head),
> >   le16_to_cpu(c->sq_id),
> >   le16_to_cpu(c->cid),
> ...
> 
> > diff --git a/include/block/nvme.h b/include/block/nvme.h
> > index 3653b4aefc..ba8a45edf5 100644
> > --- a/include/block/nvme.h
> > +++ b/include/block/nvme.h
> > @@ -489,6 +489,9 @@ enum NvmeIoCommands {
> >  NVME_CMD_COMPARE= 0x05,
> >  NVME_CMD_WRITE_ZEROES   = 0x08,
> >  NVME_CMD_DSM= 0x09,
> > +NVME_CMD_ZONE_MGMT_SEND = 0x79,
> > +NVME_CMD_ZONE_MGMT_RECV = 0x7a,
> > +NVME_CMD_ZONE_APPEND= 0x7d,
> >  };
> >
> >  typedef struct QEMU_PACKED NvmeDeleteQ {
> > @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeA

Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-30 Thread Niklas Cassel
On Fri, Oct 30, 2020 at 11:32:38AM +0900, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Niklas Cassel 

(snip)

> +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
> +{
> +NvmeCmd *cmd = (NvmeCmd *)>cmd;
> +NvmeNamespace *ns = req->ns;
> +/* cdw12 is zero-based number of dwords to return. Convert to bytes */
> +uint32_t len = (le32_to_cpu(cmd->cdw12) + 1) << 2;
> +uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> +uint32_t zone_idx, zra, zrasf, partial;
> +uint64_t max_zones, nr_zones = 0;
> +uint16_t ret;
> +uint64_t slba;
> +NvmeZoneDescr *z;
> +NvmeZone *zs;
> +NvmeZoneReportHeader *header;
> +void *buf, *buf_p;
> +size_t zone_entry_sz;
> +
> +req->status = NVME_SUCCESS;
> +
> +ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, , _idx);
> +if (ret) {
> +return ret;
> +}
> +
> +if (len < sizeof(NvmeZoneReportHeader)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}

Just like nvme_read() and nvme_write(), nvme_zone_mgmt_recv()
has to do something like:

+ret = nvme_check_mdts(n, len);
+if (ret) {
+trace_pci_nvme_err_mdts(nvme_cid(req), len);
+return ret;
+}
+

To see that we are not exceeding MDTS.


Kind regards,
Niklas


> +
> +zra = dw13 & 0xff;
> +if (!(zra == NVME_ZONE_REPORT || zra == NVME_ZONE_REPORT_EXTENDED)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +if (zra == NVME_ZONE_REPORT_EXTENDED) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +zrasf = (dw13 >> 8) & 0xff;
> +if (zrasf > NVME_ZONE_REPORT_OFFLINE) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +partial = (dw13 >> 16) & 0x01;
> +
> +zone_entry_sz = sizeof(NvmeZoneDescr);
> +
> +max_zones = (len - sizeof(NvmeZoneReportHeader)) / zone_entry_sz;
> +buf = g_malloc0(len);
> +
> +header = (NvmeZoneReportHeader *)buf;
> +buf_p = buf + sizeof(NvmeZoneReportHeader);
> +
> +while (zone_idx < ns->num_zones && nr_zones < max_zones) {
> +zs = >zone_array[zone_idx];
> +
> +if (!nvme_zone_matches_filter(zrasf, zs)) {
> +zone_idx++;
> +continue;
> +}
> +
> +z = (NvmeZoneDescr *)buf_p;
> +buf_p += sizeof(NvmeZoneDescr);
> +nr_zones++;
> +
> +z->zt = zs->d.zt;
> +z->zs = zs->d.zs;
> +z->zcap = cpu_to_le64(zs->d.zcap);
> +z->zslba = cpu_to_le64(zs->d.zslba);
> +z->za = zs->d.za;
> +
> +if (nvme_wp_is_valid(zs)) {
> +z->wp = cpu_to_le64(zs->d.wp);
> +} else {
> +z->wp = cpu_to_le64(~0ULL);
> +}
> +
> +zone_idx++;
> +}
> +
> +if (!partial) {
> +for (; zone_idx < ns->num_zones; zone_idx++) {
> +zs = >zone_array[zone_idx];
> +if (nvme_zone_matches_filter(zrasf, zs)) {
> +nr_zones++;
> +}
> +}
> +}
> +header->nr_zones = cpu_to_le64(nr_zones);
> +
> +ret = nvme_dma(n, (uint8_t *)buf, len, DMA_DIRECTION_FROM_DEVICE, req);
> +
> +g_free(buf);
> +
> +return ret;
> +}


Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-30 Thread Philippe Mathieu-Daudé
Hi Dmitry,

On 10/30/20 3:32 AM, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Niklas Cassel 
> ---
>  block/nvme.c  |   2 +-
>  hw/block/nvme-ns.c| 173 
>  hw/block/nvme-ns.h|  54 +++
>  hw/block/nvme.c   | 977 +-
>  hw/block/nvme.h   |   8 +
>  hw/block/trace-events |  18 +-
>  include/block/nvme.h  | 113 -

When you start modifying include/ files, it is recommended
to start using scripts/git.orderfile as this makes review
easier (no need to scroll back / up constantly).

As "block/nvme.h" is shared by 2 subsystems, keeping its
changes in a separate patch is preferred.

>  7 files changed, 1322 insertions(+), 23 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 05485fdd11..7a513c9a17 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
>  {
>  uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
>  if (status) {
> -trace_nvme_error(le32_to_cpu(c->result),
> +trace_nvme_error(le32_to_cpu(c->result32),
>   le16_to_cpu(c->sq_head),
>   le16_to_cpu(c->sq_id),
>   le16_to_cpu(c->cid),
...

> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 3653b4aefc..ba8a45edf5 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -489,6 +489,9 @@ enum NvmeIoCommands {
>  NVME_CMD_COMPARE= 0x05,
>  NVME_CMD_WRITE_ZEROES   = 0x08,
>  NVME_CMD_DSM= 0x09,
> +NVME_CMD_ZONE_MGMT_SEND = 0x79,
> +NVME_CMD_ZONE_MGMT_RECV = 0x7a,
> +NVME_CMD_ZONE_APPEND= 0x7d,
>  };
>  
>  typedef struct QEMU_PACKED NvmeDeleteQ {
> @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeAerResult {
>  } NvmeAerResult;
>  
>  typedef struct QEMU_PACKED NvmeCqe {
> -uint32_tresult;
> -uint32_trsvd;
> +union {
> +uint64_t result64;
> +uint32_t result32;
> +};

When using packed structure you want to define all fields to
avoid alignment confusion (and I'm surprised the compiler doesn't
complain...). So this would be:

   union {
   uint64_t result64;
   struct {
   uint32_tresult32;
   uint32_trsvd32;
   };
   };

But since the ZNS is still a technical proposal and not in the spec,
this doesn't look correct (the spec list this field as 32-bit).

What do you think about adding NvmeCqeZNS?

Maybe:

  typedef struct QEMU_PACKED NvmeCqeZNS {
  uint64_tresult;
  uint16_tsq_head;
  uint16_tsq_id;
  uint16_tcid;
  uint16_tstatus;
  } NvmeCqeZNS;

Or clever:

  typedef union QEMU_PACKED NvmeCqeZNS {
  union {
  struct {
  uint64_t result;
  uint32_t dw2;
  uint32_t dw3;
  };
  NvmeCqe  cqe;
  };
  } NvmeCqeZNS;

I wonder what part could go in hw/block/nvme-ns.h or "block/nvme-zns.h".

>  uint16_tsq_head;
>  uint16_tsq_id;
>  uint16_tcid;
> @@ -678,6 +683,7 @@ enum NvmeStatusCodes {
>  NVME_SGL_DESCR_TYPE_INVALID =