Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set
On Nov 3 21:37, Philippe Mathieu-Daudé wrote: > On 11/3/20 8:48 PM, Dmitry Fomichev wrote: > >> -Original Message- > >> From: Philippe Mathieu-Daudé > ... > >>> typedef struct QEMU_PACKED NvmeCqe { > >>> -uint32_tresult; > >>> -uint32_trsvd; > >>> +union { > >>> +uint64_t result64; > >>> +uint32_t result32; > >>> +}; > >> > >> When using packed structure you want to define all fields to > >> avoid alignment confusion (and I'm surprised the compiler doesn't > >> complain...). So this would be: > >> > >>union { > >>uint64_t result64; > >>struct { > >>uint32_tresult32; > >>uint32_trsvd32; > >>}; > >>}; > >> I align (hehe...) towards this. The amount of bit-juggling we need for commands justify the need for separate NvmeCmd's, but in this case I think an NvmeCqeZA is just unnecessary clutter. If the result value is complex, the approach used by AERs is better I think: typedef struct QEMU_PACKED NvmeAerResult { uint8_t event_type; uint8_t event_info; uint8_t log_page; uint8_t resv; } NvmeAerResult; NvmeAerResult *result = (NvmeAerResult *)>cqe.result32; Since storing the Zone Append ALBA in the result64 isn't really a complex operation, let's just assign it into that member directly. (Addendum) That DW1 is command specific and no longer reserved is defined by TP 4056 (Namespace Types) - not v1.4 or any of its revisions. signature.asc Description: PGP signature
Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set
On 11/3/20 8:48 PM, Dmitry Fomichev wrote: >> -Original Message- >> From: Philippe Mathieu-Daudé ... >>> typedef struct QEMU_PACKED NvmeCqe { >>> -uint32_tresult; >>> -uint32_trsvd; >>> +union { >>> +uint64_t result64; >>> +uint32_t result32; >>> +}; >> >> When using packed structure you want to define all fields to >> avoid alignment confusion (and I'm surprised the compiler doesn't >> complain...). So this would be: >> >>union { >>uint64_t result64; >>struct { >>uint32_tresult32; >>uint32_trsvd32; >>}; >>}; >> > > IMO, the compiler doesn't complain because it's a union. Smaller > variants in unions are "padded" to the size of the largest variant > regardless of whether the struct is packed or not. > >> But since the ZNS is still a technical proposal and not in the spec, >> this doesn't look correct (the spec list this field as 32-bit). >> >> What do you think about adding NvmeCqeZNS? >> >> Maybe: >> >> typedef struct QEMU_PACKED NvmeCqeZNS { >> uint64_tresult; >> uint16_tsq_head; >> uint16_tsq_id; >> uint16_tcid; >> uint16_tstatus; >> } NvmeCqeZNS; >> >> Or clever: >> >> typedef union QEMU_PACKED NvmeCqeZNS { >> union { >> struct { >> uint64_t result; >> uint32_t dw2; >> uint32_t dw3; >> }; >> NvmeCqe cqe; >> }; >> } NvmeCqeZNS; >> > > The 1.4 base spec changes Reserved DW1 in CQE to become the > Command Specific DW1, so it would rather make sense to define > a command-specific CQE for Zone Append - > > In include/block/nvme.h: > > typedef struct QEMU_PACKED NvmeCqe { > uint32_tresult; > -uint32_trsvd; > +uint32_tdw1; > uint16_tsq_head; > uint16_tsq_id; > uint16_tcid; > uint16_tstatus; > } NvmeCqe; > > +/* Zone Append - specific CQE */ > +typedef struct QEMU_PACKED NvmeCqeZA { > +uint64_tza_slba; > +uint16_tsq_head; > +uint16_tsq_id; > +uint16_tcid; > +uint16_tstatus; > +} NvmeCqeZA; > > ... > > +QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != sizeof(NvmeCqeZA)); > > This will go away with all CQE unions and it will also allow the returned SLBA > value to be properly named. What do you think? This is cleaner, I like it :) > >> I wonder what part could go in hw/block/nvme-ns.h or "block/nvme-zns.h". > > NvmeCqeZA could simply be defined in include/block/nvme.h next to NvmeCqe. > The problem with adding include/block/nvme-zns.h is that it would be hard if > not impossible to separate all ZNS-specific content from block/nvme.h and it > would become necessary for developers to deal with two files that present > different parts of ZNS definitions instead of just one. Got it. Regards, Phil. > > Best regards, > Dmitry
RE: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set
> -Original Message- > From: Philippe Mathieu-Daudé > Sent: Friday, October 30, 2020 3:16 AM > To: Dmitry Fomichev ; Keith Busch > ; Klaus Jensen ; Kevin Wolf > ; Maxim Levitsky ; Fam Zheng > > Cc: Alistair Francis ; Matias Bjorling > ; Niklas Cassel ; > Damien Le Moal ; qemu-bl...@nongnu.org; > qemu-devel@nongnu.org > Subject: Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace > Command Set > > Hi Dmitry, > > On 10/30/20 3:32 AM, Dmitry Fomichev wrote: > > The emulation code has been changed to advertise NVM Command Set > when > > "zoned" device property is not set (default) and Zoned Namespace > > Command Set otherwise. > > > > Define values and structures that are needed to support Zoned > > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller > emulator. > > Define trace events where needed in newly introduced code. > > > > In order to improve scalability, all open, closed and full zones > > are organized in separate linked lists. Consequently, almost all > > zone operations don't require scanning of the entire zone array > > (which potentially can be quite large) - it is only necessary to > > enumerate one or more zone lists. > > > > Handlers for three new NVMe commands introduced in Zoned Namespace > > Command Set specification are added, namely for Zone Management > > Receive, Zone Management Send and Zone Append. > > > > Device initialization code has been extended to create a proper > > configuration for zoned operation using device properties. > > > > Read/Write command handler is modified to only allow writes at the > > write pointer if the namespace is zoned. For Zone Append command, > > writes implicitly happen at the write pointer and the starting write > > pointer value is returned as the result of the command. Write Zeroes > > handler is modified to add zoned checks that are identical to those > > done as a part of Write flow. > > > > Subsequent commits in this series add ZDE support and checks for > > active and open zone limits. > > > > Signed-off-by: Niklas Cassel > > Signed-off-by: Hans Holmberg > > Signed-off-by: Ajay Joshi > > Signed-off-by: Chaitanya Kulkarni > > Signed-off-by: Matias Bjorling > > Signed-off-by: Aravind Ramesh > > Signed-off-by: Shin'ichiro Kawasaki > > Signed-off-by: Adam Manzanares > > Signed-off-by: Dmitry Fomichev > > Reviewed-by: Niklas Cassel > > --- > > block/nvme.c | 2 +- > > hw/block/nvme-ns.c| 173 > > hw/block/nvme-ns.h| 54 +++ > > hw/block/nvme.c | 977 > +- > > hw/block/nvme.h | 8 + > > hw/block/trace-events | 18 +- > > include/block/nvme.h | 113 - > > When you start modifying include/ files, it is recommended > to start using scripts/git.orderfile as this makes review > easier (no need to scroll back / up constantly). > Hi Philippe, Thanks for the suggestion, indeed it makes browsing through diffs much easier! > As "block/nvme.h" is shared by 2 subsystems, keeping its > changes in a separate patch is preferred. > No problem, I'll move all changes to include/block/nvme.h to a separate patch. > > 7 files changed, 1322 insertions(+), 23 deletions(-) > > > > diff --git a/block/nvme.c b/block/nvme.c > > index 05485fdd11..7a513c9a17 100644 > > --- a/block/nvme.c > > +++ b/block/nvme.c > > @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const > NvmeCqe *c) > > { > > uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF; > > if (status) { > > -trace_nvme_error(le32_to_cpu(c->result), > > +trace_nvme_error(le32_to_cpu(c->result32), > > le16_to_cpu(c->sq_head), > > le16_to_cpu(c->sq_id), > > le16_to_cpu(c->cid), > ... > > > diff --git a/include/block/nvme.h b/include/block/nvme.h > > index 3653b4aefc..ba8a45edf5 100644 > > --- a/include/block/nvme.h > > +++ b/include/block/nvme.h > > @@ -489,6 +489,9 @@ enum NvmeIoCommands { > > NVME_CMD_COMPARE= 0x05, > > NVME_CMD_WRITE_ZEROES = 0x08, > > NVME_CMD_DSM= 0x09, > > +NVME_CMD_ZONE_MGMT_SEND = 0x79, > > +NVME_CMD_ZONE_MGMT_RECV = 0x7a, > > +NVME_CMD_ZONE_APPEND= 0x7d, > > }; > > > > typedef struct QEMU_PACKED NvmeDeleteQ { > > @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeA
Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set
On Fri, Oct 30, 2020 at 11:32:38AM +0900, Dmitry Fomichev wrote: > The emulation code has been changed to advertise NVM Command Set when > "zoned" device property is not set (default) and Zoned Namespace > Command Set otherwise. > > Define values and structures that are needed to support Zoned > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator. > Define trace events where needed in newly introduced code. > > In order to improve scalability, all open, closed and full zones > are organized in separate linked lists. Consequently, almost all > zone operations don't require scanning of the entire zone array > (which potentially can be quite large) - it is only necessary to > enumerate one or more zone lists. > > Handlers for three new NVMe commands introduced in Zoned Namespace > Command Set specification are added, namely for Zone Management > Receive, Zone Management Send and Zone Append. > > Device initialization code has been extended to create a proper > configuration for zoned operation using device properties. > > Read/Write command handler is modified to only allow writes at the > write pointer if the namespace is zoned. For Zone Append command, > writes implicitly happen at the write pointer and the starting write > pointer value is returned as the result of the command. Write Zeroes > handler is modified to add zoned checks that are identical to those > done as a part of Write flow. > > Subsequent commits in this series add ZDE support and checks for > active and open zone limits. > > Signed-off-by: Niklas Cassel > Signed-off-by: Hans Holmberg > Signed-off-by: Ajay Joshi > Signed-off-by: Chaitanya Kulkarni > Signed-off-by: Matias Bjorling > Signed-off-by: Aravind Ramesh > Signed-off-by: Shin'ichiro Kawasaki > Signed-off-by: Adam Manzanares > Signed-off-by: Dmitry Fomichev > Reviewed-by: Niklas Cassel (snip) > +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) > +{ > +NvmeCmd *cmd = (NvmeCmd *)>cmd; > +NvmeNamespace *ns = req->ns; > +/* cdw12 is zero-based number of dwords to return. Convert to bytes */ > +uint32_t len = (le32_to_cpu(cmd->cdw12) + 1) << 2; > +uint32_t dw13 = le32_to_cpu(cmd->cdw13); > +uint32_t zone_idx, zra, zrasf, partial; > +uint64_t max_zones, nr_zones = 0; > +uint16_t ret; > +uint64_t slba; > +NvmeZoneDescr *z; > +NvmeZone *zs; > +NvmeZoneReportHeader *header; > +void *buf, *buf_p; > +size_t zone_entry_sz; > + > +req->status = NVME_SUCCESS; > + > +ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, , _idx); > +if (ret) { > +return ret; > +} > + > +if (len < sizeof(NvmeZoneReportHeader)) { > +return NVME_INVALID_FIELD | NVME_DNR; > +} Just like nvme_read() and nvme_write(), nvme_zone_mgmt_recv() has to do something like: +ret = nvme_check_mdts(n, len); +if (ret) { +trace_pci_nvme_err_mdts(nvme_cid(req), len); +return ret; +} + To see that we are not exceeding MDTS. Kind regards, Niklas > + > +zra = dw13 & 0xff; > +if (!(zra == NVME_ZONE_REPORT || zra == NVME_ZONE_REPORT_EXTENDED)) { > +return NVME_INVALID_FIELD | NVME_DNR; > +} > + > +if (zra == NVME_ZONE_REPORT_EXTENDED) { > +return NVME_INVALID_FIELD | NVME_DNR; > +} > + > +zrasf = (dw13 >> 8) & 0xff; > +if (zrasf > NVME_ZONE_REPORT_OFFLINE) { > +return NVME_INVALID_FIELD | NVME_DNR; > +} > + > +partial = (dw13 >> 16) & 0x01; > + > +zone_entry_sz = sizeof(NvmeZoneDescr); > + > +max_zones = (len - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; > +buf = g_malloc0(len); > + > +header = (NvmeZoneReportHeader *)buf; > +buf_p = buf + sizeof(NvmeZoneReportHeader); > + > +while (zone_idx < ns->num_zones && nr_zones < max_zones) { > +zs = >zone_array[zone_idx]; > + > +if (!nvme_zone_matches_filter(zrasf, zs)) { > +zone_idx++; > +continue; > +} > + > +z = (NvmeZoneDescr *)buf_p; > +buf_p += sizeof(NvmeZoneDescr); > +nr_zones++; > + > +z->zt = zs->d.zt; > +z->zs = zs->d.zs; > +z->zcap = cpu_to_le64(zs->d.zcap); > +z->zslba = cpu_to_le64(zs->d.zslba); > +z->za = zs->d.za; > + > +if (nvme_wp_is_valid(zs)) { > +z->wp = cpu_to_le64(zs->d.wp); > +} else { > +z->wp = cpu_to_le64(~0ULL); > +} > + > +zone_idx++; > +} > + > +if (!partial) { > +for (; zone_idx < ns->num_zones; zone_idx++) { > +zs = >zone_array[zone_idx]; > +if (nvme_zone_matches_filter(zrasf, zs)) { > +nr_zones++; > +} > +} > +} > +header->nr_zones = cpu_to_le64(nr_zones); > + > +ret = nvme_dma(n, (uint8_t *)buf, len, DMA_DIRECTION_FROM_DEVICE, req); > + > +g_free(buf); > + > +return ret; > +}
Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set
Hi Dmitry, On 10/30/20 3:32 AM, Dmitry Fomichev wrote: > The emulation code has been changed to advertise NVM Command Set when > "zoned" device property is not set (default) and Zoned Namespace > Command Set otherwise. > > Define values and structures that are needed to support Zoned > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator. > Define trace events where needed in newly introduced code. > > In order to improve scalability, all open, closed and full zones > are organized in separate linked lists. Consequently, almost all > zone operations don't require scanning of the entire zone array > (which potentially can be quite large) - it is only necessary to > enumerate one or more zone lists. > > Handlers for three new NVMe commands introduced in Zoned Namespace > Command Set specification are added, namely for Zone Management > Receive, Zone Management Send and Zone Append. > > Device initialization code has been extended to create a proper > configuration for zoned operation using device properties. > > Read/Write command handler is modified to only allow writes at the > write pointer if the namespace is zoned. For Zone Append command, > writes implicitly happen at the write pointer and the starting write > pointer value is returned as the result of the command. Write Zeroes > handler is modified to add zoned checks that are identical to those > done as a part of Write flow. > > Subsequent commits in this series add ZDE support and checks for > active and open zone limits. > > Signed-off-by: Niklas Cassel > Signed-off-by: Hans Holmberg > Signed-off-by: Ajay Joshi > Signed-off-by: Chaitanya Kulkarni > Signed-off-by: Matias Bjorling > Signed-off-by: Aravind Ramesh > Signed-off-by: Shin'ichiro Kawasaki > Signed-off-by: Adam Manzanares > Signed-off-by: Dmitry Fomichev > Reviewed-by: Niklas Cassel > --- > block/nvme.c | 2 +- > hw/block/nvme-ns.c| 173 > hw/block/nvme-ns.h| 54 +++ > hw/block/nvme.c | 977 +- > hw/block/nvme.h | 8 + > hw/block/trace-events | 18 +- > include/block/nvme.h | 113 - When you start modifying include/ files, it is recommended to start using scripts/git.orderfile as this makes review easier (no need to scroll back / up constantly). As "block/nvme.h" is shared by 2 subsystems, keeping its changes in a separate patch is preferred. > 7 files changed, 1322 insertions(+), 23 deletions(-) > > diff --git a/block/nvme.c b/block/nvme.c > index 05485fdd11..7a513c9a17 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c) > { > uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF; > if (status) { > -trace_nvme_error(le32_to_cpu(c->result), > +trace_nvme_error(le32_to_cpu(c->result32), > le16_to_cpu(c->sq_head), > le16_to_cpu(c->sq_id), > le16_to_cpu(c->cid), ... > diff --git a/include/block/nvme.h b/include/block/nvme.h > index 3653b4aefc..ba8a45edf5 100644 > --- a/include/block/nvme.h > +++ b/include/block/nvme.h > @@ -489,6 +489,9 @@ enum NvmeIoCommands { > NVME_CMD_COMPARE= 0x05, > NVME_CMD_WRITE_ZEROES = 0x08, > NVME_CMD_DSM= 0x09, > +NVME_CMD_ZONE_MGMT_SEND = 0x79, > +NVME_CMD_ZONE_MGMT_RECV = 0x7a, > +NVME_CMD_ZONE_APPEND= 0x7d, > }; > > typedef struct QEMU_PACKED NvmeDeleteQ { > @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeAerResult { > } NvmeAerResult; > > typedef struct QEMU_PACKED NvmeCqe { > -uint32_tresult; > -uint32_trsvd; > +union { > +uint64_t result64; > +uint32_t result32; > +}; When using packed structure you want to define all fields to avoid alignment confusion (and I'm surprised the compiler doesn't complain...). So this would be: union { uint64_t result64; struct { uint32_tresult32; uint32_trsvd32; }; }; But since the ZNS is still a technical proposal and not in the spec, this doesn't look correct (the spec list this field as 32-bit). What do you think about adding NvmeCqeZNS? Maybe: typedef struct QEMU_PACKED NvmeCqeZNS { uint64_tresult; uint16_tsq_head; uint16_tsq_id; uint16_tcid; uint16_tstatus; } NvmeCqeZNS; Or clever: typedef union QEMU_PACKED NvmeCqeZNS { union { struct { uint64_t result; uint32_t dw2; uint32_t dw3; }; NvmeCqe cqe; }; } NvmeCqeZNS; I wonder what part could go in hw/block/nvme-ns.h or "block/nvme-zns.h". > uint16_tsq_head; > uint16_tsq_id; > uint16_tcid; > @@ -678,6 +683,7 @@ enum NvmeStatusCodes { > NVME_SGL_DESCR_TYPE_INVALID =