> -----Original Message----- > From: Niklas Cassel <niklas.cas...@wdc.com> > Sent: Friday, November 6, 2020 6:59 AM > To: Dmitry Fomichev <dmitry.fomic...@wdc.com> > Cc: Keith Busch <kbu...@kernel.org>; Klaus Jensen > <k.jen...@samsung.com>; Kevin Wolf <kw...@redhat.com>; Philippe > Mathieu-Daudé <phi...@redhat.com>; Max Reitz <mre...@redhat.com>; > Maxim Levitsky <mlevi...@redhat.com>; Fam Zheng <f...@euphon.net>; > Alistair Francis <alistair.fran...@wdc.com>; Matias Bjorling > <matias.bjorl...@wdc.com>; Damien Le Moal <damien.lem...@wdc.com>; > qemu-block@nongnu.org; qemu-de...@nongnu.org > Subject: Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace > Command Set > > On Thu, Nov 05, 2020 at 11:53:38AM +0900, Dmitry Fomichev wrote: > > The emulation code has been changed to advertise NVM Command Set > when > > "zoned" device property is not set (default) and Zoned Namespace > > Command Set otherwise. > > > > Define values and structures that are needed to support Zoned > > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller > emulator. > > Define trace events where needed in newly introduced code. > > > > In order to improve scalability, all open, closed and full zones > > are organized in separate linked lists. Consequently, almost all > > zone operations don't require scanning of the entire zone array > > (which potentially can be quite large) - it is only necessary to > > enumerate one or more zone lists. > > > > Handlers for three new NVMe commands introduced in Zoned Namespace > > Command Set specification are added, namely for Zone Management > > Receive, Zone Management Send and Zone Append. > > > > Device initialization code has been extended to create a proper > > configuration for zoned operation using device properties. > > > > Read/Write command handler is modified to only allow writes at the > > write pointer if the namespace is zoned. For Zone Append command, > > writes implicitly happen at the write pointer and the starting write > > pointer value is returned as the result of the command. Write Zeroes > > handler is modified to add zoned checks that are identical to those > > done as a part of Write flow. > > > > Subsequent commits in this series add ZDE support and checks for > > active and open zone limits. > > > > Signed-off-by: Niklas Cassel <niklas.cas...@wdc.com> > > Signed-off-by: Hans Holmberg <hans.holmb...@wdc.com> > > Signed-off-by: Ajay Joshi <ajay.jo...@wdc.com> > > Signed-off-by: Chaitanya Kulkarni <chaitanya.kulka...@wdc.com> > > Signed-off-by: Matias Bjorling <matias.bjorl...@wdc.com> > > Signed-off-by: Aravind Ramesh <aravind.ram...@wdc.com> > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawas...@wdc.com> > > Signed-off-by: Adam Manzanares <adam.manzana...@wdc.com> > > Signed-off-by: Dmitry Fomichev <dmitry.fomic...@wdc.com> > > Reviewed-by: Niklas Cassel <niklas.cas...@wdc.com> > > --- > > hw/block/nvme-ns.h | 54 +++ > > hw/block/nvme.h | 8 + > > hw/block/nvme-ns.c | 173 ++++++++ > > hw/block/nvme.c | 971 > +++++++++++++++++++++++++++++++++++++++++- > > hw/block/trace-events | 18 +- > > 5 files changed, 1209 insertions(+), 15 deletions(-) > > > > (snip) > > > +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest > *req) > > +{ > > + NvmeCmd *cmd = (NvmeCmd *)&req->cmd; > > + NvmeNamespace *ns = req->ns; > > + /* cdw12 is zero-based number of dwords to return. Convert to bytes > */ > > + uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2; > > + uint32_t dw13 = le32_to_cpu(cmd->cdw13); > > + uint32_t zone_idx, zra, zrasf, partial; > > + uint64_t max_zones, nr_zones = 0; > > + uint16_t ret; > > + uint64_t slba; > > + NvmeZoneDescr *z; > > + NvmeZone *zs; > > + NvmeZoneReportHeader *header; > > + void *buf, *buf_p; > > + size_t zone_entry_sz; > > + > > + req->status = NVME_SUCCESS; > > + > > + ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx); > > + if (ret) { > > + return ret; > > + } > > + > > + zra = dw13 & 0xff; > > + if (zra != NVME_ZONE_REPORT) { > > + return NVME_INVALID_FIELD | NVME_DNR; > > + } > > + > > + zrasf = (dw13 >> 8) & 0xff; > > + if (zrasf > NVME_ZONE_REPORT_OFFLINE) { > > + return NVME_INVALID_FIELD | NVME_DNR; > > + } > > + > > + if (data_size < sizeof(NvmeZoneReportHeader)) { > > + return NVME_INVALID_FIELD | NVME_DNR; > > + } > > + > > + ret = nvme_map_dptr(n, data_size, req); > > nvme_map_dptr() call was not here in v8 patch set. > > On v7 I commented that you were missing a call to nvme_check_mdts(). > I think you still need to call nvme_check_mdts in order to verify > that data_size < mdts, no?
Ugh, I've added nvme_map_dptr() instead of nvme_check_mdts() :o Will send the corrected version shortly... Cheers, DF > > This function already has a call do nvme_dma(). nvme_dma() already > calls nvme_map_dptr(). > If you use nvme_dma(), you cannot use nvme_map_dptr(). > > It will call nvme_map_addr() (which calls qemu_sglist_add()) on the > same buffer twice, causing the qsg->size to be twice what the user > sent in. Which will cause the: > > if (unlikely(residual)) { > > check in nvme_dma() to fail. > > > Looking at nvme_read()/nvme_write(), they use nvme_map_dptr() > (without any call to nvme_dma()), and then use dma_blk_read() or > dma_blk_write(). (and they both call nvme_check_mdts()) > > > Kind regards, > Niklas > > > + if (ret) { > > + return ret; > > + } > > + > > + partial = (dw13 >> 16) & 0x01; > > + > > + zone_entry_sz = sizeof(NvmeZoneDescr); > > + > > + max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / > zone_entry_sz; > > + buf = g_malloc0(data_size); > > + > > + header = (NvmeZoneReportHeader *)buf; > > + buf_p = buf + sizeof(NvmeZoneReportHeader); > > + > > + while (zone_idx < ns->num_zones && nr_zones < max_zones) { > > + zs = &ns->zone_array[zone_idx]; > > + > > + if (!nvme_zone_matches_filter(zrasf, zs)) { > > + zone_idx++; > > + continue; > > + } > > + > > + z = (NvmeZoneDescr *)buf_p; > > + buf_p += sizeof(NvmeZoneDescr); > > + nr_zones++; > > + > > + z->zt = zs->d.zt; > > + z->zs = zs->d.zs; > > + z->zcap = cpu_to_le64(zs->d.zcap); > > + z->zslba = cpu_to_le64(zs->d.zslba); > > + z->za = zs->d.za; > > + > > + if (nvme_wp_is_valid(zs)) { > > + z->wp = cpu_to_le64(zs->d.wp); > > + } else { > > + z->wp = cpu_to_le64(~0ULL); > > + } > > + > > + zone_idx++; > > + } > > + > > + if (!partial) { > > + for (; zone_idx < ns->num_zones; zone_idx++) { > > + zs = &ns->zone_array[zone_idx]; > > + if (nvme_zone_matches_filter(zrasf, zs)) { > > + nr_zones++; > > + } > > + } > > + } > > + header->nr_zones = cpu_to_le64(nr_zones); > > + > > + ret = nvme_dma(n, (uint8_t *)buf, data_size, > > + DMA_DIRECTION_FROM_DEVICE, req); > > + > > + g_free(buf); > > + > > + return ret; > > +} > > +