date:20210406

Re: [PATCH for-6.0 v2 4/8] hw/block/nvme: fix controller namespaces array indexing

2021-04-06 Thread Philippe Mathieu-Daudé

On 4/5/21 7:54 PM, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> The controller namespaces array being 0-indexed requires 'nsid - 1'
> everywhere. Something that is easy to miss. Align the controller
> namespaces array with the subsystem namespaces array such that both are
> 1-indexed.

TBH I don't understand the justification. Assuming you hit a
bug and try to protect yourself, maybe now you should also
check for

  assert(n->namespaces[0] == NULL);

somewhere. In nvme_ns() maybe?

> 
> Signed-off-by: Klaus Jensen 
> Reviewed-by: Gollu Appalanaidu 
> ---
>  hw/block/nvme.h | 8 
>  hw/block/nvme.c | 2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 9edc86d79e98..c610ab30dc5c 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -217,7 +217,7 @@ typedef struct NvmeCtrl {
>   * Attached namespaces to this controller.  If subsys is not given, all
>   * namespaces in this list will always be attached.
>   */
> -NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
> +NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES + 1];
>  NvmeSQueue  **sq;
>  NvmeCQueue  **cq;
>  NvmeSQueue  admin_sq;
> @@ -232,7 +232,7 @@ static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, 
> uint32_t nsid)
>  return NULL;
>  }
>  
> -return n->namespaces[nsid - 1];
> +return n->namespaces[nsid];
>  }
>  
>  static inline bool nvme_ns_is_attached(NvmeCtrl *n, NvmeNamespace *ns)
> @@ -253,7 +253,7 @@ static inline void nvme_ns_attach(NvmeCtrl *n, 
> NvmeNamespace *ns)
>  uint32_t nsid = nvme_nsid(ns);
>  assert(nsid && nsid <= NVME_MAX_NAMESPACES);
>  
> -n->namespaces[nsid - 1] = ns;
> +n->namespaces[nsid] = ns;
>  }
>  
>  static inline void nvme_ns_detach(NvmeCtrl *n, NvmeNamespace *ns)
> @@ -261,7 +261,7 @@ static inline void nvme_ns_detach(NvmeCtrl *n, 
> NvmeNamespace *ns)
>  uint32_t nsid = nvme_nsid(ns);
>  assert(nsid && nsid <= NVME_MAX_NAMESPACES);
>  
> -n->namespaces[nsid - 1] = NULL;
> +n->namespaces[nsid] = NULL;
>  }
>  
>  static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index c54ec3c9523c..6d31d5b17a0b 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -5915,7 +5915,7 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace 
> *ns, Error **errp)
>  return -1;
>  }
>  } else {
> -if (n->namespaces[nsid - 1]) {
> +if (n->namespaces[nsid]) {
>  error_setg(errp, "namespace id '%d' is already in use", nsid);
>  return -1;
>  }
>

Re: [PATCH for-6.0 v2 6/8] hw/block/nvme: update dmsrl limit on namespace detachment

2021-04-06 Thread Philippe Mathieu-Daudé

On 4/5/21 7:54 PM, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> The Non-MDTS DMSRL limit must be recomputed when namespaces are
> detached.
> 
> Fixes: 645ce1a70cb6 ("hw/block/nvme: support namespace attachment command")
> Signed-off-by: Klaus Jensen 
> Reviewed-by: Gollu Appalanaidu 
> ---
>  hw/block/nvme.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index de0e726dfdd8..3dc51f407671 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -4876,6 +4876,21 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
>  return NVME_NO_COMPLETE;
>  }
>  
> +static void __nvme_update_dmrsl(NvmeCtrl *n)
> +{
> +int nsid;
> +
> +for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
> +NvmeNamespace *ns = nvme_ns(n, nsid);
> +if (!ns) {
> +continue;
> +}
> +
> +n->dmrsl = MIN_NON_ZERO(n->dmrsl,
> +BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
> +}
> +}
> +
>  static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
>  static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
>  {
> @@ -4925,6 +4940,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
> NvmeRequest *req)
>  }
>  
>  nvme_ns_detach(ctrl, ns);
> +
> +__nvme_update_dmrsl(ctrl);
>  }

Why the '__' prefix? It doesn't seem clearer (I'm not sure there is
a convention, it makes me think of a internal macro expansion use
for preprocessor).

There are very few uses of this prefix:

hw/9pfs/cofs.c:21:static ssize_t __readlink(V9fsState *s, V9fsPath
*path, V9fsString *buf)
hw/block/nvme.c:1683:static uint16_t __nvme_zrm_open(NvmeNamespace *ns,
NvmeZone *zone,
hw/block/nvme.c:1742:static void __nvme_advance_zone_wp(NvmeNamespace
*ns, NvmeZone *zone,
hw/block/nvme.c:5213:static void __nvme_select_ns_iocs(NvmeCtrl *n,
NvmeNamespace *ns)
hw/i386/amd_iommu.c:1160:static int __amdvi_int_remap_msi(AMDVIState *iommu,
hw/intc/s390_flic_kvm.c:255:static int __get_all_irqs(KVMS390FLICState
*flic,
hw/net/rocker/rocker_desc.c:199:static bool
__desc_ring_post_desc(DescRing *ring, int err)
hw/net/sungem.c:766:static uint16_t __sungem_mii_read(SunGEMState *s,
uint8_t phy_addr,
hw/ppc/ppc.c:867:static void __cpu_ppc_store_decr(PowerPCCPU *cpu,
uint64_t *nextp,
hw/s390x/pv.c:25:static int __s390_pv_cmd(uint32_t cmd, const char
*cmdname, void *data)
pc-bios/s390-ccw/cio.c:315:static int __do_cio(SubChannelId schid,
uint32_t ccw_addr, int fmt, Irb *irb)
target/ppc/mmu-hash64.c:170:static void __helper_slbie(CPUPPCState *env,
target_ulong addr,

Thomas, Eric, is it worth cleaning these and updating the
'CODESTYLE.rst'?

Phil.

Re: [PATCH for-6.0 v2 3/8] hw/block/nvme: fix the nsid 'invalid' value

2021-04-06 Thread Klaus Jensen

On Apr  6 08:53, Philippe Mathieu-Daudé wrote:
> Hi Klaus,
> 
> On 4/5/21 7:54 PM, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > The `nvme_nsid()` function returns '-1' (h) when the given
> > namespace is NULL. Since h is actually a valid namespace
> > identifier (the "broadcast" value), change this to be '0' since that
> > actually *is* the invalid value.
> > 
> > Signed-off-by: Klaus Jensen 
> > Reviewed-by: Gollu Appalanaidu 
> > ---
> >  hw/block/nvme-ns.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> > index 9ab7894fc83e..82340c4b2574 100644
> > --- a/hw/block/nvme-ns.h
> > +++ b/hw/block/nvme-ns.h
> > @@ -96,7 +96,7 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
> >  return ns->params.nsid;
> >  }
> >  
> > -return -1;
> > +return 0;
> 
> For 6.1 can you add a NVME_NSID_INVALID definition along
> NVME_NSID_BROADCAST and use it here?
> 

Good idea Philippe, I'll write that up, thanks!


signature.asc
Description: PGP signature

Re: [PATCH for-6.0 v2 6/8] hw/block/nvme: update dmsrl limit on namespace detachment

2021-04-06 Thread Klaus Jensen

On Apr  6 09:10, Philippe Mathieu-Daudé wrote:
> On 4/5/21 7:54 PM, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > The Non-MDTS DMSRL limit must be recomputed when namespaces are
> > detached.
> > 
> > Fixes: 645ce1a70cb6 ("hw/block/nvme: support namespace attachment command")
> > Signed-off-by: Klaus Jensen 
> > Reviewed-by: Gollu Appalanaidu 
> > ---
> >  hw/block/nvme.c | 17 +
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index de0e726dfdd8..3dc51f407671 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -4876,6 +4876,21 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest 
> > *req)
> >  return NVME_NO_COMPLETE;
> >  }
> >  
> > +static void __nvme_update_dmrsl(NvmeCtrl *n)
> > +{
> > +int nsid;
> > +
> > +for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
> > +NvmeNamespace *ns = nvme_ns(n, nsid);
> > +if (!ns) {
> > +continue;
> > +}
> > +
> > +n->dmrsl = MIN_NON_ZERO(n->dmrsl,
> > +BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
> > +}
> > +}
> > +
> >  static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
> >  static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
> >  {
> > @@ -4925,6 +4940,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
> > NvmeRequest *req)
> >  }
> >  
> >  nvme_ns_detach(ctrl, ns);
> > +
> > +__nvme_update_dmrsl(ctrl);
> >  }
> 
> Why the '__' prefix? It doesn't seem clearer (I'm not sure there is
> a convention, it makes me think of a internal macro expansion use
> for preprocessor).
> 
> There are very few uses of this prefix:
> 
> hw/9pfs/cofs.c:21:static ssize_t __readlink(V9fsState *s, V9fsPath
> *path, V9fsString *buf)
> hw/block/nvme.c:1683:static uint16_t __nvme_zrm_open(NvmeNamespace *ns,
> NvmeZone *zone,
> hw/block/nvme.c:1742:static void __nvme_advance_zone_wp(NvmeNamespace
> *ns, NvmeZone *zone,
> hw/block/nvme.c:5213:static void __nvme_select_ns_iocs(NvmeCtrl *n,
> NvmeNamespace *ns)
> hw/i386/amd_iommu.c:1160:static int __amdvi_int_remap_msi(AMDVIState *iommu,
> hw/intc/s390_flic_kvm.c:255:static int __get_all_irqs(KVMS390FLICState
> *flic,
> hw/net/rocker/rocker_desc.c:199:static bool
> __desc_ring_post_desc(DescRing *ring, int err)
> hw/net/sungem.c:766:static uint16_t __sungem_mii_read(SunGEMState *s,
> uint8_t phy_addr,
> hw/ppc/ppc.c:867:static void __cpu_ppc_store_decr(PowerPCCPU *cpu,
> uint64_t *nextp,
> hw/s390x/pv.c:25:static int __s390_pv_cmd(uint32_t cmd, const char
> *cmdname, void *data)
> pc-bios/s390-ccw/cio.c:315:static int __do_cio(SubChannelId schid,
> uint32_t ccw_addr, int fmt, Irb *irb)
> target/ppc/mmu-hash64.c:170:static void __helper_slbie(CPUPPCState *env,
> target_ulong addr,
> 
> Thomas, Eric, is it worth cleaning these and updating the
> 'CODESTYLE.rst'?
> 

Yeah ok, I think you are right that there is no clear convention on when
to use this or not. I typically just use it for functions that are
normally not supposed to be called directly.

But I don't even think its consistent in the nvme device. For my sake,
we can clean it up, I'll drop it in this case since there is no good
reason for it other than my own idea of "style".


signature.asc
Description: PGP signature

Re: [PATCH for-6.0 v2 4/8] hw/block/nvme: fix controller namespaces array indexing

2021-04-06 Thread Klaus Jensen

On Apr  6 09:01, Philippe Mathieu-Daudé wrote:
> On 4/5/21 7:54 PM, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > The controller namespaces array being 0-indexed requires 'nsid - 1'
> > everywhere. Something that is easy to miss. Align the controller
> > namespaces array with the subsystem namespaces array such that both are
> > 1-indexed.
> 
> TBH I don't understand the justification.

Justification is mostly to align with the subsystem device. I like the
'1-indexed' approach better. And the -1 causes Coverity to complain
before the assert was added.

> Assuming you hit a
> bug and try to protect yourself, maybe now you should also
> check for
> 
>   assert(n->namespaces[0] == NULL);
> 
> somewhere. In nvme_ns() maybe?
> 

That is definitely a state that should always hold, I guess we can do
that, but we do already guard all "insertions" into the namespace array
by an assert on the nsid. Then again, asserting here makes sure that we
don't introduce something else that inserts on this invalid position.

So, good point, I'll add it.

> > 
> > Signed-off-by: Klaus Jensen 
> > Reviewed-by: Gollu Appalanaidu 
> > ---
> >  hw/block/nvme.h | 8 
> >  hw/block/nvme.c | 2 +-
> >  2 files changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> > index 9edc86d79e98..c610ab30dc5c 100644
> > --- a/hw/block/nvme.h
> > +++ b/hw/block/nvme.h
> > @@ -217,7 +217,7 @@ typedef struct NvmeCtrl {
> >   * Attached namespaces to this controller.  If subsys is not given, all
> >   * namespaces in this list will always be attached.
> >   */
> > -NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
> > +NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES + 1];
> >  NvmeSQueue  **sq;
> >  NvmeCQueue  **cq;
> >  NvmeSQueue  admin_sq;
> > @@ -232,7 +232,7 @@ static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, 
> > uint32_t nsid)
> >  return NULL;
> >  }
> >  
> > -return n->namespaces[nsid - 1];
> > +return n->namespaces[nsid];
> >  }
> >  
> >  static inline bool nvme_ns_is_attached(NvmeCtrl *n, NvmeNamespace *ns)
> > @@ -253,7 +253,7 @@ static inline void nvme_ns_attach(NvmeCtrl *n, 
> > NvmeNamespace *ns)
> >  uint32_t nsid = nvme_nsid(ns);
> >  assert(nsid && nsid <= NVME_MAX_NAMESPACES);
> >  
> > -n->namespaces[nsid - 1] = ns;
> > +n->namespaces[nsid] = ns;
> >  }
> >  
> >  static inline void nvme_ns_detach(NvmeCtrl *n, NvmeNamespace *ns)
> > @@ -261,7 +261,7 @@ static inline void nvme_ns_detach(NvmeCtrl *n, 
> > NvmeNamespace *ns)
> >  uint32_t nsid = nvme_nsid(ns);
> >  assert(nsid && nsid <= NVME_MAX_NAMESPACES);
> >  
> > -n->namespaces[nsid - 1] = NULL;
> > +n->namespaces[nsid] = NULL;
> >  }
> >  
> >  static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index c54ec3c9523c..6d31d5b17a0b 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -5915,7 +5915,7 @@ int nvme_register_namespace(NvmeCtrl *n, 
> > NvmeNamespace *ns, Error **errp)
> >  return -1;
> >  }
> >  } else {
> > -if (n->namespaces[nsid - 1]) {
> > +if (n->namespaces[nsid]) {
> >  error_setg(errp, "namespace id '%d' is already in use", nsid);
> >  return -1;
> >  }
> > 
> 
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature

Re: issuing [block-]job-complete to jobs in STANDBY state

2021-04-06 Thread Peter Krempa

On Thu, Apr 01, 2021 at 15:02:35 -0400, John Snow wrote:
> Hi; downstream we've run into an issue where VMs under heavy load with many
> simultaneously concurrent block jobs running might occasionally flicker into
> the STANDBY state, during which time they will be unable to receive JOB
> COMPLETE commands. I assume this flicker is due to
> child_job_drained_begin().
> 
> BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
> 
> It's safe to just retry this operation again, but it may be difficult to
> understand WHY the job is paused at the application level, since the flush
> event may be asynchronous and unpredictable.
> 
> We could define a transition to allow COMPLETE to be applied to STANDBY
> jobs, but is there any risk or drawback to doing so? On QMP's side, we do
> know the difference between a temporary pause and a user pause/error pause
> (Both use the user_pause flag.)

I think qemu should allow that, because otherwise it's like playing
whack-a-mole. Libvirt could delay the completiion until the 'standby'
state is over and the job is running but that would be racy.

Additionally, we could even hit the standby state before the state
transition would be processed internally, and the error message is too
generic to attempt to decide what to do based on it. (We do want to
report real errors)

Re: [PATCH 1/2] block/rbd: fix memory leak in qemu_rbd_connect()

2021-04-06 Thread Markus Armbruster

Stefano Garzarella  writes:

> In qemu_rbd_connect(), 'mon_host' is allocated by qemu_rbd_mon_host()
> using g_strjoinv(), but it's only freed in the error path, leaking
> memory in the success path as reported by valgrind:
>
>   80 bytes in 4 blocks are definitely lost in loss record 5,028 of 6,516
>  at 0x4839809: malloc (vg_replace_malloc.c:307)
>  by 0x5315BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
>  by 0x532B6FF: g_strjoinv (in /usr/lib64/libglib-2.0.so.0.6600.8)
>  by 0x87D07E: qemu_rbd_mon_host (rbd.c:538)
>  by 0x87D07E: qemu_rbd_connect (rbd.c:562)
>  by 0x87E1CE: qemu_rbd_open (rbd.c:740)
>  by 0x840EB1: bdrv_open_driver (block.c:1528)
>  by 0x8453A9: bdrv_open_common (block.c:1802)
>  by 0x8453A9: bdrv_open_inherit (block.c:3444)
>  by 0x8464C2: bdrv_open (block.c:3537)
>  by 0x8108CD: qmp_blockdev_add (blockdev.c:3569)
>  by 0x8EA61B: qmp_marshal_blockdev_add (qapi-commands-block-core.c:1086)
>  by 0x90B528: do_qmp_dispatch_bh (qmp-dispatch.c:131)
>  by 0x907EA4: aio_bh_poll (async.c:164)
>
> Fix freeing 'mon_host' also when qemu_rbd_connect() ends correctly.
>
> Signed-off-by: Stefano Garzarella 

I believe this
Fixes: 0a55679b4a5061f4d74bdb1a0e81611ba3390b00

Reviewed-by: Markus Armbruster

Re: [PATCH 2/2] block/rbd: fix memory leak in qemu_rbd_co_create_opts()

2021-04-06 Thread Markus Armbruster

Stefano Garzarella  writes:

> When we allocate 'q_namespace', we forgot to set 'has_q_namespace'
> to true. This can cause several issues, including a memory leak,
> since qapi_free_BlockdevCreateOptions() does not deallocate that
> memory, as reported by valgrind:
>
>   13 bytes in 1 blocks are definitely lost in loss record 7 of 96
>  at 0x4839809: malloc (vg_replace_malloc.c:307)
>  by 0x48CEBB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
>  by 0x48E3FE3: g_strdup (in /usr/lib64/libglib-2.0.so.0.6600.8)
>  by 0x180010: qemu_rbd_co_create_opts (rbd.c:446)
>  by 0x1AE72C: bdrv_create_co_entry (block.c:492)
>  by 0x241902: coroutine_trampoline (coroutine-ucontext.c:173)
>  by 0x57530AF: ??? (in /usr/lib64/libc-2.32.so)
>  by 0x1FFEFFFA6F: ???
>
> Fix setting 'has_q_namespace' to true when we allocate 'q_namespace'.
>
> Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
> Signed-off-by: Stefano Garzarella 
> ---
>  block/rbd.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 24cefcd0dc..f098a89c7b 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -444,6 +444,7 @@ static int coroutine_fn 
> qemu_rbd_co_create_opts(BlockDriver *drv,
>  loc->user= g_strdup(qdict_get_try_str(options, "user"));
>  loc->has_user= !!loc->user;
>  loc->q_namespace = g_strdup(qdict_get_try_str(options, "namespace"));
> +loc->has_q_namespace = !!loc->q_namespace;
>  loc->image   = g_strdup(qdict_get_try_str(options, "image"));
>  keypairs = qdict_get_try_str(options, "=keyvalue-pairs");

Reviewed-by: Markus Armbruster

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Dongli Zhang

On 4/5/21 6:55 PM, Jason Wang wrote:
> 
> 在 2021/4/6 上午4:00, Dongli Zhang 写道:
>>
>> On 4/1/21 8:47 PM, Jason Wang wrote:
>>> 在 2021/3/30 下午3:29, Dongli Zhang 写道:
 On 3/28/21 8:56 PM, Jason Wang wrote:
> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
>> Hi Jason,
>>
>> On 3/26/21 12:24 AM, Jason Wang wrote:
>>> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
 The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due 
 to
 the loss of doorbell kick, e.g.,

 https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$

 ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
 fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

 This patch introduces a new debug interface 'DeviceEvent' to 
 DeviceClass
 to help narrow down if the issue is due to loss of irq/kick. So far 
 the new
 interface handles only two events: 'call' and 'kick'. Any device (e.g.,
 virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, 
 MSI-X
 or legacy IRQ).

 The 'call' is to inject irq on purpose by admin for a specific device
 (e.g.,
 vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the 
 doorbell
 on purpose by admin at QEMU/host side for a specific device.

 This device can be used as a workaround if call/kick is lost due to
 virtualization software (e.g., kernel or QEMU) issue.

 We may also implement the interface for VFIO PCI, e.g., to write to
 VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to 
 VM
 on purpose. This is considered future work once the virtio part is 
 done.

 Below is from live crash analysis. Initially, the queue=2 has count=15 
 for
 'kick' eventfd_ctx. Suppose there is data in vring avail while there 
 is no
 used available. We suspect this is because vhost-scsi was not notified 
 by
 VM. In order to narrow down and analyze the issue, we use live crash to
 dump the current counter of eventfd for queue=2.

 crash> eventfd_ctx 8f67f6bbe700
 struct eventfd_ctx {
   kref = {
     refcount = {
   refs = {
     counter = 4
   }
     }
   },
   wqh = {
     lock = {
   {
     rlock = {
   raw_lock = {
     val = {
   counter = 0
     }
   }
     }
   }
     },
     head = {
   next = 0x8f841dc08e18,
   prev = 0x8f841dc08e18
     }
   },
   count = 15, ---> eventfd is 15 !!!
   flags = 526336
 }

 Now we kick the doorbell for vhost-scsi queue=2 on purpose for 
 diagnostic
 with this interface.

 { "execute": "x-debug-device-event",
   "arguments": { "dev": "/machine/peripheral/vscsi0",
  "event": "kick", "queue": 2 } }

 The counter is increased to 16. Suppose the hang issue is resolved, it
 indicates something bad is in software that the 'kick' is lost.
>>> What do you mean by "software" here? And it looks to me you're testing
>>> whether
>>> event_notifier_set() is called by virtio_queue_notify() here. If so, 
>>> I'm not
>>> sure how much value could we gain from a dedicated debug interface like 
>>> this
>>> consider there're a lot of exisinting general purpose debugging method 
>>> like
>>> tracing or gdb. I'd say the path from virtio_queue_notify() to
>>> event_notifier_set() is only a very small fraction of the process of
>>> virtqueue
>>> kick which is unlikey to be buggy. Consider usually the ioeventfd will 
>>> be
>>> offloaded to KVM, it's more a chance that something is wrong in setuping
>>> ioeventfd instead of here. Irq is even more complicated.
>> Thank you very much!
>>
>> I am not testing whether event_notifier_set() is called by
>> virtio_queue_notify().
>>
>> The 'software' indicates the data processing and event notification 
>> mechanism
>> involved with virtio/vhost PV driver frontend. E.g., while VM is waiting
>> for an
>> extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
>> erroneously returns false due to corrupted ring buffer status.
> So there coul

Re: [PULL for-6.0 0/2] emulated nvme fixes

2021-04-06 Thread Peter Maydell

On Mon, 5 Apr 2021 at 18:34, Klaus Jensen  wrote:
>
> From: Klaus Jensen 
>
> Hi Peter,
>
> The following changes since commit 25d75c99b2e5941c67049ee776efdb226414f4c6:
>
>   Merge remote-tracking branch 'remotes/xtensa/tags/20210403-xtensa' into 
> staging (2021-04-04 21:48:45 +0100)
>
> are available in the Git repository at:
>
>   git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-for-6.0-pull-request
>
> for you to fetch changes up to 498114b37bc99fddcfc24b92bff7f1323fb32672:
>
>   hw/block/nvme: expose 'bootindex' property (2021-04-05 19:33:04 +0200)
>
> 
> emulated nvme fixes
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0
for any user-visible changes.

-- PMM

Re: [PATCH v2 2/2] block/rbd: Add an escape-aware strchr helper

2021-04-06 Thread Max Reitz


On 01.04.21 23:01, Connor Kuehl wrote:

Sometimes the parser needs to further split a token it has collected
from the token input stream. Right now, it does a cursory check to see
if the relevant characters appear in the token to determine if it should
break it down further.

However, qemu_rbd_next_tok() will escape characters as it removes tokens
from the token stream and plain strchr() won't. This can make the
initial strchr() check slightly misleading since it implies
qemu_rbd_next_tok() will find the token and split on it, except the
reality is that qemu_rbd_next_tok() will pass over it if it is escaped.

Use a custom strchr to avoid mixing escaped and unescaped string
operations.

Reported-by: Han Han 
Fixes: https://bugzilla.redhat.com/1873913
Signed-off-by: Connor Kuehl 
---
  block/rbd.c| 20 ++--
  tests/qemu-iotests/231 |  4 
  tests/qemu-iotests/231.out |  3 +++
  3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 9071a00e3f..c0e4d4a952 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -134,6 +134,22 @@ static char *qemu_rbd_next_tok(char *src, char delim, char 
**p)
  return src;
  }
  
+static char *qemu_rbd_strchr(char *src, char delim)

+{
+char *p;
+
+for (p = src; *p; ++p) {
+if (*p == delim) {
+return p;
+}
+if (*p == '\\') {
+++p;
+}
+}
+
+return NULL;
+}
+


So I thought you could make qemu_rbd_do_next_tok() to do this.  (I 
didn’t say you should, but bear with me.)  That would be possible by 
giving it a new parameter (e.g. @find), and if that is set, return @end 
if *end == delim after the loop, and NULL otherwise.


Now, if you add wrapper functions to make it nice, there’s not much more 
difference in lines added compared to just adding a new function, but it 
does mean your function should basically be the same as 
qemu_rbd_next_tok(), except that no splitting happens, that there is no 
*p, and that @end is returned instead of @src.


So there is one difference, and that is that qemu_rbd_next_tok() has 
this condition to skip escaped characters:


if (*end == '\\' && end[1] != '\0') {

where qemu_rbd_strchr() has only:

if (*p == '\\') {

And I think qemu_rbd_next_tok() is right; if the string in question has 
a trailing backslash, qemu_rbd_strchr() will ignore the final NUL and 
continue searching past the end of the string.


Max

[PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

2021-04-06 Thread Vladimir Sementsov-Ogievskiy

If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.

Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.

Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.

After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! I faced a crash, just running 277 iotest in a loop. I can't
reproduce it on master, it reproduces only on my branch with nbd
reconnect refactorings.

Still, it seems very possible that it may crash under some conditions.
So I propose this patch for 6.0. It's written so that it's obvious that
it will not hurt:

 pre-patch, on first hunk we'll just crash if thr is NULL,
 on second hunk it's safe to return -1, and using thr when
 s->connect_thread is already zeroed is obviously wrong.

 block/nbd.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..1d4668d42d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 BDRVNBDState *s = bs->opaque;
 NBDConnectThread *thr = s->connect_thread;
 
+if (!thr) {
+/* detached */
+return -1;
+}
+
 qemu_mutex_lock(&thr->mutex);
 
 switch (thr->state) {
@@ -486,6 +491,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 s->wait_connect = true;
 qemu_coroutine_yield();
 
+if (!s->connect_thread) {
+/* detached */
+return -1;
+}
+assert(thr == s->connect_thread);
+
 qemu_mutex_lock(&thr->mutex);
 
 switch (thr->state) {
-- 
2.29.2

Re: [PATCH 1/7] block/nbd: avoid touching freed connect_thread

2021-04-06 Thread Vladimir Sementsov-Ogievskiy


15.03.2021 09:06, Roman Kagan wrote:

When the NBD connection is being torn down, the connection thread gets
canceled and "detached", meaning it is about to get freed.

If this happens while the connection coroutine yielded waiting for the
connection thread to complete, when it resumes it may access the
invalidated connection thread data.

To prevent this, revalidate the ->connect_thread pointer in
nbd_co_establish_connection_cancel before using after the the yield.

Signed-off-by: Roman Kagan 
---
  block/nbd.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..447d176b76 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -486,6 +486,15 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
  s->wait_connect = true;
  qemu_coroutine_yield();
  
+/*

+ * If nbd_co_establish_connection_cancel had a chance to run it may have
+ * invalidated ->connect_thread.
+ */
+thr = s->connect_thread;
+if (!thr) {
+return -ECONNABORTED;


nbd_co_establish_connection() tends to return -1 or 0, not -errno, so -1 is 
better here. Still it doesn't really matter.


+}
+
  qemu_mutex_lock(&thr->mutex);
  
  switch (thr->state) {





--
Best regards,
Vladimir

Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

2021-04-06 Thread Vladimir Sementsov-Ogievskiy


06.04.2021 18:51, Vladimir Sementsov-Ogievskiy wrote:

If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.

Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.

Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.

After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy
---

Hi all! I faced a crash, just running 277 iotest in a loop. I can't
reproduce it on master, it reproduces only on my branch with nbd
reconnect refactorings.

Still, it seems very possible that it may crash under some conditions.
So I propose this patch for 6.0. It's written so that it's obvious that
it will not hurt:

  pre-patch, on first hunk we'll just crash if thr is NULL,
  on second hunk it's safe to return -1, and using thr when
  s->connect_thread is already zeroed is obviously wrong.


Ha, occasionally I reinvented what Roman already does in "[PATCH 1/7] block/nbd: 
avoid touching freed connect_thread".

My additional first hunk actually is not needed, as nbd_co_establish_connection is 
called after if (!nbd_clisent_connecting(s)) { return; }, so we should not be here 
after  nbd_co_establish_connection_cancel(bs, true); which is called with 
s->state set to NBD_CLIENT_QUIT.

So, it would be more honest to take Roman's patch "[PATCH 1/7] block/nbd: avoid 
touching freed connect_thread" :)

Eric, could you take a look? If there no more pending block patches, I can try 
to send pull-request myself

--
Best regards,
Vladimir

Re: [PATCH 0/2] block/rbd: fix memory leaks

2021-04-06 Thread Max Reitz


On 29.03.21 17:01, Stefano Garzarella wrote:

This series fixes two memory leaks, found through valgrind, in the
rbd driver.

Stefano Garzarella (2):
   block/rbd: fix memory leak in qemu_rbd_connect()
   block/rbd: fix memory leak in qemu_rbd_co_create_opts()

  block/rbd.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)


Reviewed-by: Max Reitz 

I’m not quite sure whether this is fit for 6.0...  I think it’s too late 
for rc2, so I don’t know.


Max

Re: [PATCH for-6.0 v2 4/8] hw/block/nvme: fix controller namespaces array indexing

2021-04-06 Thread Klaus Jensen

On Apr  6 09:28, Klaus Jensen wrote:
> On Apr  6 09:01, Philippe Mathieu-Daudé wrote:
> > On 4/5/21 7:54 PM, Klaus Jensen wrote:
> > > From: Klaus Jensen 
> > > 
> > > The controller namespaces array being 0-indexed requires 'nsid - 1'
> > > everywhere. Something that is easy to miss. Align the controller
> > > namespaces array with the subsystem namespaces array such that both are
> > > 1-indexed.
> > 
> > TBH I don't understand the justification.
> 
> Justification is mostly to align with the subsystem device. I like the
> '1-indexed' approach better. And the -1 causes Coverity to complain
> before the assert was added.
> 
> > Assuming you hit a
> > bug and try to protect yourself, maybe now you should also
> > check for
> > 
> >   assert(n->namespaces[0] == NULL);
> > 
> > somewhere. In nvme_ns() maybe?
> > 
> 
> That is definitely a state that should always hold, I guess we can do
> that, but we do already guard all "insertions" into the namespace array
> by an assert on the nsid. Then again, asserting here makes sure that we
> don't introduce something else that inserts on this invalid position.
> 
> So, good point, I'll add it.
> 

Then again again.

I don't see the reason for the assert. Even if something ends up there 
by mistake we will never return it. If something ends up there due to 
new code, that nvme_ns() will always return NULL when nsid is zero and 
that should weed out the bug easily.

I'll update the commit message to make it clear that this is about 
making both the subsystem and controller namespaces arrays 1-indexed. 
Them being indexed differently is a recipe for disaster I'd say.

In anycase, I it actually a stretch to call this a bug fix, so I'll drop 
it and queue it up for v6.1.

signature.asc
Description: PGP signature

[PULL for-6.0 0/7] emulated nvme fixes for -rc3

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Hi Peter,

My apologies that these didn't make it for -rc2!


The following changes since commit d0d3dd401b70168a353450e031727affee828527:

  Update version for v6.0.0-rc2 release (2021-04-06 18:34:34 +0100)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-2021-04-06-pull-request

for you to fetch changes up to 9553f8deebe0e443a0b006aa9d881269fd251a2c:

  hw/block/nvme: add missing copyright headers (2021-04-06 21:14:25 +0200)


emulated nvme fixes for -rc3



Klaus Jensen (7):
  hw/block/nvme: fix pi constraint check
  hw/block/nvme: fix missing string representation for ns attachment
  hw/block/nvme: fix the nsid 'invalid' value
  hw/block/nvme: fix warning about legacy namespace configuration
  hw/block/nvme: update dmsrl limit on namespace detachment
  hw/block/nvme: fix handling of private namespaces
  hw/block/nvme: add missing copyright headers

 hw/block/nvme-dif.h|  10 +++
 hw/block/nvme-ns.h |  12 ++--
 hw/block/nvme-subsys.h |   7 +-
 hw/block/nvme.h|  41 +---
 include/block/nvme.h   |   1 +
 hw/block/nvme-dif.c|  10 +++
 hw/block/nvme-ns.c |  78 ++
 hw/block/nvme-subsys.c |  28 
 hw/block/nvme.c| 142 +
 hw/block/trace-events  |   1 -
 10 files changed, 156 insertions(+), 174 deletions(-)

-- 
2.31.1

[PULL for-6.0 4/7] hw/block/nvme: fix warning about legacy namespace configuration

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Remove the unused BlockConf from the controller structure and fix the
constraint checking to actually check the right BlockConf and issue the
warning.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h | 1 -
 hw/block/nvme.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 9edc86d79e98..8d1806cc942f 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -166,7 +166,6 @@ typedef struct NvmeCtrl {
 NvmeBar  bar;
 NvmeParams   params;
 NvmeBus  bus;
-BlockConfconf;
 
 uint16_tcntlid;
 boolqs_created;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7244534a89e9..09c38fb35e0d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -5805,7 +5805,7 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 params->max_ioqpairs = params->num_queues - 1;
 }
 
-if (n->conf.blk) {
+if (n->namespace.blkconf.blk) {
 warn_report("drive property is deprecated; "
 "please use an nvme-ns device instead");
 }
-- 
2.31.1

[PULL for-6.0 3/7] hw/block/nvme: fix the nsid 'invalid' value

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

The `nvme_nsid()` function returns '-1' (h) when the given
namespace is NULL. Since h is actually a valid namespace
identifier (the "broadcast" value), change this to be '0' since that
actually *is* the invalid value.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 9ab7894fc83e..82340c4b2574 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -96,7 +96,7 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return ns->params.nsid;
 }
 
-return -1;
+return 0;
 }
 
 static inline bool nvme_ns_shared(NvmeNamespace *ns)
-- 
2.31.1

[PULL for-6.0 7/7] hw/block/nvme: add missing copyright headers

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Add missing license/copyright headers to the nvme-dif.{c,h} files.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-dif.h | 10 ++
 hw/block/nvme-dif.c | 10 ++
 2 files changed, 20 insertions(+)

diff --git a/hw/block/nvme-dif.h b/hw/block/nvme-dif.h
index 5a8e37c8525b..524faffbd7a0 100644
--- a/hw/block/nvme-dif.h
+++ b/hw/block/nvme-dif.h
@@ -1,3 +1,13 @@
+/*
+ * QEMU NVM Express End-to-End Data Protection support
+ *
+ * Copyright (c) 2021 Samsung Electronics Co., Ltd.
+ *
+ * Authors:
+ *   Klaus Jensen   
+ *   Gollu Appalanaidu  
+ */
+
 #ifndef HW_NVME_DIF_H
 #define HW_NVME_DIF_H
 
diff --git a/hw/block/nvme-dif.c b/hw/block/nvme-dif.c
index e6f04faafb5f..81b0a4cb1382 100644
--- a/hw/block/nvme-dif.c
+++ b/hw/block/nvme-dif.c
@@ -1,3 +1,13 @@
+/*
+ * QEMU NVM Express End-to-End Data Protection support
+ *
+ * Copyright (c) 2021 Samsung Electronics Co., Ltd.
+ *
+ * Authors:
+ *   Klaus Jensen   
+ *   Gollu Appalanaidu  
+ */
+
 #include "qemu/osdep.h"
 #include "hw/block/block.h"
 #include "sysemu/dma.h"
-- 
2.31.1

[PULL for-6.0 2/7] hw/block/nvme: fix missing string representation for ns attachment

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Add the missing nvme_adm_opc_str entry for the Namespace Attachment
command.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 5b0031b11db2..9edc86d79e98 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -86,6 +86,7 @@ static inline const char *nvme_adm_opc_str(uint8_t opc)
 case NVME_ADM_CMD_SET_FEATURES: return "NVME_ADM_CMD_SET_FEATURES";
 case NVME_ADM_CMD_GET_FEATURES: return "NVME_ADM_CMD_GET_FEATURES";
 case NVME_ADM_CMD_ASYNC_EV_REQ: return "NVME_ADM_CMD_ASYNC_EV_REQ";
+case NVME_ADM_CMD_NS_ATTACHMENT:return "NVME_ADM_CMD_NS_ATTACHMENT";
 case NVME_ADM_CMD_FORMAT_NVM:   return "NVME_ADM_CMD_FORMAT_NVM";
 default:return "NVME_ADM_CMD_UNKNOWN";
 }
-- 
2.31.1

[PULL for-6.0 1/7] hw/block/nvme: fix pi constraint check

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Protection Information can only be enabled if there is at least 8 bytes
of metadata.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7f8d139a8663..ca04ee1bacfb 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -394,7 +394,7 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, 
Error **errp)
 return -1;
 }
 
-if (ns->params.pi && !ns->params.ms) {
+if (ns->params.pi && ns->params.ms < 8) {
 error_setg(errp, "at least 8 bytes of metadata required to enable "
"protection information");
 return -1;
-- 
2.31.1

[PULL for-6.0 5/7] hw/block/nvme: update dmsrl limit on namespace detachment

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

The Non-MDTS DMSRL limit must be recomputed when namespaces are
detached.

Fixes: 645ce1a70cb6 ("hw/block/nvme: support namespace attachment command")
Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 09c38fb35e0d..0898ece2af31 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4868,6 +4868,21 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
+static void nvme_update_dmrsl(NvmeCtrl *n)
+{
+int nsid;
+
+for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
+NvmeNamespace *ns = nvme_ns(n, nsid);
+if (!ns) {
+continue;
+}
+
+n->dmrsl = MIN_NON_ZERO(n->dmrsl,
+BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
+}
+}
+
 static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
 static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
 {
@@ -4917,6 +4932,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 }
 
 nvme_ns_detach(ctrl, ns);
+
+nvme_update_dmrsl(ctrl);
 }
 
 /*
-- 
2.31.1

[PULL for-6.0 6/7] hw/block/nvme: fix handling of private namespaces

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Prior to this patch, if a private nvme-ns device (that is, a namespace
that is not linked to a subsystem) is wired up to an nvme-subsys linked
nvme controller device, the device fails to verify that the namespace id
is unique within the subsystem. NVM Express v1.4b, Section 6.1.6 ("NSID
and Namespace Usage") states that because the device supports Namespace
Management, "NSIDs *shall* be unique within the NVM subsystem".

Additionally, prior to this patch, private namespaces are not known to
the subsystem and the namespace is considered exclusive to the
controller with which it is initially wired up to. However, this is not
the definition of a private namespace; per Section 1.6.33 ("private
namespace"), a private namespace is just a namespace that does not
support multipath I/O or namespace sharing, which means "that it is only
able to be attached to one controller at a time".

Fix this by always allocating namespaces in the subsystem (if one is
linked to the controller), regardless of the shared/private status of
the namespace. Whether or not the namespace is shareable is controlled
by a new `shared` nvme-ns parameter.

Finally, this fix allows the nvme-ns `subsys` parameter to be removed,
since the `shared` parameter now serves the purpose of attaching the
namespace to all controllers in the subsystem upon device realization.
It is invalid to have an nvme-ns namespace device with a linked
subsystem without the parent nvme controller device also being linked to
one and since the nvme-ns devices will unconditionally be "attached" (in
QEMU terms that is) to an nvme controller device through an NvmeBus, the
nvme-ns namespace device can always get a reference to the subsystem of
the controller it is explicitly (using 'bus=' parameter) or implicitly
attaching to.

Fixes: e570768566b3 ("hw/block/nvme: support for shared namespace in subsystem")
Cc: Minwoo Im 
Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme-ns.h |  10 +---
 hw/block/nvme-subsys.h |   7 +--
 hw/block/nvme.h|  39 +
 include/block/nvme.h   |   1 +
 hw/block/nvme-ns.c |  76 +
 hw/block/nvme-subsys.c |  28 --
 hw/block/nvme.c| 123 ++---
 hw/block/trace-events  |   1 -
 8 files changed, 115 insertions(+), 170 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 82340c4b2574..fb0a41f912e7 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,7 @@ typedef struct NvmeZone {
 
 typedef struct NvmeNamespaceParams {
 bool detached;
+bool shared;
 uint32_t nsid;
 QemuUUID uuid;
 
@@ -60,8 +61,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 uint16_t status;
+int  attached;
 
-NvmeSubsystem   *subsys;
 QTAILQ_ENTRY(NvmeNamespace) entry;
 
 NvmeIdNsZoned   *id_ns_zoned;
@@ -99,11 +100,6 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return 0;
 }
 
-static inline bool nvme_ns_shared(NvmeNamespace *ns)
-{
-return !!ns->subsys;
-}
-
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = &ns->id_ns;
@@ -225,7 +221,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns)
 }
 
 void nvme_ns_init_format(NvmeNamespace *ns);
-int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
+int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_shutdown(NvmeNamespace *ns);
 void nvme_ns_cleanup(NvmeNamespace *ns);
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index aafa04b84829..24132edd005c 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,7 +14,7 @@
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 #define NVME_SUBSYS_MAX_CTRLS   32
-#define NVME_SUBSYS_MAX_NAMESPACES  256
+#define NVME_MAX_NAMESPACES 256
 
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
@@ -24,7 +24,7 @@ typedef struct NvmeSubsystem {
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 /* Allocated namespaces for this subsystem */
-NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES + 1];
+NvmeNamespace *namespaces[NVME_MAX_NAMESPACES + 1];
 
 struct {
 char *nqn;
@@ -32,7 +32,6 @@ typedef struct NvmeSubsystem {
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
-int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
 static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
 uint32_t cntlid)
@@ -54,7 +53,7 @@ static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem 
*subsys,
 return NULL;
 }
 
-assert(nsid && nsid <= NVME_SUBSYS_MAX_NAMESPACES);
+assert(nsid && nsid <= NVME_MAX_NAMESPACES);
 
 return subsys->namespaces[nsid];
 }
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 8d1806c

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Dongli Zhang




On 4/6/21 1:43 AM, Dongli Zhang wrote:
> 
> 
> On 4/5/21 6:55 PM, Jason Wang wrote:
>>
>> 在 2021/4/6 上午4:00, Dongli Zhang 写道:
>>>
>>> On 4/1/21 8:47 PM, Jason Wang wrote:
 在 2021/3/30 下午3:29, Dongli Zhang 写道:
> On 3/28/21 8:56 PM, Jason Wang wrote:
>> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
>>> Hi Jason,
>>>
>>> On 3/26/21 12:24 AM, Jason Wang wrote:
 在 2021/3/26 下午1:44, Dongli Zhang 写道:
> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due 
> to
> the loss of doorbell kick, e.g.,
>
> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>
>
>
>
> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>
> This patch introduces a new debug interface 'DeviceEvent' to 
> DeviceClass
> to help narrow down if the issue is due to loss of irq/kick. So far 
> the new
> interface handles only two events: 'call' and 'kick'. Any device 
> (e.g.,
> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, 
> MSI-X
> or legacy IRQ).
>
> The 'call' is to inject irq on purpose by admin for a specific device
> (e.g.,
> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the 
> doorbell
> on purpose by admin at QEMU/host side for a specific device.
>
>
> This device can be used as a workaround if call/kick is lost due to
> virtualization software (e.g., kernel or QEMU) issue.
>
> We may also implement the interface for VFIO PCI, e.g., to write to
> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to 
> VM
> on purpose. This is considered future work once the virtio part is 
> done.
>
>
> Below is from live crash analysis. Initially, the queue=2 has 
> count=15 for
> 'kick' eventfd_ctx. Suppose there is data in vring avail while there 
> is no
> used available. We suspect this is because vhost-scsi was not 
> notified by
> VM. In order to narrow down and analyze the issue, we use live crash 
> to
> dump the current counter of eventfd for queue=2.
>
> crash> eventfd_ctx 8f67f6bbe700
> struct eventfd_ctx {
>   kref = {
>     refcount = {
>   refs = {
>     counter = 4
>   }
>     }
>   },
>   wqh = {
>     lock = {
>   {
>     rlock = {
>   raw_lock = {
>     val = {
>   counter = 0
>     }
>   }
>     }
>   }
>     },
>     head = {
>   next = 0x8f841dc08e18,
>   prev = 0x8f841dc08e18
>     }
>   },
>   count = 15, ---> eventfd is 15 !!!
>   flags = 526336
> }
>
> Now we kick the doorbell for vhost-scsi queue=2 on purpose for 
> diagnostic
> with this interface.
>
> { "execute": "x-debug-device-event",
>   "arguments": { "dev": "/machine/peripheral/vscsi0",
>  "event": "kick", "queue": 2 } }
>
> The counter is increased to 16. Suppose the hang issue is resolved, it
> indicates something bad is in software that the 'kick' is lost.
 What do you mean by "software" here? And it looks to me you're testing
 whether
 event_notifier_set() is called by virtio_queue_notify() here. If so, 
 I'm not
 sure how much value could we gain from a dedicated debug interface 
 like this
 consider there're a lot of exisinting general purpose debugging method 
 like
 tracing or gdb. I'd say the path from virtio_queue_notify() to
 event_notifier_set() is only a very small fraction of the process of
 virtqueue
 kick which is unlikey to be buggy. Consider usually the ioeventfd will 
 be
 offloaded to KVM, it's more a chance that something is wrong in 
 setuping
 ioeventfd instead of here. Irq is even more complicated.
>>> Thank you very much!
>>>
>>> I am not testing whether event_notifier_set() is called by
>>> virtio_queue_notify().
>>>
>>> The 'software' indicates the data processing and event notification 
>>> mechanism
>>> involved with virtio/vhost PV driver frontend. E.g., w

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Jason Wang




在 2021/4/6 下午4:43, Dongli Zhang 写道:


On 4/5/21 6:55 PM, Jason Wang wrote:

在 2021/4/6 上午4:00, Dongli Zhang 写道:

On 4/1/21 8:47 PM, Jason Wang wrote:

在 2021/3/30 下午3:29, Dongli Zhang 写道:

On 3/28/21 8:56 PM, Jason Wang wrote:

在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$




... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device
(e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
   kref = {
     refcount = {
   refs = {
     counter = 4
   }
     }
   },
   wqh = {
     lock = {
   {
     rlock = {
   raw_lock = {
     val = {
   counter = 0
     }
   }
     }
   }
     },
     head = {
   next = 0x8f841dc08e18,
   prev = 0x8f841dc08e18
     }
   },
   count = 15, ---> eventfd is 15 !!!
   flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
   "arguments": { "dev": "/machine/peripheral/vscsi0",
  "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing
whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of
virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting
for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.

So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

This is not only about whether notification is blocked.

It can also be used to help narrow down and understand if there is any
suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only
drivers following virtio spec. It is closely related to many of other kernel
components.

Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we
will be able to analyze what may happen along the IO completion path starting
from when /where the IRQ is injected ... perhaps the root cause is not with
virtio but blk-mq/scsi (this is just an example).


In addition, this idea should help for vfio-pci. Suppose the developer for a
specific device driver suspects IO/networking hangs because of loss if IRQ, we

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-06 Thread Jason Wang




在 2021/4/7 上午7:27, Dongli Zhang 写道:

This will answer your question that "Can it bypass the masking?".

For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is
not able to bypass masking because masking is to unregister the eventfd. To
write to eventfd does not take effect.

However, it is possible to bypass masking for vhost-net because vhost-net
registered a specific masked_notifier eventfd in order to mask irq. To write to
original eventfd still takes effect.

We may leave the user to decide whether to write to 'masked_notifier' or
original 'guest_notifier' for vhost-net.

My fault here. To write to masked_notifier will always be masked:(



Only when there's no bug in the qemu.




If it is EventNotifier level, we will not care whether the EventNotifier is
masked or not. It just provides an interface to write to EventNotifier.



Yes.




To dump the MSI-x table for both virtio and vfio will help confirm if the vector
is masked.



That would be helpful as well. It's probably better to extend "info pci" 
command.


Thanks




Thank you very much!

Dongli Zhang

[PULL for-6.0 v2 00/10] emulated nvme fixes for -rc3

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Hi Peter,

My apologies that these didn't make it for -rc2!

I botched v1, so please pull this v2 instead.


The following changes since commit d0d3dd401b70168a353450e031727affee828527:

  Update version for v6.0.0-rc2 release (2021-04-06 18:34:34 +0100)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-2021-04-07-pull-request

for you to fetch changes up to 5dd79300df47f07d0e9d6a7bda43b23ff26001dc:

  hw/block/nvme: fix out-of-bounds read in nvme_subsys_ctrl (2021-04-07 
07:27:09 +0200)


emulated nvme fixes for -rc3

v2:
  - added missing patches



Klaus Jensen (10):
  hw/block/nvme: fix pi constraint check
  hw/block/nvme: fix missing string representation for ns attachment
  hw/block/nvme: fix the nsid 'invalid' value
  hw/block/nvme: fix warning about legacy namespace configuration
  hw/block/nvme: update dmsrl limit on namespace detachment
  hw/block/nvme: fix handling of private namespaces
  hw/block/nvme: add missing copyright headers
  hw/block/nvme: fix ns attachment out-of-bounds read
  hw/block/nvme: fix assert crash in nvme_subsys_ns
  hw/block/nvme: fix out-of-bounds read in nvme_subsys_ctrl

 hw/block/nvme-dif.h|  10 +++
 hw/block/nvme-ns.h |  12 ++--
 hw/block/nvme-subsys.h |  11 ++--
 hw/block/nvme.h|  41 +---
 include/block/nvme.h   |   1 +
 hw/block/nvme-dif.c|  10 +++
 hw/block/nvme-ns.c |  78 ++
 hw/block/nvme-subsys.c |  28 
 hw/block/nvme.c| 143 +
 hw/block/trace-events  |   1 -
 10 files changed, 158 insertions(+), 177 deletions(-)

-- 
2.31.1

[PULL for-6.0 v2 02/10] hw/block/nvme: fix missing string representation for ns attachment

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Add the missing nvme_adm_opc_str entry for the Namespace Attachment
command.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 5b0031b11db2..9edc86d79e98 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -86,6 +86,7 @@ static inline const char *nvme_adm_opc_str(uint8_t opc)
 case NVME_ADM_CMD_SET_FEATURES: return "NVME_ADM_CMD_SET_FEATURES";
 case NVME_ADM_CMD_GET_FEATURES: return "NVME_ADM_CMD_GET_FEATURES";
 case NVME_ADM_CMD_ASYNC_EV_REQ: return "NVME_ADM_CMD_ASYNC_EV_REQ";
+case NVME_ADM_CMD_NS_ATTACHMENT:return "NVME_ADM_CMD_NS_ATTACHMENT";
 case NVME_ADM_CMD_FORMAT_NVM:   return "NVME_ADM_CMD_FORMAT_NVM";
 default:return "NVME_ADM_CMD_UNKNOWN";
 }
-- 
2.31.1

[PULL for-6.0 v2 01/10] hw/block/nvme: fix pi constraint check

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Protection Information can only be enabled if there is at least 8 bytes
of metadata.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7f8d139a8663..ca04ee1bacfb 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -394,7 +394,7 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, 
Error **errp)
 return -1;
 }
 
-if (ns->params.pi && !ns->params.ms) {
+if (ns->params.pi && ns->params.ms < 8) {
 error_setg(errp, "at least 8 bytes of metadata required to enable "
"protection information");
 return -1;
-- 
2.31.1

[PULL for-6.0 v2 04/10] hw/block/nvme: fix warning about legacy namespace configuration

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Remove the unused BlockConf from the controller structure and fix the
constraint checking to actually check the right BlockConf and issue the
warning.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h | 1 -
 hw/block/nvme.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 9edc86d79e98..8d1806cc942f 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -166,7 +166,6 @@ typedef struct NvmeCtrl {
 NvmeBar  bar;
 NvmeParams   params;
 NvmeBus  bus;
-BlockConfconf;
 
 uint16_tcntlid;
 boolqs_created;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7244534a89e9..09c38fb35e0d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -5805,7 +5805,7 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 params->max_ioqpairs = params->num_queues - 1;
 }
 
-if (n->conf.blk) {
+if (n->namespace.blkconf.blk) {
 warn_report("drive property is deprecated; "
 "please use an nvme-ns device instead");
 }
-- 
2.31.1

[PULL for-6.0 v2 05/10] hw/block/nvme: update dmsrl limit on namespace detachment

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

The Non-MDTS DMSRL limit must be recomputed when namespaces are
detached.

Fixes: 645ce1a70cb6 ("hw/block/nvme: support namespace attachment command")
Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 09c38fb35e0d..0898ece2af31 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4868,6 +4868,21 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
+static void nvme_update_dmrsl(NvmeCtrl *n)
+{
+int nsid;
+
+for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
+NvmeNamespace *ns = nvme_ns(n, nsid);
+if (!ns) {
+continue;
+}
+
+n->dmrsl = MIN_NON_ZERO(n->dmrsl,
+BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
+}
+}
+
 static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
 static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
 {
@@ -4917,6 +4932,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 }
 
 nvme_ns_detach(ctrl, ns);
+
+nvme_update_dmrsl(ctrl);
 }
 
 /*
-- 
2.31.1

[PULL for-6.0 v2 07/10] hw/block/nvme: add missing copyright headers

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Add missing license/copyright headers to the nvme-dif.{c,h} files.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-dif.h | 10 ++
 hw/block/nvme-dif.c | 10 ++
 2 files changed, 20 insertions(+)

diff --git a/hw/block/nvme-dif.h b/hw/block/nvme-dif.h
index 5a8e37c8525b..524faffbd7a0 100644
--- a/hw/block/nvme-dif.h
+++ b/hw/block/nvme-dif.h
@@ -1,3 +1,13 @@
+/*
+ * QEMU NVM Express End-to-End Data Protection support
+ *
+ * Copyright (c) 2021 Samsung Electronics Co., Ltd.
+ *
+ * Authors:
+ *   Klaus Jensen   
+ *   Gollu Appalanaidu  
+ */
+
 #ifndef HW_NVME_DIF_H
 #define HW_NVME_DIF_H
 
diff --git a/hw/block/nvme-dif.c b/hw/block/nvme-dif.c
index e6f04faafb5f..81b0a4cb1382 100644
--- a/hw/block/nvme-dif.c
+++ b/hw/block/nvme-dif.c
@@ -1,3 +1,13 @@
+/*
+ * QEMU NVM Express End-to-End Data Protection support
+ *
+ * Copyright (c) 2021 Samsung Electronics Co., Ltd.
+ *
+ * Authors:
+ *   Klaus Jensen   
+ *   Gollu Appalanaidu  
+ */
+
 #include "qemu/osdep.h"
 #include "hw/block/block.h"
 #include "sysemu/dma.h"
-- 
2.31.1

[PULL for-6.0 v2 03/10] hw/block/nvme: fix the nsid 'invalid' value

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

The `nvme_nsid()` function returns '-1' (h) when the given
namespace is NULL. Since h is actually a valid namespace
identifier (the "broadcast" value), change this to be '0' since that
actually *is* the invalid value.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 9ab7894fc83e..82340c4b2574 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -96,7 +96,7 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return ns->params.nsid;
 }
 
-return -1;
+return 0;
 }
 
 static inline bool nvme_ns_shared(NvmeNamespace *ns)
-- 
2.31.1

[PULL for-6.0 v2 06/10] hw/block/nvme: fix handling of private namespaces

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

Prior to this patch, if a private nvme-ns device (that is, a namespace
that is not linked to a subsystem) is wired up to an nvme-subsys linked
nvme controller device, the device fails to verify that the namespace id
is unique within the subsystem. NVM Express v1.4b, Section 6.1.6 ("NSID
and Namespace Usage") states that because the device supports Namespace
Management, "NSIDs *shall* be unique within the NVM subsystem".

Additionally, prior to this patch, private namespaces are not known to
the subsystem and the namespace is considered exclusive to the
controller with which it is initially wired up to. However, this is not
the definition of a private namespace; per Section 1.6.33 ("private
namespace"), a private namespace is just a namespace that does not
support multipath I/O or namespace sharing, which means "that it is only
able to be attached to one controller at a time".

Fix this by always allocating namespaces in the subsystem (if one is
linked to the controller), regardless of the shared/private status of
the namespace. Whether or not the namespace is shareable is controlled
by a new `shared` nvme-ns parameter.

Finally, this fix allows the nvme-ns `subsys` parameter to be removed,
since the `shared` parameter now serves the purpose of attaching the
namespace to all controllers in the subsystem upon device realization.
It is invalid to have an nvme-ns namespace device with a linked
subsystem without the parent nvme controller device also being linked to
one and since the nvme-ns devices will unconditionally be "attached" (in
QEMU terms that is) to an nvme controller device through an NvmeBus, the
nvme-ns namespace device can always get a reference to the subsystem of
the controller it is explicitly (using 'bus=' parameter) or implicitly
attaching to.

Fixes: e570768566b3 ("hw/block/nvme: support for shared namespace in subsystem")
Cc: Minwoo Im 
Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Keith Busch 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme-ns.h |  10 +---
 hw/block/nvme-subsys.h |   7 +--
 hw/block/nvme.h|  39 +
 include/block/nvme.h   |   1 +
 hw/block/nvme-ns.c |  76 +
 hw/block/nvme-subsys.c |  28 --
 hw/block/nvme.c| 123 ++---
 hw/block/trace-events  |   1 -
 8 files changed, 115 insertions(+), 170 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 82340c4b2574..fb0a41f912e7 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,7 @@ typedef struct NvmeZone {
 
 typedef struct NvmeNamespaceParams {
 bool detached;
+bool shared;
 uint32_t nsid;
 QemuUUID uuid;
 
@@ -60,8 +61,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 uint16_t status;
+int  attached;
 
-NvmeSubsystem   *subsys;
 QTAILQ_ENTRY(NvmeNamespace) entry;
 
 NvmeIdNsZoned   *id_ns_zoned;
@@ -99,11 +100,6 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return 0;
 }
 
-static inline bool nvme_ns_shared(NvmeNamespace *ns)
-{
-return !!ns->subsys;
-}
-
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = &ns->id_ns;
@@ -225,7 +221,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns)
 }
 
 void nvme_ns_init_format(NvmeNamespace *ns);
-int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
+int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_shutdown(NvmeNamespace *ns);
 void nvme_ns_cleanup(NvmeNamespace *ns);
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index aafa04b84829..24132edd005c 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,7 +14,7 @@
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 #define NVME_SUBSYS_MAX_CTRLS   32
-#define NVME_SUBSYS_MAX_NAMESPACES  256
+#define NVME_MAX_NAMESPACES 256
 
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
@@ -24,7 +24,7 @@ typedef struct NvmeSubsystem {
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 /* Allocated namespaces for this subsystem */
-NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES + 1];
+NvmeNamespace *namespaces[NVME_MAX_NAMESPACES + 1];
 
 struct {
 char *nqn;
@@ -32,7 +32,6 @@ typedef struct NvmeSubsystem {
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
-int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
 static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
 uint32_t cntlid)
@@ -54,7 +53,7 @@ static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem 
*subsys,
 return NULL;
 }
 
-assert(nsid && nsid <= NVME_SUBSYS_MAX_NAMESPACES);
+assert(nsid && nsid <= NVME_MAX_NAMESPACES);
 
 return subsys->namespaces[nsid];
 }
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 8d1806c

[PULL for-6.0 v2 09/10] hw/block/nvme: fix assert crash in nvme_subsys_ns

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

nvme_subsys_ns() is used in contexts where the namespace identifier is
taken from an untrusted source. Commit 3921756dee6d ("hw/block/nvme:
assert namespaces array indices") tried to guard against this by
introducing an assert on the namespace identifier.

This is wrong since it is perfectly valid to call the function with an
invalid namespace identifier and like nvme_ns(), nvme_subsys_ns() should
simply return NULL.

Fixes: 3921756dee6d ("hw/block/nvme: assert namespaces array indices")
Fixes: 94d8d6d16781 ("hw/block/nvme: support allocated namespace type")
Cc: Minwoo Im 
Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 24132edd005c..1cbcad9be23e 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -49,12 +49,10 @@ static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem 
*subsys,
 static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem *subsys,
 uint32_t nsid)
 {
-if (!subsys) {
+if (!subsys || !nsid || nsid > NVME_MAX_NAMESPACES) {
 return NULL;
 }
 
-assert(nsid && nsid <= NVME_MAX_NAMESPACES);
-
 return subsys->namespaces[nsid];
 }
 
-- 
2.31.1

[PULL for-6.0 v2 08/10] hw/block/nvme: fix ns attachment out-of-bounds read

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

nvme_ns_attachment() does not verify the contents of the host-supplied
16 bit "Number of Identifiers" field in the command payload.

Make sure the value is capped at 2047 and fix the out-of-bounds read.

Fixes: 645ce1a70cb6 ("hw/block/nvme: support namespace attachment command")
Cc: Minwoo Im 
Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d2dd82496790..87891d4d0f3b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4920,6 +4920,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
 }
 
+*nr_ids = MIN(*nr_ids, NVME_CONTROLLER_LIST_SIZE - 1);
 for (i = 0; i < *nr_ids; i++) {
 ctrl = nvme_subsys_ctrl(n->subsys, ids[i]);
 if (!ctrl) {
-- 
2.31.1

[PULL for-6.0 v2 10/10] hw/block/nvme: fix out-of-bounds read in nvme_subsys_ctrl

2021-04-06 Thread Klaus Jensen

From: Klaus Jensen 

nvme_subsys_ctrl() is used in contexts where the given controller
identifier is from an untrusted source. Like its friends nvme_ns() and
nvme_subsys_ns(), nvme_subsys_ctrl() should just return NULL if an
invalid identifier is given.

Fixes: 645ce1a70cb6 ("hw/block/nvme: support namespace attachment command")
Cc: Minwoo Im 
Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme-subsys.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 1cbcad9be23e..7d7ef5f7f12b 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -36,7 +36,7 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
 static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
 uint32_t cntlid)
 {
-if (!subsys) {
+if (!subsys || cntlid >= NVME_SUBSYS_MAX_CTRLS) {
 return NULL;
 }
 
-- 
2.31.1

38 matches

Mail list logo