Re: [PATCH v2 2/3] dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel() race

2023-02-16 Thread Stefan Hajnoczi
On Thu, Feb 16, 2023 at 04:27:42PM +0100, Kevin Wolf wrote:
> Am 10.02.2023 um 15:32 hat Stefan Hajnoczi geschrieben:
> > dma_blk_cb() only takes the AioContext lock around ->io_func(). That
> > means the rest of dma_blk_cb() is not protected. In particular, the
> > DMAAIOCB field accesses happen outside the lock.
> > 
> > There is a race when the main loop thread holds the AioContext lock and
> > invokes scsi_device_purge_requests() -> bdrv_aio_cancel() ->
> > dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb
> > field determines how cancellation proceeds. If dma_aio_cancel() see
> > dbs->acb == NULL while dma_blk_cb() is still running, the request can be
> > completed twice (-ECANCELED and the actual return value).
> > 
> > The following assertion can occur with virtio-scsi when an IOThread is
> > used:
> > 
> >   ../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb != 
> > NULL' failed.
> > 
> > Fix the race by holding the AioContext across dma_blk_cb(). Now
> > dma_aio_cancel() under the AioContext lock will not see
> > inconsistent/intermediate states.
> > 
> > Cc: Paolo Bonzini 
> > Signed-off-by: Stefan Hajnoczi 
> 
> Two things that seem to need attention in the review:
> 
> > diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
> > index 7820fec54c..2463964805 100644
> > --- a/softmmu/dma-helpers.c
> > +++ b/softmmu/dma-helpers.c
> > @@ -113,17 +113,19 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
> >  static void dma_blk_cb(void *opaque, int ret)
> >  {
> >  DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> > +AioContext *ctx = dbs->ctx;
> >  dma_addr_t cur_addr, cur_len;
> >  void *mem;
> >  
> >  trace_dma_blk_cb(dbs, ret);
> >  
> > +aio_context_acquire(ctx);
> 
> During the first iteration, the caller may already hold the AioContext
> lock. In the case of scsi-disk, it does. Locking a second time is fine
> in principle because it's a recursive lock, but we have to be careful
> not to call AIO_WAIT_WHILE() indirectly then because it could deadlock.
> 
> Except for the dbs->common.cb (see below) I don't see any functions that
> would be problematic in this respect. In fact, the one I would be most
> worried about is dbs->io_func, but it was already locked before.
> 
> >  dbs->acb = NULL;
> >  dbs->offset += dbs->iov.size;
> >  
> >  if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
> >  dma_complete(dbs, ret);
> 
> All request callbacks hold the AioContext lock now when they didn't
> before. I wonder if it's worth documenting the locking rules for
> dma_blk_io() in a comment. Could be a separate patch, though.
> 
> You remove the locking in scsi_dma_complete() to compensate. All the
> other callers come from IDE and nvme, which don't take the lock
> internally. Taking the main AioContext lock once is fine, so this looks
> good.
> 
> If it is possible that we already complete on the first iteration, this
> could however also be affected by the case above so that the AioContext
> is locked twice. In this case, the invoked callback must not call
> AIO_WAIT_WHILE() and we would need to audit IDE and nvme.
> 
> Is it possible? In other words, can dbs->sg->nsg be 0? If not, can we
> assert and document it?

In the nsg == 0 case there's another problem: the completion callback
function is invoked and the AIOCB is unref'd before dma_blk_io() returns
the stale AIOCB pointer.

That would lead to problems later because the pattern is typically:

  r->aiocb = dma_blk_io(...);
  ...
  use r and r->aiocb later

So I don't think nsg = 0 makes sense.

Functions I looked at invoke dma_blk_io() only when there are still
bytes to transfer. I think we're safe but I'll admit I'm not 100% sure.

> > -return;
> > +goto out;
> >  }
> >  dma_blk_unmap(dbs);
> >  
> > @@ -164,9 +166,9 @@ static void dma_blk_cb(void *opaque, int ret)
> >  
> >  if (dbs->iov.size == 0) {
> >  trace_dma_map_wait(dbs);
> > -dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
> > +dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);
> 
> Does copying dbs->ctx to a local variable imply that it may change
> during the function? I didn't think so, but if it may, then why is
> calling aio_bh_new() with the old value right?

I changed this line for consistency, not to change behavior or fix a bug.

Regarding AioContext changes, they can't happen because no function that
changes the AioContext is called between this line and the earlier
aio_context_acquire().

(Having to worry about AioContext changes is a pain though and I look
forward to when we can get rid of this lock.)

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 5/5] hw/nvme: flexible data placement emulation

2023-02-16 Thread Keith Busch
On Thu, Feb 16, 2023 at 05:48:06PM +0100, Jesper Devantier wrote:
> +static bool nvme_ns_init_fdp(NvmeNamespace *ns, Error **errp)
> +{
> +NvmeEnduranceGroup *endgrp = ns->endgrp;
> +NvmeRuHandle *ruh;
> +uint8_t lbafi = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
> +unsigned int *ruhid, *ruhids;
> +char *r, *p, *token;
> +uint16_t *ph;
> +
> +if (!ns->params.fdp.ruhs) {
> +ns->fdp.nphs = 1;
> +ph = ns->fdp.phs = g_new(uint16_t, 1);
> +
> +ruh = nvme_find_ruh_by_attr(endgrp, NVME_RUHA_CTRL, ph);
> +if (!ruh) {
> +ruh = nvme_find_ruh_by_attr(endgrp, NVME_RUHA_UNUSED, ph);
> +if (!ruh) {
> +error_setg(errp, "no unused reclaim unit handles left");
> +return false;
> +}
> +
> +ruh->ruha = NVME_RUHA_CTRL;
> +ruh->lbafi = lbafi;
> +ruh->ruamw = endgrp->fdp.runs >> ns->lbaf.ds;
> +
> +for (uint16_t rg = 0; rg < endgrp->fdp.nrg; rg++) {
> +ruh->rus[rg].ruamw = ruh->ruamw;
> +}
> +} else if (ruh->lbafi != lbafi) {
> +error_setg(errp, "lba format index of controller assigned "
> +   "reclaim unit handle does not match namespace lba "
> +   "format index");
> +return false;
> +}
> +
> +return true;
> +}
> +
> +ruhid = ruhids = g_new0(unsigned int, endgrp->fdp.nruh);
> +r = p = strdup(ns->params.fdp.ruhs);
> +
> +/* parse the reclaim unit handle identifiers */
> +while ((token = qemu_strsep(&p, ";")) != NULL) {
> +if (++ns->fdp.nphs == endgrp->fdp.nruh) {

Since a namespace can't have more than 128 placement handles, and the endurance
group can have more, I think the 128 limit needs to be checked here.

> +error_setg(errp, "too many placement handles");
> +free(r);
> +return false;
> +}
> +
> +if (qemu_strtoui(token, NULL, 0, ruhid++) < 0) {
> +error_setg(errp, "cannot parse reclaim unit handle identifier");
> +free(r);
> +return false;
> +}
> +}



Re: [RFC PATCH 5/5] hw/ppc/pnv_bmc: Simplify pnv_bmc_find()

2023-02-16 Thread Philippe Mathieu-Daudé

On 16/2/23 19:12, Cédric Le Goater wrote:

On 2/16/23 13:25, Philippe Mathieu-Daudé wrote:

ForeachArgs::name is only used once as TYPE_IPMI_BMC.
Since the penultimate commit, object_child_foreach_recursive()'s
handler takes an Error* argument and return a boolean.
We can directly pass ForeachArgs::obj as context, removing the
ForeachArgs structure.

Signed-off-by: Philippe Mathieu-Daudé 

---
RFC: please double-check...


  hw/ppc/pnv_bmc.c | 25 +
  1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/pnv_bmc.c b/hw/ppc/pnv_bmc.c
index 05acc88a55..566284469f 100644
--- a/hw/ppc/pnv_bmc.c
+++ b/hw/ppc/pnv_bmc.c
@@ -278,36 +278,29 @@ IPMIBmc *pnv_bmc_create(PnvPnor *pnor)
  return IPMI_BMC(obj);
  }
-typedef struct ForeachArgs {
-    const char *name;
-    Object *obj;
-} ForeachArgs;
-
  static bool bmc_find(Object *child, void *opaque, Error **errp)
  {
-    ForeachArgs *args = opaque;
+    Object **obj = opaque;
-    if (object_dynamic_cast(child, args->name)) {
-    if (args->obj) {
-    return false;


The purpose of this test was to catch multiple bmc devices and it's removed
now.


Great.


+    if (object_dynamic_cast(child, TYPE_IPMI_BMC)) {
+    if (*obj) {
+    return true;
  }
-    args->obj = child;
+    *obj = child;
  }
  return true;
  }
  IPMIBmc *pnv_bmc_find(Error **errp)
  {
-    ForeachArgs args = { TYPE_IPMI_BMC, NULL };
-    int ret;
+    Object *obj = NULL;
-    ret = object_child_foreach_recursive(object_get_root(), bmc_find,
- &args, NULL);
-    if (ret) {
+    if (!object_child_foreach_recursive(object_get_root(), bmc_find, 
&obj,

+    NULL)) {
  error_setg(errp, "machine should have only one BMC device. "
 "Use '-nodefaults'");>   return NULL;
  }


We don't test obj against NULL any more. This could break the QOM cast 
below.


IIUC QOM cast-macros are NULL-safe, see
https://lore.kernel.org/qemu-devel/20210107121304.1db97...@bahia.lan/

If you concur I'll try to update the QOM API doc where relevant.


-    return args.obj ? IPMI_BMC(args.obj) : NULL;
+    return IPMI_BMC(obj);
  }







Re: [PATCH 5/5] hw/nvme: flexible data placement emulation

2023-02-16 Thread Jesper Devantier
Correct. We deliberately allow enabling fdp as a startup parameter to side-step 
the need for NS mgmt.

You are also correct on the second point - the NS placement handle list can at 
most hold 128 elements.

> On 16 Feb 2023, at 19.26, Keith Busch  wrote:
> 
> This mostly looks fine. I need to clarify some spec decisions, though.
> 
> By default, FDP feature is disabled: "The default value of this Feature shall
> be 0h." You can't change the value as long as namespaces exist within the
> group, so FDP requires NS Mgmt be supported if you're going to enable it. The
> spec, however, doesn't seem to explicitly say that, and NS Mgmt isn't 
> currently
> supported in this controller.
> 
> We can look past that for this implementation to allow static settings even if
> that doesn't perfectly align with this feature (NS Mgmt is kind of difficult
> here). In order to match what the spec says is possible though, we can't have 
> a
> namespace with more than 128 placement handles since that's the most you can
> provide when you create the namespace.
> 
> Does that make sense, or am I totally misunderstanding the details?




Re: [PATCH 5/5] hw/nvme: flexible data placement emulation

2023-02-16 Thread Keith Busch
This mostly looks fine. I need to clarify some spec decisions, though.

By default, FDP feature is disabled: "The default value of this Feature shall
be 0h." You can't change the value as long as namespaces exist within the
group, so FDP requires NS Mgmt be supported if you're going to enable it. The
spec, however, doesn't seem to explicitly say that, and NS Mgmt isn't currently
supported in this controller.

We can look past that for this implementation to allow static settings even if
that doesn't perfectly align with this feature (NS Mgmt is kind of difficult
here). In order to match what the spec says is possible though, we can't have a
namespace with more than 128 placement handles since that's the most you can
provide when you create the namespace.

Does that make sense, or am I totally misunderstanding the details?



Re: [RFC PATCH 5/5] hw/ppc/pnv_bmc: Simplify pnv_bmc_find()

2023-02-16 Thread Cédric Le Goater

On 2/16/23 13:25, Philippe Mathieu-Daudé wrote:

ForeachArgs::name is only used once as TYPE_IPMI_BMC.
Since the penultimate commit, object_child_foreach_recursive()'s
handler takes an Error* argument and return a boolean.
We can directly pass ForeachArgs::obj as context, removing the
ForeachArgs structure.

Signed-off-by: Philippe Mathieu-Daudé 

---
RFC: please double-check...


  hw/ppc/pnv_bmc.c | 25 +
  1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/pnv_bmc.c b/hw/ppc/pnv_bmc.c
index 05acc88a55..566284469f 100644
--- a/hw/ppc/pnv_bmc.c
+++ b/hw/ppc/pnv_bmc.c
@@ -278,36 +278,29 @@ IPMIBmc *pnv_bmc_create(PnvPnor *pnor)
  return IPMI_BMC(obj);
  }
  
-typedef struct ForeachArgs {

-const char *name;
-Object *obj;
-} ForeachArgs;
-
  static bool bmc_find(Object *child, void *opaque, Error **errp)
  {
-ForeachArgs *args = opaque;
+Object **obj = opaque;
  
-if (object_dynamic_cast(child, args->name)) {

-if (args->obj) {
-return false;


The purpose of this test was to catch multiple bmc devices and it's removed
now.
 

+if (object_dynamic_cast(child, TYPE_IPMI_BMC)) {
+if (*obj) {
+return true;
  }
-args->obj = child;
+*obj = child;
  }
  return true;
  }
  
  IPMIBmc *pnv_bmc_find(Error **errp)

  {
-ForeachArgs args = { TYPE_IPMI_BMC, NULL };
-int ret;
+Object *obj = NULL;
  
-ret = object_child_foreach_recursive(object_get_root(), bmc_find,

- &args, NULL);
-if (ret) {
+if (!object_child_foreach_recursive(object_get_root(), bmc_find, &obj,
+NULL)) {
  error_setg(errp, "machine should have only one BMC device. "
 "Use '-nodefaults'");>   return NULL;
  }


We don't test obj against NULL any more. This could break the QOM cast below.

Thanks,


C.


  
-return args.obj ? IPMI_BMC(args.obj) : NULL;

+return IPMI_BMC(obj);
  }





Re: [PATCH 4/5] hw/nvme: basic directives support

2023-02-16 Thread Keith Busch
On Thu, Feb 16, 2023 at 06:35:27PM +0100, Klaus Jensen wrote:
> On Thu, Feb 16, 2023, at 18:23, Keith Busch wrote:
> > On Thu, Feb 16, 2023 at 05:48:05PM +0100, Jesper Devantier wrote:
> >> +enum NvmeDirective {
> >> +NVME_DIRECTIVE_SUPPORTED = 0x0,
> >> +NVME_DIRECTIVE_ENABLED   = 0x1,
> >> +};
> >
> > What's this?
> 
> That’s a left-over from my rebase. I’ll fix that one up.

Okay, other than that, this one looks good.



Re: [PATCH 4/5] hw/nvme: basic directives support

2023-02-16 Thread Klaus Jensen



On Thu, Feb 16, 2023, at 18:23, Keith Busch wrote:
> On Thu, Feb 16, 2023 at 05:48:05PM +0100, Jesper Devantier wrote:
>> +enum NvmeDirective {
>> +NVME_DIRECTIVE_SUPPORTED = 0x0,
>> +NVME_DIRECTIVE_ENABLED   = 0x1,
>> +};
>
> What's this?

That’s a left-over from my rebase. I’ll fix that one up.



[PATCH 2/5] hw/nvme: store a pointer to the NvmeSubsystem in the NvmeNamespace

2023-02-16 Thread Jesper Devantier
From: Niklas Cassel 

Each NvmeNamespace can be used by serveral controllers,
but a NvmeNamespace can at most belong to a single NvmeSubsystem.
Store a pointer to the NvmeSubsystem, if the namespace was realized
with a NvmeSubsystem.

Signed-off-by: Niklas Cassel 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ns.c   | 1 +
 hw/nvme/nvme.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 62a1f97be0..8ebab4fbce 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -561,6 +561,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
 return;
 }
+ns->subsys = subsys;
 }
 
 if (nvme_ns_setup(ns, errp)) {
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 16da27a69b..5d221073ac 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -167,6 +167,7 @@ typedef struct NvmeNamespace {
 int32_t nr_active_zones;
 
 NvmeNamespaceParams params;
+NvmeSubsystem *subsys;
 
 struct {
 uint32_t err_rec;
-- 
2.39.1




[PATCH 5/5] hw/nvme: flexible data placement emulation

2023-02-16 Thread Jesper Devantier
From: Jesper Devantier 

Add emulation of TP4146 ("Flexible Data Placement").

Signed-off-by: Jesper Devantier 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 703 ++-
 hw/nvme/ns.c | 141 +
 hw/nvme/nvme.h   |  85 +-
 hw/nvme/subsys.c |  94 +-
 hw/nvme/trace-events |   5 +
 include/block/nvme.h | 168 ++-
 6 files changed, 1186 insertions(+), 10 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 17e6b430e2..b894bda326 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -238,6 +238,8 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
 [NVME_TIMESTAMP]= true,
 [NVME_HOST_BEHAVIOR_SUPPORT]= true,
 [NVME_COMMAND_SET_PROFILE]  = true,
+[NVME_FDP_MODE] = true,
+[NVME_FDP_EVENTS]   = true,
 };
 
 static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
@@ -249,6 +251,8 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
 [NVME_TIMESTAMP]= NVME_FEAT_CAP_CHANGE,
 [NVME_HOST_BEHAVIOR_SUPPORT]= NVME_FEAT_CAP_CHANGE,
 [NVME_COMMAND_SET_PROFILE]  = NVME_FEAT_CAP_CHANGE,
+[NVME_FDP_MODE] = NVME_FEAT_CAP_CHANGE,
+[NVME_FDP_EVENTS]   = NVME_FEAT_CAP_CHANGE | NVME_FEAT_CAP_NS,
 };
 
 static const uint32_t nvme_cse_acs[256] = {
@@ -281,6 +285,8 @@ static const uint32_t nvme_cse_iocs_nvm[256] = {
 [NVME_CMD_VERIFY]   = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_COPY] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_COMPARE]  = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_IO_MGMT_RECV] = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_IO_MGMT_SEND] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 };
 
 static const uint32_t nvme_cse_iocs_zoned[256] = {
@@ -299,12 +305,66 @@ static const uint32_t nvme_cse_iocs_zoned[256] = {
 
 static void nvme_process_sq(void *opaque);
 static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst);
+static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n);
 
 static uint16_t nvme_sqid(NvmeRequest *req)
 {
 return le16_to_cpu(req->sq->sqid);
 }
 
+static inline uint16_t nvme_make_pid(NvmeNamespace *ns, uint16_t rg,
+ uint16_t ph)
+{
+uint16_t rgif = ns->endgrp->fdp.rgif;
+
+if (!rgif) {
+return ph;
+}
+
+return (rg << (16 - rgif)) | ph;
+}
+
+static inline bool nvme_ph_valid(NvmeNamespace *ns, uint16_t ph)
+{
+return ph < ns->fdp.nphs;
+}
+
+static inline bool nvme_rg_valid(NvmeEnduranceGroup *endgrp, uint16_t rg)
+{
+return rg < endgrp->fdp.nrg;
+}
+
+static inline uint16_t nvme_pid2ph(NvmeNamespace *ns, uint16_t pid)
+{
+uint16_t rgif = ns->endgrp->fdp.rgif;
+
+if (!rgif) {
+return pid;
+}
+
+return pid & ((1 << (15 - rgif)) - 1);
+}
+
+static inline uint16_t nvme_pid2rg(NvmeNamespace *ns, uint16_t pid)
+{
+uint16_t rgif = ns->endgrp->fdp.rgif;
+
+if (!rgif) {
+return 0;
+}
+
+return pid >> (16 - rgif);
+}
+
+static inline bool nvme_parse_pid(NvmeNamespace *ns, uint16_t pid,
+  uint16_t *ph, uint16_t *rg)
+{
+*rg = nvme_pid2rg(ns, pid);
+*ph = nvme_pid2ph(ns, pid);
+
+return nvme_ph_valid(ns, *ph) && nvme_rg_valid(ns->endgrp, *rg);
+}
+
 static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone,
NvmeZoneState state)
 {
@@ -378,6 +438,69 @@ static uint16_t nvme_aor_check(NvmeNamespace *ns, uint32_t 
act, uint32_t opn)
 return nvme_zns_check_resources(ns, act, opn, 0);
 }
 
+static NvmeFdpEvent *nvme_fdp_alloc_event(NvmeCtrl *n, NvmeFdpEventBuffer 
*ebuf)
+{
+NvmeFdpEvent *ret = NULL;
+bool is_full = ebuf->next == ebuf->start && ebuf->nelems;
+
+ret = &ebuf->events[ebuf->next++];
+if (unlikely(ebuf->next == NVME_FDP_MAX_EVENTS)) {
+ebuf->next = 0;
+}
+if (is_full) {
+ebuf->start = ebuf->next;
+} else {
+ebuf->nelems++;
+}
+
+memset(ret, 0, sizeof(NvmeFdpEvent));
+ret->timestamp = nvme_get_timestamp(n);
+
+return ret;
+}
+
+static inline int log_event(NvmeRuHandle *ruh, uint8_t event_type)
+{
+return (ruh->event_filter >> nvme_fdp_evf_shifts[event_type]) & 0x1;
+}
+
+static bool nvme_update_ruh(NvmeCtrl *n, NvmeNamespace *ns, uint16_t pid)
+{
+NvmeEnduranceGroup *endgrp = ns->endgrp;
+NvmeRuHandle *ruh;
+NvmeReclaimUnit *ru;
+NvmeFdpEvent *e = NULL;
+uint16_t ph, rg, ruhid;
+
+if (!nvme_parse_pid(ns, pid, &ph, &rg)) {
+return false;
+}
+
+ruhid = ns->fdp.phs[ph];
+
+ruh = &endgrp->fdp.ruhs[ruhid];
+ru = &ruh->rus[rg];
+
+if (ru->ruamw) {
+if (log_event(ruh, FDP_EVT_RU_NOT_FULLY_WRITTEN)) {
+e = nvme_fdp_alloc_event(n, &endgrp->fdp.host_events);
+e->type = FDP_EVT_RU_NOT_FULLY_WRITTEN;
+e->flags = FDPEF_PIV | FDPEF_NSIDV | FDPEF_

[PATCH 3/5] hw/nvme: add basic endurance group support

2023-02-16 Thread Jesper Devantier
From: Klaus Jensen 

Add the mandatory Endurance Group identify data structures and log
pages.

For now, all namespaces in a subsystem belongs to a single Endurance
Group.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 48 +++-
 hw/nvme/ns.c |  3 +++
 hw/nvme/nvme.h   |  4 
 include/block/nvme.h | 42 +++---
 4 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 99b92ff20b..729ed9adc5 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4454,6 +4454,44 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 return nvme_c2h(n, (uint8_t *) &smart + off, trans_len, req);
 }
 
+static uint16_t nvme_endgrp_info(NvmeCtrl *n,  uint8_t rae, uint32_t buf_len,
+ uint64_t off, NvmeRequest *req)
+{
+uint32_t dw11 = le32_to_cpu(req->cmd.cdw11);
+uint16_t endgrpid = (dw11 >> 16) & 0x;
+struct nvme_stats stats = {};
+NvmeEndGrpLog info = {};
+int i;
+
+if (!n->subsys || endgrpid != 0x1) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+if (off >= sizeof(info)) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
+if (!ns) {
+continue;
+}
+
+nvme_set_blk_stats(ns, &stats);
+}
+
+info.data_units_read[0] = cpu_to_le64(stats.units_read / 10);
+info.data_units_written[0] = cpu_to_le64(stats.units_written / 10);
+info.media_units_written[0] = cpu_to_le64(stats.units_written / 
10);
+info.host_read_commands[0] = cpu_to_le64(stats.read_commands);
+info.host_write_commands[0] = cpu_to_le64(stats.write_commands);
+
+buf_len = MIN(sizeof(info) - off, buf_len);
+
+return nvme_c2h(n, (uint8_t *)&info + off, buf_len, req);
+}
+
+
 static uint16_t nvme_fw_log_info(NvmeCtrl *n, uint32_t buf_len, uint64_t off,
  NvmeRequest *req)
 {
@@ -4626,6 +4664,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_changed_nslist(n, rae, len, off, req);
 case NVME_LOG_CMD_EFFECTS:
 return nvme_cmd_effects(n, csi, len, off, req);
+case NVME_LOG_ENDGRP:
+return nvme_endgrp_info(n, rae, len, off, req);
 default:
 trace_pci_nvme_err_invalid_log_page(nvme_cid(req), lid);
 return NVME_INVALID_FIELD | NVME_DNR;
@@ -7382,6 +7422,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 uint8_t *pci_conf = pci_dev->config;
 uint64_t cap = ldq_le_p(&n->bar.cap);
 NvmeSecCtrlEntry *sctrl = nvme_sctrl(n);
+uint32_t ctratt;
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -7392,7 +7433,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->cntlid = cpu_to_le16(n->cntlid);
 
 id->oaes = cpu_to_le32(NVME_OAES_NS_ATTR);
-id->ctratt |= cpu_to_le32(NVME_CTRATT_ELBAS);
+ctratt = NVME_CTRATT_ELBAS;
 
 id->rab = 6;
 
@@ -7459,8 +7500,13 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 
 if (n->subsys) {
 id->cmic |= NVME_CMIC_MULTI_CTRL;
+ctratt |= NVME_CTRATT_ENDGRPS;
+
+id->endgidmax = cpu_to_le16(0x1);
 }
 
+id->ctratt = cpu_to_le32(ctratt);
+
 NVME_CAP_SET_MQES(cap, 0x7ff);
 NVME_CAP_SET_CQR(cap, 1);
 NVME_CAP_SET_TO(cap, 0xf);
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 8ebab4fbce..23872a174c 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -592,6 +592,8 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 if (subsys) {
 subsys->namespaces[nsid] = ns;
 
+ns->id_ns.endgid = cpu_to_le16(0x1);
+
 if (ns->params.detached) {
 return;
 }
@@ -607,6 +609,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 
 return;
 }
+
 }
 
 nvme_attach_ns(n, ns);
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 5d221073ac..a88e0dea5c 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -45,6 +45,10 @@ typedef struct NvmeBus {
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 #define SUBSYS_SLOT_RSVD (void *)0x
 
+typedef struct NvmeEnduranceGroup {
+uint8_t event_conf;
+} NvmeEnduranceGroup;
+
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
 NvmeBus bus;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 8027b7126b..4abc1bbfa5 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -58,6 +58,24 @@ enum NvmeBarRegs {
 NVME_REG_PMRMSCU = offsetof(NvmeBar, pmrmscu),
 };
 
+typedef struct QEMU_PACKED NvmeEndGrpLog {
+uint8_t  critical_warning;
+uint8_t  rsvd[2];
+uint8_t  avail_spare;
+uint8_t  avail_spare_thres;
+uint8_t  percet_used

[PATCH 0/5] Support Flexible Data Placement (FDP)

2023-02-16 Thread Jesper Devantier
From: Jesper Wendel Devantier 

Flexible Data Placement (FDP) is a newly introduced enhancement
of the NVM command set introduced by the NVM Express, Inc.
organization as TP 4146. FDP aims to extend the NVM command set
to enable host-guided data placement. FDP-enabled namespaces
can be used as before, but writes may now reference a specific
placement id which in turn points to a reclaim unit (RU). RUs
are defined as some amount of physical, non-volatile storage which
can be erased/reused/repurposed without disturbing any other
reclaim units.

For further details on FDP, consult the specification, which is
available as "TP4146 Flexible Data Placement 2022.11.30 Ratified.pdf"
in the following link:
https://nvmexpress.org/wp-content/uploads/NVM-Express-2.0-Ratified-TPs_20230111.zip

The FDP work builds on 4 preparatory patches, chiefly to add support for
endurance groups and directives.
The final patch adds FDP support itself.

Gollu Appalanaidu (1):
  hw/nvme: basic directives support

Jesper Devantier (1):
  hw/nvme: flexible data placement emulation

Joel Granados (1):
  hw/nvme: move adjustment of data_units{read,written}

Klaus Jensen (1):
  hw/nvme: add basic endurance group support

Niklas Cassel (1):
  hw/nvme: store a pointer to the NvmeSubsystem in the NvmeNamespace

 hw/nvme/ctrl.c   | 803 ++-
 hw/nvme/ns.c | 145 
 hw/nvme/nvme.h   |  92 -
 hw/nvme/subsys.c |  94 -
 hw/nvme/trace-events |   5 +
 include/block/nvme.h | 241 -
 6 files changed, 1353 insertions(+), 27 deletions(-)

-- 
2.39.1




[PATCH 4/5] hw/nvme: basic directives support

2023-02-16 Thread Jesper Devantier
From: Gollu Appalanaidu 

Add support for the Directive Send and Recv commands and the Identify
directive.

Signed-off-by: Klaus Jensen 
Signed-off-by: Gollu Appalanaidu 
---
 hw/nvme/ctrl.c   | 40 +++-
 hw/nvme/nvme.h   |  2 ++
 include/block/nvme.h | 35 ++-
 3 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 729ed9adc5..17e6b430e2 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -266,6 +266,8 @@ static const uint32_t nvme_cse_acs[256] = {
 [NVME_ADM_CMD_VIRT_MNGMT]   = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_DBBUF_CONFIG] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_FORMAT_NVM]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_ADM_CMD_DIRECTIVE_RECV]   = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_DIRECTIVE_SEND]   = NVME_CMD_EFF_CSUPP,
 };
 
 static const uint32_t nvme_cse_iocs_none[256];
@@ -6146,6 +6148,37 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
 return NVME_SUCCESS;
 }
 
+static uint16_t nvme_directive_send(NvmeCtrl *n, NvmeRequest *req)
+{
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+static uint16_t nvme_directive_receive(NvmeCtrl *n, NvmeRequest *req)
+{
+uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
+uint32_t dw11 = le32_to_cpu(req->cmd.cdw11);
+uint32_t nsid = le32_to_cpu(req->cmd.nsid);
+uint8_t doper, dtype;
+uint32_t numd, trans_len;
+NvmeDirectiveIdentify id = {
+.supported = 1 << NVME_DIRECTIVE_IDENTIFY,
+.enabled = 1 << NVME_DIRECTIVE_IDENTIFY,
+};
+
+numd = dw10 + 1;
+doper = dw11 & 0xff;
+dtype = (dw11 >> 8) & 0xff;
+
+trans_len = MIN(sizeof(NvmeDirectiveIdentify), numd << 2);
+
+if (nsid == NVME_NSID_BROADCAST || dtype != NVME_DIRECTIVE_IDENTIFY ||
+doper != NVME_DIRECTIVE_RETURN_PARAMS) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+return nvme_c2h(n, (uint8_t *)&id, trans_len, req);
+}
+
 static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
 trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode,
@@ -6194,6 +6227,10 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_dbbuf_config(n, req);
 case NVME_ADM_CMD_FORMAT_NVM:
 return nvme_format(n, req);
+case NVME_ADM_CMD_DIRECTIVE_SEND:
+return nvme_directive_send(n, req);
+case NVME_ADM_CMD_DIRECTIVE_RECV:
+return nvme_directive_receive(n, req);
 default:
 assert(false);
 }
@@ -7450,7 +7487,8 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->mdts = n->params.mdts;
 id->ver = cpu_to_le32(NVME_SPEC_VER);
 id->oacs =
-cpu_to_le16(NVME_OACS_NS_MGMT | NVME_OACS_FORMAT | NVME_OACS_DBBUF);
+cpu_to_le16(NVME_OACS_NS_MGMT | NVME_OACS_FORMAT | NVME_OACS_DBBUF |
+NVME_OACS_DIRECTIVES);
 id->cntrltype = 0x1;
 
 /*
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index a88e0dea5c..e489830478 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -345,7 +345,9 @@ static inline const char *nvme_adm_opc_str(uint8_t opc)
 case NVME_ADM_CMD_GET_FEATURES: return "NVME_ADM_CMD_GET_FEATURES";
 case NVME_ADM_CMD_ASYNC_EV_REQ: return "NVME_ADM_CMD_ASYNC_EV_REQ";
 case NVME_ADM_CMD_NS_ATTACHMENT:return "NVME_ADM_CMD_NS_ATTACHMENT";
+case NVME_ADM_CMD_DIRECTIVE_SEND:   return "NVME_ADM_CMD_DIRECTIVE_SEND";
 case NVME_ADM_CMD_VIRT_MNGMT:   return "NVME_ADM_CMD_VIRT_MNGMT";
+case NVME_ADM_CMD_DIRECTIVE_RECV:   return "NVME_ADM_CMD_DIRECTIVE_RECV";
 case NVME_ADM_CMD_DBBUF_CONFIG: return "NVME_ADM_CMD_DBBUF_CONFIG";
 case NVME_ADM_CMD_FORMAT_NVM:   return "NVME_ADM_CMD_FORMAT_NVM";
 default:return "NVME_ADM_CMD_UNKNOWN";
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 4abc1bbfa5..d463d0ef5f 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -613,7 +613,9 @@ enum NvmeAdminCommands {
 NVME_ADM_CMD_ACTIVATE_FW= 0x10,
 NVME_ADM_CMD_DOWNLOAD_FW= 0x11,
 NVME_ADM_CMD_NS_ATTACHMENT  = 0x15,
+NVME_ADM_CMD_DIRECTIVE_SEND = 0x19,
 NVME_ADM_CMD_VIRT_MNGMT = 0x1c,
+NVME_ADM_CMD_DIRECTIVE_RECV = 0x1a,
 NVME_ADM_CMD_DBBUF_CONFIG   = 0x7c,
 NVME_ADM_CMD_FORMAT_NVM = 0x80,
 NVME_ADM_CMD_SECURITY_SEND  = 0x81,
@@ -1161,11 +1163,12 @@ enum NvmeIdCtrlCtratt {
 };
 
 enum NvmeIdCtrlOacs {
-NVME_OACS_SECURITY  = 1 << 0,
-NVME_OACS_FORMAT= 1 << 1,
-NVME_OACS_FW= 1 << 2,
-NVME_OACS_NS_MGMT   = 1 << 3,
-NVME_OACS_DBBUF = 1 << 8,
+NVME_OACS_SECURITY  = 1 << 0,
+NVME_OACS_FORMAT= 1 << 1,
+NVME_OACS_FW= 1 << 2,
+NVME_OACS_NS_MGMT   = 1 << 3,
+NVME_OACS_DIRECTIVES= 1 << 5,
+NVME_OACS_DBBUF = 1 << 8,
 };
 
 enum NvmeIdCtrlOncs {
@@ -1644,6 +1647,27 @@ typedef enum NvmeVirtualResource

[PATCH 1/5] hw/nvme: move adjustment of data_units{read,written}

2023-02-16 Thread Jesper Devantier
From: Joel Granados 

In order to return the units_{read/written} required by the SMART log we
need to shift the number of bytes value by BDRV_SECTORS_BITS and multiply
by 1000. This is a prep patch that moves this adjustment to where the SMART
log is calculated in order to use the stats struct for calculating OCP
extended smart log values.

Reviewed-by: Klaus Jensen 
Signed-off-by: Joel Granados 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f25cc2c235..99b92ff20b 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4386,8 +4386,8 @@ static void nvme_set_blk_stats(NvmeNamespace *ns, struct 
nvme_stats *stats)
 {
 BlockAcctStats *s = blk_get_stats(ns->blkconf.blk);
 
-stats->units_read += s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
-stats->units_written += s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
+stats->units_read += s->nr_bytes[BLOCK_ACCT_READ];
+stats->units_written += s->nr_bytes[BLOCK_ACCT_WRITE];
 stats->read_commands += s->nr_ops[BLOCK_ACCT_READ];
 stats->write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
 }
@@ -4401,6 +4401,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, 
uint32_t buf_len,
 uint32_t trans_len;
 NvmeNamespace *ns;
 time_t current_ms;
+uint64_t u_read, u_written;
 
 if (off >= sizeof(smart)) {
 return NVME_INVALID_FIELD | NVME_DNR;
@@ -4427,10 +4428,11 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 trans_len = MIN(sizeof(smart) - off, buf_len);
 smart.critical_warning = n->smart_critical_warning;
 
-smart.data_units_read[0] = cpu_to_le64(DIV_ROUND_UP(stats.units_read,
-1000));
-smart.data_units_written[0] = cpu_to_le64(DIV_ROUND_UP(stats.units_written,
-   1000));
+u_read = DIV_ROUND_UP(stats.units_read >> BDRV_SECTOR_BITS, 1000);
+u_written = DIV_ROUND_UP(stats.units_written >> BDRV_SECTOR_BITS, 1000);
+
+smart.data_units_read[0] = cpu_to_le64(u_read);
+smart.data_units_written[0] = cpu_to_le64(u_written);
 smart.host_read_commands[0] = cpu_to_le64(stats.read_commands);
 smart.host_write_commands[0] = cpu_to_le64(stats.write_commands);
 
-- 
2.39.1




Re: [PATCH 4/5] hw/nvme: basic directives support

2023-02-16 Thread Keith Busch
On Thu, Feb 16, 2023 at 05:48:05PM +0100, Jesper Devantier wrote:
> +enum NvmeDirective {
> +NVME_DIRECTIVE_SUPPORTED = 0x0,
> +NVME_DIRECTIVE_ENABLED   = 0x1,
> +};

What's this?



Re: [PULL 0/5] Migration 20230215 patches

2023-02-16 Thread Peter Maydell
On Wed, 15 Feb 2023 at 20:06, Juan Quintela  wrote:
>
> The following changes since commit 6a50f64ca01d0a7b97f14f069762bfd88160f31e:
>
>   Merge tag 'pull-request-2023-02-14' of https://gitlab.com/thuth/qemu into 
> staging (2023-02-14 14:46:10 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/juan.quintela/qemu.git 
> tags/migration-20230215-pull-request
>
> for you to fetch changes up to 24beea4efe6e6b65fd6248ede936cd3278b2bf8a:
>
>   migration: Rename res_{postcopy,precopy}_only (2023-02-15 20:04:30 +0100)
>
> 
> Migration Pull request
>
> This pull request contains:
>
> * Add qemu_file_get_to_fd() a.k.a. make vfio happy(Avihai)
> * migration/block is now DPRINTF() free zone (Philippe)
> * remove res_compat and improve docs (me)
>
> Please apply.


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM



Re: [PATCH v2 07/18] hw/ide/piix: Ensure IDE output IRQs are wired at realization

2023-02-16 Thread Bernhard Beschow



Am 16. Februar 2023 15:33:47 UTC schrieb "Philippe Mathieu-Daudé" 
:
>On 16/2/23 15:43, Bernhard Beschow wrote:
>> 
>> 
>> On Wed, Feb 15, 2023 at 5:20 PM Philippe Mathieu-Daudé > > wrote:
>> 
>> Ensure both IDE output IRQ lines are wired.
>> 
>> We can remove the last use of isa_get_irq(NULL).
>> 
>> Signed-off-by: Philippe Mathieu-Daudé > >
>> ---
>>   hw/ide/piix.c | 13 -
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>> 
>> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
>> index 9d876dd4a7..b75a4ddcca 100644
>> --- a/hw/ide/piix.c
>> +++ b/hw/ide/piix.c
>> @@ -133,14 +133,17 @@ static bool pci_piix_init_bus(PCIIDEState *d,
>> unsigned i, Error **errp)
>>       static const struct {
>>           int iobase;
>>           int iobase2;
>> -        int isairq;
>>       } port_info[] = {
>> -        {0x1f0, 0x3f6, 14},
>> -        {0x170, 0x376, 15},
>> +        {0x1f0, 0x3f6},
>> +        {0x170, 0x376},
>>       };
>>       int ret;
>> 
>> -    qemu_irq irq_out = d->irq[i] ? : isa_get_irq(NULL,
>> port_info[i].isairq);
>> +    if (!d->irq[i]) {
>> +        error_setg(errp, "output IDE IRQ %u not connected", i);
>> +        return false;
>> +    }
>> +
>>       ide_bus_init(&d->bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
>>       ret = ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase,
>>                             port_info[i].iobase2);
>> @@ -149,7 +152,7 @@ static bool pci_piix_init_bus(PCIIDEState *d,
>> unsigned i, Error **errp)
>>                            object_get_typename(OBJECT(d)), i);
>>           return false;
>>       }
>> -    ide_bus_init_output_irq(&d->bus[i], irq_out);
>> +    ide_bus_init_output_irq(&d->bus[i], d->irq[i]);
>> 
>>       bmdma_init(&d->bus[i], &d->bmdma[i], d);
>>       d->bmdma[i].bus = &d->bus[i];
>> -- 2.38.1
>> 
>> 
>> This now breaks user-created  piix3-ide:
>> 
>>    qemu-system-x86_64 -M q35 -device piix3-ide
>> 
>> Results in:
>> 
>>    qemu-system-x86_64: -device piix3-ide: output IDE IRQ 0 not connected
>
>Thank you for this real-life-impossible-but-exists-in-QEMU test-case!
>
>Should we cover it in qtests?

Yes, why not. Preferably along with the x-remote machine.



Re: Lost partition tables on ide-hd + ahci drive

2023-02-16 Thread Mike Maslenkin
Makes sense for disks without partition table.
But wouldn't Linux or any other OS write at least 4K bytes in that case?
Who may want to write 512 bytes for any purposes except for boot
sector nowadays..
In dump mentioned before only 512 bytes were not zeroed, so I guess it
was caused by IO from guest OS.
In other cases it can be caused by misconfigured IDE registers state
or broken FIS memory area.


On Thu, Feb 16, 2023 at 6:25 PM Fiona Ebner  wrote:
>
> Am 16.02.23 um 15:17 schrieb Mike Maslenkin:
> > Does additional comparison make a sense here: check for LBA == 0 and
> > then check MBR signature bytes.
> > Additionally it’s easy to check buffer_is_zero() result or even print
> > FIS contents under these conditions.
> > Data looks like a part of guest memory of 64bit Windows.
>
> Thank you for the suggestion! I'll think about adding such a check and
> dumping of FIS contents in a custom build for affected users. But in
> general it would be too much noise for non-MBR cases: e.g. on a disk
> formatted with ext4 (without any partitions), Linux will write to sector
> 0 on every startup and shutdown.
>
> Best Regards,
> Fiona
>



Re: [PATCH v2 07/18] hw/ide/piix: Ensure IDE output IRQs are wired at realization

2023-02-16 Thread Philippe Mathieu-Daudé

On 16/2/23 15:43, Bernhard Beschow wrote:



On Wed, Feb 15, 2023 at 5:20 PM Philippe Mathieu-Daudé 
mailto:phi...@linaro.org>> wrote:


Ensure both IDE output IRQ lines are wired.

We can remove the last use of isa_get_irq(NULL).

Signed-off-by: Philippe Mathieu-Daudé mailto:phi...@linaro.org>>
---
  hw/ide/piix.c | 13 -
  1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 9d876dd4a7..b75a4ddcca 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -133,14 +133,17 @@ static bool pci_piix_init_bus(PCIIDEState *d,
unsigned i, Error **errp)
      static const struct {
          int iobase;
          int iobase2;
-        int isairq;
      } port_info[] = {
-        {0x1f0, 0x3f6, 14},
-        {0x170, 0x376, 15},
+        {0x1f0, 0x3f6},
+        {0x170, 0x376},
      };
      int ret;

-    qemu_irq irq_out = d->irq[i] ? : isa_get_irq(NULL,
port_info[i].isairq);
+    if (!d->irq[i]) {
+        error_setg(errp, "output IDE IRQ %u not connected", i);
+        return false;
+    }
+
      ide_bus_init(&d->bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
      ret = ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase,
                            port_info[i].iobase2);
@@ -149,7 +152,7 @@ static bool pci_piix_init_bus(PCIIDEState *d,
unsigned i, Error **errp)
                           object_get_typename(OBJECT(d)), i);
          return false;
      }
-    ide_bus_init_output_irq(&d->bus[i], irq_out);
+    ide_bus_init_output_irq(&d->bus[i], d->irq[i]);

      bmdma_init(&d->bus[i], &d->bmdma[i], d);
      d->bmdma[i].bus = &d->bus[i];
-- 
2.38.1



This now breaks user-created  piix3-ide:

   qemu-system-x86_64 -M q35 -device piix3-ide

Results in:

   qemu-system-x86_64: -device piix3-ide: output IDE IRQ 0 not connected


Thank you for this real-life-impossible-but-exists-in-QEMU test-case!

Should we cover it in qtests?



Re: [PATCH v2 17/18] hw/isa: Un-inline isa_bus_from_device()

2023-02-16 Thread Bernhard Beschow



Am 15. Februar 2023 16:16:40 UTC schrieb "Philippe Mathieu-Daudé" 
:
>No point in inlining isa_bus_from_device() which is only
>used at device realization time.
>
>Reviewed-by: Richard Henderson 
>Signed-off-by: Philippe Mathieu-Daudé 
>---
> hw/isa/isa-bus.c | 5 +
> include/hw/isa/isa.h | 6 +-
> 2 files changed, 6 insertions(+), 5 deletions(-)
>
>diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c
>index 719f2e96f2..f44817b88b 100644
>--- a/hw/isa/isa-bus.c
>+++ b/hw/isa/isa-bus.c
>@@ -167,6 +167,11 @@ bool isa_realize_and_unref(ISADevice *dev, ISABus *bus, 
>Error **errp)
> return qdev_realize_and_unref(&dev->parent_obj, &bus->parent_obj, errp);
> }
> 
>+ISABus *isa_bus_from_device(ISADevice *dev)
>+{
>+return ISA_BUS(qdev_get_parent_bus(DEVICE(dev)));
>+}
>+
> ISADevice *isa_vga_init(ISABus *bus)
> {
> vga_interface_created = true;
>diff --git a/include/hw/isa/isa.h b/include/hw/isa/isa.h
>index 1084d68ead..c9954a7d99 100644
>--- a/include/hw/isa/isa.h
>+++ b/include/hw/isa/isa.h
>@@ -96,6 +96,7 @@ ISADevice *isa_vga_init(ISABus *bus);
> 
> /*  isa_get_irq() is deprecated, please use isa_bus_get_irq() instead. */
> qemu_irq isa_get_irq(ISADevice *dev, unsigned irqnum);
>+ISABus *isa_bus_from_device(ISADevice *dev);
> 
> /**
>  * isa_register_ioport: Install an I/O port region on the ISA bus.
>@@ -133,9 +134,4 @@ int isa_register_portio_list(ISADevice *dev,
>  const MemoryRegionPortio *portio,
>  void *opaque, const char *name);
> 
>-static inline ISABus *isa_bus_from_device(ISADevice *d)
>-{
>-return ISA_BUS(qdev_get_parent_bus(DEVICE(d)));
>-}
>-
> #endif

Reviewed-by: Bernhard Beschow 



Re: [PATCH v2 18/18] hw/isa: Remove empty ISADeviceClass structure

2023-02-16 Thread Bernhard Beschow



Am 15. Februar 2023 16:16:41 UTC schrieb "Philippe Mathieu-Daudé" 
:
>ISADeviceClass is an empty class and just increase code
>complexity. Remove it, directly embedding DeviceClass in
>classes expanding TYPE_ISA_DEVICE.
>
>Signed-off-by: Philippe Mathieu-Daudé 
>---
> hw/isa/isa-bus.c  | 1 -
> hw/rtc/m48t59-isa.c   | 2 +-
> include/hw/isa/i8259_internal.h   | 2 +-
> include/hw/isa/isa.h  | 6 +-
> include/hw/isa/superio.h  | 2 +-
> include/hw/timer/i8254_internal.h | 2 +-
> 6 files changed, 5 insertions(+), 10 deletions(-)
>
>diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c
>index f44817b88b..1276f31826 100644
>--- a/hw/isa/isa-bus.c
>+++ b/hw/isa/isa-bus.c
>@@ -221,7 +221,6 @@ static const TypeInfo isa_device_type_info = {
> .parent = TYPE_DEVICE,
> .instance_size = sizeof(ISADevice),
> .abstract = true,
>-.class_size = sizeof(ISADeviceClass),
> .class_init = isa_device_class_init,
> };
> 
>diff --git a/hw/rtc/m48t59-isa.c b/hw/rtc/m48t59-isa.c
>index e61f7ec370..5bb46f2383 100644
>--- a/hw/rtc/m48t59-isa.c
>+++ b/hw/rtc/m48t59-isa.c
>@@ -47,7 +47,7 @@ struct M48txxISAState {
> };
> 
> struct M48txxISADeviceClass {
>-ISADeviceClass parent_class;
>+DeviceClass parent_class;
> M48txxInfo info;
> };
> 
>diff --git a/include/hw/isa/i8259_internal.h b/include/hw/isa/i8259_internal.h
>index d272d879fb..155b098452 100644
>--- a/include/hw/isa/i8259_internal.h
>+++ b/include/hw/isa/i8259_internal.h
>@@ -35,7 +35,7 @@
> OBJECT_DECLARE_TYPE(PICCommonState, PICCommonClass, PIC_COMMON)
> 
> struct PICCommonClass {
>-ISADeviceClass parent_class;
>+DeviceClass parent_class;
> 
> void (*pre_save)(PICCommonState *s);
> void (*post_load)(PICCommonState *s);
>diff --git a/include/hw/isa/isa.h b/include/hw/isa/isa.h
>index c9954a7d99..411d12330b 100644
>--- a/include/hw/isa/isa.h
>+++ b/include/hw/isa/isa.h
>@@ -11,7 +11,7 @@
> #define ISA_NUM_IRQS 16
> 
> #define TYPE_ISA_DEVICE "isa-device"
>-OBJECT_DECLARE_TYPE(ISADevice, ISADeviceClass, ISA_DEVICE)
>+OBJECT_DECLARE_SIMPLE_TYPE(ISADevice, ISA_DEVICE)
> 
> #define TYPE_ISA_BUS "ISA"
> OBJECT_DECLARE_SIMPLE_TYPE(ISABus, ISA_BUS)
>@@ -48,10 +48,6 @@ struct IsaDmaClass {
>  void *opaque);
> };
> 
>-struct ISADeviceClass {
>-DeviceClass parent_class;
>-};
>-
> struct ISABus {
> /*< private >*/
> BusState parent_obj;
>diff --git a/include/hw/isa/superio.h b/include/hw/isa/superio.h
>index b9f5c19155..0dc45104d4 100644
>--- a/include/hw/isa/superio.h
>+++ b/include/hw/isa/superio.h
>@@ -44,7 +44,7 @@ typedef struct ISASuperIOFuncs {
> 
> struct ISASuperIOClass {
> /*< private >*/
>-ISADeviceClass parent_class;
>+DeviceClass parent_class;
> /*< public >*/
> DeviceRealize parent_realize;
> 
>diff --git a/include/hw/timer/i8254_internal.h 
>b/include/hw/timer/i8254_internal.h
>index a9a600d941..1761deb4cf 100644
>--- a/include/hw/timer/i8254_internal.h
>+++ b/include/hw/timer/i8254_internal.h
>@@ -58,7 +58,7 @@ struct PITCommonState {
> };
> 
> struct PITCommonClass {
>-ISADeviceClass parent_class;
>+DeviceClass parent_class;
> 
> void (*set_channel_gate)(PITCommonState *s, PITChannelState *sc, int val);
> void (*get_channel_info)(PITCommonState *s, PITChannelState *sc,

Reviewed-by: Bernhard Beschow 



Re: [PATCH v2 2/3] dma-helpers: prevent dma_blk_cb() vs dma_aio_cancel() race

2023-02-16 Thread Kevin Wolf
Am 10.02.2023 um 15:32 hat Stefan Hajnoczi geschrieben:
> dma_blk_cb() only takes the AioContext lock around ->io_func(). That
> means the rest of dma_blk_cb() is not protected. In particular, the
> DMAAIOCB field accesses happen outside the lock.
> 
> There is a race when the main loop thread holds the AioContext lock and
> invokes scsi_device_purge_requests() -> bdrv_aio_cancel() ->
> dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb
> field determines how cancellation proceeds. If dma_aio_cancel() see
> dbs->acb == NULL while dma_blk_cb() is still running, the request can be
> completed twice (-ECANCELED and the actual return value).
> 
> The following assertion can occur with virtio-scsi when an IOThread is
> used:
> 
>   ../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb != 
> NULL' failed.
> 
> Fix the race by holding the AioContext across dma_blk_cb(). Now
> dma_aio_cancel() under the AioContext lock will not see
> inconsistent/intermediate states.
> 
> Cc: Paolo Bonzini 
> Signed-off-by: Stefan Hajnoczi 

Two things that seem to need attention in the review:

> diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c
> index 7820fec54c..2463964805 100644
> --- a/softmmu/dma-helpers.c
> +++ b/softmmu/dma-helpers.c
> @@ -113,17 +113,19 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
>  static void dma_blk_cb(void *opaque, int ret)
>  {
>  DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> +AioContext *ctx = dbs->ctx;
>  dma_addr_t cur_addr, cur_len;
>  void *mem;
>  
>  trace_dma_blk_cb(dbs, ret);
>  
> +aio_context_acquire(ctx);

During the first iteration, the caller may already hold the AioContext
lock. In the case of scsi-disk, it does. Locking a second time is fine
in principle because it's a recursive lock, but we have to be careful
not to call AIO_WAIT_WHILE() indirectly then because it could deadlock.

Except for the dbs->common.cb (see below) I don't see any functions that
would be problematic in this respect. In fact, the one I would be most
worried about is dbs->io_func, but it was already locked before.

>  dbs->acb = NULL;
>  dbs->offset += dbs->iov.size;
>  
>  if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
>  dma_complete(dbs, ret);

All request callbacks hold the AioContext lock now when they didn't
before. I wonder if it's worth documenting the locking rules for
dma_blk_io() in a comment. Could be a separate patch, though.

You remove the locking in scsi_dma_complete() to compensate. All the
other callers come from IDE and nvme, which don't take the lock
internally. Taking the main AioContext lock once is fine, so this looks
good.

If it is possible that we already complete on the first iteration, this
could however also be affected by the case above so that the AioContext
is locked twice. In this case, the invoked callback must not call
AIO_WAIT_WHILE() and we would need to audit IDE and nvme.

Is it possible? In other words, can dbs->sg->nsg be 0? If not, can we
assert and document it?

> -return;
> +goto out;
>  }
>  dma_blk_unmap(dbs);
>  
> @@ -164,9 +166,9 @@ static void dma_blk_cb(void *opaque, int ret)
>  
>  if (dbs->iov.size == 0) {
>  trace_dma_map_wait(dbs);
> -dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs);
> +dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs);

Does copying dbs->ctx to a local variable imply that it may change
during the function? I didn't think so, but if it may, then why is
calling aio_bh_new() with the old value right?

>  cpu_register_map_client(dbs->bh);
> -return;
> +goto out;
>  }
>  
>  if (!QEMU_IS_ALIGNED(dbs->iov.size, dbs->align)) {
> @@ -174,11 +176,11 @@ static void dma_blk_cb(void *opaque, int ret)
>  QEMU_ALIGN_DOWN(dbs->iov.size, dbs->align));
>  }
>  
> -aio_context_acquire(dbs->ctx);
>  dbs->acb = dbs->io_func(dbs->offset, &dbs->iov,
>  dma_blk_cb, dbs, dbs->io_func_opaque);
> -aio_context_release(dbs->ctx);
>  assert(dbs->acb);
> +out:
> +aio_context_release(ctx);
>  }

Kevin




Re: Lost partition tables on ide-hd + ahci drive

2023-02-16 Thread Fiona Ebner
Am 16.02.23 um 15:17 schrieb Mike Maslenkin:
> Does additional comparison make a sense here: check for LBA == 0 and
> then check MBR signature bytes.
> Additionally it’s easy to check buffer_is_zero() result or even print
> FIS contents under these conditions.
> Data looks like a part of guest memory of 64bit Windows.

Thank you for the suggestion! I'll think about adding such a check and
dumping of FIS contents in a custom build for affected users. But in
general it would be too much noise for non-MBR cases: e.g. on a disk
formatted with ext4 (without any partitions), Linux will write to sector
0 on every startup and shutdown.

Best Regards,
Fiona




Re: [PATCH v3 01/14] hw/char/serial-pci: Replace DO_UPCAST(PCISerialState) by PCI_SERIAL()

2023-02-16 Thread Philippe Mathieu-Daudé

On 16/2/23 15:17, BALATON Zoltan wrote:

On Thu, 16 Feb 2023, Philippe Mathieu-Daudé wrote:

On 16/2/23 12:20, Thomas Huth wrote:

On 13/02/2023 19.43, Philippe Mathieu-Daudé wrote:

Use the PCI_SERIAL() QOM type-checking macro to avoid DO_UPCAST().

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/char/serial-pci.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c
index 801b769aba..9689645cac 100644
--- a/hw/char/serial-pci.c
+++ b/hw/char/serial-pci.c
@@ -36,7 +36,10 @@
  #include "qom/object.h"
  struct PCISerialState {
+    /*< private >*/
  PCIDevice dev;
+    /*< public >*/
+


I'm not sure about this part of the patch. It does not seem to be 
related to the other changes at all, and are you sure about which 
parts are really "public" and which parts are "private"? If so, I'd 
like to see a description about this in the commit message, 
preferably in a separate patch. Also, why an empty line after the 
"public" comment?


This is how QOM style separates the object 'private' part -- the
inherited parent, used by the QOM-cast macros -- and the fields
specific to this object.
The private field *must* be the first one in the structure for the
cast macros to work.

Maybe this isn't a convention and we could make one, to unify the
API style. I'm open to better suggestion :)

I suppose I got custom to see it to distinct the QOM hierarchy and
now it helps me to detect what is QOM and what isn't.
Anyway I'll remove from this patch.


I also dislike these comments and empty lines in these struct 
definitions. 


Well this and the new line somehow helps to not reorder this field
elsewhere in the structure.

I think it should be enough to document this QOM convention 
in the docs saying that each QOM object state has to have it's parent's 
state as first member and you're not supposed to access it directly 
(except maybe from very closely related sub class) but do a QOM cast 
instead. If this is clearly stated in the docs then there's no need to 
add comments about this in every object. You could tell QOM objects from 
other structs by the first member also being a QOM object and usually 
called parent or similar but sometimes just dev.


Maybe "first_field_is_always_the_qom_parent" so nobody will be tempted
to access it directly? :)

If you really want to 
get fancy maybe you could hide it in a macro, something like:


OBJECT_STATE(PCISerialState, PCIDevice)
...
END_OBJECT_STATE

but I'm not sure I like that because it has to hide the braces in the 
macro so it's not clear it's just a struct. So just describing it in the 
docs if it's not already is probably enough.


Regards,
BALATON Zoltan





Re: [PATCH v2 07/18] hw/ide/piix: Ensure IDE output IRQs are wired at realization

2023-02-16 Thread Bernhard Beschow
On Wed, Feb 15, 2023 at 5:20 PM Philippe Mathieu-Daudé 
wrote:

> Ensure both IDE output IRQ lines are wired.
>
> We can remove the last use of isa_get_irq(NULL).
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/ide/piix.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index 9d876dd4a7..b75a4ddcca 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -133,14 +133,17 @@ static bool pci_piix_init_bus(PCIIDEState *d,
> unsigned i, Error **errp)
>  static const struct {
>  int iobase;
>  int iobase2;
> -int isairq;
>  } port_info[] = {
> -{0x1f0, 0x3f6, 14},
> -{0x170, 0x376, 15},
> +{0x1f0, 0x3f6},
> +{0x170, 0x376},
>  };
>  int ret;
>
> -qemu_irq irq_out = d->irq[i] ? : isa_get_irq(NULL,
> port_info[i].isairq);
> +if (!d->irq[i]) {
> +error_setg(errp, "output IDE IRQ %u not connected", i);
> +return false;
> +}
> +
>  ide_bus_init(&d->bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
>  ret = ide_init_ioport(&d->bus[i], NULL, port_info[i].iobase,
>port_info[i].iobase2);
> @@ -149,7 +152,7 @@ static bool pci_piix_init_bus(PCIIDEState *d, unsigned
> i, Error **errp)
>   object_get_typename(OBJECT(d)), i);
>  return false;
>  }
> -ide_bus_init_output_irq(&d->bus[i], irq_out);
> +ide_bus_init_output_irq(&d->bus[i], d->irq[i]);
>
>  bmdma_init(&d->bus[i], &d->bmdma[i], d);
>  d->bmdma[i].bus = &d->bus[i];
> --
> 2.38.1
>
>
> This now breaks user-created  piix3-ide:

  qemu-system-x86_64 -M q35 -device piix3-ide

Results in:

  qemu-system-x86_64: -device piix3-ide: output IDE IRQ 0 not connected

Best regards,
Bernhard


Re: [PATCH v3 01/14] hw/char/serial-pci: Replace DO_UPCAST(PCISerialState) by PCI_SERIAL()

2023-02-16 Thread BALATON Zoltan

On Thu, 16 Feb 2023, Philippe Mathieu-Daudé wrote:

On 16/2/23 12:20, Thomas Huth wrote:

On 13/02/2023 19.43, Philippe Mathieu-Daudé wrote:

Use the PCI_SERIAL() QOM type-checking macro to avoid DO_UPCAST().

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/char/serial-pci.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c
index 801b769aba..9689645cac 100644
--- a/hw/char/serial-pci.c
+++ b/hw/char/serial-pci.c
@@ -36,7 +36,10 @@
  #include "qom/object.h"
  struct PCISerialState {
+    /*< private >*/
  PCIDevice dev;
+    /*< public >*/
+


I'm not sure about this part of the patch. It does not seem to be related 
to the other changes at all, and are you sure about which parts are really 
"public" and which parts are "private"? If so, I'd like to see a 
description about this in the commit message, preferably in a separate 
patch. Also, why an empty line after the "public" comment?


This is how QOM style separates the object 'private' part -- the
inherited parent, used by the QOM-cast macros -- and the fields
specific to this object.
The private field *must* be the first one in the structure for the
cast macros to work.

Maybe this isn't a convention and we could make one, to unify the
API style. I'm open to better suggestion :)

I suppose I got custom to see it to distinct the QOM hierarchy and
now it helps me to detect what is QOM and what isn't.
Anyway I'll remove from this patch.


I also dislike these comments and empty lines in these struct definitions. 
I think it should be enough to document this QOM convention in the docs 
saying that each QOM object state has to have it's parent's state as first 
member and you're not supposed to access it directly (except maybe from 
very closely related sub class) but do a QOM cast instead. If this is 
clearly stated in the docs then there's no need to add comments about this 
in every object. You could tell QOM objects from other structs by the 
first member also being a QOM object and usually called parent or similar 
but sometimes just dev. If you really want to get fancy maybe you could 
hide it in a macro, something like:


OBJECT_STATE(PCISerialState, PCIDevice)
...
END_OBJECT_STATE

but I'm not sure I like that because it has to hide the braces in the 
macro so it's not clear it's just a struct. So just describing it in the 
docs if it's not already is probably enough.


Regards,
BALATON Zoltan

Re: [PATCH 04/20] hw/ide/isa: Extract TYPE_ISA_IDE declarations to 'hw/ide/isa.h'

2023-02-16 Thread Bernhard Beschow



Am 15. Februar 2023 11:26:56 UTC schrieb "Philippe Mathieu-Daudé" 
:
>"hw/ide.h" is a mixed bag of lost IDE declarations.
>
>Extract isa_ide_init() and the TYPE_ISA_IDE QOM declarations
>to a new "hw/ide/isa.h" header.
>
>Rename ISAIDEState::isairq as 'irqnum' to emphasize this is
>not a qemu_irq object but the number (index) of an ISA IRQ.
>
>Signed-off-by: Philippe Mathieu-Daudé 
>---
> hw/i386/pc_piix.c|  1 +
> hw/ide/isa.c | 14 ++
> include/hw/ide.h |  5 -
> include/hw/ide/isa.h | 20 
> 4 files changed, 27 insertions(+), 13 deletions(-)
> create mode 100644 include/hw/ide/isa.h
>
>diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
>index df64dd8dcc..7085b4bc58 100644
>--- a/hw/i386/pc_piix.c
>+++ b/hw/i386/pc_piix.c
>@@ -39,6 +39,7 @@
> #include "hw/pci/pci_ids.h"
> #include "hw/usb.h"
> #include "net/net.h"
>+#include "hw/ide/isa.h"
> #include "hw/ide/pci.h"
> #include "hw/ide/piix.h"
> #include "hw/irq.h"
>diff --git a/hw/ide/isa.c b/hw/ide/isa.c
>index 8bedbd13f1..5c3e83a0fc 100644
>--- a/hw/ide/isa.c
>+++ b/hw/ide/isa.c
>@@ -31,22 +31,20 @@
> #include "qemu/module.h"
> #include "sysemu/dma.h"
> 
>+#include "hw/ide/isa.h"
> #include "hw/ide/internal.h"
> #include "qom/object.h"
> 
> /***/
> /* ISA IDE definitions */
> 
>-#define TYPE_ISA_IDE "isa-ide"
>-OBJECT_DECLARE_SIMPLE_TYPE(ISAIDEState, ISA_IDE)
>-
> struct ISAIDEState {
> ISADevice parent_obj;
> 
> IDEBusbus;
> uint32_t  iobase;
> uint32_t  iobase2;
>-uint32_t  isairq;
>+uint32_t  irqnum;
> qemu_irq  irq;
> };
> 
>@@ -75,13 +73,13 @@ static void isa_ide_realizefn(DeviceState *dev, Error 
>**errp)
> 
> ide_bus_init(&s->bus, sizeof(s->bus), dev, 0, 2);
> ide_init_ioport(&s->bus, isadev, s->iobase, s->iobase2);
>-s->irq = isa_get_irq(isadev, s->isairq);
>+s->irq = isa_get_irq(isadev, s->irqnum);
> ide_init2(&s->bus, s->irq);
> vmstate_register(VMSTATE_IF(dev), 0, &vmstate_ide_isa, s);
> ide_register_restart_cb(&s->bus);
> }
> 
>-ISADevice *isa_ide_init(ISABus *bus, int iobase, int iobase2, int isairq,
>+ISADevice *isa_ide_init(ISABus *bus, int iobase, int iobase2, int irqnum,
> DriveInfo *hd0, DriveInfo *hd1)
> {
> DeviceState *dev;
>@@ -92,7 +90,7 @@ ISADevice *isa_ide_init(ISABus *bus, int iobase, int 
>iobase2, int isairq,
> dev = DEVICE(isadev);
> qdev_prop_set_uint32(dev, "iobase",  iobase);
> qdev_prop_set_uint32(dev, "iobase2", iobase2);
>-qdev_prop_set_uint32(dev, "irq", isairq);
>+qdev_prop_set_uint32(dev, "irq", irqnum);
> isa_realize_and_unref(isadev, bus, &error_fatal);
> 
> s = ISA_IDE(dev);
>@@ -108,7 +106,7 @@ ISADevice *isa_ide_init(ISABus *bus, int iobase, int 
>iobase2, int isairq,
> static Property isa_ide_properties[] = {
> DEFINE_PROP_UINT32("iobase",  ISAIDEState, iobase,  0x1f0),
> DEFINE_PROP_UINT32("iobase2", ISAIDEState, iobase2, 0x3f6),
>-DEFINE_PROP_UINT32("irq",ISAIDEState, isairq,  14),
>+DEFINE_PROP_UINT32("irq", ISAIDEState, irqnum,  14),
> DEFINE_PROP_END_OF_LIST(),
> };
> 
>diff --git a/include/hw/ide.h b/include/hw/ide.h
>index 5f8c36b2aa..24a7aa2925 100644
>--- a/include/hw/ide.h
>+++ b/include/hw/ide.h
>@@ -1,13 +1,8 @@
> #ifndef HW_IDE_H
> #define HW_IDE_H
> 
>-#include "hw/isa/isa.h"
> #include "exec/memory.h"
> 
>-/* ide-isa.c */
>-ISADevice *isa_ide_init(ISABus *bus, int iobase, int iobase2, int isairq,
>-DriveInfo *hd0, DriveInfo *hd1);
>-
> int ide_get_geometry(BusState *bus, int unit,
>  int16_t *cyls, int8_t *heads, int8_t *secs);
> int ide_get_bios_chs_trans(BusState *bus, int unit);
>diff --git a/include/hw/ide/isa.h b/include/hw/ide/isa.h
>new file mode 100644
>index 00..1cd0ff1fa6
>--- /dev/null
>+++ b/include/hw/ide/isa.h
>@@ -0,0 +1,20 @@
>+/*
>+ * QEMU IDE Emulation: ISA Bus support.
>+ *
>+ * Copyright (c) 2003 Fabrice Bellard
>+ * Copyright (c) 2006 Openedhand Ltd.
>+ *
>+ * SPDX-License-Identifier: MIT
>+ */
>+#ifndef HW_IDE_ISA_H
>+#define HW_IDE_ISA_H
>+
>+#include "qom/object.h"
>+
>+#define TYPE_ISA_IDE "isa-ide"
>+OBJECT_DECLARE_SIMPLE_TYPE(ISAIDEState, ISA_IDE)
>+
>+ISADevice *isa_ide_init(ISABus *bus, int iobase, int iobase2, int irqnum,
>+DriveInfo *hd0, DriveInfo *hd1);
>+
>+#endif

Reviewed-by: Bernhard Beschow 



Re: Lost partition tables on ide-hd + ahci drive

2023-02-16 Thread Mike Maslenkin
Does additional comparison make a sense here: check for LBA == 0 and
then check MBR signature bytes.
Additionally it’s easy to check buffer_is_zero() result or even print
FIS contents under these conditions.
Data looks like a part of guest memory of 64bit Windows.

On Wed, Feb 15, 2023 at 1:53 PM Fiona Ebner  wrote:
>
> Am 14.02.23 um 19:21 schrieb John Snow:
> > On Thu, Feb 2, 2023 at 7:08 AM Fiona Ebner  wrote:
> >>
> >> Hi,
> >> over the years we've got 1-2 dozen reports[0] about suddenly
> >> missing/corrupted MBR/partition tables. The issue seems to be very rare
> >> and there was no success in trying to reproduce it yet. I'm asking here
> >> in the hope that somebody has seen something similar.
> >>
> >> The only commonality seems to be the use of an ide-hd drive with ahci bus.
> >>
> >> It does seem to happen with both Linux and Windows guests (one of the
> >> reports even mentions FreeBSD) and backing storages for the VMs include
> >> ZFS, RBD, LVM-Thin as well as file-based storages.
> >>
> >> Relevant part of an example configuration:
> >>
> >>>   -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
> >>>   -drive 
> >>> 'file=/dev/zvol/myzpool/vm-168-disk-0,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on'
> >>>  \
> >>>   -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' \
> >>
> >> The first reports are from before io_uring was used and there are also
> >> reports with writeback cache mode and discard=on,detect-zeroes=unmap.
> >>
> >> Some reports say that the issue occurred under high IO load.
> >>
> >> Many reports suspect backups causing the issue. Our backup mechanism
> >> uses backup_job_create() for each drive and runs the jobs sequentially.
> >> It uses a custom block driver as the backup target which just forwards
> >> the writes to the actual target which can be a file or our backup server.
> >> (If you really want to see the details, apply the patches in [1] and see
> >> pve-backup.c and block/backup-dump.c).
> >>
> >> Of course, the backup job will read sector 0 of the source disk, but I
> >> really can't see where a stray write would happen, why the issue would
> >> trigger so rarely or why seemingly only ide-hd+ahci would be affected.
> >>
> >> So again, just asking if somebody has seen something similar or has a
> >> hunch of what the cause might be.
> >>
> >
> > Hi Floria;
> >
> > I'm sorry to say that I haven't worked on the block devices (or
> > backup) for a little while now, so I am not immediately sure what
> > might be causing this problem. In general, I advise against using AHCI
> > in production as better performance (and dev support) can be achieved
> > through virtio.
>
> Yes, we also recommend using virtio-{scsi,blk}-pci to our users and most
> do. Still, some use AHCI, I'd guess mostly for Windows, but not only.
>
> > Still, I am not sure why the combination of AHCI with
> > backup_job_create() would be corrupting the early sectors of the disk.
>
> It's not clear that backup itself is causing the issue. Some of the
> reports do correlate it with backup, but there are no precise timestamps
> when the corruption happened. It might be that the additional IO during
> backup is somehow triggering the issue.
>
> > Do you have any analysis on how much data gets corrupted? Is it the
> > first sector only, the first few? Has anyone taken a peek at the
> > backing storage to see if there are any interesting patterns that can
> > be observed? (Zeroes, garbage, old data?)
>
> It does seem to be the first sector only, but it's not entirely clear.
> Many of the affected users said that after fixing the partition table
> with TestDisk, the VMs booted/worked normally again. We only have dumps
> for the first MiB of three images. In this case, all Windows with Ceph
> RBD images.
>
> See below[0] for the dumps. One was a valid MBR and matched the latest
> good backup, so that VM didn't boot for some other reason, not sure if
> even related to this bug. I did not include this one. One was completely
> empty and one contained other data in the first 512 Bytes, then again
> zeroes, but those zeroes are nothing special AFAIK.
>
> > Have any errors or warnings been observed in either the guest or the
> > host that might offer some clues?
>
> There is a single user who seemed to have hardware issues, and I'd be
> inclined to blame those in that case. But none of the other users
> reported any errors or warnings, though I can't say if any checked
> inside the guests.
>
> > Is there any commonality in the storage format being used? Is it
> > qcow2? Is it network-backed?
>
> There are reports with local ZFS volumes, local LVM-Thin volumes, RBD
> images, qcow2 on NFS. So no pattern to be seen.
>
> > Apologies for the "tier 1" questions.
>
> Thank you for your time!
>
> Best Regards,
> Fiona
>
> @Aaron (had access to the broken images): please correct me/add anything
> relevant I missed. Are the broken VMs/backups still present? If yes, can
> we ask the u

Re: [PATCH v3 01/14] hw/char/serial-pci: Replace DO_UPCAST(PCISerialState) by PCI_SERIAL()

2023-02-16 Thread Philippe Mathieu-Daudé

On 16/2/23 12:20, Thomas Huth wrote:

On 13/02/2023 19.43, Philippe Mathieu-Daudé wrote:

Use the PCI_SERIAL() QOM type-checking macro to avoid DO_UPCAST().

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/char/serial-pci.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c
index 801b769aba..9689645cac 100644
--- a/hw/char/serial-pci.c
+++ b/hw/char/serial-pci.c
@@ -36,7 +36,10 @@
  #include "qom/object.h"
  struct PCISerialState {
+    /*< private >*/
  PCIDevice dev;
+    /*< public >*/
+


I'm not sure about this part of the patch. It does not seem to be 
related to the other changes at all, and are you sure about which parts 
are really "public" and which parts are "private"? If so, I'd like to 
see a description about this in the commit message, preferably in a 
separate patch. Also, why an empty line after the "public" comment?


This is how QOM style separates the object 'private' part -- the
inherited parent, used by the QOM-cast macros -- and the fields
specific to this object.
The private field *must* be the first one in the structure for the
cast macros to work.

Maybe this isn't a convention and we could make one, to unify the
API style. I'm open to better suggestion :)

I suppose I got custom to see it to distinct the QOM hierarchy and
now it helps me to detect what is QOM and what isn't.
Anyway I'll remove from this patch.



Ack for the DO_UPCAST removal.


Thanks!




Re: [PATCH v2 1/3] scsi: protect req->aiocb with AioContext lock

2023-02-16 Thread Kevin Wolf
Am 10.02.2023 um 15:32 hat Stefan Hajnoczi geschrieben:
> If requests are being processed in the IOThread when a SCSIDevice is
> unplugged, scsi_device_purge_requests() -> scsi_req_cancel_async() races
> with I/O completion callbacks. Both threads load and store req->aiocb.
> This can lead to assert(r->req.aiocb == NULL) failures and undefined
> behavior.
> 
> Protect r->req.aiocb with the AioContext lock to prevent the race.
> 
> Signed-off-by: Stefan Hajnoczi 

I tried to check that all accesses of .aiocb are actually protected by
the AioContext lock. I stopped at scsi_read_data(), which asserts that
it is non-NULL, but it is only called as a SCSIReqOps function. I
couldn't find any information on what the locking rules are with
SCSIReqOps functions and didn't feel like reverse engineering scsi-bus
etc. without just asking first.

The same question applies to:
- scsi_read_data
- scsi_write_data
- scsi_disk_emulate_write_data
- scsi_disk_emulate_command

Since these are not callbacks scheduled in the AioContext by scsi-disk
itself, I expect that they are indeed covered. The acquire/release pair
in virtio_scsi_handle_cmd() might actually indirectly cover all of them,
but I haven't checked that.

Either way, the changes that you're making look good:

Reviewed-by: Kevin Wolf 




Re: [PATCH v2 0/7] Python: Drop support for Python 3.6

2023-02-16 Thread Markus Armbruster
Peter Maydell  writes:

> On Wed, 15 Feb 2023 at 19:05, Markus Armbruster  wrote:
>>
>> The discussion under PATCH 6 makes me think there's a bit of confusion
>> about the actual impact of dropping support for Python 3.6.  Possibly
>> because it's spelled out in the commit message of PATCH 7.  Let me
>> summarize it in one sentence:
>>
>> *** All supported host systems continue to work ***
>>
>> Evidence: CI remains green.
>>
>> On some supported host systems, different packages need to be installed.
>> On CentOS 8, for instance, we need to install Python 3.8.13 or 3.9.16
>> instead of 3.6.8.  Let me stress again: same repository, different
>> package.  No downsides I can see.
>
> Yes; I never had any issues with this part of it. If there was
> a "Sphinx that also used that Python" in that repo, the answer
> would be easy.
>
>> The *one* exception is Sphinx on CentOS 8.  CentOS 8 does not ship a
>> version of Sphinx that works with Python 3.7 or newer.  This series
>> proposes to simply stop building the docs there, unless the user
>> provides a suitable version of Sphinx (which is easy enough with pip).
>> That's PATCH 7.
>
> Yes; this brings CentOS 8 down from "fully supported" to "kinda
> supported but not for everything", which is less than ideal.

I acknowledge there's a difference between "you need to dnf install
python-sphinx to be able to build docs" and "you need to pip install
sphinx to be able to build docs", and that the difference is negligible
in some scenarios, and a show stopper in others.  I wasn't fully aware
of the latter.

> I think the level of not-idealness of that side of the scales
> is probably clear enough. The difficulty I think for those who
> haven't had their arms deep in QEMU's Python code is not having
> the background info to be able to weigh up how heavy the other side
> of the tradeoff scales is (since the naive "well, just keep writing
> Python 3.6 for the moment" take is clearly wrong).

Fair point.  I hope the situation is more clear now.




Re: [PATCH 3/5] bulk: Have object_child_foreach() take Error* and return boolean

2023-02-16 Thread Michael S. Tsirkin
On Thu, Feb 16, 2023 at 01:25:22PM +0100, Philippe Mathieu-Daudé wrote:
> Following the Error API best practices documented in commit
> e3fe3988d7 ("error: Document Error API usage rules"), have the
> object_child_foreach[_recursive]() handler take a Error* argument
> and return a boolean indicating whether this error is set or not.
> Convert all handler implementations.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Michael S. Tsirkin 

> ---
>  chardev/char-mux.c  |  6 ++--
>  chardev/char.c  |  6 ++--
>  gdbstub/gdbstub.c   |  8 +++---
>  hw/acpi/cxl.c   |  7 +++--
>  hw/acpi/viot.c  |  6 ++--
>  hw/arm/boot.c   |  6 ++--
>  hw/arm/virt-acpi-build.c|  8 +++---
>  hw/core/machine-qmp-cmds.c  | 18 ++--
>  hw/core/nmi.c   |  8 +++---
>  hw/core/qdev.c  |  8 --
>  hw/core/sysbus.c| 10 +++
>  hw/cpu/cluster.c|  7 +++--
>  hw/display/virtio-gpu-udmabuf.c |  7 +++--
>  hw/i386/acpi-build.c| 16 +--
>  hw/i386/intel_iommu.c   |  7 +++--
>  hw/i386/pc.c| 13 -
>  hw/i386/sgx.c   |  7 ++---
>  hw/mem/memory-device.c  | 25 -
>  hw/mem/pc-dimm.c|  8 +++---
>  hw/misc/mos6522.c   |  6 ++--
>  hw/ppc/pnv.c| 15 +-
>  hw/ppc/pnv_bmc.c|  9 +++---
>  hw/ppc/pnv_xscom.c  |  6 ++--
>  hw/ppc/spapr_pci.c  | 50 -
>  hw/ppc/spapr_rtas_ddw.c | 16 +++
>  hw/rdma/vmw/pvrdma_main.c   |  7 +++--
>  hw/virtio/virtio-balloon.c  |  7 ++---
>  include/qom/object.h| 28 ++
>  iothread.c  |  8 +++---
>  monitor/hmp-cmds.c  |  6 ++--
>  qom/object.c| 36 
>  qom/qom-hmp-cmds.c  |  7 +++--
>  scsi/pr-manager.c   |  8 +++---
>  softmmu/physmem.c   | 13 +
>  softmmu/qdev-monitor.c  |  9 +++---
>  tests/qtest/fuzz/generic_fuzz.c |  6 ++--
>  ui/dbus-chardev.c   | 14 -
>  util/nvdimm-utils.c |  8 +++---
>  38 files changed, 225 insertions(+), 215 deletions(-)
> 
> diff --git a/chardev/char-mux.c b/chardev/char-mux.c
> index ee2d47b20d..17b5854e4c 100644
> --- a/chardev/char-mux.c
> +++ b/chardev/char-mux.c
> @@ -384,7 +384,7 @@ void suspend_mux_open(void)
>  muxes_opened = false;
>  }
>  
> -static int chardev_options_parsed_cb(Object *child, void *opaque)
> +static bool chardev_options_parsed_cb(Object *child, void *opaque, Error 
> **errp)
>  {
>  Chardev *chr = (Chardev *)child;
>  
> @@ -392,14 +392,14 @@ static int chardev_options_parsed_cb(Object *child, 
> void *opaque)
>  open_muxes(chr);
>  }
>  
> -return 0;
> +return true;
>  }
>  
>  void resume_mux_open(void)
>  {
>  muxes_opened = true;
>  object_child_foreach(get_chardevs_root(),
> - chardev_options_parsed_cb, NULL);
> + chardev_options_parsed_cb, NULL, NULL);
>  }
>  
>  static void char_mux_class_init(ObjectClass *oc, void *data)
> diff --git a/chardev/char.c b/chardev/char.c
> index 11eab7764c..542b835477 100644
> --- a/chardev/char.c
> +++ b/chardev/char.c
> @@ -742,7 +742,7 @@ Chardev *qemu_chr_new_mux_mon(const char *label, const 
> char *filename,
>  return qemu_chr_new_permit_mux_mon(label, filename, true, context);
>  }
>  
> -static int qmp_query_chardev_foreach(Object *obj, void *data)
> +static bool qmp_query_chardev_foreach(Object *obj, void *data, Error **errp)
>  {
>  Chardev *chr = CHARDEV(obj);
>  ChardevInfoList **list = data;
> @@ -754,7 +754,7 @@ static int qmp_query_chardev_foreach(Object *obj, void 
> *data)
>  
>  QAPI_LIST_PREPEND(*list, value);
>  
> -return 0;
> +return true;
>  }
>  
>  ChardevInfoList *qmp_query_chardev(Error **errp)
> @@ -762,7 +762,7 @@ ChardevInfoList *qmp_query_chardev(Error **errp)
>  ChardevInfoList *chr_list = NULL;
>  
>  object_child_foreach(get_chardevs_root(),
> - qmp_query_chardev_foreach, &chr_list);
> + qmp_query_chardev_foreach, &chr_list, NULL);
>  
>  return chr_list;
>  }
> diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
> index be88ca0d71..73d6352be5 100644
> --- a/gdbstub/gdbstub.c
> +++ b/gdbstub/gdbstub.c
> @@ -3379,7 +3379,7 @@ static const TypeInfo char_gdb_type_info = {
>  .class_init = char_gdb_class_init,
>  };
>  
> -static int find_cpu_clusters(Object *child, void *opaque)
> +static bool find_cpu_clusters(Object *child, void *opaque, Error **errp)
>  {
>  if (object_dynamic_cast(child, TYPE_CPU_CLUSTER)) {
>  GDBState *s = (GDBState *) opaque;
> @@ -3400,10 +3400,10 @@ static int find_cpu_clusters(Object *child

[RFC PATCH 4/5] hw/nmi: Simplify nmi_monitor_handle() and do_nmi()

2023-02-16 Thread Philippe Mathieu-Daudé
Since the previous commit, object_child_foreach_recursive()'s
handler takes an Error* argument and return a boolean. We can
remove these fields from the do_nmi_s structure, which then
only contains the 'int cpu_index' field.
Directly pass 'cpu_index' as context, removing 'struct do_nmi_s'.

Signed-off-by: Philippe Mathieu-Daudé 
---
RFC: please double-check...

 hw/core/nmi.c | 41 ++---
 1 file changed, 6 insertions(+), 35 deletions(-)

diff --git a/hw/core/nmi.c b/hw/core/nmi.c
index fa74c405f7..f5a6483e89 100644
--- a/hw/core/nmi.c
+++ b/hw/core/nmi.c
@@ -26,51 +26,22 @@
 #include "qemu/module.h"
 #include "monitor/monitor.h"
 
-struct do_nmi_s {
-int cpu_index;
-Error *err;
-bool handled;
-};
-
-static void nmi_children(Object *o, struct do_nmi_s *ns);
-
 static bool do_nmi(Object *o, void *opaque, Error **errp)
 {
-struct do_nmi_s *ns = opaque;
+int *cpu_index = opaque;
 NMIState *n = (NMIState *) object_dynamic_cast(o, TYPE_NMI);
 
-if (n) {
-NMIClass *nc = NMI_GET_CLASS(n);
-
-ns->handled = true;
-if (!nc->nmi_monitor_handler(n, ns->cpu_index, &ns->err)) {
-return false;
-}
+if (!n) {
+error_setg(errp, QERR_UNSUPPORTED);
+return false;
 }
-nmi_children(o, ns);
 
-return true;
-}
-
-static void nmi_children(Object *o, struct do_nmi_s *ns)
-{
-object_child_foreach(o, do_nmi, ns, NULL);
+return NMI_GET_CLASS(n)->nmi_monitor_handler(n, *cpu_index, errp);
 }
 
 void nmi_monitor_handle(int cpu_index, Error **errp)
 {
-struct do_nmi_s ns = {
-.cpu_index = cpu_index,
-.err = NULL,
-.handled = false
-};
-
-nmi_children(object_get_root(), &ns);
-if (ns.handled) {
-error_propagate(errp, ns.err);
-} else {
-error_setg(errp, QERR_UNSUPPORTED);
-}
+object_child_foreach_recursive(object_get_root(), do_nmi, &cpu_index, 
errp);
 }
 
 static const TypeInfo nmi_info = {
-- 
2.38.1




[PATCH 2/5] spapr/ddw: Remove confuse return value in spapr_phb_get_free_liobn()

2023-02-16 Thread Philippe Mathieu-Daudé
The '1' returned value isn't used because
spapr_phb_get_free_liobn_cb() isn't called recursively
(it is only called once in spapr_phb_get_free_liobn()).

The next commit will convert object_child_foreach()
handlers to return a boolean indicating error.
Remove this value to avoid confusion.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/ppc/spapr_rtas_ddw.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
index 7ba11382bc..98f1310c6e 100644
--- a/hw/ppc/spapr_rtas_ddw.c
+++ b/hw/ppc/spapr_rtas_ddw.c
@@ -51,7 +51,6 @@ static int spapr_phb_get_free_liobn_cb(Object *child, void 
*opaque)
 tcet = (SpaprTceTable *) object_dynamic_cast(child, TYPE_SPAPR_TCE_TABLE);
 if (tcet && !tcet->nb_table) {
 *(uint32_t *)opaque = tcet->liobn;
-return 1;
 }
 return 0;
 }
-- 
2.38.1




[PATCH 3/5] bulk: Have object_child_foreach() take Error* and return boolean

2023-02-16 Thread Philippe Mathieu-Daudé
Following the Error API best practices documented in commit
e3fe3988d7 ("error: Document Error API usage rules"), have the
object_child_foreach[_recursive]() handler take a Error* argument
and return a boolean indicating whether this error is set or not.
Convert all handler implementations.

Signed-off-by: Philippe Mathieu-Daudé 
---
 chardev/char-mux.c  |  6 ++--
 chardev/char.c  |  6 ++--
 gdbstub/gdbstub.c   |  8 +++---
 hw/acpi/cxl.c   |  7 +++--
 hw/acpi/viot.c  |  6 ++--
 hw/arm/boot.c   |  6 ++--
 hw/arm/virt-acpi-build.c|  8 +++---
 hw/core/machine-qmp-cmds.c  | 18 ++--
 hw/core/nmi.c   |  8 +++---
 hw/core/qdev.c  |  8 --
 hw/core/sysbus.c| 10 +++
 hw/cpu/cluster.c|  7 +++--
 hw/display/virtio-gpu-udmabuf.c |  7 +++--
 hw/i386/acpi-build.c| 16 +--
 hw/i386/intel_iommu.c   |  7 +++--
 hw/i386/pc.c| 13 -
 hw/i386/sgx.c   |  7 ++---
 hw/mem/memory-device.c  | 25 -
 hw/mem/pc-dimm.c|  8 +++---
 hw/misc/mos6522.c   |  6 ++--
 hw/ppc/pnv.c| 15 +-
 hw/ppc/pnv_bmc.c|  9 +++---
 hw/ppc/pnv_xscom.c  |  6 ++--
 hw/ppc/spapr_pci.c  | 50 -
 hw/ppc/spapr_rtas_ddw.c | 16 +++
 hw/rdma/vmw/pvrdma_main.c   |  7 +++--
 hw/virtio/virtio-balloon.c  |  7 ++---
 include/qom/object.h| 28 ++
 iothread.c  |  8 +++---
 monitor/hmp-cmds.c  |  6 ++--
 qom/object.c| 36 
 qom/qom-hmp-cmds.c  |  7 +++--
 scsi/pr-manager.c   |  8 +++---
 softmmu/physmem.c   | 13 +
 softmmu/qdev-monitor.c  |  9 +++---
 tests/qtest/fuzz/generic_fuzz.c |  6 ++--
 ui/dbus-chardev.c   | 14 -
 util/nvdimm-utils.c |  8 +++---
 38 files changed, 225 insertions(+), 215 deletions(-)

diff --git a/chardev/char-mux.c b/chardev/char-mux.c
index ee2d47b20d..17b5854e4c 100644
--- a/chardev/char-mux.c
+++ b/chardev/char-mux.c
@@ -384,7 +384,7 @@ void suspend_mux_open(void)
 muxes_opened = false;
 }
 
-static int chardev_options_parsed_cb(Object *child, void *opaque)
+static bool chardev_options_parsed_cb(Object *child, void *opaque, Error 
**errp)
 {
 Chardev *chr = (Chardev *)child;
 
@@ -392,14 +392,14 @@ static int chardev_options_parsed_cb(Object *child, void 
*opaque)
 open_muxes(chr);
 }
 
-return 0;
+return true;
 }
 
 void resume_mux_open(void)
 {
 muxes_opened = true;
 object_child_foreach(get_chardevs_root(),
- chardev_options_parsed_cb, NULL);
+ chardev_options_parsed_cb, NULL, NULL);
 }
 
 static void char_mux_class_init(ObjectClass *oc, void *data)
diff --git a/chardev/char.c b/chardev/char.c
index 11eab7764c..542b835477 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -742,7 +742,7 @@ Chardev *qemu_chr_new_mux_mon(const char *label, const char 
*filename,
 return qemu_chr_new_permit_mux_mon(label, filename, true, context);
 }
 
-static int qmp_query_chardev_foreach(Object *obj, void *data)
+static bool qmp_query_chardev_foreach(Object *obj, void *data, Error **errp)
 {
 Chardev *chr = CHARDEV(obj);
 ChardevInfoList **list = data;
@@ -754,7 +754,7 @@ static int qmp_query_chardev_foreach(Object *obj, void 
*data)
 
 QAPI_LIST_PREPEND(*list, value);
 
-return 0;
+return true;
 }
 
 ChardevInfoList *qmp_query_chardev(Error **errp)
@@ -762,7 +762,7 @@ ChardevInfoList *qmp_query_chardev(Error **errp)
 ChardevInfoList *chr_list = NULL;
 
 object_child_foreach(get_chardevs_root(),
- qmp_query_chardev_foreach, &chr_list);
+ qmp_query_chardev_foreach, &chr_list, NULL);
 
 return chr_list;
 }
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index be88ca0d71..73d6352be5 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -3379,7 +3379,7 @@ static const TypeInfo char_gdb_type_info = {
 .class_init = char_gdb_class_init,
 };
 
-static int find_cpu_clusters(Object *child, void *opaque)
+static bool find_cpu_clusters(Object *child, void *opaque, Error **errp)
 {
 if (object_dynamic_cast(child, TYPE_CPU_CLUSTER)) {
 GDBState *s = (GDBState *) opaque;
@@ -3400,10 +3400,10 @@ static int find_cpu_clusters(Object *child, void 
*opaque)
 process->attached = false;
 process->target_xml[0] = '\0';
 
-return 0;
+return true;
 }
 
-return object_child_foreach(child, find_cpu_clusters, opaque);
+return object_child_foreach(child, find_cpu_clusters, opaque, errp);
 }
 
 static int pid_order(const void *a, const void *b)
@@ -3422,7 +3422,

[RFC PATCH 5/5] hw/ppc/pnv_bmc: Simplify pnv_bmc_find()

2023-02-16 Thread Philippe Mathieu-Daudé
ForeachArgs::name is only used once as TYPE_IPMI_BMC.
Since the penultimate commit, object_child_foreach_recursive()'s
handler takes an Error* argument and return a boolean.
We can directly pass ForeachArgs::obj as context, removing the
ForeachArgs structure.

Signed-off-by: Philippe Mathieu-Daudé 
---
RFC: please double-check...

 hw/ppc/pnv_bmc.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/pnv_bmc.c b/hw/ppc/pnv_bmc.c
index 05acc88a55..566284469f 100644
--- a/hw/ppc/pnv_bmc.c
+++ b/hw/ppc/pnv_bmc.c
@@ -278,36 +278,29 @@ IPMIBmc *pnv_bmc_create(PnvPnor *pnor)
 return IPMI_BMC(obj);
 }
 
-typedef struct ForeachArgs {
-const char *name;
-Object *obj;
-} ForeachArgs;
-
 static bool bmc_find(Object *child, void *opaque, Error **errp)
 {
-ForeachArgs *args = opaque;
+Object **obj = opaque;
 
-if (object_dynamic_cast(child, args->name)) {
-if (args->obj) {
-return false;
+if (object_dynamic_cast(child, TYPE_IPMI_BMC)) {
+if (*obj) {
+return true;
 }
-args->obj = child;
+*obj = child;
 }
 return true;
 }
 
 IPMIBmc *pnv_bmc_find(Error **errp)
 {
-ForeachArgs args = { TYPE_IPMI_BMC, NULL };
-int ret;
+Object *obj = NULL;
 
-ret = object_child_foreach_recursive(object_get_root(), bmc_find,
- &args, NULL);
-if (ret) {
+if (!object_child_foreach_recursive(object_get_root(), bmc_find, &obj,
+NULL)) {
 error_setg(errp, "machine should have only one BMC device. "
"Use '-nodefaults'");
 return NULL;
 }
 
-return args.obj ? IPMI_BMC(args.obj) : NULL;
+return IPMI_BMC(obj);
 }
-- 
2.38.1




[PATCH 1/5] hw/nmi: Have nmi_monitor_handler() return a boolean indicating error

2023-02-16 Thread Philippe Mathieu-Daudé
Following the example documented since commit e3fe3988d7 ("error:
Document Error API usage rules"), have the nmi_monitor_handler
return a boolean indicating whether an error is set or not and
convert its implementations.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/core/nmi.c  | 3 +--
 hw/hppa/machine.c  | 3 ++-
 hw/i386/x86.c  | 3 ++-
 hw/intc/m68k_irqc.c| 4 +++-
 hw/m68k/q800.c | 4 +++-
 hw/misc/macio/gpio.c   | 4 +++-
 hw/ppc/pnv.c   | 3 ++-
 hw/ppc/spapr.c | 3 ++-
 hw/s390x/s390-virtio-ccw.c | 4 +++-
 include/hw/nmi.h   | 3 ++-
 10 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/hw/core/nmi.c b/hw/core/nmi.c
index 481c4b3c7e..76cb3ba3b0 100644
--- a/hw/core/nmi.c
+++ b/hw/core/nmi.c
@@ -43,8 +43,7 @@ static int do_nmi(Object *o, void *opaque)
 NMIClass *nc = NMI_GET_CLASS(n);
 
 ns->handled = true;
-nc->nmi_monitor_handler(n, ns->cpu_index, &ns->err);
-if (ns->err) {
+if (!nc->nmi_monitor_handler(n, ns->cpu_index, &ns->err)) {
 return -1;
 }
 }
diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index 7ac68c943f..da7c36c554 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -437,13 +437,14 @@ static void hppa_machine_reset(MachineState *ms, 
ShutdownCause reason)
 cpu[0]->env.gr[19] = FW_CFG_IO_BASE;
 }
 
-static void hppa_nmi(NMIState *n, int cpu_index, Error **errp)
+static bool hppa_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 CPUState *cs;
 
 CPU_FOREACH(cs) {
 cpu_interrupt(cs, CPU_INTERRUPT_NMI);
 }
+return true;
 }
 
 static void hppa_machine_init_class_init(ObjectClass *oc, void *data)
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index eaff4227bd..8bd0691705 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -501,7 +501,7 @@ const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState 
*ms)
 return ms->possible_cpus;
 }
 
-static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
+static bool x86_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 /* cpu index isn't used */
 CPUState *cs;
@@ -515,6 +515,7 @@ static void x86_nmi(NMIState *n, int cpu_index, Error 
**errp)
 apic_deliver_nmi(cpu->apic_state);
 }
 }
+return true;
 }
 
 static long get_file_size(FILE *f)
diff --git a/hw/intc/m68k_irqc.c b/hw/intc/m68k_irqc.c
index 0c515e4ecb..e05083e756 100644
--- a/hw/intc/m68k_irqc.c
+++ b/hw/intc/m68k_irqc.c
@@ -70,9 +70,11 @@ static void m68k_irqc_instance_init(Object *obj)
 qdev_init_gpio_in(DEVICE(obj), m68k_set_irq, M68K_IRQC_LEVEL_NUM);
 }
 
-static void m68k_nmi(NMIState *n, int cpu_index, Error **errp)
+static bool m68k_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 m68k_set_irq(n, M68K_IRQC_LEVEL_7, 1);
+
+return true;
 }
 
 static const VMStateDescription vmstate_m68k_irqc = {
diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
index 9d52ca6613..8631a226cd 100644
--- a/hw/m68k/q800.c
+++ b/hw/m68k/q800.c
@@ -227,13 +227,15 @@ static void glue_auxmode_set_irq(void *opaque, int irq, 
int level)
 s->auxmode = level;
 }
 
-static void glue_nmi(NMIState *n, int cpu_index, Error **errp)
+static bool glue_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 GLUEState *s = GLUE(n);
 
 /* Hold NMI active for 100ms */
 GLUE_set_irq(s, GLUE_IRQ_IN_NMI, 1);
 timer_mod(s->nmi_release, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 100);
+
+return true;
 }
 
 static void glue_nmi_release(void *opaque)
diff --git a/hw/misc/macio/gpio.c b/hw/misc/macio/gpio.c
index c8ac5633b2..0a7214421c 100644
--- a/hw/misc/macio/gpio.c
+++ b/hw/misc/macio/gpio.c
@@ -182,10 +182,12 @@ static void macio_gpio_reset(DeviceState *dev)
 macio_set_gpio(s, 1, true);
 }
 
-static void macio_gpio_nmi(NMIState *n, int cpu_index, Error **errp)
+static bool macio_gpio_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 macio_set_gpio(MACIO_GPIO(n), 9, true);
 macio_set_gpio(MACIO_GPIO(n), 9, false);
+
+return true;
 }
 
 static void macio_gpio_class_init(ObjectClass *oc, void *data)
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 44b1fbbc93..38e69f3b39 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -2309,13 +2309,14 @@ static void pnv_cpu_do_nmi_on_cpu(CPUState *cs, 
run_on_cpu_data arg)
 }
 }
 
-static void pnv_nmi(NMIState *n, int cpu_index, Error **errp)
+static bool pnv_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 CPUState *cs;
 
 CPU_FOREACH(cs) {
 async_run_on_cpu(cs, pnv_cpu_do_nmi_on_cpu, RUN_ON_CPU_NULL);
 }
+return true;
 }
 
 static void pnv_machine_class_init(ObjectClass *oc, void *data)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4921198b9d..d298068169 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3464,13 +3464,14 @@ void spapr_do_system_reset_on_cpu(CPUState *cs, 
run_on_cpu_data arg)
 }
 }
 
-static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
+static bool spapr_nmi(NMIState *n

[PATCH 0/5] bulk: Have object_child_foreach() take Error* and return boolean

2023-02-16 Thread Philippe Mathieu-Daudé
The objective of this series is to allow object_child_foreach()
and object_child_foreach_recursive() propagate Error to their
callers.
Instead of adding temporary helpers with the new prototype, and
converting per subsystem, a bulk conversion is done as a single
patch... The changes aren't mechanical, but quite obvious.

Philippe Mathieu-Daudé (5):
  hw/nmi: Have nmi_monitor_handler() return a boolean indicating error
  spapr/ddw: Remove confuse return value in spapr_phb_get_free_liobn()
  bulk: Have object_child_foreach() take Error* and return boolean
  hw/nmi: Simplify nmi_monitor_handle() and do_nmi()
  hw/ppc/pnv_bmc: Simplify pnv_bmc_find()

 chardev/char-mux.c  |  6 ++--
 chardev/char.c  |  6 ++--
 gdbstub/gdbstub.c   |  8 +++---
 hw/acpi/cxl.c   |  7 +++--
 hw/acpi/viot.c  |  6 ++--
 hw/arm/boot.c   |  6 ++--
 hw/arm/virt-acpi-build.c|  8 +++---
 hw/core/machine-qmp-cmds.c  | 18 ++--
 hw/core/nmi.c   | 44 +
 hw/core/qdev.c  |  8 --
 hw/core/sysbus.c| 10 +++
 hw/cpu/cluster.c|  7 +++--
 hw/display/virtio-gpu-udmabuf.c |  7 +++--
 hw/hppa/machine.c   |  3 +-
 hw/i386/acpi-build.c| 16 +--
 hw/i386/intel_iommu.c   |  7 +++--
 hw/i386/pc.c| 13 -
 hw/i386/sgx.c   |  7 ++---
 hw/i386/x86.c   |  3 +-
 hw/intc/m68k_irqc.c |  4 ++-
 hw/m68k/q800.c  |  4 ++-
 hw/mem/memory-device.c  | 25 -
 hw/mem/pc-dimm.c|  8 +++---
 hw/misc/macio/gpio.c|  4 ++-
 hw/misc/mos6522.c   |  6 ++--
 hw/ppc/pnv.c| 18 ++--
 hw/ppc/pnv_bmc.c| 28 --
 hw/ppc/pnv_xscom.c  |  6 ++--
 hw/ppc/spapr.c  |  3 +-
 hw/ppc/spapr_pci.c  | 50 -
 hw/ppc/spapr_rtas_ddw.c | 17 ++-
 hw/rdma/vmw/pvrdma_main.c   |  7 +++--
 hw/s390x/s390-virtio-ccw.c  |  4 ++-
 hw/virtio/virtio-balloon.c  |  7 ++---
 include/hw/nmi.h|  3 +-
 include/qom/object.h| 28 ++
 iothread.c  |  8 +++---
 monitor/hmp-cmds.c  |  6 ++--
 qom/object.c| 36 
 qom/qom-hmp-cmds.c  |  7 +++--
 scsi/pr-manager.c   |  8 +++---
 softmmu/physmem.c   | 13 +
 softmmu/qdev-monitor.c  |  9 +++---
 tests/qtest/fuzz/generic_fuzz.c |  6 ++--
 ui/dbus-chardev.c   | 14 -
 util/nvdimm-utils.c |  8 +++---
 46 files changed, 256 insertions(+), 271 deletions(-)

-- 
2.38.1




Re: [PATCH v3 01/14] hw/char/serial-pci: Replace DO_UPCAST(PCISerialState) by PCI_SERIAL()

2023-02-16 Thread Thomas Huth

On 13/02/2023 19.43, Philippe Mathieu-Daudé wrote:

Use the PCI_SERIAL() QOM type-checking macro to avoid DO_UPCAST().

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/char/serial-pci.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c
index 801b769aba..9689645cac 100644
--- a/hw/char/serial-pci.c
+++ b/hw/char/serial-pci.c
@@ -36,7 +36,10 @@
  #include "qom/object.h"
  
  struct PCISerialState {

+/*< private >*/
  PCIDevice dev;
+/*< public >*/
+


I'm not sure about this part of the patch. It does not seem to be related to 
the other changes at all, and are you sure about which parts are really 
"public" and which parts are "private"? If so, I'd like to see a description 
about this in the commit message, preferably in a separate patch. Also, why 
an empty line after the "public" comment?



  SerialState state;
  uint8_t prog_if;
  };
@@ -46,7 +49,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCISerialState, PCI_SERIAL)
  
  static void serial_pci_realize(PCIDevice *dev, Error **errp)

  {
-PCISerialState *pci = DO_UPCAST(PCISerialState, dev, dev);
+PCISerialState *pci = PCI_SERIAL(dev);
  SerialState *s = &pci->state;
  
  if (!qdev_realize(DEVICE(s), NULL, errp)) {

@@ -63,7 +66,7 @@ static void serial_pci_realize(PCIDevice *dev, Error **errp)
  
  static void serial_pci_exit(PCIDevice *dev)

  {
-PCISerialState *pci = DO_UPCAST(PCISerialState, dev, dev);
+PCISerialState *pci = PCI_SERIAL(dev);
  SerialState *s = &pci->state;
  
  qdev_unrealize(DEVICE(s));


Ack for the DO_UPCAST removal.

 Thomas




Re: [PULL 0/5] Misc next patches

2023-02-16 Thread Peter Maydell
On Wed, 15 Feb 2023 at 17:48, Daniel P. Berrangé  wrote:
>
> The following changes since commit 6a50f64ca01d0a7b97f14f069762bfd88160f31e:
>
>   Merge tag 'pull-request-2023-02-14' of https://gitlab.com/thuth/qemu into 
> staging (2023-02-14 14:46:10 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/berrange/qemu tags/misc-next-pull-request
>
> for you to fetch changes up to 36debafddd788066be10b33c5f11b984a08e5c85:
>
>   ui: remove deprecated 'password' option for SPICE (2023-02-15 11:14:58 
> -0500)
>
> 
>  * Document 'password-secret' option for -iscsi
>  * Deprecate iSCSI 'password' in favour of 'password-secret'
>  * Remove deprecated 'password' option for SPICE
>  * Fix handling of cached read buffers with TLS
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM



Re: [PATCH v2 6/7] CI: Stop building docs on centos8

2023-02-16 Thread Daniel P . Berrangé
On Tue, Feb 14, 2023 at 03:03:54PM +0100, Paolo Bonzini wrote:
> On Tue, Feb 14, 2023 at 12:49 PM Daniel P. Berrangé  
> wrote:
> > [quote]
> > The motivation for this series is that Python 3.6 was EOL at the end of
> > 2021; upstream tools are beginning to drop support for it, including
> > setuptools, pylint, mypy, etc. As time goes by, it becomes more
> > difficult to support and test against the full range of Python versions
> > that QEMU supports. The closer we get to Python 3.12, the harder it will
> > be to cover that full spread of versions.
> > [/quote]
> >
> > this is all about new/eol versions of software upstream, and I don't
> > think that's a justification. QEMU explicitly aims to use distro provided
> > versions and upstream EOL status is not relevant in that context. Even
> > if using "pip" to install it is possible to limit yourself to upstream
> > releases which still support 3.6.
> >
> > There is the separate issue of Meson dropping python 3.6 which motivates
> > Paolo's series. Again though, we don't have to increase our minimum meson
> > version, because meson is working today. It is our choice to to increase
> > it to use latest available meson features. At some point we can decide
> > what we have is good enough and we don't have to keep chasing the latest
> > features. Maybe we're not there yet, but we should think about when that
> > would be.
> 
> In the case of Meson, the main advantage is moving _all_ of the
> emulator configury out of the configure script.  This requires
> add_global_dependencies which was added in 0.63.  So in that case it
> is indeed mostly about shiny new features and it's not absolutely
> necessary.

I forgot to mention in my previous reply, I feel like the ability
to finish the configure -> meson conversion is a pretty compelling
motivation to adopt the new meson. The hybrid configure/meson
state we've been in has worked better than I expected it would
at first, but none the less, the more we can get out of configure
the better it will be for ongoing maint burden.

So yes, its shiny new features, but they're pretty compelling
features as they allow us to finish the job we started. We've
clear precedent all over QEMU codebase that half-finished
conversions harm our ability to maintain the project.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 6/7] CI: Stop building docs on centos8

2023-02-16 Thread Daniel P . Berrangé
On Thu, Feb 16, 2023 at 02:46:18AM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > On Tue, Feb 14, 2023 at 09:52:44PM +0100, Paolo Bonzini wrote:
> >> Il mar 14 feb 2023, 18:26 Kevin Wolf  ha scritto:
> >> 
> >> > Am 14.02.2023 um 15:03 hat Paolo Bonzini geschrieben:
> >> > > In the case of Python the issue is not the interpreter per se, though
> >> > > there are a couple new feature in Python 3.7 that are quite nice (for
> >> > > example improved data classes[1] or context variables[2]). The main
> >> > > problem as far as I understood (and have seen in my experience) is
> >> > > linting tools. New versions fix bugs that caused false positives, but
> >> > > also become more strict at the same time. The newer versions at the
> >> > > same time are very quick at dropping support for old versions of
> >> > > Python; while older versions sometimes throw deprecation warnings on
> >> > > new versions of Python. This makes it very hard to support a single
> >> > > version of, say, mypy that works on all versions from RHEL8 and SLE15
> >> > > to Fedora 38 and Ubuntu 23.04.
> >> >
> >> > Why do we have to support a single version of mypy? What is wrong with
> >> > running an old mypy version with old Python version, and a newer mypy
> >> > with newer Python versions?
> >> >
> >> > Sure, they will complain about different things, but it doesn't feel
> >> > that different from supporting multiple C compilers in various versions.
> >> >
> >> 
> >> It's more like having to support only C++03 on RHEL 8 and only C++20 in
> >> Fedora 37, without even being able to use a preprocessor.
> >> 
> >> For example old versions might not understand some type annotations and
> >> will fail mypy altogether, therefore even with newer versions you can't
> >> annotate the whole source and have to fall back to non-strict mode.
> >
> > In terms of back compatibility, is there a distinction to be
> > made between mypy compat and the python runtime compat ?
> >
> > If we add annotations wanted by new mypy, and old mypy doesn't
> > understand them, that's not a huge problem. We can simply declare
> > that we don't support old mypy, and skip the validation tests if
> > old mypy is installed.
> 
> In theory, type hints are transparent at run time.  In practice, use of
> modern type hints is known to break 3.6 at run time.  I don't remember
> the details offhand, but John should be able to dig them up if you're
> interested.
> 
> So, it should not be a problem, but it is.

Ok, this is a pretty compelling motivating factor for dropping
3.6, as it is clear demonstration that we're being held back
by the unfortunate lack of runtime compatibility.

For most of the other problems we can simply mandate a new
enough version of the add on tool, but if using new mypy
requires annotations that break at runtime on 3.6 that's
a painful blocker.



With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] hbitmap: fix hbitmap_status() return value for first dirty bit case

2023-02-16 Thread Kevin Wolf
Am 02.02.2023 um 19:15 hat Andrey Zhadchenko via geschrieben:
> The last return statement should return true, as we already evaluated that
> start == next_dirty
> 
> Also, fix hbitmap_status() description in header
> 
> Cc: qemu-sta...@nongnu.org
> Fixes: a6426475a75 ("block/dirty-bitmap: introduce 
> bdrv_dirty_bitmap_status()")
> Signed-off-by: Andrey Zhadchenko 

Thanks, applied to the block branch.

Kevin




Re: [PATCH v2 6/7] CI: Stop building docs on centos8

2023-02-16 Thread Daniel P . Berrangé
On Thu, Feb 16, 2023 at 02:08:33AM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> > Our support policy is written from the POV of the C world, and
> > merely reducing the length of time we support a distro does not
> > address the different world view of Python.
> >
> > Should we instead try to be more explicit about the different
> > needs of the non-C dependencies ?
> >
> > We could for example say
> >
> >  * For native library/application dependancies we aim to
> >support the two most recent distro versions, for 2 years
> >overlap
> >
> >  * For non-native library/applications dependancies we aim
> >to support only the most recent distro version. Users
> >of older distros may need to dynamically fetch newer
> >deps.
> 
> Who does the fetching, users manually, or the build process
> automatically?

I expect both cases need supporting.

In the case of distro builds, they will have to fetch the
deps ahead of time, because the build environments usually
won't have any network access. Some contributors have also
previously objected to the build system fetching stuff off
the network, but they're a small minority.

For friendliness to developers, the build process would be
best to fetch automatically, if they haven't been provided
upfront.

> > The python 3.8 runtime would be considered a native dep, so fall
> > under the 2 distro versions rule. This is fine with CentOS 8,
> > since it provides newer python runtime versions.
> >
> > The python libraries, or tools written in python (meson), would
> > fall under the second rule, and so only need to target one distro
> > version. This would be compatible with CentOS 8, as the users would
> > be expected to download extra python components (or we do it on
> > their behalf).
> >
> > For the second rule, rather than saying most recent distro versions,
> > possibly we might want to carve out an exclusion for LTS distros too.
> > ie, explicitly don't care about versions of non-native bits in RHEL
> > at all, beyond availability of the base (python) runtime.
> 
> Interesting idea.  Can anyone think of reasons not to do this?

It is probably even more compelling when looking at SLES, which is
having an even larger gap between releases than RHEL does.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 0/7] Python: Drop support for Python 3.6

2023-02-16 Thread Thomas Huth

On 15/02/2023 20.05, Markus Armbruster wrote:

The discussion under PATCH 6 makes me think there's a bit of confusion
about the actual impact of dropping support for Python 3.6.  Possibly
because it's spelled out in the commit message of PATCH 7.  Let me
summarize it in one sentence:

 *** All supported host systems continue to work ***

Evidence: CI remains green.


The CI remains green since one of the patches disabled the building of the 
docs on CentOS 8. That's not how I'd describe "continue to work", at least 
not in the same extend as before.



On some supported host systems, different packages need to be installed.
On CentOS 8, for instance, we need to install Python 3.8.13 or 3.9.16
instead of 3.6.8.  Let me stress again: same repository, different
package.  No downsides I can see.

The *one* exception is Sphinx on CentOS 8.  CentOS 8 does not ship a
version of Sphinx that works with Python 3.7 or newer.  This series
proposes to simply stop building the docs there, unless the user
provides a suitable version of Sphinx (which is easy enough with pip).


I think we've all understood that. The thing that you obviously did not 
understood: This breaks our support statement.
I'm pretty sure that you could also build the whole QEMU suite successfully 
on an ancient CentOS 7 or Ubuntu 18.04 system if you manually install a 
newer version of GCC and some of the required libraries first. But that's 
not how we understand our support statement.


Sure, you can argue that you can use "pip install" to get a newer version of 
Sphinx on RHEL 8 / CentOS 8 to continue building the docs there - but is 
that really that much different from installing a newer version of GCC and 
libraries on an ancient distro that we do not officially support anymore? 
I'd say no. You also have to consider that not every build host has access 
to the internet, maybe some companies only have an internal mirror of the 
distro packages in their intranet (I remember some discussion about such a 
case in the past) - so while you were perfectly fine to build the whole of 
QEMU on a CentOS 8 there before this change, you could now not build parts 
of QEMU anymore there due to the missing possibility to run "pip install" 
without full internet connection.


And sure, you can argue that it's "just" the documentation. But IMHO that's 
still an essential part of the QEMU build, and it used to work before, so it 
feels wrong to cut that now out. And also, if we start with the 
documentation now, what's next? If for example scripts/shaderinclude.py 
stops working with Python 3.6, do we then simply say: "Oh, it's fine, you 
can still build all the other targets that work without this script, just 
not the ones anymore that need it"?



All the angst about CentOS falling off the end of our "supported build
platforms" list is not actually warranted by this series :)


Using the term "angst" for the concerns of your fellows here is quite 
cheeky. It's not about "angst", it's about a discussion that our support 
policy might need to be adjusted if we do this step. So instead of writing 
such sentences, I'd rather would like to see you posting a patch for 
docs/about/build-platforms.rst for constructive further discussion instead.


 Thanks,
  Thomas




Re: [PATCH 1/1] block: improve error logging in bdrv_reopen_prepare()

2023-02-16 Thread Kevin Wolf
Am 13.02.2023 um 11:31 hat Denis V. Lunev geschrieben:
> The error generated when the option could not be changed inside
> bdrv_reopen_prepare() does not give a clue about problematic
> BlockDriverState as we could get very long tree of devices.
> 
> The patch adds node name to the error report in the same way as done
> above.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Kevin Wolf 
> CC: Hanna Reitz 
> CC: Vladimir Sementsov-Ogievskiy 

Truly fascinating how inconsistent error reporting is in
bdrv_reopen_prepare(). Some places use the node name, some places device
or node name, some places filename and some places nothing. Your choice
is as good as any.

The only problem I can see with this patch is that qemu-iotests 245
needs an update, too.

Kevin




Re: [PATCH v2 6/7] CI: Stop building docs on centos8

2023-02-16 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Tue, Feb 14, 2023 at 08:40:20AM +0100, Markus Armbruster wrote:
>> Daniel P. Berrangé  writes:
>> 
>> [...]
>> 
>> > We don't have to drop python 3.6. It is a choice because
>> > of a desire to be able to use some shiny new python
>> > features without caring about back compat.
>> 
>> I read this on Friday, and decided to let it sit until after the
>> weekend.  Well, it's now Tuesday, and to be frank, it's still as
>> offensively flippant as it was on Friday.  It shows either ignorance of
>> or cavalier disregard for the sheer amount of work some of us have had
>> to put into keeping old versions of Python viable.
>> 
>> The latter would be quite unlike you, so it must be the former.
>
> I'm sorry, I don't mean it to be offensive. I'm genuinely not seeing
> from the descriptions in the series what the functional benefits are
> from dropping 3.6.

Apology accepted, and point well taken.

>> John has sunk *man-months* into keeping old versions of Python viable.
>> I've put in a lot less myself, but still enough to be painfully aware of
>> it.  I figure Cleber and Beraldo are somewhere in between
>> 
>> Insinuating John's proposal is motivated by "a desire to be able to use
>> some shiny new python features without caring about back compat"
>> disrespects all this work.
>
> I'm writing my comments based on what is described in the cover letter
> as the motivations for the change:
>
> [quote]
> The motivation for this series is that Python 3.6 was EOL at the end of
> 2021; upstream tools are beginning to drop support for it, including
> setuptools, pylint, mypy, etc. As time goes by, it becomes more
> difficult to support and test against the full range of Python versions
> that QEMU supports. The closer we get to Python 3.12, the harder it will
> be to cover that full spread of versions.
> [/quote]
>
> this is all about new/eol versions of software upstream, and I don't
> think that's a justification. QEMU explicitly aims to use distro provided
> versions and upstream EOL status is not relevant in that context. Even
> if using "pip" to install it is possible to limit yourself to upstream
> releases which still support 3.6.
>
> There is the separate issue of Meson dropping python 3.6 which motivates
> Paolo's series. Again though, we don't have to increase our minimum meson
> version, because meson is working today. It is our choice to to increase
> it to use latest available meson features. At some point we can decide
> what we have is good enough and we don't have to keep chasing the latest
> features. Maybe we're not there yet, but we should think about when that
> would be.
>
> [quote]
> The qemu.qmp library and the avocado testing framework both have
> motivations for dropping 3.6 support, but are committed to not doing so
> until QEMU drops support.
> [/quote]
>
> I suspect that this is more of a driver for the drop of 3.6, but I
> don't see any details.
>
> IOW overall justification come across as wanting to use new features,
> and follow upstream EOL, without any real detail of what we're going
> to gain from a functional POV.

I think what we gain is there: "As time goes by, it becomes more
difficult to support and test against the full range of Python versions
that QEMU supports. The closer we get to Python 3.12, the harder it will
be to cover that full spread of versions."  In other words, the gain is
"we make something that's hard and getting harder easier".

What isn't in the cover letter, only in commit message 6+7/7, i.e. this
patch and the next one, is what we pay for it: basically nothing (next
patch's commit message) except for the issue of Sphinx in CentOS 8 (this
patch's commit message).  Putting at least a summary it in the cover
letter would have been better.

John did explain this again and in more detail in reply to Peter's "This
confuses me.  We work fine with Python 3.6 today."  Quote:

That won't last - Many tools such as mypy, pylint and flake8 which I use to
manage our python codebase have been dropping support for 3.6 and I've had
to implement an increasing number of workarounds to help keep it possible
to test 3.6 via CI while also ensuring our newest platforms work as dev
environments.

Our testing matrix for Python is novel and thorough enough that it's
revealed  several bugs in other downstream Python distributions for Debian
and Fedora, and dozens of bugs for the linters themselves.

I'm concerned that when 3.7 is EOL'd in just a few months that the support
and testing gap is going to get even uglier.

[...]

The argument I'm making is:

- CentOS 8 is a supported build platform
- All platforms *do* have a Python modern enough to allow us to drop 3.6
- CentOS's repo version of sphinx is hardcoded to use the older 3.6, though
- You expressed a preference to me in the past to NOT use a pip installed
version of sphinx in preference to the system version in "configure"
- It's 

Re: [PATCH v2 0/7] Python: Drop support for Python 3.6

2023-02-16 Thread Peter Maydell
On Wed, 15 Feb 2023 at 19:05, Markus Armbruster  wrote:
>
> The discussion under PATCH 6 makes me think there's a bit of confusion
> about the actual impact of dropping support for Python 3.6.  Possibly
> because it's spelled out in the commit message of PATCH 7.  Let me
> summarize it in one sentence:
>
> *** All supported host systems continue to work ***
>
> Evidence: CI remains green.
>
> On some supported host systems, different packages need to be installed.
> On CentOS 8, for instance, we need to install Python 3.8.13 or 3.9.16
> instead of 3.6.8.  Let me stress again: same repository, different
> package.  No downsides I can see.

Yes; I never had any issues with this part of it. If there was
a "Sphinx that also used that Python" in that repo, the answer
would be easy.

> The *one* exception is Sphinx on CentOS 8.  CentOS 8 does not ship a
> version of Sphinx that works with Python 3.7 or newer.  This series
> proposes to simply stop building the docs there, unless the user
> provides a suitable version of Sphinx (which is easy enough with pip).
> That's PATCH 7.

Yes; this brings CentOS 8 down from "fully supported" to "kinda
supported but not for everything", which is less than ideal.
I think the level of not-idealness of that side of the scales
is probably clear enough. The difficulty I think for those who
haven't had their arms deep in QEMU's Python code is not having
the background info to be able to weigh up how heavy the other side
of the tradeoff scales is (since the naive "well, just keep writing
Python 3.6 for the moment" take is clearly wrong).

thanks
-- PMM



Re: Lost partition tables on ide-hd + ahci drive

2023-02-16 Thread Fiona Ebner
Am 15.02.23 um 22:47 schrieb John Snow:
> Hm, I'm not sure I see any pattern that might help. Could be that AHCI
> is just bugged during load, but it's tough to know in what way.

If we ever get a backtrace where the bad write actually goes through
QEMU, I'll let you know.

We are considering providing a custom build to affected users (using
GDB-hooks leads to too much slowdown in these performance-critical
paths) in the hope to catch it if it triggers again. We can't really
roll it out to all users, because most writes to sector zero are
legitimate after all and most users are not affected.

> What versions of QEMU are in use here? Is there a date on which you
> noticed an increased frequency of these reports?

There were a few reports around the time we rolled out 4.2 and 5.0
(Q2/Q3 of 2020), but the frequency was always very low. AFAICT, there's
about 20-40 reports that could be this issue in total. The earliest I
know of with lost partitions, but not much more information, are forum
threads from 2017/2018.

With 4.2, there was a rework with our backup patches so naturally, I
suspected that. Before 4.2, we had extended the backup job to allow
using a callback to handle the writes instead of the BlockDriverState
target. But starting from 4.2, we are not messing with that anymore and
using a custom driver as the backup target. That custom driver doesn't
even know about the source. The source is handled by the usual backup
job mechanisms.

If there was some general mix-up there, I'd not expect it to work for
>99.99% of backups and only trigger in combination with AHCI, but who knows?

Best Regards,
Fiona




Re: [PATCH 1/8] m25p80: Improve error when the backend file size does not match the device

2023-02-16 Thread Philippe Mathieu-Daudé

On 15/2/23 20:52, Peter Delevoryas wrote:

On Tue, Feb 14, 2023 at 06:18:23PM +0100, Cédric Le Goater wrote:

Currently, when a block backend is attached to a m25p80 device and the
associated file size does not match the flash model, QEMU complains
with the error message "failed to read the initial flash content".
This is confusing for the user.

Use blk_check_size_and_read_all() instead of blk_pread() to improve
the reported error.


Could you reference commit 06f1521795?


Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Delevoryas 
Reviewed-by: Alistair Francis 
Message-Id: <20221115151000.2080833-1-...@kaod.org>
Signed-off-by: Cédric Le Goater 
---

   breakage with commit a4b15a8b9e ("pflash: Only read non-zero parts
   of backend image") when using -snaphot.



I guess it's not obvious to me, what broke?

1. BlockBackend *blk = -drive file=blob.mtd,format=raw,if=mtd,snapshot=on
2. uint8_t *storage = malloc(...)
3. blk_check_size_and_read_all(blk, storage, size)
4. commit a4b15a8b9e
 for sector in blk:
 if any(b != 0 for b in sector):
 memcpy(&storage[...], sector, sizeof(sector))
 else:
 # Skip memcpy of zeroes

But this is probably a regression for us because we actually expect the zeroes
to be copied?


Yes... Commit a4b15a8b9e mostly considered cloud payload optimization.

Since EEPROMs are usually "small", this could be kludged as:

-- >8 --
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -1615,6 +1615,7 @@ static void m25p80_realize(SSIPeripheral *ss, 
Error **errp)

 trace_m25p80_binding(s);
 s->storage = blk_blockalign(s->blk, s->size);

+memset(s->storage, ERASED_BYTE, s->size);
 if (blk_pread(s->blk, 0, s->size, s->storage, 0) < 0) {
 error_setg(errp, "failed to read the initial flash content");
 return;
---

On emulation, it is simpler to ask the user to provide a block image
with the same size of the emulated device. See commit 06f1521795
("pflash: Require backend size to match device, improve errors") for
more information.


diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 802d2eb021..dc5ffbc4ff 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -24,6 +24,7 @@
  #include "qemu/osdep.h"
  #include "qemu/units.h"
  #include "sysemu/block-backend.h"
+#include "hw/block/block.h"
  #include "hw/qdev-properties.h"
  #include "hw/qdev-properties-system.h"
  #include "hw/ssi/ssi.h"
@@ -1615,8 +1616,7 @@ static void m25p80_realize(SSIPeripheral *ss, Error 
**errp)
  trace_m25p80_binding(s);
  s->storage = blk_blockalign(s->blk, s->size);
  
-if (blk_pread(s->blk, 0, s->size, s->storage, 0) < 0) {

-error_setg(errp, "failed to read the initial flash content");
+if (!blk_check_size_and_read_all(s->blk, s->storage, s->size, errp)) {
  return;
  }
  } else {
--
2.39.1