Re: [RFC PATCH] python: add qmp-send program to send raw qmp commands to qemu

2022-04-05 Thread Markus Armbruster
John Snow  writes:

> On Tue, Apr 5, 2022, 5:03 AM Damien Hedde 
> wrote:

[...]

>> If it stays in QEMU tree, what licensing should I use ? LGPL does not
>> hurt, no ?
>>
>
> Whichever you please. GPLv2+ would be convenient and harmonizes well with
> other tools. LGPL is only something I started doing so that the "qemu.qmp"
> package would be LGPL. Licensing the tools as LGPL was just a sin of
> convenience so I could claim a single license for the whole wheel/egg/tgz.
>
> (I didn't want to make separate qmp and qmp-tools packages.)
>
> Go with what you feel is best.

Any license other than GPLv2+ needs justification in the commit message.

[...]




[PATCH qemu] ppc/vof: Fix uninitialized string tracing

2022-04-05 Thread Alexey Kardashevskiy
There are error paths which do not initialize propname but the trace_exit
label prints it anyway. This initializes the problem string.

Spotted by Coverity CID 1487241.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/ppc/vof.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
index 2b63a6287561..5ce3ca32c998 100644
--- a/hw/ppc/vof.c
+++ b/hw/ppc/vof.c
@@ -294,7 +294,7 @@ static uint32_t vof_setprop(MachineState *ms, void *fdt, 
Vof *vof,
 uint32_t nodeph, uint32_t pname,
 uint32_t valaddr, uint32_t vallen)
 {
-char propname[OF_PROPNAME_LEN_MAX + 1];
+char propname[OF_PROPNAME_LEN_MAX + 1] = "";
 uint32_t ret = PROM_ERROR;
 int offset, rc;
 char trval[64] = "";
-- 
2.30.2




RE: [PATCH V2 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry

2022-04-05 Thread Tian, Kevin
> From: Jason Wang 
> Sent: Wednesday, April 6, 2022 11:33 AM
> To: Tian, Kevin 
> Cc: Liu, Yi L ; m...@redhat.com; pet...@redhat.com;
> yi.y@linux.intel.com; qemu-devel@nongnu.org
> Subject: Re: [PATCH V2 1/4] intel-iommu: don't warn guest errors when
> getting rid2pasid entry
> 
> On Sat, Apr 2, 2022 at 3:34 PM Tian, Kevin  wrote:
> >
> > > From: Jason Wang 
> > > Sent: Wednesday, March 30, 2022 4:37 PM
> > > On Wed, Mar 30, 2022 at 4:16 PM Tian, Kevin 
> wrote:
> > > >
> > > > > From: Jason Wang 
> > > > > Sent: Tuesday, March 29, 2022 12:52 PM
> > > > > >
> > > > > >>>
> > > > > >>> Currently the implementation of vtd_ce_get_rid2pasid_entry() is
> also
> > > > > >>> problematic. According to VT-d spec, RID2PASID field is effective
> only
> > > > > >>> when ecap.rps is true otherwise PASID#0 is used for RID2PASID. I
> > > didn't
> > > > > >>> see ecap.rps is set, neither is it checked in that function. It
> > > > > >>> works possibly
> > > > > >>> just because Linux currently programs 0 to RID2PASID...
> > > > > >>
> > > > > >> This seems to be another issue since the introduction of scalable
> mode.
> > > > > >
> > > > > > yes. this is not introduced in this series. The current scalable 
> > > > > > mode
> > > > > > vIOMMU support was following 3.0 spec, while RPS is added in 3.1.
> > > Needs
> > > > > > to be fixed.
> > > > >
> > > > >
> > > > > Interesting, so this is more complicated when dealing with migration
> > > > > compatibility. So what I suggest is probably something like:
> > > > >
> > > > > -device intel-iommu,version=$version
> > > > >
> > > > > Then we can maintain migration compatibility correctly. For 3.0 we
> can
> > > > > go without RPS and 3.1 and above we need to implement RPS.
> > > >
> > > > This is sensible. Probably a new version number is created only when
> > > > it breaks compatibility with an old version, i.e. not necessarily to 
> > > > follow
> > > > every release from VT-d spec. In this case we definitely need one from
> > > > 3.0 to 3.1+ given RID2PASID working on a 3.0 implementation will
> > > > trigger a reserved fault due to RPS not set on a 3.1 implementation.
> > >
> > > 3.0 should be fine, but I need to check whether there's another
> > > difference for PASID mode.
> > >
> > > It would be helpful if there's a chapter in the spec to describe the
> > > difference of behaviours.
> >
> > There is a section called 'Revision History' in the start of the VT-d spec.
> > It talks about changes in each revision, e.g.:
> > --
> >   June 2019, 3.1:
> >
> >   Added support for RID-PASID capability (RPS field in ECAP_REG).
> 
> Good to know that, does it mean, except for this revision history, all
> the other semantics keep backward compatibility across the version?

Yes and if you find anything not clarified properly I can help forward
to the spec owner.

Thanks
Kevin


Re: [PATCH V2 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry

2022-04-05 Thread Jason Wang
On Sat, Apr 2, 2022 at 3:34 PM Tian, Kevin  wrote:
>
> > From: Jason Wang 
> > Sent: Wednesday, March 30, 2022 4:37 PM
> > On Wed, Mar 30, 2022 at 4:16 PM Tian, Kevin  wrote:
> > >
> > > > From: Jason Wang 
> > > > Sent: Tuesday, March 29, 2022 12:52 PM
> > > > >
> > > > >>>
> > > > >>> Currently the implementation of vtd_ce_get_rid2pasid_entry() is also
> > > > >>> problematic. According to VT-d spec, RID2PASID field is effective 
> > > > >>> only
> > > > >>> when ecap.rps is true otherwise PASID#0 is used for RID2PASID. I
> > didn't
> > > > >>> see ecap.rps is set, neither is it checked in that function. It
> > > > >>> works possibly
> > > > >>> just because Linux currently programs 0 to RID2PASID...
> > > > >>
> > > > >> This seems to be another issue since the introduction of scalable 
> > > > >> mode.
> > > > >
> > > > > yes. this is not introduced in this series. The current scalable mode
> > > > > vIOMMU support was following 3.0 spec, while RPS is added in 3.1.
> > Needs
> > > > > to be fixed.
> > > >
> > > >
> > > > Interesting, so this is more complicated when dealing with migration
> > > > compatibility. So what I suggest is probably something like:
> > > >
> > > > -device intel-iommu,version=$version
> > > >
> > > > Then we can maintain migration compatibility correctly. For 3.0 we can
> > > > go without RPS and 3.1 and above we need to implement RPS.
> > >
> > > This is sensible. Probably a new version number is created only when
> > > it breaks compatibility with an old version, i.e. not necessarily to 
> > > follow
> > > every release from VT-d spec. In this case we definitely need one from
> > > 3.0 to 3.1+ given RID2PASID working on a 3.0 implementation will
> > > trigger a reserved fault due to RPS not set on a 3.1 implementation.
> >
> > 3.0 should be fine, but I need to check whether there's another
> > difference for PASID mode.
> >
> > It would be helpful if there's a chapter in the spec to describe the
> > difference of behaviours.
>
> There is a section called 'Revision History' in the start of the VT-d spec.
> It talks about changes in each revision, e.g.:
> --
>   June 2019, 3.1:
>
>   Added support for RID-PASID capability (RPS field in ECAP_REG).

Good to know that, does it mean, except for this revision history, all
the other semantics keep backward compatibility across the version?

> --
>
> >
> > >
> > > >
> > > > Since most of the advanced features has not been implemented, we may
> > > > probably start just from 3.4 (assuming it's the latest version). And all
> > > > of the following effort should be done for 3.4 in order to productize 
> > > > it.
> > > >
> > >
> > > Agree. btw in your understanding is intel-iommu in a production quality
> > > now?
> >
> > Red Hat supports vIOMMU for the guest DPDK path now.
> >
> > For scalable-mode we need to see some use cases then we can evaluate.
> > virtio SVA could be a possible use case, but it requires more work e.g
> > PRS queue.
>
> Yes it's not ready for full evaluation yet.
>
> The current state before your change is exactly feature-on-par with the
> legacy mode, except using scalable format in certain structures. That alone
> is not worthy of a formal evaluation.

Right.

Thanks

>
> >
> > > If not, do we want to apply this version scheme only when it
> > > reaches the production quality or also in the experimental phase?
> >
> > Yes. E.g if we think scalable mode is mature, we can enable 3.0.
> >
>
> Nice to know.
>
> Thanks
> Kevin




Re: [PATCH V2 4/4] intel-iommu: PASID support

2022-04-05 Thread Jason Wang
On Sat, Apr 2, 2022 at 3:27 PM Tian, Kevin  wrote:
>
> > From: Jason Wang 
> > Sent: Wednesday, March 30, 2022 4:32 PM
> >
> > >
> > > >
> > > > > If there is certain fault
> > > > > triggered by a request with PASID, we do want to report this
> > information
> > > > > upward.
> > > >
> > > > I tend to do it increasingly on top of this series (anyhow at least
> > > > RID2PASID is introduced before this series)
> > >
> > > Yes, RID2PASID should have been recorded too but it's not done correctly.
> > >
> > > If you do it in separate series, it implies that you will introduce 
> > > another
> > > "x-pasid-fault' to guard the new logic related to PASID fault recording?
> >
> > Something like this, as said previously, if it's a real problem, it
> > exists since the introduction of rid2pasid, not specific to this
> > patch.
> >
> > But I can add the fault recording if you insist.
>
> I prefer to including the fault recording given it's simple and makes this
> change more complete in concept. 

That's fine.

Thanks

>
> > > > >
> > > > > Earlier when Yi proposed Qemu changes for guest SVA [1] he aimed for
> > a
> > > > > coarse-grained knob design:
> > > > > --
> > > > >   Intel VT-d 3.0 introduces scalable mode, and it has a bunch of
> > capabilities
> > > > >   related to scalable mode translation, thus there are multiple
> > combinations.
> > > > >   While this vIOMMU implementation wants simplify it for user by
> > providing
> > > > >   typical combinations. User could config it by "x-scalable-mode" 
> > > > > option.
> > > > The
> > > > >   usage is as below:
> > > > > "-device intel-iommu,x-scalable-mode=["legacy"|"modern"]"
> > > > >
> > > > > - "legacy": gives support for SL page table
> > > > > - "modern": gives support for FL page table, pasid, virtual 
> > > > > command
> > > > > -  if not configured, means no scalable mode support, if not 
> > > > > proper
> > > > >configured, will throw error
> > > > > --
> > > > >
> > > > > Which way do you prefer to?
> > > > >
> > > > > [1] https://lists.gnu.org/archive/html/qemu-devel/2020-
> > 02/msg02805.html
> > > >
> > > > My understanding is that, if we want to deploy Qemu in a production
> > > > environment, we can't use the "x-" prefix. We need a full
> > > > implementation of each cap.
> > > >
> > > > E.g
> > > > -device intel-iommu,first-level=on,scalable-mode=on etc.
> > > >
> > >
> > > You meant each cap will get a separate control option?
> > >
> > > But that way requires the management stack or admin to have deep
> > > knowledge about how combinations of different capabilities work, e.g.
> > > if just turning on scalable mode w/o first-level cannot support vSVA
> > > on assigned devices. Is this a common practice when defining Qemu
> > > parameters?
> >
> > We can have a safe and good default value for each cap. E.g
> >
> > In qemu 8.0 we think scalable is mature, we can make scalable to be
> > enabled by default
> > in qemu 8.1 we think first-level is mature, we can make first level to
> > be enabled by default.
> >
>
> OK, that is a workable way.
>
> Thanks
> Kevin




Re: [PATCH V2 4/4] intel-iommu: PASID support

2022-04-05 Thread Jason Wang
On Sat, Apr 2, 2022 at 3:24 PM Tian, Kevin  wrote:
>
> > From: Jason Wang 
> > Sent: Wednesday, March 30, 2022 4:32 PM
> >
> > On Wed, Mar 30, 2022 at 4:02 PM Tian, Kevin  wrote:
> > >
> > > > From: Jason Wang 
> > > > Sent: Tuesday, March 29, 2022 12:49 PM
> > > >
> > > > On Mon, Mar 28, 2022 at 3:03 PM Tian, Kevin 
> > wrote:
> > > > >
> > > > > > From: Jason Wang
> > > > > > Sent: Monday, March 21, 2022 1:54 PM
> > > > > >
> > > > > > +/*
> > > > > > + * vtd-spec v3.4 3.14:
> > > > > > + *
> > > > > > + * """
> > > > > > + * Requests-with-PASID with input address in range 0xFEEx_
> > are
> > > > > > + * translated normally like any other request-with-PASID 
> > > > > > through
> > > > > > + * DMA-remapping hardware. However, if such a request is
> > processed
> > > > > > + * using pass-through translation, it will be blocked as 
> > > > > > described
> > > > > > + * in the paragraph below.
> > > > >
> > > > > While PASID+PT is blocked as described in the below paragraph, the
> > > > > paragraph itself applies to all situations:
> > > > >
> > > > >   1) PT + noPASID
> > > > >   2) translation + noPASID
> > > > >   3) PT + PASID
> > > > >   4) translation + PASID
> > > > >
> > > > > because...
> > > > >
> > > > > > + *
> > > > > > + * Software must not program paging-structure entries to remap
> > any
> > > > > > + * address to the interrupt address range. Untranslated 
> > > > > > requests
> > > > > > + * and translation requests that result in an address in the
> > > > > > + * interrupt range will be blocked with condition code LGN.4 or
> > > > > > + * SGN.8.
> > > > >
> > > > > ... if you look at the definition of LGN.4 or SGN.8:
> > > > >
> > > > > LGN.4:  When legacy mode (RTADDR_REG.TTM=00b) is enabled,
> > hardware
> > > > > detected an output address (i.e. address after remapping) in 
> > > > > the
> > > > > interrupt address range (0xFEEx_). For Translated 
> > > > > requests and
> > > > > requests with pass-through translation type (TT=10), the 
> > > > > output
> > > > > address is the same as the address in the request
> > > > >
> > > > > The last sentence in the first paragraph above just highlights the 
> > > > > fact
> > that
> > > > > when input address of PT is in interrupt range then it is blocked by
> > LGN.4
> > > > > or SGN.8 due to output address also in interrupt range.
> > > > >
> > > > > > + * """
> > > > > > + *
> > > > > > + * We enable per as memory region (iommu_ir_fault) for catching
> > > > > > + * the tranlsation for interrupt range through PASID + PT.
> > > > > > + */
> > > > > > +if (pt && as->pasid != PCI_NO_PASID) {
> > > > > > +memory_region_set_enabled(>iommu_ir_fault, true);
> > > > > > +} else {
> > > > > > +memory_region_set_enabled(>iommu_ir_fault, false);
> > > > > > +}
> > > > > > +
> > > > >
> > > > > Given above this should be a bug fix for nopasid first and then apply 
> > > > > it
> > > > > to pasid path too.
> > > >
> > > > Actually, nopasid path patches were posted here.
> > > >
> > > > https://www.mail-archive.com/qemu-
> > de...@nongnu.org/msg867878.html
> > > >
> > > > Thanks
> > > >
> > >
> > > Can you elaborate why they are handled differently?
> >
> > It's because that patch is for the case where pasid mode is not
> > implemented. We might need it for -stable.
> >
>
> So will that patch be replaced after this one goes in?

That path will be merged first if I understand correctly. Then this
patch could be applied on top.

> By any means
> the new iommu_ir_fault region could be applied to both nopasid
> and pasid i.e. no need toggle it when address space is switched.

Actually it's needed only when PT is enabled. When PT is disabled, the
translation is done via iommu_translate.

Considering the previous patch will be merged, I will fix this !PT in
the next version.

Thanks

>
> Thanks
> Kevin




Re: [PATCH] vdpa: Add missing tracing to batch mapping functions

2022-04-05 Thread Jason Wang



在 2022/4/5 下午2:36, Eugenio Pérez 写道:

These functions were not traced properly.

Signed-off-by: Eugenio Pérez 



Acked-by: Jason Wang 



---
  hw/virtio/vhost-vdpa.c | 2 ++
  hw/virtio/trace-events | 2 ++
  2 files changed, 4 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 8adf7c0b92..9e5fe15d03 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -129,6 +129,7 @@ static void vhost_vdpa_listener_begin_batch(struct 
vhost_vdpa *v)
  .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
  };

+trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
  if (write(fd, , sizeof(msg)) != sizeof(msg)) {
  error_report("failed to write, fd=%d, errno=%d (%s)",
   fd, errno, strerror(errno));
@@ -163,6 +164,7 @@ static void vhost_vdpa_listener_commit(MemoryListener 
*listener)
  msg.type = v->msg_type;
  msg.iotlb.type = VHOST_IOTLB_BATCH_END;

+trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
  if (write(fd, , sizeof(msg)) != sizeof(msg)) {
  error_report("failed to write, fd=%d, errno=%d (%s)",
   fd, errno, strerror(errno));
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index a5102eac9e..48d9d5 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -25,6 +25,8 @@ vhost_user_postcopy_waker_nomatch(const char *rb, uint64_t rb_offset) 
"%s + 0x%"
  # vhost-vdpa.c
  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) 
"vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" 
perm: 0x%"PRIx8" type: %"PRIu8
  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p 
fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
+vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p 
fd: %d msg_type: %"PRIu32" type: %"PRIu8
+vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type)  "vdpa:%p fd: %d 
msg_type: %"PRIu32" type: %"PRIu8
  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) 
"vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d"
  vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 
0x%"PRIx64" llend 0x%"PRIx64
  vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
--
2.27.0






Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()

2022-04-05 Thread Si-Wei Liu




On 4/1/2022 7:20 PM, Jason Wang wrote:

Adding Michael.

On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu  wrote:



On 3/31/2022 7:53 PM, Jason Wang wrote:

On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu  wrote:

Currently, when VM poweroff, it will trigger vdpa
device(such as mlx bluefield2 VF) reset many times(with 1 datapath
queue pair and one control queue, triggered 3 times), this
leads to below issue:

vhost VQ 2 ring restore failed: -22: Invalid argument (22)

This because in vhost_net_stop(), it will stop all vhost device bind to
this virtio device, and in vhost_dev_stop(), qemu tries to stop the device
, then stop the queue: vhost_virtqueue_stop().

In vhost_dev_stop(), it resets the device, which clear some flags
in low level driver, and in next loop(stop other vhost backends),
qemu try to stop the queue corresponding to the vhost backend,
   the driver finds that the VQ is invalied, this is the root cause.

To solve the issue, vdpa should set vring unready, and
remove reset ops in device stop: vhost_dev_start(hdev, false).

and implement a new function vhost_dev_reset, only reset backend
device after all vhost(per-queue) stoped.

Typo.


Signed-off-by: Michael Qiu
Acked-by: Jason Wang 

Rethink this patch, consider there're devices that don't support
set_vq_ready(). I wonder if we need

1) uAPI to tell the user space whether or not it supports set_vq_ready()

I guess what's more relevant here is to define the uAPI semantics for
unready i.e. set_vq_ready(0) for resuming/stopping virtqueue processing,
as starting vq is comparatively less ambiguous.

Yes.


Considering the
likelihood that this interface may be used for live migration, it would
be nice to come up with variants such as 1) discard inflight request
v.s. 2) waiting for inflight processing to be done,

Or inflight descriptor reporting (which seems to be tricky). But we
can start from net that a discarding may just work.


and 3) timeout in
waiting.

Actually, that's the plan and Eugenio is proposing something like this
via virtio spec:

https://urldefense.com/v3/__https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html__;!!ACWV5N9M2RV99hQ!bcX6i6_atR-6Gcl-4q5Tekab_xDuXr7lDAMw2E1hilZ_1cZIX1c5mztQtvsnjiiy$
Thanks for the pointer, I seem to recall I saw it some time back though 
I wonder if there's follow-up for the v3? My impression was that this is 
still a work-in-progress spec proposal, while the semantics of various 
F_STOP scenario is unclear yet and not all of the requirements (ex: 
STOP_FAILED, rewind & !IN_ORDER) for live migration do seem to get 
accommodated?





2) userspace will call SET_VRING_ENABLE() when the device supports
otherwise it will use RESET.

Are you looking to making virtqueue resume-able through the new
SET_VRING_ENABLE() uAPI?

I think RESET is inevitable in some case, i.e. when guest initiates
device reset by writing 0 to the status register.

Yes, that's all my plan.


For suspend/resume and
live migration use cases, indeed RESET can be substituted with
SET_VRING_ENABLE. Again, it'd need quite some code refactoring to
accommodate this change. Although I'm all for it, it'd be the best to
lay out the plan for multiple phases rather than overload this single
patch too much. You can count my time on this endeavor if you don't mind. :)

You're welcome, I agree we should choose a way to go first:

1) manage to use SET_VRING_ENABLE (more like a workaround anyway)
For networking device and the vq suspend/resume and live migration use 
cases to support, I thought it might suffice? We may drop inflight or 
unused ones for Ethernet... What other part do you think may limit its 
extension to become a general uAPI or add new uAPI to address similar VQ 
stop requirement if need be? Or we might well define subsystem specific 
uAPI to stop the virtqueue, for vdpa device specifically? I think the 
point here is given that we would like to avoid guest side modification 
to support live migration, we can define specific uAPI for specific live 
migration requirement without having to involve guest driver change. 
It'd be easy to get started this way and generalize them all to a full 
blown _S_STOP when things are eventually settled.



2) go with virtio-spec (may take a while)
I feel it might be still quite early for now to get to a full blown 
_S_STOP spec level amendment that works for all types of virtio (vendor) 
devices. Generally there can be very specific subsystem-dependent ways 
to stop each type of virtio devices that satisfies the live migration of 
virtio subsystem devices. For now the discussion mostly concerns with vq 
index rewind, inflight handling, notification interrupt and 
configuration space such kind of virtio level things, but real device 
backend has implication on the other parts such as the order of IO/DMA 
quiescing and interrupt masking. If the subsystem virtio guest drivers 
today somehow don't support any of those _S_STOP new behaviors, I guess 
it's with little point to introduce the same 

Re: [PATCH v5 11/13] KVM: Zap existing KVM mappings when pages changed in the private fd

2022-04-05 Thread Michael Roth
On Thu, Mar 10, 2022 at 10:09:09PM +0800, Chao Peng wrote:
> KVM gets notified when memory pages changed in the memory backing store.
> When userspace allocates the memory with fallocate() or frees memory
> with fallocate(FALLOC_FL_PUNCH_HOLE), memory backing store calls into
> KVM fallocate/invalidate callbacks respectively. To ensure KVM never
> maps both the private and shared variants of a GPA into the guest, in
> the fallocate callback, we should zap the existing shared mapping and
> in the invalidate callback we should zap the existing private mapping.
> 
> In the callbacks, KVM firstly converts the offset range into the
> gfn_range and then calls existing kvm_unmap_gfn_range() which will zap
> the shared or private mapping. Both callbacks pass in a memslot
> reference but we need 'kvm' so add a reference in memslot structure.
> 
> Signed-off-by: Yu Zhang 
> Signed-off-by: Chao Peng 
> ---
>  include/linux/kvm_host.h |  3 ++-
>  virt/kvm/kvm_main.c  | 36 
>  2 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9b175aeca63f..186b9b981a65 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -236,7 +236,7 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t 
> cr2_or_gpa,
>  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #endif
>  
> -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
> +#if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || defined(CONFIG_MEMFILE_NOTIFIER)
>  struct kvm_gfn_range {
>   struct kvm_memory_slot *slot;
>   gfn_t start;
> @@ -568,6 +568,7 @@ struct kvm_memory_slot {
>   loff_t private_offset;
>   struct memfile_pfn_ops *pfn_ops;
>   struct memfile_notifier notifier;
> + struct kvm *kvm;
>  };
>  
>  static inline bool kvm_slot_is_private(const struct kvm_memory_slot *slot)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 67349421eae3..52319f49d58a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -841,8 +841,43 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
>  #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
>  
>  #ifdef CONFIG_MEMFILE_NOTIFIER
> +static void kvm_memfile_notifier_handler(struct memfile_notifier *notifier,
> +  pgoff_t start, pgoff_t end)
> +{
> + int idx;
> + struct kvm_memory_slot *slot = container_of(notifier,
> + struct kvm_memory_slot,
> + notifier);
> + struct kvm_gfn_range gfn_range = {
> + .slot   = slot,
> + .start  = start - (slot->private_offset >> PAGE_SHIFT),
> + .end= end - (slot->private_offset >> PAGE_SHIFT),
> + .may_block  = true,
> + };
> + struct kvm *kvm = slot->kvm;
> +
> + gfn_range.start = max(gfn_range.start, slot->base_gfn);
> + gfn_range.end = min(gfn_range.end, slot->base_gfn + slot->npages);
> +
> + if (gfn_range.start >= gfn_range.end)
> + return;
> +
> + idx = srcu_read_lock(>srcu);
> + KVM_MMU_LOCK(kvm);
> + kvm_unmap_gfn_range(kvm, _range);
> + kvm_flush_remote_tlbs(kvm);
> + KVM_MMU_UNLOCK(kvm);
> + srcu_read_unlock(>srcu, idx);

Should this also invalidate gfn_to_pfn_cache mappings? Otherwise it seems
possible the kernel might end up inadvertantly writing to now-private guest
memory via a now-stale gfn_to_pfn_cache entry.



Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa

2022-04-05 Thread Si-Wei Liu




On 4/1/2022 7:10 PM, Jason Wang wrote:

On Sat, Apr 2, 2022 at 6:32 AM Si-Wei Liu  wrote:



On 3/31/2022 1:39 AM, Jason Wang wrote:

On Wed, Mar 30, 2022 at 11:48 PM Si-Wei Liu  wrote:


On 3/30/2022 2:00 AM, Jason Wang wrote:

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu  wrote:

With MQ enabled vdpa device and non-MQ supporting guest e.g.
booting vdpa with mq=on over OVMF of single vqp, below assert
failure is seen:

../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index 
&& idx < dev->vq_index + dev->nvqs' failed.

0  0x7f8ce3ff3387 in raise () at /lib64/libc.so.6
1  0x7f8ce3ff4a78 in abort () at /lib64/libc.so.6
2  0x7f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
3  0x7f8ce3fec252 in  () at /lib64/libc.so.6
4  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:563
5  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:558
6  0x558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, 
vdev=0x558f568f91f0, n=2, mask=) at ../hw/virtio/vhost.c:1557
7  0x558f52c6b89a in virtio_pci_set_guest_notifier 
(d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, 
with_irqfd=with_irqfd@entry=false)
  at ../hw/virtio/virtio-pci.c:974
8  0x558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, 
nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
9  0x558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, 
ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
  at ../hw/net/vhost_net.c:361
10 0x558f52d4e5e7 in virtio_net_set_status (status=, 
n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
11 0x558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 
'\017') at ../hw/net/virtio-net.c:370
12 0x558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, 
val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
13 0x558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=, val=, size=) at ../hw/virtio/virtio-pci.c:1292
14 0x558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, 
value=, size=1, shift=, mask=, 
attrs=...)
  at ../softmmu/memory.c:492
15 0x558f52d127de in access_with_adjusted_size (addr=addr@entry=20, 
value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=, 
access_size_max=, access_fn=0x558f52d15cf0 
, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
16 0x558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, 
data=, op=, attrs=attrs@entry=...)
  at ../softmmu/memory.c:1504
17 0x558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, 
addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, 
addr1=, l=, mr=0x558f568f19d0) at 
/home/opc/qemu-upstream/include/qemu/host-utils.h:165
18 0x558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, 
attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
19 0x558f52d0b36b in address_space_write (as=, addr=, attrs=..., buf=buf@entry=0x7f8ce6300028, len=)
  at ../softmmu/physmem.c:2914
20 0x558f52d0b3da in address_space_rw (as=, addr=, attrs=...,
  attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=, 
is_write=) at ../softmmu/physmem.c:2924
21 0x558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at 
../accel/kvm/kvm-all.c:2903
22 0x558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at 
../accel/kvm/kvm-accel-ops.c:49
23 0x558f52f9f04a in qemu_thread_start (args=) at 
../util/qemu-thread-posix.c:556
24 0x7f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
25 0x7f8ce40bb9fd in clone () at /lib64/libc.so.6

The cause for the assert failure is due to that the vhost_dev index
for the ctrl vq was not aligned with actual one in use by the guest.
Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
if guest doesn't support multiqueue, the guest vq layout would shrink
to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl).
This results in ctrl_vq taking a different vhost_dev group index than
the default. We can map vq to the correct vhost_dev group by checking
if MQ is supported by guest and successfully negotiated. Since the
MQ feature is only present along with CTRL_VQ, we make sure the index
2 is only meant for the control vq while MQ is not supported by guest.

Be noted if QEMU or guest doesn't support control vq, there's no bother
exposing vhost_dev and guest notifier for the control vq. Since
vhost_net_start/stop implies DRIVER_OK is set in device status, feature
negotiation should be completed when reaching virtio_net_vhost_status().

Fixes: 22288fe ("virtio-net: vhost control virtqueue support")
Suggested-by: Jason Wang 
Signed-off-by: Si-Wei Liu 
---
hw/net/virtio-net.c | 19 ---
1 file changed, 16 insertions(+), 3 

[PATCH for-7.1 09/11] pc-bios: Add NPCM8xx Bootrom

2022-04-05 Thread Hao Wu
The bootrom is a minimal bootrom that can be used to bring up
an NPCM845 Linux kernel. Its source code can be found at
github.com/google/vbootrom/tree/master/npcm8xx

Signed-off-by: Hao Wu 
Reviwed-by: Titus Rwantare 
---
 pc-bios/npcm8xx_bootrom.bin | Bin 0 -> 608 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 pc-bios/npcm8xx_bootrom.bin

diff --git a/pc-bios/npcm8xx_bootrom.bin b/pc-bios/npcm8xx_bootrom.bin
new file mode 100644
index 
..6370d6475635c4d445d2b927311edcd591949c82
GIT binary patch
literal 608
zcmdUrKTE?<6vfX=0{*3B5ET?nwWA^;qEk()n=Xb9-4dxoSBrz#p|QJQL~zokn{Eyc
z?PBXUkU+aB?k?IbNQftG5ej|*FC2c{bKkr7zLy3jhNxj`gc_y5h=Ru)PgZC)Y`f
zTqA9Am28qLHlr*^#;re-)dpxT0U42|O+cWOcx=B;{6xXH04vx?cjm
z+%U{oFx!aPpV3>ZKz0i$XA-yq{f}x4;|pbw;l#@9zGd|z-rs*H@V-o%PEV)D-)8n2%DyH5@w_^Y8
LH5R3RMV#gjxYTW}

literal 0
HcmV?d1

-- 
2.35.1.1094.g7c7d902a7c-goog




[PATCH for-7.1 11/11] hw/arm: Add NPCM845 Evaluation board

2022-04-05 Thread Hao Wu
Signed-off-by: Hao Wu 
Reviwed-by: Patrick Venture 
---
 hw/arm/meson.build   |   2 +-
 hw/arm/npcm8xx_boards.c  | 257 +++
 include/hw/arm/npcm8xx.h |  20 +++
 3 files changed, 278 insertions(+), 1 deletion(-)
 create mode 100644 hw/arm/npcm8xx_boards.c

diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index cf824241c5..e813cd72fa 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -14,7 +14,7 @@ arm_ss.add(when: 'CONFIG_MUSICPAL', if_true: 
files('musicpal.c'))
 arm_ss.add(when: 'CONFIG_NETDUINO2', if_true: files('netduino2.c'))
 arm_ss.add(when: 'CONFIG_NETDUINOPLUS2', if_true: files('netduinoplus2.c'))
 arm_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx.c', 
'npcm7xx_boards.c'))
-arm_ss.add(when: 'CONFIG_NPCM8XX', if_true: files('npcm8xx.c'))
+arm_ss.add(when: 'CONFIG_NPCM8XX', if_true: files('npcm8xx.c', 
'npcm8xx_boards.c'))
 arm_ss.add(when: 'CONFIG_NSERIES', if_true: files('nseries.c'))
 arm_ss.add(when: 'CONFIG_SX1', if_true: files('omap_sx1.c'))
 arm_ss.add(when: 'CONFIG_CHEETAH', if_true: files('palm.c'))
diff --git a/hw/arm/npcm8xx_boards.c b/hw/arm/npcm8xx_boards.c
new file mode 100644
index 00..2290473d12
--- /dev/null
+++ b/hw/arm/npcm8xx_boards.c
@@ -0,0 +1,257 @@
+/*
+ * Machine definitions for boards featuring an NPCM8xx SoC.
+ *
+ * Copyright 2022 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+
+#include "chardev/char.h"
+#include "hw/arm/npcm8xx.h"
+#include "hw/core/cpu.h"
+#include "hw/loader.h"
+#include "hw/qdev-core.h"
+#include "hw/qdev-properties.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "qemu/datadir.h"
+#include "qemu/units.h"
+#include "sysemu/block-backend.h"
+
+#define NPCM845_EVB_POWER_ON_STRAPS 0x17ff
+
+static const char npcm8xx_default_bootrom[] = "npcm8xx_bootrom.bin";
+
+static void npcm8xx_load_bootrom(MachineState *machine, NPCM8xxState *soc)
+{
+const char *bios_name = machine->firmware ?: npcm8xx_default_bootrom;
+g_autofree char *filename = NULL;
+int ret;
+
+filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+if (!filename) {
+error_report("Could not find ROM image '%s'", bios_name);
+if (!machine->kernel_filename) {
+/* We can't boot without a bootrom or a kernel image. */
+exit(1);
+}
+return;
+}
+ret = load_image_mr(filename, machine->ram);
+if (ret < 0) {
+error_report("Failed to load ROM image '%s'", filename);
+exit(1);
+}
+}
+
+static void npcm8xx_connect_flash(NPCM7xxFIUState *fiu, int cs_no,
+  const char *flash_type, DriveInfo *dinfo)
+{
+DeviceState *flash;
+qemu_irq flash_cs;
+
+flash = qdev_new(flash_type);
+if (dinfo) {
+qdev_prop_set_drive(flash, "drive", blk_by_legacy_dinfo(dinfo));
+}
+qdev_realize_and_unref(flash, BUS(fiu->spi), _fatal);
+
+flash_cs = qdev_get_gpio_in_named(flash, SSI_GPIO_CS, 0);
+qdev_connect_gpio_out_named(DEVICE(fiu), "cs", cs_no, flash_cs);
+}
+
+static void npcm8xx_connect_dram(NPCM8xxState *soc, MemoryRegion *dram)
+{
+memory_region_add_subregion(get_system_memory(), NPCM8XX_DRAM_BA, dram);
+
+object_property_set_link(OBJECT(soc), "dram-mr", OBJECT(dram),
+ _abort);
+}
+
+static NPCM8xxState *npcm8xx_create_soc(MachineState *machine,
+uint32_t hw_straps)
+{
+NPCM8xxMachineClass *nmc = NPCM8XX_MACHINE_GET_CLASS(machine);
+MachineClass *mc = MACHINE_CLASS(nmc);
+Object *obj;
+
+if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
+error_report("This board can only be used with %s",
+ mc->default_cpu_type);
+exit(1);
+}
+
+obj = object_new_with_props(nmc->soc_type, OBJECT(machine), "soc",
+_abort, NULL);
+object_property_set_uint(obj, "power-on-straps", hw_straps, _abort);
+
+return NPCM8XX(obj);
+}
+
+static I2CBus *npcm8xx_i2c_get_bus(NPCM8xxState *soc, uint32_t num)
+{
+g_assert(num < ARRAY_SIZE(soc->smbus));
+return I2C_BUS(qdev_get_child_bus(DEVICE(>smbus[num]), "i2c-bus"));
+}
+
+static void npcm8xx_init_pwm_splitter(NPCM8xxMachine *machine,
+  NPCM8xxState *soc, const int *fan_counts)
+{
+SplitIRQ *splitters = machine->fan_splitter;
+
+/*
+ * PWM 0~3 belong to module 0 output 0~3.
+ * PWM 4~7 belong to 

[PATCH for-7.1 05/11] hw/misc: Store DRAM size in NPCM8XX GCR Module

2022-04-05 Thread Hao Wu
NPCM8XX boot block stores the DRAM size in SCRPAD_B register in GCR
module. Since we don't simulate a detailed memory controller, we
need to store this information directly similar to the NPCM7XX's
INCTR3 register.

Signed-off-by: Hao Wu 
Reviwed-by: Titus Rwantare 
---
 hw/misc/npcm_gcr.c | 33 ++---
 include/hw/misc/npcm_gcr.h |  1 +
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/hw/misc/npcm_gcr.c b/hw/misc/npcm_gcr.c
index 2349949599..14c298602a 100644
--- a/hw/misc/npcm_gcr.c
+++ b/hw/misc/npcm_gcr.c
@@ -267,7 +267,7 @@ static const struct MemoryRegionOps npcm_gcr_ops = {
 },
 };
 
-static void npcm_gcr_enter_reset(Object *obj, ResetType type)
+static void npcm7xx_gcr_enter_reset(Object *obj, ResetType type)
 {
 NPCMGCRState *s = NPCM_GCR(obj);
 NPCMGCRClass *c = NPCM_GCR_GET_CLASS(obj);
@@ -283,6 +283,23 @@ static void npcm_gcr_enter_reset(Object *obj, ResetType 
type)
 }
 }
 
+static void npcm8xx_gcr_enter_reset(Object *obj, ResetType type)
+{
+NPCMGCRState *s = NPCM_GCR(obj);
+NPCMGCRClass *c = NPCM_GCR_GET_CLASS(obj);
+
+switch (type) {
+case RESET_TYPE_COLD:
+memcpy(s->regs, c->cold_reset_values, c->nr_regs * sizeof(uint32_t));
+/* These 3 registers are at the same location in both 7xx and 8xx. */
+s->regs[NPCM8XX_GCR_PWRON] = s->reset_pwron;
+s->regs[NPCM8XX_GCR_MDLR] = s->reset_mdlr;
+s->regs[NPCM8XX_GCR_INTCR3] = s->reset_intcr3;
+s->regs[NPCM8XX_GCR_SCRPAD_B] = s->reset_scrpad_b;
+break;
+}
+}
+
 static void npcm_gcr_realize(DeviceState *dev, Error **errp)
 {
 ERRP_GUARD();
@@ -326,6 +343,14 @@ static void npcm_gcr_realize(DeviceState *dev, Error 
**errp)
  * 
https://github.com/Nuvoton-Israel/u-boot/blob/2aef993bd2aafeb5408dbaad0f3ce099ee40c4aa/board/nuvoton/poleg/poleg.c#L244
  */
 s->reset_intcr3 |= ctz64(dram_size / NPCM7XX_GCR_MIN_DRAM_SIZE) << 8;
+
+/*
+ * The boot block starting from 0.0.6 for NPCM8xx SoCs stores the DRAM size
+ * in the SCRPAD2 registers. We need to set this field correctly since
+ * the initialization is skipped as we mentioned above.
+ * 
https://github.com/Nuvoton-Israel/u-boot/blob/npcm8mnx-v2019.01_tmp/board/nuvoton/arbel/arbel.c#L737
+ */
+s->reset_scrpad_b = dram_size;
 }
 
 static void npcm_gcr_init(Object *obj)
@@ -355,12 +380,10 @@ static Property npcm_gcr_properties[] = {
 
 static void npcm_gcr_class_init(ObjectClass *klass, void *data)
 {
-ResettableClass *rc = RESETTABLE_CLASS(klass);
 DeviceClass *dc = DEVICE_CLASS(klass);
 
 dc->realize = npcm_gcr_realize;
 dc->vmsd = _npcm_gcr;
-rc->phases.enter = npcm_gcr_enter_reset;
 
 device_class_set_props(dc, npcm_gcr_properties);
 }
@@ -369,24 +392,28 @@ static void npcm7xx_gcr_class_init(ObjectClass *klass, 
void *data)
 {
 NPCMGCRClass *c = NPCM_GCR_CLASS(klass);
 DeviceClass *dc = DEVICE_CLASS(klass);
+ResettableClass *rc = RESETTABLE_CLASS(klass);
 
 QEMU_BUILD_BUG_ON(NPCM7XX_GCR_REGS_END > NPCM_GCR_MAX_NR_REGS);
 QEMU_BUILD_BUG_ON(NPCM7XX_GCR_REGS_END != NPCM7XX_GCR_NR_REGS);
 dc->desc = "NPCM7xx System Global Control Registers";
 c->nr_regs = NPCM7XX_GCR_NR_REGS;
 c->cold_reset_values = npcm7xx_cold_reset_values;
+rc->phases.enter = npcm7xx_gcr_enter_reset;
 }
 
 static void npcm8xx_gcr_class_init(ObjectClass *klass, void *data)
 {
 NPCMGCRClass *c = NPCM_GCR_CLASS(klass);
 DeviceClass *dc = DEVICE_CLASS(klass);
+ResettableClass *rc = RESETTABLE_CLASS(klass);
 
 QEMU_BUILD_BUG_ON(NPCM8XX_GCR_REGS_END > NPCM_GCR_MAX_NR_REGS);
 QEMU_BUILD_BUG_ON(NPCM8XX_GCR_REGS_END != NPCM8XX_GCR_NR_REGS);
 dc->desc = "NPCM8xx System Global Control Registers";
 c->nr_regs = NPCM8XX_GCR_NR_REGS;
 c->cold_reset_values = npcm8xx_cold_reset_values;
+rc->phases.enter = npcm8xx_gcr_enter_reset;
 }
 
 static const TypeInfo npcm_gcr_info[] = {
diff --git a/include/hw/misc/npcm_gcr.h b/include/hw/misc/npcm_gcr.h
index ac3d781c2e..bd69199d51 100644
--- a/include/hw/misc/npcm_gcr.h
+++ b/include/hw/misc/npcm_gcr.h
@@ -39,6 +39,7 @@ typedef struct NPCMGCRState {
 uint32_t reset_pwron;
 uint32_t reset_mdlr;
 uint32_t reset_intcr3;
+uint32_t reset_scrpad_b;
 } NPCMGCRState;
 
 typedef struct NPCMGCRClass {
-- 
2.35.1.1094.g7c7d902a7c-goog




[PATCH for-7.1 08/11] hw/net: Add NPCM8XX PCS Module

2022-04-05 Thread Hao Wu
The PCS exists in NPCM8XX's GMAC1 and is used to control the SGMII
PHY. This implementation contains all the default registers and
the soft reset feature that are required to load the Linux kernel
driver. Further features have not been implemented yet.

Signed-off-by: Hao Wu 
Reviewed-by: Titus Rwantare 
---
 hw/net/meson.build|   1 +
 hw/net/npcm_pcs.c | 409 ++
 hw/net/trace-events   |   4 +
 include/hw/net/npcm_pcs.h |  42 
 4 files changed, 456 insertions(+)
 create mode 100644 hw/net/npcm_pcs.c
 create mode 100644 include/hw/net/npcm_pcs.h

diff --git a/hw/net/meson.build b/hw/net/meson.build
index 685b75badb..4cba3e66db 100644
--- a/hw/net/meson.build
+++ b/hw/net/meson.build
@@ -37,6 +37,7 @@ softmmu_ss.add(when: 'CONFIG_SUNHME', if_true: 
files('sunhme.c'))
 softmmu_ss.add(when: 'CONFIG_FTGMAC100', if_true: files('ftgmac100.c'))
 softmmu_ss.add(when: 'CONFIG_SUNGEM', if_true: files('sungem.c'))
 softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c'))
+softmmu_ss.add(when: 'CONFIG_NPCM8XX', if_true: files('npcm_pcs.c'))
 
 softmmu_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c'))
 softmmu_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c'))
diff --git a/hw/net/npcm_pcs.c b/hw/net/npcm_pcs.c
new file mode 100644
index 00..efe5f68d9c
--- /dev/null
+++ b/hw/net/npcm_pcs.c
@@ -0,0 +1,409 @@
+/*
+ * Nuvoton NPCM8xx PCS Module
+ *
+ * Copyright 2022 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+/*
+ * Disclaimer:
+ * Currently we only implemented the default values of the registers and
+ * the soft reset feature. These are required to boot up the GMAC module
+ * in Linux kernel for NPCM845 boards. Other functionalities are not modeled.
+ */
+
+#include "qemu/osdep.h"
+
+#include "exec/hwaddr.h"
+#include "hw/registerfields.h"
+#include "hw/net/npcm_pcs.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
+#include "trace.h"
+
+#define NPCM_PCS_IND_AC_BA  0x1fe
+#define NPCM_PCS_IND_SR_CTL 0x1e00
+#define NPCM_PCS_IND_SR_MII 0x1f00
+#define NPCM_PCS_IND_SR_TIM 0x1f07
+#define NPCM_PCS_IND_VR_MII 0x1f80
+
+REG16(NPCM_PCS_SR_CTL_ID1, 0x08)
+REG16(NPCM_PCS_SR_CTL_ID2, 0x0a)
+REG16(NPCM_PCS_SR_CTL_STS, 0x10)
+
+REG16(NPCM_PCS_SR_MII_CTRL, 0x00)
+REG16(NPCM_PCS_SR_MII_STS, 0x02)
+REG16(NPCM_PCS_SR_MII_DEV_ID1, 0x04)
+REG16(NPCM_PCS_SR_MII_DEV_ID2, 0x06)
+REG16(NPCM_PCS_SR_MII_AN_ADV, 0x08)
+REG16(NPCM_PCS_SR_MII_LP_BABL, 0x0a)
+REG16(NPCM_PCS_SR_MII_AN_EXPN, 0x0c)
+REG16(NPCM_PCS_SR_MII_EXT_STS, 0x1e)
+
+REG16(NPCM_PCS_SR_TIM_SYNC_ABL, 0x10)
+REG16(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR, 0x12)
+REG16(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR, 0x14)
+REG16(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR, 0x16)
+REG16(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR, 0x18)
+REG16(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR, 0x1a)
+REG16(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR, 0x1c)
+REG16(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR, 0x1e)
+REG16(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR, 0x20)
+
+REG16(NPCM_PCS_VR_MII_MMD_DIG_CTRL1, 0x000)
+REG16(NPCM_PCS_VR_MII_AN_CTRL, 0x002)
+REG16(NPCM_PCS_VR_MII_AN_INTR_STS, 0x004)
+REG16(NPCM_PCS_VR_MII_TC, 0x006)
+REG16(NPCM_PCS_VR_MII_DBG_CTRL, 0x00a)
+REG16(NPCM_PCS_VR_MII_EEE_MCTRL0, 0x00c)
+REG16(NPCM_PCS_VR_MII_EEE_TXTIMER, 0x010)
+REG16(NPCM_PCS_VR_MII_EEE_RXTIMER, 0x012)
+REG16(NPCM_PCS_VR_MII_LINK_TIMER_CTRL, 0x014)
+REG16(NPCM_PCS_VR_MII_EEE_MCTRL1, 0x016)
+REG16(NPCM_PCS_VR_MII_DIG_STS, 0x020)
+REG16(NPCM_PCS_VR_MII_ICG_ERRCNT1, 0x022)
+REG16(NPCM_PCS_VR_MII_MISC_STS, 0x030)
+REG16(NPCM_PCS_VR_MII_RX_LSTS, 0x040)
+REG16(NPCM_PCS_VR_MII_MP_TX_BSTCTRL0, 0x070)
+REG16(NPCM_PCS_VR_MII_MP_TX_LVLCTRL0, 0x074)
+REG16(NPCM_PCS_VR_MII_MP_TX_GENCTRL0, 0x07a)
+REG16(NPCM_PCS_VR_MII_MP_TX_GENCTRL1, 0x07c)
+REG16(NPCM_PCS_VR_MII_MP_TX_STS, 0x090)
+REG16(NPCM_PCS_VR_MII_MP_RX_GENCTRL0, 0x0b0)
+REG16(NPCM_PCS_VR_MII_MP_RX_GENCTRL1, 0x0b2)
+REG16(NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0, 0x0ba)
+REG16(NPCM_PCS_VR_MII_MP_MPLL_CTRL0, 0x0f0)
+REG16(NPCM_PCS_VR_MII_MP_MPLL_CTRL1, 0x0f2)
+REG16(NPCM_PCS_VR_MII_MP_MPLL_STS, 0x110)
+REG16(NPCM_PCS_VR_MII_MP_MISC_CTRL2, 0x126)
+REG16(NPCM_PCS_VR_MII_MP_LVL_CTRL, 0x130)
+REG16(NPCM_PCS_VR_MII_MP_MISC_CTRL0, 0x132)
+REG16(NPCM_PCS_VR_MII_MP_MISC_CTRL1, 0x134)
+REG16(NPCM_PCS_VR_MII_DIG_CTRL2, 0x1c2)
+REG16(NPCM_PCS_VR_MII_DIG_ERRCNT_SEL, 0x1c4)
+
+/* Register Fields */
+#define NPCM_PCS_SR_MII_CTRL_RSTBIT(15)
+
+static const uint16_t 

[PATCH for-7.1 07/11] hw/misc: Support 8-bytes memop in NPCM GCR module

2022-04-05 Thread Hao Wu
The NPCM8xx GCR device can be accessed with 64-bit memory operations.
This patch supports that.

Signed-off-by: Hao Wu 
Reviewed-by: Patrick Venture 
---
 hw/misc/npcm_gcr.c   | 98 +---
 hw/misc/trace-events |  4 +-
 2 files changed, 77 insertions(+), 25 deletions(-)

diff --git a/hw/misc/npcm_gcr.c b/hw/misc/npcm_gcr.c
index 14c298602a..aa81db23d7 100644
--- a/hw/misc/npcm_gcr.c
+++ b/hw/misc/npcm_gcr.c
@@ -201,6 +201,7 @@ static uint64_t npcm_gcr_read(void *opaque, hwaddr offset, 
unsigned size)
 uint32_t reg = offset / sizeof(uint32_t);
 NPCMGCRState *s = opaque;
 NPCMGCRClass *c = NPCM_GCR_GET_CLASS(s);
+uint64_t value;
 
 if (reg >= c->nr_regs) {
 qemu_log_mask(LOG_GUEST_ERROR,
@@ -209,9 +210,23 @@ static uint64_t npcm_gcr_read(void *opaque, hwaddr offset, 
unsigned size)
 return 0;
 }
 
-trace_npcm_gcr_read(offset, s->regs[reg]);
+switch (size) {
+case 4:
+value = s->regs[reg];
+break;
+
+case 8:
+value = s->regs[reg] + (((uint64_t)s->regs[reg + 1]) << 32);
+break;
+
+default:
+g_assert_not_reached();
+}
 
-return s->regs[reg];
+if (s->regs[reg] != 0) {
+trace_npcm_gcr_read(offset, value);
+}
+return value;
 }
 
 static void npcm_gcr_write(void *opaque, hwaddr offset,
@@ -222,7 +237,7 @@ static void npcm_gcr_write(void *opaque, hwaddr offset,
 NPCMGCRClass *c = NPCM_GCR_GET_CLASS(s);
 uint32_t value = v;
 
-trace_npcm_gcr_write(offset, value);
+trace_npcm_gcr_write(offset, v);
 
 if (reg >= c->nr_regs) {
 qemu_log_mask(LOG_GUEST_ERROR,
@@ -231,29 +246,65 @@ static void npcm_gcr_write(void *opaque, hwaddr offset,
 return;
 }
 
-switch (reg) {
-case NPCM7XX_GCR_PDID:
-case NPCM7XX_GCR_PWRON:
-case NPCM7XX_GCR_INTSR:
-qemu_log_mask(LOG_GUEST_ERROR,
-  "%s: register @ 0x%04" HWADDR_PRIx " is read-only\n",
-  __func__, offset);
-return;
-
-case NPCM7XX_GCR_RESSR:
-case NPCM7XX_GCR_CP2BST:
-/* Write 1 to clear */
-value = s->regs[reg] & ~value;
+switch (size) {
+case 4:
+switch (reg) {
+case NPCM7XX_GCR_PDID:
+case NPCM7XX_GCR_PWRON:
+case NPCM7XX_GCR_INTSR:
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: register @ 0x%04" HWADDR_PRIx " is read-only\n",
+  __func__, offset);
+return;
+
+case NPCM7XX_GCR_RESSR:
+case NPCM7XX_GCR_CP2BST:
+/* Write 1 to clear */
+value = s->regs[reg] & ~value;
+break;
+
+case NPCM7XX_GCR_RLOCKR1:
+case NPCM7XX_GCR_MDLR:
+/* Write 1 to set */
+value |= s->regs[reg];
+break;
+};
+s->regs[reg] = value;
 break;
 
-case NPCM7XX_GCR_RLOCKR1:
-case NPCM7XX_GCR_MDLR:
-/* Write 1 to set */
-value |= s->regs[reg];
+case 8:
+s->regs[reg] = value;
+s->regs[reg + 1] = v >> 32;
 break;
-};
 
-s->regs[reg] = value;
+default:
+g_assert_not_reached();
+}
+}
+
+static bool npcm_gcr_check_mem_op(void *opaque, hwaddr offset,
+  unsigned size, bool is_write,
+  MemTxAttrs attrs)
+{
+NPCMGCRClass *c = NPCM_GCR_GET_CLASS(opaque);
+
+if (offset >= c->nr_regs * sizeof(uint32_t)) {
+return false;
+}
+
+switch (size) {
+case 4:
+return true;
+case 8:
+if (offset >= NPCM8XX_GCR_SCRPAD_00 * sizeof(uint32_t) &&
+offset < (NPCM8XX_GCR_NR_REGS - 1) * sizeof(uint32_t)) {
+return true;
+} else {
+return false;
+}
+default:
+return false;
+}
 }
 
 static const struct MemoryRegionOps npcm_gcr_ops = {
@@ -262,7 +313,8 @@ static const struct MemoryRegionOps npcm_gcr_ops = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 .valid  = {
 .min_access_size= 4,
-.max_access_size= 4,
+.max_access_size= 8,
+.accepts= npcm_gcr_check_mem_op,
 .unaligned  = false,
 },
 };
diff --git a/hw/misc/trace-events b/hw/misc/trace-events
index 02650acfff..2ffec963e7 100644
--- a/hw/misc/trace-events
+++ b/hw/misc/trace-events
@@ -103,8 +103,8 @@ npcm_clk_read(uint64_t offset, uint32_t value) " offset: 
0x%04" PRIx64 " value:
 npcm_clk_write(uint64_t offset, uint32_t value) "offset: 0x%04" PRIx64 " 
value: 0x%08" PRIx32
 
 # npcm_gcr.c
-npcm_gcr_read(uint64_t offset, uint32_t value) " offset: 0x%04" PRIx64 " 
value: 0x%08" PRIx32
-npcm_gcr_write(uint64_t offset, uint32_t value) "offset: 0x%04" PRIx64 " 
value: 0x%08" PRIx32
+npcm_gcr_read(uint64_t offset, uint64_t value) " offset: 0x%04" PRIx64 " 
value: 0x%08" PRIx64
+npcm_gcr_write(uint64_t offset, uint64_t 

[PATCH for-7.1 06/11] hw/intc: Add a property to allow GIC to reset into non secure mode

2022-04-05 Thread Hao Wu
This property allows certain boards like NPCM8xx to boot the kernel
directly into non-secure mode. This is necessary since we do not
support secure boot features for NPCM8xx now.

Signed-off-by: Hao Wu 
Reviewed-by: Patrick Venture 
---
 hw/intc/arm_gic_common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c
index 7b44d5625b..7ddc5cfbd0 100644
--- a/hw/intc/arm_gic_common.c
+++ b/hw/intc/arm_gic_common.c
@@ -358,6 +358,8 @@ static Property arm_gic_common_properties[] = {
 /* True if the GIC should implement the virtualization extensions */
 DEFINE_PROP_BOOL("has-virtualization-extensions", GICState, virt_extn, 0),
 DEFINE_PROP_UINT32("num-priority-bits", GICState, n_prio_bits, 8),
+/* True if we want to directly booting a kernel into NonSecure */
+DEFINE_PROP_BOOL("irq-reset-nonsecure", GICState, irq_reset_nonsecure, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.35.1.1094.g7c7d902a7c-goog




[PATCH for-7.1 03/11] hw/misc: Support NPCM8XX GCR module

2022-04-05 Thread Hao Wu
NPCM8XX has a different set of global control registers than 7XX.
This patch supports that.

Signed-off-by: Hao Wu 
Reviwed-by: Titus Rwantare 
---
 MAINTAINERS   |   9 +-
 hw/misc/meson.build   |   2 +-
 hw/misc/npcm7xx_gcr.c | 269 
 hw/misc/npcm_gcr.c| 413 ++
 hw/misc/trace-events  |   6 +-
 include/hw/arm/npcm7xx.h  |   4 +-
 include/hw/misc/{npcm7xx_gcr.h => npcm_gcr.h} |  29 +-
 7 files changed, 445 insertions(+), 287 deletions(-)
 delete mode 100644 hw/misc/npcm7xx_gcr.c
 create mode 100644 hw/misc/npcm_gcr.c
 rename include/hw/misc/{npcm7xx_gcr.h => npcm_gcr.h} (56%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4ad2451e03..c31ed09527 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -791,14 +791,15 @@ F: hw/net/mv88w8618_eth.c
 F: include/hw/net/mv88w8618_eth.h
 F: docs/system/arm/musicpal.rst
 
-Nuvoton NPCM7xx
+Nuvoton NPCM
 M: Havard Skinnemoen 
 M: Tyrone Ting 
+M: Hao Wu 
 L: qemu-...@nongnu.org
 S: Supported
-F: hw/*/npcm7xx*
-F: include/hw/*/npcm7xx*
-F: tests/qtest/npcm7xx*
+F: hw/*/npcm*
+F: include/hw/*/npcm*
+F: tests/qtest/npcm*
 F: pc-bios/npcm7xx_bootrom.bin
 F: roms/vbootrom
 F: docs/system/arm/nuvoton.rst
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 6fb69612e0..13f8fee5b6 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -61,7 +61,7 @@ softmmu_ss.add(when: 'CONFIG_IMX', if_true: files(
 softmmu_ss.add(when: 'CONFIG_MAINSTONE', if_true: files('mst_fpga.c'))
 softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: files(
   'npcm7xx_clk.c',
-  'npcm7xx_gcr.c',
+  'npcm_gcr.c',
   'npcm7xx_mft.c',
   'npcm7xx_pwm.c',
   'npcm7xx_rng.c',
diff --git a/hw/misc/npcm7xx_gcr.c b/hw/misc/npcm7xx_gcr.c
deleted file mode 100644
index eace9e1967..00
--- a/hw/misc/npcm7xx_gcr.c
+++ /dev/null
@@ -1,269 +0,0 @@
-/*
- * Nuvoton NPCM7xx System Global Control Registers.
- *
- * Copyright 2020 Google LLC
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the
- * Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
- * for more details.
- */
-
-#include "qemu/osdep.h"
-
-#include "hw/misc/npcm7xx_gcr.h"
-#include "hw/qdev-properties.h"
-#include "migration/vmstate.h"
-#include "qapi/error.h"
-#include "qemu/cutils.h"
-#include "qemu/log.h"
-#include "qemu/module.h"
-#include "qemu/units.h"
-
-#include "trace.h"
-
-#define NPCM7XX_GCR_MIN_DRAM_SIZE   (128 * MiB)
-#define NPCM7XX_GCR_MAX_DRAM_SIZE   (2 * GiB)
-
-enum NPCM7xxGCRRegisters {
-NPCM7XX_GCR_PDID,
-NPCM7XX_GCR_PWRON,
-NPCM7XX_GCR_MFSEL1  = 0x0c / sizeof(uint32_t),
-NPCM7XX_GCR_MFSEL2,
-NPCM7XX_GCR_MISCPE,
-NPCM7XX_GCR_SPSWC   = 0x038 / sizeof(uint32_t),
-NPCM7XX_GCR_INTCR,
-NPCM7XX_GCR_INTSR,
-NPCM7XX_GCR_HIFCR   = 0x050 / sizeof(uint32_t),
-NPCM7XX_GCR_INTCR2  = 0x060 / sizeof(uint32_t),
-NPCM7XX_GCR_MFSEL3,
-NPCM7XX_GCR_SRCNT,
-NPCM7XX_GCR_RESSR,
-NPCM7XX_GCR_RLOCKR1,
-NPCM7XX_GCR_FLOCKR1,
-NPCM7XX_GCR_DSCNT,
-NPCM7XX_GCR_MDLR,
-NPCM7XX_GCR_SCRPAD3,
-NPCM7XX_GCR_SCRPAD2,
-NPCM7XX_GCR_DAVCLVLR= 0x098 / sizeof(uint32_t),
-NPCM7XX_GCR_INTCR3,
-NPCM7XX_GCR_VSINTR  = 0x0ac / sizeof(uint32_t),
-NPCM7XX_GCR_MFSEL4,
-NPCM7XX_GCR_CPBPNTR = 0x0c4 / sizeof(uint32_t),
-NPCM7XX_GCR_CPCTL   = 0x0d0 / sizeof(uint32_t),
-NPCM7XX_GCR_CP2BST,
-NPCM7XX_GCR_B2CPNT,
-NPCM7XX_GCR_CPPCTL,
-NPCM7XX_GCR_I2CSEGSEL,
-NPCM7XX_GCR_I2CSEGCTL,
-NPCM7XX_GCR_VSRCR,
-NPCM7XX_GCR_MLOCKR,
-NPCM7XX_GCR_SCRPAD  = 0x013c / sizeof(uint32_t),
-NPCM7XX_GCR_USB1PHYCTL,
-NPCM7XX_GCR_USB2PHYCTL,
-NPCM7XX_GCR_REGS_END,
-};
-
-static const uint32_t cold_reset_values[NPCM7XX_GCR_NR_REGS] = {
-[NPCM7XX_GCR_PDID]  = 0x04a92750,   /* Poleg A1 */
-[NPCM7XX_GCR_MISCPE]= 0x,
-[NPCM7XX_GCR_SPSWC] = 0x0003,
-[NPCM7XX_GCR_INTCR] = 0x035e,
-[NPCM7XX_GCR_HIFCR] = 0x004e,
-[NPCM7XX_GCR_INTCR2]= (1U << 19),   /* DDR initialized */
-[NPCM7XX_GCR_RESSR] = 0x8000,
-[NPCM7XX_GCR_DSCNT] = 0x00c0,
-[NPCM7XX_GCR_DAVCLVLR]  = 0x5a00f3cf,
-[NPCM7XX_GCR_SCRPAD]= 0x0008,
-[NPCM7XX_GCR_USB1PHYCTL]= 0x034730e4,
-[NPCM7XX_GCR_USB2PHYCTL]= 0x034730e4,
-};
-
-static uint64_t npcm7xx_gcr_read(void *opaque, hwaddr offset, unsigned size)
-{
-uint32_t reg = offset / 

[PATCH for-7.1 01/11] docs/system/arm: Add Description for NPCM8XX SoC

2022-04-05 Thread Hao Wu
NPCM8XX SoC is the successor of the NPCM7XX. It features quad-core
Cortex-A35 (Armv8, 64-bit) CPUs and some additional peripherals.

Signed-off-by: Hao Wu 
Reviewed-by: Patrick Venture 
---
 docs/system/arm/nuvoton.rst | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index ef2792076a..bead17fa7e 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -1,12 +1,13 @@
 Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``)
 
 
-The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
+The `Nuvoton iBMC`_ chips are a family of ARM-based SoCs that are
 designed to be used as Baseboard Management Controllers (BMCs) in various
-servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an
-assortment of peripherals targeted for either Enterprise or Data Center /
-Hyperscale applications. The former is a superset of the latter, so NPCM750 has
-all the peripherals of NPCM730 and more.
+servers. Currently there are two families: NPCM7XX series and
+NPCM8XX series. NPCM7XX series feature one or two ARM Cortex-A9 CPU cores,
+while NPCM8XX feature 4 ARM Cortex-A35 CPU cores. Both series contain a
+different assortment of peripherals targeted for either Enterprise or Data
+Center / Hyperscale applications.
 
 .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/
 
@@ -27,6 +28,8 @@ There are also two more SoCs, NPCM710 and NPCM705, which are 
single-core
 variants of NPCM750 and NPCM730, respectively. These are currently not
 supported by QEMU.
 
+The NPCM8xx SoC is the successor of the NPCM7xx SoC.
+
 Supported devices
 -
 
@@ -61,6 +64,8 @@ Missing devices
* System Wake-up Control (SWC)
* Shared memory (SHM)
* eSPI slave interface
+   * Block-tranfer interface (8XX only)
+   * Virtual UART (8XX only)
 
  * Ethernet controller (GMAC)
  * USB device (USBD)
@@ -76,6 +81,11 @@ Missing devices
  * Video capture
  * Encoding compression engine
  * Security features
+ * I3C buses (8XX only)
+ * Temperator sensor interface (8XX only)
+ * Virtual UART (8XX only)
+ * Flash monitor (8XX only)
+ * JTAG master (8XX only)
 
 Boot options
 
-- 
2.35.1.1094.g7c7d902a7c-goog




[PATCH for-7.1 04/11] hw/misc: Support NPCM8XX CLK Module Registers

2022-04-05 Thread Hao Wu
NPCM8XX adds a few new registers and have a different set of reset
values to the CLK modules. This patch supports them.

This patch doesn't support the new clock values generated by these
registers. Currently no modules use these new clock values so they
are not necessary at this point.
Implementation of these clocks might be required when implementing
these modules.

Signed-off-by: Hao Wu 
Reviewed-by: Titus Rwantare
---
 hw/misc/meson.build   |   2 +-
 hw/misc/{npcm7xx_clk.c => npcm_clk.c} | 238 ++
 hw/misc/trace-events  |   6 +-
 include/hw/arm/npcm7xx.h  |   4 +-
 include/hw/misc/{npcm7xx_clk.h => npcm_clk.h} |  43 ++--
 5 files changed, 219 insertions(+), 74 deletions(-)
 rename hw/misc/{npcm7xx_clk.c => npcm_clk.c} (81%)
 rename include/hw/misc/{npcm7xx_clk.h => npcm_clk.h} (83%)

diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 13f8fee5b6..b4e9d3f857 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -60,7 +60,7 @@ softmmu_ss.add(when: 'CONFIG_IMX', if_true: files(
 ))
 softmmu_ss.add(when: 'CONFIG_MAINSTONE', if_true: files('mst_fpga.c'))
 softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: files(
-  'npcm7xx_clk.c',
+  'npcm_clk.c',
   'npcm_gcr.c',
   'npcm7xx_mft.c',
   'npcm7xx_pwm.c',
diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm_clk.c
similarity index 81%
rename from hw/misc/npcm7xx_clk.c
rename to hw/misc/npcm_clk.c
index bc2b879feb..f4601a3e9a 100644
--- a/hw/misc/npcm7xx_clk.c
+++ b/hw/misc/npcm_clk.c
@@ -1,5 +1,5 @@
 /*
- * Nuvoton NPCM7xx Clock Control Registers.
+ * Nuvoton NPCM7xx/8xx Clock Control Registers.
  *
  * Copyright 2020 Google LLC
  *
@@ -16,7 +16,7 @@
 
 #include "qemu/osdep.h"
 
-#include "hw/misc/npcm7xx_clk.h"
+#include "hw/misc/npcm_clk.h"
 #include "hw/timer/npcm7xx_timer.h"
 #include "hw/qdev-clock.h"
 #include "migration/vmstate.h"
@@ -75,13 +75,65 @@ enum NPCM7xxCLKRegisters {
 NPCM7XX_CLK_REGS_END,
 };
 
+enum NPCM8xxCLKRegisters {
+NPCM8XX_CLK_CLKEN1,
+NPCM8XX_CLK_CLKSEL,
+NPCM8XX_CLK_CLKDIV1,
+NPCM8XX_CLK_PLLCON0,
+NPCM8XX_CLK_PLLCON1,
+NPCM8XX_CLK_SWRSTR,
+NPCM8XX_CLK_IPSRST1 = 0x20 / sizeof(uint32_t),
+NPCM8XX_CLK_IPSRST2,
+NPCM8XX_CLK_CLKEN2,
+NPCM8XX_CLK_CLKDIV2,
+NPCM8XX_CLK_CLKEN3,
+NPCM8XX_CLK_IPSRST3,
+NPCM8XX_CLK_WD0RCR,
+NPCM8XX_CLK_WD1RCR,
+NPCM8XX_CLK_WD2RCR,
+NPCM8XX_CLK_SWRSTC1,
+NPCM8XX_CLK_SWRSTC2,
+NPCM8XX_CLK_SWRSTC3,
+NPCM8XX_CLK_TIPRSTC,
+NPCM8XX_CLK_PLLCON2,
+NPCM8XX_CLK_CLKDIV3,
+NPCM8XX_CLK_CORSTC,
+NPCM8XX_CLK_PLLCONG,
+NPCM8XX_CLK_AHBCKFI,
+NPCM8XX_CLK_SECCNT,
+NPCM8XX_CLK_CNTR25M,
+/* Registers unique to NPCM8XX SoC */
+NPCM8XX_CLK_CLKEN4,
+NPCM8XX_CLK_IPSRST4,
+NPCM8XX_CLK_BUSTO,
+NPCM8XX_CLK_CLKDIV4,
+NPCM8XX_CLK_WD0RCRB,
+NPCM8XX_CLK_WD1RCRB,
+NPCM8XX_CLK_WD2RCRB,
+NPCM8XX_CLK_SWRSTC1B,
+NPCM8XX_CLK_SWRSTC2B,
+NPCM8XX_CLK_SWRSTC3B,
+NPCM8XX_CLK_TIPRSTCB,
+NPCM8XX_CLK_CORSTCB,
+NPCM8XX_CLK_IPSRSTDIS1,
+NPCM8XX_CLK_IPSRSTDIS2,
+NPCM8XX_CLK_IPSRSTDIS3,
+NPCM8XX_CLK_IPSRSTDIS4,
+NPCM8XX_CLK_CLKENDIS1,
+NPCM8XX_CLK_CLKENDIS2,
+NPCM8XX_CLK_CLKENDIS3,
+NPCM8XX_CLK_CLKENDIS4,
+NPCM8XX_CLK_THRTL_CNT,
+NPCM8XX_CLK_REGS_END,
+};
+
 /*
  * These reset values were taken from version 0.91 of the NPCM750R data sheet.
  *
  * All are loaded on power-up reset. CLKENx and SWRSTR should also be loaded on
  * core domain reset, but this reset type is not yet supported by QEMU.
  */
-static const uint32_t cold_reset_values[NPCM7XX_CLK_NR_REGS] = {
+static const uint32_t npcm7xx_cold_reset_values[NPCM7XX_CLK_NR_REGS] = {
 [NPCM7XX_CLK_CLKEN1]= 0x,
 [NPCM7XX_CLK_CLKSEL]= 0x004a,
 [NPCM7XX_CLK_CLKDIV1]   = 0x5413f855,
@@ -103,6 +155,46 @@ static const uint32_t 
cold_reset_values[NPCM7XX_CLK_NR_REGS] = {
 [NPCM7XX_CLK_AHBCKFI]   = 0x00c8,
 };
 
+/*
+ * These reset values were taken from version 0.92 of the NPCM8xx data sheet.
+ */
+static const uint32_t npcm8xx_cold_reset_values[NPCM8XX_CLK_NR_REGS] = {
+[NPCM8XX_CLK_CLKEN1]= 0x,
+[NPCM8XX_CLK_CLKSEL]= 0x154a,
+[NPCM8XX_CLK_CLKDIV1]   = 0x5413f855,
+[NPCM8XX_CLK_PLLCON0]   = 0x00222101 | PLLCON_LOKI,
+[NPCM8XX_CLK_PLLCON1]   = 0x00202101 | PLLCON_LOKI,
+[NPCM8XX_CLK_IPSRST1]   = 0x1000,
+[NPCM8XX_CLK_IPSRST2]   = 0x8000,
+[NPCM8XX_CLK_CLKEN2]= 0x,
+[NPCM8XX_CLK_CLKDIV2]   = 0xaa4f8f9f,
+[NPCM8XX_CLK_CLKEN3]= 0x,
+[NPCM8XX_CLK_IPSRST3]   = 0x0300,
+[NPCM8XX_CLK_WD0RCR]= 0x,
+[NPCM8XX_CLK_WD1RCR]= 0x,
+[NPCM8XX_CLK_WD2RCR]= 0x,
+[NPCM8XX_CLK_SWRSTC1]   = 0x0003,
+[NPCM8XX_CLK_SWRSTC2]   = 0x0001,
+

[PATCH for-7.1 10/11] hw/arm: Add NPCM8XX SoC

2022-04-05 Thread Hao Wu
The file contains a basic NPCM8XX SOC file. It's forked
from the NPCM7XX SOC with some changes.

Signed-off-by: Hao Wu 
Reviwed-by: Patrick Venture 
Reviwed-by: Titus Rwantare 
---
 configs/devices/aarch64-softmmu/default.mak |   1 +
 hw/arm/Kconfig  |  11 +
 hw/arm/meson.build  |   1 +
 hw/arm/npcm8xx.c| 806 
 include/hw/arm/npcm8xx.h| 106 +++
 5 files changed, 925 insertions(+)
 create mode 100644 hw/arm/npcm8xx.c
 create mode 100644 include/hw/arm/npcm8xx.h

diff --git a/configs/devices/aarch64-softmmu/default.mak 
b/configs/devices/aarch64-softmmu/default.mak
index cf43ac8da1..1c3cf6dda1 100644
--- a/configs/devices/aarch64-softmmu/default.mak
+++ b/configs/devices/aarch64-softmmu/default.mak
@@ -6,3 +6,4 @@ include ../arm-softmmu/default.mak
 CONFIG_XLNX_ZYNQMP_ARM=y
 CONFIG_XLNX_VERSAL=y
 CONFIG_SBSA_REF=y
+CONFIG_NPCM8XX=y
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 97f3b38019..ed5d37ba01 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -408,6 +408,17 @@ config NPCM7XX
 select UNIMP
 select PCA954X
 
+config NPCM8XX
+bool
+select ARM_GIC
+select SMBUS
+select PL310  # cache controller
+select NPCM7XX
+select SERIAL
+select SSI
+select UNIMP
+
+
 config FSL_IMX25
 bool
 imply I2C_DEVICES
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 721a8eb8be..cf824241c5 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -14,6 +14,7 @@ arm_ss.add(when: 'CONFIG_MUSICPAL', if_true: 
files('musicpal.c'))
 arm_ss.add(when: 'CONFIG_NETDUINO2', if_true: files('netduino2.c'))
 arm_ss.add(when: 'CONFIG_NETDUINOPLUS2', if_true: files('netduinoplus2.c'))
 arm_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx.c', 
'npcm7xx_boards.c'))
+arm_ss.add(when: 'CONFIG_NPCM8XX', if_true: files('npcm8xx.c'))
 arm_ss.add(when: 'CONFIG_NSERIES', if_true: files('nseries.c'))
 arm_ss.add(when: 'CONFIG_SX1', if_true: files('omap_sx1.c'))
 arm_ss.add(when: 'CONFIG_CHEETAH', if_true: files('palm.c'))
diff --git a/hw/arm/npcm8xx.c b/hw/arm/npcm8xx.c
new file mode 100644
index 00..afcf8330d5
--- /dev/null
+++ b/hw/arm/npcm8xx.c
@@ -0,0 +1,806 @@
+/*
+ * Nuvoton NPCM8xx SoC family.
+ *
+ * Copyright 2022 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/arm/boot.h"
+#include "hw/arm/npcm8xx.h"
+#include "hw/char/serial.h"
+#include "hw/intc/arm_gic.h"
+#include "hw/loader.h"
+#include "hw/misc/unimp.h"
+#include "hw/qdev-clock.h"
+#include "hw/qdev-properties.h"
+#include "qapi/error.h"
+#include "qemu/units.h"
+#include "sysemu/sysemu.h"
+
+#define ARM_PHYS_TIMER_PPI  30
+#define ARM_VIRT_TIMER_PPI  27
+#define ARM_HYP_TIMER_PPI   26
+#define ARM_SEC_TIMER_PPI   29
+
+/*
+ * This covers the whole MMIO space. We'll use this to catch any MMIO accesses
+ * that aren't handled by a device.
+ */
+#define NPCM8XX_MMIO_BA (0x8000)
+#define NPCM8XX_MMIO_SZ (0x7ffd)
+
+/* OTP fuse array */
+#define NPCM8XX_OTP_BA  (0xf0189000)
+
+/* GIC Distributor */
+#define NPCM8XX_GICD_BA (0xdfff9000)
+#define NPCM8XX_GICC_BA (0xdfffa000)
+
+/* Core system modules. */
+#define NPCM8XX_CPUP_BA (0xf03fe000)
+#define NPCM8XX_GCR_BA  (0xf080)
+#define NPCM8XX_CLK_BA  (0xf0801000)
+#define NPCM8XX_MC_BA   (0xf0824000)
+#define NPCM8XX_RNG_BA  (0xf000b000)
+
+/* ADC Module */
+#define NPCM8XX_ADC_BA  (0xf000c000)
+
+/* Internal AHB SRAM */
+#define NPCM8XX_RAM3_BA (0xc0008000)
+#define NPCM8XX_RAM3_SZ (4 * KiB)
+
+/* Memory blocks at the end of the address space */
+#define NPCM8XX_RAM2_BA (0xfffb)
+#define NPCM8XX_RAM2_SZ (256 * KiB)
+#define NPCM8XX_ROM_BA  (0x0100)
+#define NPCM8XX_ROM_SZ  (64 * KiB)
+
+/* SDHCI Modules */
+#define NPCM8XX_MMC_BA  (0xf0842000)
+
+/* Run PLL1 at 1600 MHz */
+#define NPCM8XX_PLLCON1_FIXUP_VAL   (0x00402101)
+/* Run the CPU from PLL1 and UART from PLL2 */
+#define NPCM8XX_CLKSEL_FIXUP_VAL(0x004aaba9)
+
+/* Clock configuration values to be fixed up when bypassing bootloader */
+
+/*
+ * Interrupt lines going into the GIC. This does not include internal Cortex-A9
+ * interrupts.
+ */
+enum NPCM8xxInterrupt {
+NPCM8XX_ADC_IRQ = 0,
+NPCM8XX_KCS_HIB_IRQ = 9,
+NPCM8XX_MMC_IRQ = 26,
+NPCM8XX_TIMER0_IRQ  = 

[PATCH for-7.1 02/11] hw/ssi: Make flash size a property in NPCM7XX FIU

2022-04-05 Thread Hao Wu
This allows different FIUs to have different flash sizes, useful
in NPCM8XX which has multiple different sized FIU modules.

Signed-off-by: Hao Wu 
Reviewed-by: Patrick Venture 
---
 hw/arm/npcm7xx.c | 6 ++
 hw/ssi/npcm7xx_fiu.c | 6 ++
 include/hw/ssi/npcm7xx_fiu.h | 1 +
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index d85cc02765..9946b94120 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -274,17 +274,21 @@ static const struct {
 hwaddr regs_addr;
 int cs_count;
 const hwaddr *flash_addr;
+size_t flash_size;
 } npcm7xx_fiu[] = {
 {
 .name = "fiu0",
 .regs_addr = 0xfb00,
 .cs_count = ARRAY_SIZE(npcm7xx_fiu0_flash_addr),
 .flash_addr = npcm7xx_fiu0_flash_addr,
+.flash_size = 128 * MiB,
+
 }, {
 .name = "fiu3",
 .regs_addr = 0xc000,
 .cs_count = ARRAY_SIZE(npcm7xx_fiu3_flash_addr),
 .flash_addr = npcm7xx_fiu3_flash_addr,
+.flash_size = 128 * MiB,
 },
 };
 
@@ -686,6 +690,8 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 
 object_property_set_int(OBJECT(sbd), "cs-count",
 npcm7xx_fiu[i].cs_count, _abort);
+object_property_set_int(OBJECT(sbd), "flash-size",
+npcm7xx_fiu[i].flash_size, _abort);
 sysbus_realize(sbd, _abort);
 
 sysbus_mmio_map(sbd, 0, npcm7xx_fiu[i].regs_addr);
diff --git a/hw/ssi/npcm7xx_fiu.c b/hw/ssi/npcm7xx_fiu.c
index 4eedb2927e..ea490f1332 100644
--- a/hw/ssi/npcm7xx_fiu.c
+++ b/hw/ssi/npcm7xx_fiu.c
@@ -28,9 +28,6 @@
 
 #include "trace.h"
 
-/* Up to 128 MiB of flash may be accessed directly as memory. */
-#define NPCM7XX_FIU_FLASH_WINDOW_SIZE (128 * MiB)
-
 /* Each module has 4 KiB of register space. Only a fraction of it is used. */
 #define NPCM7XX_FIU_CTRL_REGS_SIZE (4 * KiB)
 
@@ -525,7 +522,7 @@ static void npcm7xx_fiu_realize(DeviceState *dev, Error 
**errp)
 flash->fiu = s;
 memory_region_init_io(>direct_access, OBJECT(s),
   _fiu_flash_ops, >flash[i], "flash",
-  NPCM7XX_FIU_FLASH_WINDOW_SIZE);
+  s->flash_size);
 sysbus_init_mmio(sbd, >direct_access);
 }
 }
@@ -543,6 +540,7 @@ static const VMStateDescription vmstate_npcm7xx_fiu = {
 
 static Property npcm7xx_fiu_properties[] = {
 DEFINE_PROP_INT32("cs-count", NPCM7xxFIUState, cs_count, 0),
+DEFINE_PROP_SIZE("flash-size", NPCM7xxFIUState, flash_size, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/ssi/npcm7xx_fiu.h b/include/hw/ssi/npcm7xx_fiu.h
index a3a1704289..1785ea16f4 100644
--- a/include/hw/ssi/npcm7xx_fiu.h
+++ b/include/hw/ssi/npcm7xx_fiu.h
@@ -60,6 +60,7 @@ struct NPCM7xxFIUState {
 int32_t cs_count;
 int32_t active_cs;
 qemu_irq *cs_lines;
+size_t flash_size;
 NPCM7xxFIUFlash *flash;
 
 SSIBus *spi;
-- 
2.35.1.1094.g7c7d902a7c-goog




[PATCH for-7.1 00/11] hw/arm: Add NPCM8XX support

2022-04-05 Thread Hao Wu
NPCM8XX BMCs are the successors of the NPCM7XX BMCs. They feature
quad-core ARM Cortex A35 that supports both 32 bits and 64 bits
operations. This patch set aims to support basic functionalities
of the NPCM7XX BMCs. The patch set includes:

1. We derive most devices from the 7XX models and
made some modifications.
2. We have constructed a minimum vBootROM similar to the 7XX one at
https://github.com/google/vbootrom/tree/master/npcm8xx
and included it in the patch set.
3.  We added a new NPCM8XX SOC and an evaluation
board machine npcm845-evb.

The OpenBMC for NPCM845 evaluation board can be found at:
https://github.com/Nuvoton-Israel/openbmc/tree/npcm-v2.10/meta-evb/meta-evb-nuvoton/meta-evb-npcm845

The patch set can boot the evaluation board image built from the source
above to login prompt.

Hao Wu (11):
  docs/system/arm: Add Description for NPCM8XX SoC
  hw/ssi: Make flash size a property in NPCM7XX FIU
  hw/misc: Support NPCM8XX GCR module
  hw/misc: Support NPCM8XX CLK Module Registers
  hw/misc: Store DRAM size in NPCM8XX GCR Module
  hw/intc: Add a property to allow GIC to reset into non secure mode
  hw/misc: Support 8-bytes memop in NPCM GCR module
  hw/net: Add NPCM8XX PCS Module
  pc-bios: Add NPCM8xx Bootrom
  hw/arm: Add NPCM8XX SoC
  hw/arm: Add NPCM845 Evaluation board

 MAINTAINERS   |   9 +-
 configs/devices/aarch64-softmmu/default.mak   |   1 +
 docs/system/arm/nuvoton.rst   |  20 +-
 hw/arm/Kconfig|  11 +
 hw/arm/meson.build|   1 +
 hw/arm/npcm7xx.c  |   6 +
 hw/arm/npcm8xx.c  | 806 ++
 hw/arm/npcm8xx_boards.c   | 257 ++
 hw/intc/arm_gic_common.c  |   2 +
 hw/misc/meson.build   |   4 +-
 hw/misc/npcm7xx_gcr.c | 269 --
 hw/misc/{npcm7xx_clk.c => npcm_clk.c} | 238 --
 hw/misc/npcm_gcr.c| 492 +++
 hw/misc/trace-events  |  12 +-
 hw/net/meson.build|   1 +
 hw/net/npcm_pcs.c | 409 +
 hw/net/trace-events   |   4 +
 hw/ssi/npcm7xx_fiu.c  |   6 +-
 include/hw/arm/npcm7xx.h  |   8 +-
 include/hw/arm/npcm8xx.h  | 126 +++
 include/hw/misc/{npcm7xx_clk.h => npcm_clk.h} |  43 +-
 include/hw/misc/{npcm7xx_gcr.h => npcm_gcr.h} |  30 +-
 include/hw/net/npcm_pcs.h |  42 +
 include/hw/ssi/npcm7xx_fiu.h  |   1 +
 pc-bios/npcm8xx_bootrom.bin   | Bin 0 -> 608 bytes
 25 files changed, 2428 insertions(+), 370 deletions(-)
 create mode 100644 hw/arm/npcm8xx.c
 create mode 100644 hw/arm/npcm8xx_boards.c
 delete mode 100644 hw/misc/npcm7xx_gcr.c
 rename hw/misc/{npcm7xx_clk.c => npcm_clk.c} (81%)
 create mode 100644 hw/misc/npcm_gcr.c
 create mode 100644 hw/net/npcm_pcs.c
 create mode 100644 include/hw/arm/npcm8xx.h
 rename include/hw/misc/{npcm7xx_clk.h => npcm_clk.h} (83%)
 rename include/hw/misc/{npcm7xx_gcr.h => npcm_gcr.h} (55%)
 create mode 100644 include/hw/net/npcm_pcs.h
 create mode 100644 pc-bios/npcm8xx_bootrom.bin

-- 
2.35.1.1094.g7c7d902a7c-goog




Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Vladimir Sementsov-Ogievskiy

05.04.2022 17:41, Kevin Wolf wrote:

Am 05.04.2022 um 14:12 hat Vladimir Sementsov-Ogievskiy geschrieben:

Thanks Kevin! I have already run out of arguments in the battle
against using subtree-drains to isolate graph modification operations
from each other in different threads in the mailing list)

(Note also, that the top-most version of this patch is "[PATCH v2]
block/stream: Drain subtree around graph change")


Oops, I completely missed the v2. Thanks!


About avoiding polling during graph-modifying operations, there is a
problem: some IO operations are involved into block-graph modifying
operations. At least it's rewriting "backing_file_offset" and
"backing_file_size" fields in qcow2 header.

We can't just separate rewriting metadata from graph modifying
operation: this way another graph-modifying operation may interleave
and we'll write outdated metadata.


Hm, generally we don't update image metadata when we reconfigure the
graph. Most changes are temporary (like insertion of filter nodes) and
the image header only contains a "default configuration" to be used on
the next start.

There are only a few places that update the image header; I think it's
generally block job completions. They obviously update the in-memory
graph, too, but they don't write to the image file (and therefore
potentially poll) in the middle of updating the in-memory graph, but
they do both in separate steps.

I think this is okay. We must just avoid polling in the middle of graph
updates because if something else changes the graph there, it's not
clear any more that we're really doing what the caller had in mind.


Hmm, interesting where is polling in described case?

First possible place I can find is bdrv_parent_drained_begin_single() in 
bdrv_replace_child_noperm().

Another is bdrv_apply_subtree_drain() in bdrv_child_cb_attach().

No idea how to get rid of them. Hmm.

I think, the core problem here is that when we wait in drained_begin(), nobody 
protects us from attaching one more node to the drained subgraph. And we should 
handle this, that's the complexity.




So I still think, we need a kind of global lock for graph modifying
operations. Or a kind per-BDS locks as you propose. But in this case
we need to be sure that taking all needed per-BDS locks we'll avoid
deadlocking.


I guess this depends on the exact granularity of the locks we're using.
If you take the lock only while updating a single edge, I don't think
you could easily deadlock. If you hold it for more complex operations,
it becomes harder to tell without checking the code.



I think, keeping the whole operation, like reopen_multiple, or some job's 
.prepare(), etc., under one critical section is simplest to analyze.

Could this be something like this?

  uint8_t graph_locked;

  void graph_lock(AioContext *ctx) {
AIO_POLL_WHILE(ctx, qatomic_cmpxchg(_locked, 0, 1) == 1);
  }

  void graph_unlock() {
qatomic_set(_locked, 0);
aio_wait_kick();
  }

--
Best regards,
Vladimir



Re: [PATCH] acpi: Bodge acpi_index migration

2022-04-05 Thread Michael S. Tsirkin
On Tue, Apr 05, 2022 at 08:06:58PM +0100, Dr. David Alan Gilbert (git) wrote:

The patch is fine but pls repost as text not as
application/octet-stream.

Thanks!

-- 
MST




Re: [PATCH] acpi: Bodge acpi_index migration

2022-04-05 Thread Alex Williamson
On Tue,  5 Apr 2022 20:06:58 +0100
"Dr. David Alan Gilbert (git)"  wrote:

> From: "Dr. David Alan Gilbert" 
> 
> The 'acpi_index' field is a statically configured field, which for
> some reason is migrated; this never makes much sense because it's
> command line static.
> 
> However, on piix4 it's conditional, and the condition/test function
> ends up having the wrong pointer passed to it (it gets a PIIX4PMState
> not the AcpiPciHpState it was expecting, because VMSTATE_PCI_HOTPLUG
> is a macro and not another struct).  This means the field is randomly
> loaded/saved based on a random pointer.  In 6.x this random pointer
> randomly seems to get 0 for everyone (!); in 7.0rc it's getting junk
> and trying to load a field that the source didn't send.

FWIW, after some hunting and pecking, 6.2 (64bit):

(gdb) p &((struct AcpiPciHpState *)0)->acpi_index
$1 = (uint32_t *) 0xc04

(gdb) p &((struct PIIX4PMState *)0)->ar.tmr.io.addr
$2 = (hwaddr *) 0xc00

f53faa70bb63:

(gdb) p &((struct AcpiPciHpState *)0)->acpi_index
$1 = (uint32_t *) 0xc04

(gdb) p &((struct PIIX4PMState *)0)->io_gpe.coalesced.tqh_circ.tql_prev
$2 = (struct QTailQLink **) 0xc00

So yeah, it seems 0xc04 will always be part of a pointer on current
mainline.  I can't really speak to the ACPIPMTimer MemoryRegion in the
PIIX4PMState, maybe if there's a hwaddr it's always 32bit and the upper
dword is reliably zero?  Thanks,

Alex

>  The migration
> stream gets out of line and hits the section footer.
> 
> The bodge is on piix4 never to load the field:
>   a) Most 6.x builds never send it, so most of the time the migration
> will work.
>   b) We can backport this fix to 6.x to remove the boobytrap.
>   c) It should never have made a difference anyway since the acpi-index
> is command line configured and should be correct on the destination
> anyway
>   d) ich9 is still sending/receiving this (unconditionally all the time)
> but due to (c) should never notice.  We could follow up to make it
> skip.
> 
> It worries me just when (a) actually happens.
> 
> Fixes: b32bd76 ("pci: introduce acpi-index property for PCI device")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/932
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  hw/acpi/acpi-pci-hotplug-stub.c |  4 
>  hw/acpi/pcihp.c |  6 --
>  hw/acpi/piix4.c | 11 ++-
>  include/hw/acpi/pcihp.h |  2 --
>  4 files changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/acpi/acpi-pci-hotplug-stub.c b/hw/acpi/acpi-pci-hotplug-stub.c
> index 734e4c5986..a43f6dafc9 100644
> --- a/hw/acpi/acpi-pci-hotplug-stub.c
> +++ b/hw/acpi/acpi-pci-hotplug-stub.c
> @@ -41,7 +41,3 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool 
> acpihp_root_off)
>  return;
>  }
>  
> -bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id)
> -{
> -return false;
> -}
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 6351bd3424..bf65bbea49 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -554,12 +554,6 @@ void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, 
> PCIBus *root_bus,
> OBJ_PROP_FLAG_READ);
>  }
>  
> -bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id)
> -{
> - AcpiPciHpState *s = opaque;
> - return s->acpi_index;
> -}
> -
>  const VMStateDescription vmstate_acpi_pcihp_pci_status = {
>  .name = "acpi_pcihp_pci_status",
>  .version_id = 1,
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index cc37fa3416..48aeedd5f0 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -267,6 +267,15 @@ static bool piix4_vmstate_need_smbus(void *opaque, int 
> version_id)
>  return pm_smbus_vmstate_needed();
>  }
>  
> +/*
> + * This is a fudge to turn off the acpi_index field, whose
> + * test was always broken on piix4.
> + */
> +static bool vmstate_test_never(void *opaque, int version_id)
> +{
> +return false;
> +}
> +
>  /* qemu-kvm 1.2 uses version 3 but advertised as 2
>   * To support incoming qemu-kvm 1.2 migration, change version_id
>   * and minimum_version_id to 2 below (which breaks migration from
> @@ -297,7 +306,7 @@ static const VMStateDescription vmstate_acpi = {
>  struct AcpiPciHpPciStatus),
>  VMSTATE_PCI_HOTPLUG(acpi_pci_hotplug, PIIX4PMState,
>  vmstate_test_use_acpi_hotplug_bridge,
> -vmstate_acpi_pcihp_use_acpi_index),
> +vmstate_test_never),
>  VMSTATE_END_OF_LIST()
>  },
>  .subsections = (const VMStateDescription*[]) {
> diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h
> index af1a169fc3..7e268c2c9c 100644
> --- a/include/hw/acpi/pcihp.h
> +++ b/include/hw/acpi/pcihp.h
> @@ -73,8 +73,6 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool 
> acpihp_root_off);
>  
>  extern const VMStateDescription vmstate_acpi_pcihp_pci_status;
>  
> -bool 

Re: [RFC PATCH 0/4] hw/i2c: i2c slave mode support

2022-04-05 Thread Peter Delevoryas


> On Mar 31, 2022, at 9:57 AM, Klaus Jensen  wrote:
> 
> From: Klaus Jensen 
> 
> Hi all,
> 
> This RFC series adds I2C "slave mode" support for the Aspeed I2C
> controller as well as the necessary infrastructure in the i2c core to
> support this.
> 
> Background
> ~~
> We are working on an emulated NVM Express Management Interface[1] for
> testing and validation purposes. NVMe-MI is based on the MCTP
> protocol[2] which may use a variety of underlying transports. The one we
> are interested in is I2C[3].
> 
> The first general trickery here is that all MCTP transactions are based
> on the SMBus Block Write bus protocol[4]. This means that the slave must
> be able to master the bus to communicate. As you know, hw/i2c/core.c
> currently does not support this use case.

This is great, I’m attempting to use your changes right now for the same thing 
(MCTP).

> 
> The second issue is how to interact with these mastering devices. Jeremy
> and Matt (CC'ed) have been working on an MCTP stack for the Linux Kernel
> (already upstream) and an I2C binding driver[5] is currently under
> review. This binding driver relies on I2C slave mode support in the I2C
> controller.
> 
> This series
> ~~~
> Patch 1 adds support for multiple masters in the i2c core, allowing
> slaves to master the bus and safely issue i2c_send/recv(). Patch 2 adds
> an asynchronous send i2c_send_async(I2CBus *, uint8) on the bus that
> must be paired with an explicit ack using i2c_ack(I2CBus *).
> 
> Patch 3 adds the slave mode functionality to the emulated Aspeed I2C
> controller. The implementation is probably buggy since I had to rely on
> the implementation of the kernel driver to reverse engineer the behavior
> of the controller slave mode (I do not have access to a spec sheet for
> the Aspeed, but maybe someone can help me out with that?).
> 
> Finally, patch 4 adds an example device using this new API. The device
> is a simple "echo" device that upon being sent a set of bytes uses the
> first byte as the address of the slave to echo to.
> 
> With this combined I am able to boot up Linux on an emulated Aspeed 2600
> evaluation board and have the i2c echo device write into a Linux slave
> EEPROM. Assuming the echo device is on address 0x42:
> 
>  # echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-15/new_device
>  i2c i2c-15: new_device: Instantiated device slave-24c02 at 0x64
>  # i2cset -y 15 0x42 0x64 0x00 0xaa i
>  # hexdump /sys/bus/i2c/devices/15-1064/slave-eeprom
>  000 ffaa       
>  010        
>  *
>  100

When I try this with my system, it seems like the i2c-echo device takes over
the bus but never echoes the data to the EEPROM. Am I missing something to
make this work? It seems like the “i2c_send_async” calls aren’t happening,
which must be because the bottom half isn’t being scheduled, right? After
the i2c_do_start_transfer, how is the bottom half supposed to be scheduled
again? Is the slave receiving (the EEPROM) supposed to call i2c_ack or 
something?

root@bmc-oob:~# echo 24c02 0x1064 > /sys/bus/i2c/devices/i2c-8/new_device
[  135.559719] at24 8-1064: 256 byte 24c02 EEPROM, writable, 1 bytes/write
[  135.562661] i2c i2c-8: new_device: Instantiated device 24c02 at 0x64
root@bmc-oob:~# i2cset -y 8 0x42 0x64 0x00 0xaa i
i2c_echo_event: start send
i2c_echo_send: data[0] = 0x64
i2c_echo_send: data[1] = 0x00
i2c_echo_send: data[2] = 0xaa
i2c_echo_event: scheduling bottom-half
i2c_echo_bh: attempting to gain mastery of bus
i2c_echo_bh: starting a send to address 0x64
root@bmc-oob:~# hexdump -C /sys/bus/i2c/devices/8-1064/eeprom
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0100

Thanks again for this, it’s exactly what I needed.

> 
>  [1]: https://nvmexpress.org/developers/nvme-mi-specification/ 
>  [2]: 
> https://www.dmtf.org/sites/default/files/standards/documents/DSP0236_1.3.1.pdf
>  
>  [3]: 
> https://www.dmtf.org/sites/default/files/standards/documents/DSP0237_1.2.0.pdf
>  
>  [4]: http://www.smbus.org/specs/SMBus_3_1_20180319.pdf 
>  [5]: 
> https://lore.kernel.org/linux-i2c/20220218055106.1944485-1-m...@codeconstruct.com.au/
> 
> Klaus Jensen (4):
>  hw/i2c: support multiple masters
>  hw/i2c: add async send
>  hw/i2c: add slave mode for aspeed_i2c
>  hw/misc: add a toy i2c echo device
> 
> hw/i2c/aspeed_i2c.c |  95 +---
> hw/i2c/core.c   |  57 +-
> hw/i2c/trace-events |   2 +-
> hw/misc/i2c-echo.c  | 144 
> hw/misc/meson.build |   2 +
> include/hw/i2c/aspeed_i2c.h |   8 ++
> include/hw/i2c/i2c.h|  19 +
> 7 files changed, 316 insertions(+), 11 deletions(-)
> create mode 100644 hw/misc/i2c-echo.c
> 
> -- 
> 2.35.1
> 
> 



[PATCH for-7.1 1/1] hw/ppc: check if spapr_drc_index() returns NULL in spapr_nvdimm.c

2022-04-05 Thread Daniel Henrique Barboza
spapr_nvdimm_flush_completion_cb() and flush_worker_cb() are using the
DRC object returned by spapr_drc_index() without checking it for NULL.
In this case we would be dereferencing a NULL pointer when doing
SPAPR_NVDIMM(drc->dev) and PC_DIMM(drc->dev).

This can happen if, during a scm_flush(), the DRC object is wrongly
freed/released by another part of the code (i.e. hotunplug the device).
spapr_drc_index() would then return NULL in the callbacks.

Fixes: Coverity CID 1487108, 1487178
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_nvdimm.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
index c4c97da5de..e92d92fdae 100644
--- a/hw/ppc/spapr_nvdimm.c
+++ b/hw/ppc/spapr_nvdimm.c
@@ -447,9 +447,19 @@ static int flush_worker_cb(void *opaque)
 {
 SpaprNVDIMMDeviceFlushState *state = opaque;
 SpaprDrc *drc = spapr_drc_by_index(state->drcidx);
-PCDIMMDevice *dimm = PC_DIMM(drc->dev);
-HostMemoryBackend *backend = MEMORY_BACKEND(dimm->hostmem);
-int backend_fd = memory_region_get_fd(>mr);
+PCDIMMDevice *dimm;
+HostMemoryBackend *backend;
+int backend_fd;
+
+if (!drc) {
+error_report("papr_scm: Could not find nvdimm device with DRC 0x%u",
+ state->drcidx);
+return H_HARDWARE;
+}
+
+dimm = PC_DIMM(drc->dev);
+backend = MEMORY_BACKEND(dimm->hostmem);
+backend_fd = memory_region_get_fd(>mr);
 
 if (object_property_get_bool(OBJECT(backend), "pmem", NULL)) {
 MemoryRegion *mr = host_memory_backend_get_memory(dimm->hostmem);
@@ -475,7 +485,15 @@ static void spapr_nvdimm_flush_completion_cb(void *opaque, 
int hcall_ret)
 {
 SpaprNVDIMMDeviceFlushState *state = opaque;
 SpaprDrc *drc = spapr_drc_by_index(state->drcidx);
-SpaprNVDIMMDevice *s_nvdimm = SPAPR_NVDIMM(drc->dev);
+SpaprNVDIMMDevice *s_nvdimm;
+
+if (!drc) {
+error_report("papr_scm: Could not find nvdimm device with DRC 0x%u",
+ state->drcidx);
+return;
+}
+
+s_nvdimm = SPAPR_NVDIMM(drc->dev);
 
 state->hcall_ret = hcall_ret;
 QLIST_REMOVE(state, node);
-- 
2.35.1




[PATCH for-7.1 0/1] Coverity fixes in hw/ppc/spapr_nvdimm.c

2022-04-05 Thread Daniel Henrique Barboza
Hi,

This is a simple patch to fix 2 Coverity issues in
hw/ppc/spapr_nvdimm.c. Aiming it to 7.1 because it's not critical enough
for 7.0.

Daniel Henrique Barboza (1):
  hw/ppc: check if spapr_drc_index() returns NULL in spapr_nvdimm.c

 hw/ppc/spapr_nvdimm.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

-- 
2.35.1




[PATCH v2 8/9] target/ppc: Implemented vector module word/doubleword

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: "Lucas Mateus Castro (alqotel)" 

Implement the following PowerISA v3.1 instructions:
vmodsw: Vector Modulo Signed Word
vmoduw: Vector Modulo Unsigned Word
vmodsd: Vector Modulo Signed Doubleword
vmodud: Vector Modulo Unsigned Doubleword

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 target/ppc/insn32.decode|  5 +
 target/ppc/translate/vmx-impl.c.inc | 10 ++
 2 files changed, 15 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 3eb920ac76..36b42e41d2 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -719,3 +719,8 @@ VDIVESD 000100 . . . 0001011@VX
 VDIVEUD 000100 . . . 01011001011@VX
 VDIVESQ 000100 . . . 0111011@VX
 VDIVEUQ 000100 . . . 0101011@VX
+
+VMODSW  000100 . . . 0001011@VX
+VMODUW  000100 . . . 11010001011@VX
+VMODSD  000100 . . . 1001011@VX
+VMODUD  000100 . . . 11011001011@VX
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index 23f215dbea..c5178a0f1e 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -3340,6 +3340,11 @@ static void do_diveu_i32(TCGv_i32 t, TCGv_i32 a, 
TCGv_i32 b)
 DO_VDIV_VMOD(do_divesw, 32, do_dives_i32, true)
 DO_VDIV_VMOD(do_diveuw, 32, do_diveu_i32, false)
 
+DO_VDIV_VMOD(do_modsw, 32, tcg_gen_rem_i32, true)
+DO_VDIV_VMOD(do_moduw, 32, tcg_gen_remu_i32, false)
+DO_VDIV_VMOD(do_modsd, 64, tcg_gen_rem_i64, true)
+DO_VDIV_VMOD(do_modud, 64, tcg_gen_remu_i64, false)
+
 TRANS_VDIV_VMOD(ISA310, VDIVESW, MO_32, do_divesw, NULL)
 TRANS_VDIV_VMOD(ISA310, VDIVEUW, MO_32, do_diveuw, NULL)
 TRANS_FLAGS2(ISA310, VDIVESD, do_vx_helper, gen_helper_VDIVESD)
@@ -3347,6 +3352,11 @@ TRANS_FLAGS2(ISA310, VDIVEUD, do_vx_helper, 
gen_helper_VDIVEUD)
 TRANS_FLAGS2(ISA310, VDIVESQ, do_vx_helper, gen_helper_VDIVESQ)
 TRANS_FLAGS2(ISA310, VDIVEUQ, do_vx_helper, gen_helper_VDIVEUQ)
 
+TRANS_VDIV_VMOD(ISA310, VMODSW, MO_32, do_modsw , NULL)
+TRANS_VDIV_VMOD(ISA310, VMODUW, MO_32, do_moduw, NULL)
+TRANS_VDIV_VMOD(ISA310, VMODSD, MO_64, NULL, do_modsd)
+TRANS_VDIV_VMOD(ISA310, VMODUD, MO_64, NULL, do_modud)
+
 #undef DO_VDIV_VMOD
 
 #undef GEN_VR_LDX
-- 
2.31.1




[PATCH v2 9/9] target/ppc: Implemented vector module quadword

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: "Lucas Mateus Castro (alqotel)" 

Implement the following PowerISA v3.1 instructions:
vmodsq: Vector Modulo Signed Quadword
vmoduq: Vector Modulo Unsigned Quadword

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 target/ppc/helper.h |  2 ++
 target/ppc/insn32.decode|  2 ++
 target/ppc/int_helper.c | 21 +
 target/ppc/translate/vmx-impl.c.inc |  2 ++
 4 files changed, 27 insertions(+)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 67ecff2c9a..881e03959a 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -177,6 +177,8 @@ DEF_HELPER_FLAGS_3(VDIVESD, TCG_CALL_NO_RWG, void, avr, 
avr, avr)
 DEF_HELPER_FLAGS_3(VDIVEUD, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VDIVESQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VDIVEUQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(VMODSQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(VMODUQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_3(vslo, void, avr, avr, avr)
 DEF_HELPER_3(vsro, void, avr, avr, avr)
 DEF_HELPER_3(vsrv, void, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 36b42e41d2..b53efe1915 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -724,3 +724,5 @@ VMODSW  000100 . . . 0001011@VX
 VMODUW  000100 . . . 11010001011@VX
 VMODSD  000100 . . . 1001011@VX
 VMODUD  000100 . . . 11011001011@VX
+VMODSQ  000100 . . . 1111011@VX
+VMODUQ  000100 . . . 1101011@VX
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 17a10c4412..72b2b06078 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1121,6 +1121,27 @@ void helper_VDIVEUQ(ppc_avr_t *t, ppc_avr_t *a, 
ppc_avr_t *b)
 }
 }
 
+void helper_VMODSQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b)
+{
+Int128 neg1 = int128_makes64(-1);
+Int128 int128_min = int128_make128(0, INT64_MIN);
+if (likely(int128_nz(b->s128) &&
+  (int128_ne(a->s128, int128_min) || int128_ne(b->s128, neg1 {
+t->s128 = int128_rems(a->s128, b->s128);
+} else {
+t->s128 = int128_zero(); /* Undefined behavior */
+}
+}
+
+void helper_VMODUQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b)
+{
+if (likely(int128_nz(b->s128))) {
+t->s128 = int128_remu(a->s128, b->s128);
+} else {
+t->s128 = int128_zero(); /* Undefined behavior */
+}
+}
+
 void helper_VPERM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
 ppc_avr_t result;
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index c5178a0f1e..7ced7ad655 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -3356,6 +3356,8 @@ TRANS_VDIV_VMOD(ISA310, VMODSW, MO_32, do_modsw , NULL)
 TRANS_VDIV_VMOD(ISA310, VMODUW, MO_32, do_moduw, NULL)
 TRANS_VDIV_VMOD(ISA310, VMODSD, MO_64, NULL, do_modsd)
 TRANS_VDIV_VMOD(ISA310, VMODUD, MO_64, NULL, do_modud)
+TRANS_FLAGS2(ISA310, VMODSQ, do_vx_helper, gen_helper_VMODSQ)
+TRANS_FLAGS2(ISA310, VMODUQ, do_vx_helper, gen_helper_VMODUQ)
 
 #undef DO_VDIV_VMOD
 
-- 
2.31.1




[PATCH v2 7/9] target/ppc: Implemented remaining vector divide extended

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: "Lucas Mateus Castro (alqotel)" 

Implement the following PowerISA v3.1 instructions:
vdivesd: Vector Divide Extended Signed Doubleword
vdiveud: Vector Divide Extended Unsigned Doubleword
vdivesq: Vector Divide Extended Signed Quadword
vdiveuq: Vector Divide Extended Unsigned Quadword

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 target/ppc/helper.h |  4 ++
 target/ppc/insn32.decode|  4 ++
 target/ppc/int_helper.c | 64 +
 target/ppc/translate/vmx-impl.c.inc |  4 ++
 4 files changed, 76 insertions(+)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 4cfdf7b3ec..67ecff2c9a 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -173,6 +173,10 @@ DEF_HELPER_FLAGS_3(VMULOUH, TCG_CALL_NO_RWG, void, avr, 
avr, avr)
 DEF_HELPER_FLAGS_3(VMULOUW, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VDIVSQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VDIVUQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(VDIVESD, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(VDIVEUD, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(VDIVESQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(VDIVEUQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_3(vslo, void, avr, avr, avr)
 DEF_HELPER_3(vsro, void, avr, avr, avr)
 DEF_HELPER_3(vsrv, void, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 8c115c9c60..3eb920ac76 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -715,3 +715,7 @@ VDIVUQ  000100 . . . 0001011@VX
 
 VDIVESW 000100 . . . 01110001011@VX
 VDIVEUW 000100 . . . 01010001011@VX
+VDIVESD 000100 . . . 0001011@VX
+VDIVEUD 000100 . . . 01011001011@VX
+VDIVESQ 000100 . . . 0111011@VX
+VDIVEUQ 000100 . . . 0101011@VX
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index ba5d4193ff..17a10c4412 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1057,6 +1057,70 @@ void helper_VDIVUQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t 
*b)
 }
 }
 
+void helper_VDIVESD(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b)
+{
+int i;
+int64_t high;
+uint64_t low;
+for (i = 0; i < 2; i++) {
+high = a->s64[i];
+low = 0;
+if (unlikely((high == INT64_MIN && b->s64[i] == -1) || !b->s64[i])) {
+t->s64[i] = a->s64[i]; /* Undefined behavior */
+} else {
+divs128(, , b->s64[i]);
+t->s64[i] = low;
+}
+}
+}
+
+void helper_VDIVEUD(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b)
+{
+int i;
+uint64_t high, low;
+for (i = 0; i < 2; i++) {
+high = a->u64[i];
+low = 0;
+if (unlikely(!b->u64[i])) {
+t->u64[i] = a->u64[i]; /* Undefined behavior */
+} else {
+divu128(, , b->u64[i]);
+t->u64[i] = low;
+}
+}
+}
+
+void helper_VDIVESQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b)
+{
+Int128 high, low;
+Int128 int128_min = int128_make128(0, INT64_MIN);
+Int128 neg1 = int128_makes64(-1);
+
+high = a->s128;
+low = int128_zero();
+if (unlikely(!int128_nz(b->s128) ||
+ (int128_eq(b->s128, neg1) && int128_eq(high, int128_min {
+t->s128 = a->s128; /* Undefined behavior */
+} else {
+divs256(, , b->s128);
+t->s128 = low;
+}
+}
+
+void helper_VDIVEUQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b)
+{
+Int128 high, low;
+
+high = a->s128;
+low = int128_zero();
+if (unlikely(!int128_nz(b->s128))) {
+t->s128 = a->s128; /* Undefined behavior */
+} else {
+divu256(, , b->s128);
+t->s128 = low;
+}
+}
+
 void helper_VPERM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
 ppc_avr_t result;
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index 8799e945bd..23f215dbea 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -3342,6 +3342,10 @@ DO_VDIV_VMOD(do_diveuw, 32, do_diveu_i32, false)
 
 TRANS_VDIV_VMOD(ISA310, VDIVESW, MO_32, do_divesw, NULL)
 TRANS_VDIV_VMOD(ISA310, VDIVEUW, MO_32, do_diveuw, NULL)
+TRANS_FLAGS2(ISA310, VDIVESD, do_vx_helper, gen_helper_VDIVESD)
+TRANS_FLAGS2(ISA310, VDIVEUD, do_vx_helper, gen_helper_VDIVEUD)
+TRANS_FLAGS2(ISA310, VDIVESQ, do_vx_helper, gen_helper_VDIVESQ)
+TRANS_FLAGS2(ISA310, VDIVEUQ, do_vx_helper, gen_helper_VDIVEUQ)
 
 #undef DO_VDIV_VMOD
 
-- 
2.31.1




[PATCH v2 4/9] target/ppc: Implemented vector divide extended word

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: "Lucas Mateus Castro (alqotel)" 

Implement the following PowerISA v3.1 instructions:
vdivesw: Vector Divide Extended Signed Word
vdiveuw: Vector Divide Extended Unsigned Word

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 target/ppc/insn32.decode|  3 ++
 target/ppc/translate/vmx-impl.c.inc | 48 +
 2 files changed, 51 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 3a88a0b5bc..8c115c9c60 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -712,3 +712,6 @@ VDIVSD  000100 . . . 00111001011@VX
 VDIVUD  000100 . . . 00011001011@VX
 VDIVSQ  000100 . . . 0011011@VX
 VDIVUQ  000100 . . . 0001011@VX
+
+VDIVESW 000100 . . . 01110001011@VX
+VDIVEUW 000100 . . . 01010001011@VX
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index bac0db7128..8799e945bd 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -3295,6 +3295,54 @@ TRANS_VDIV_VMOD(ISA310, VDIVUD, MO_64, NULL, do_divud)
 TRANS_FLAGS2(ISA310, VDIVSQ, do_vx_helper, gen_helper_VDIVSQ)
 TRANS_FLAGS2(ISA310, VDIVUQ, do_vx_helper, gen_helper_VDIVUQ)
 
+static void do_dives_i32(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+TCGv_i64 val1, val2;
+
+val1 = tcg_temp_new_i64();
+val2 = tcg_temp_new_i64();
+
+tcg_gen_ext_i32_i64(val1, a);
+tcg_gen_ext_i32_i64(val2, b);
+
+/* (a << 32)/b */
+tcg_gen_shli_i64(val1, val1, 32);
+tcg_gen_div_i64(val1, val1, val2);
+
+/* if quotient doesn't fit in 32 bits the result is undefined */
+tcg_gen_extrl_i64_i32(t, val1);
+
+tcg_temp_free_i64(val1);
+tcg_temp_free_i64(val2);
+}
+
+static void do_diveu_i32(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
+{
+TCGv_i64 val1, val2;
+
+val1 = tcg_temp_new_i64();
+val2 = tcg_temp_new_i64();
+
+tcg_gen_extu_i32_i64(val1, a);
+tcg_gen_extu_i32_i64(val2, b);
+
+/* (a << 32)/b */
+tcg_gen_shli_i64(val1, val1, 32);
+tcg_gen_divu_i64(val1, val1, val2);
+
+/* if quotient doesn't fit in 32 bits the result is undefined */
+tcg_gen_extrl_i64_i32(t, val1);
+
+tcg_temp_free_i64(val1);
+tcg_temp_free_i64(val2);
+}
+
+DO_VDIV_VMOD(do_divesw, 32, do_dives_i32, true)
+DO_VDIV_VMOD(do_diveuw, 32, do_diveu_i32, false)
+
+TRANS_VDIV_VMOD(ISA310, VDIVESW, MO_32, do_divesw, NULL)
+TRANS_VDIV_VMOD(ISA310, VDIVEUW, MO_32, do_diveuw, NULL)
+
 #undef DO_VDIV_VMOD
 
 #undef GEN_VR_LDX
-- 
2.31.1




[PATCH v2 5/9] host-utils: Implemented unsigned 256-by-128 division

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: "Lucas Mateus Castro (alqotel)" 

Based on already existing QEMU implementation, created an unsigned 256
bit by 128 bit division needed to implement the vector divide extended
unsigned instruction from PowerISA3.1

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 include/qemu/host-utils.h |  15 +
 include/qemu/int128.h |  20 ++
 util/host-utils.c | 128 ++
 3 files changed, 163 insertions(+)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index ca979dc6cc..6da6a93f69 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -32,6 +32,7 @@
 
 #include "qemu/compiler.h"
 #include "qemu/bswap.h"
+#include "qemu/int128.h"
 
 #ifdef CONFIG_INT128
 static inline void mulu64(uint64_t *plow, uint64_t *phigh,
@@ -153,6 +154,19 @@ static inline int clo64(uint64_t val)
 return clz64(~val);
 }
 
+/*
+ * clz128 - count leading zeros in a 128-bit value.
+ * @val: The value to search
+ */
+static inline int clz128(Int128 a)
+{
+if (int128_gethi(a)) {
+return clz64(int128_gethi(a));
+} else {
+return clz64(int128_getlo(a)) + 64;
+}
+}
+
 /**
  * ctz32 - count trailing zeros in a 32-bit value.
  * @val: The value to search
@@ -849,4 +863,5 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
 #endif
 }
 
+Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor);
 #endif
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 3af01f38cd..2a9ee956aa 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -128,11 +128,21 @@ static inline bool int128_ge(Int128 a, Int128 b)
 return a >= b;
 }
 
+static inline bool int128_uge(Int128 a, Int128 b)
+{
+return ((__uint128_t)a) >= ((__uint128_t)b);
+}
+
 static inline bool int128_lt(Int128 a, Int128 b)
 {
 return a < b;
 }
 
+static inline bool int128_ult(Int128 a, Int128 b)
+{
+return (__uint128_t)a < (__uint128_t)b;
+}
+
 static inline bool int128_le(Int128 a, Int128 b)
 {
 return a <= b;
@@ -373,11 +383,21 @@ static inline bool int128_ge(Int128 a, Int128 b)
 return a.hi > b.hi || (a.hi == b.hi && a.lo >= b.lo);
 }
 
+static inline bool int128_uge(Int128 a, Int128 b)
+{
+return (uint64_t)a.hi > (uint64_t)b.hi || (a.hi == b.hi && a.lo >= b.lo);
+}
+
 static inline bool int128_lt(Int128 a, Int128 b)
 {
 return !int128_ge(a, b);
 }
 
+static inline bool int128_ult(Int128 a, Int128 b)
+{
+return !int128_uge(a, b);
+}
+
 static inline bool int128_le(Int128 a, Int128 b)
 {
 return int128_ge(b, a);
diff --git a/util/host-utils.c b/util/host-utils.c
index bcc772b8ec..c6a01638c7 100644
--- a/util/host-utils.c
+++ b/util/host-utils.c
@@ -266,3 +266,131 @@ void ulshift(uint64_t *plow, uint64_t *phigh, int32_t 
shift, bool *overflow)
 *plow = *plow << shift;
 }
 }
+/*
+ * Unsigned 256-by-128 division.
+ * Returns the remainder via r.
+ * Returns lower 128 bit of quotient.
+ * Needs a normalized divisor (most significant bit set to 1).
+ *
+ * Adapted from include/qemu/host-utils.h udiv_qrnnd,
+ * from the GNU Multi Precision Library - longlong.h __udiv_qrnnd
+ * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
+ *
+ * Licensed under the GPLv2/LGPLv3
+ */
+static Int128 udiv256_qrnnd(Int128 *r, Int128 n1, Int128 n0, Int128 d)
+{
+Int128 d0, d1, q0, q1, r1, r0, m;
+uint64_t mp0, mp1;
+
+d0 = int128_make64(int128_getlo(d));
+d1 = int128_make64(int128_gethi(d));
+
+r1 = int128_remu(n1, d1);
+q1 = int128_divu(n1, d1);
+mp0 = int128_getlo(q1);
+mp1 = int128_gethi(q1);
+mulu128(, , int128_getlo(d0));
+m = int128_make128(mp0, mp1);
+r1 = int128_make128(int128_gethi(n0), int128_getlo(r1));
+if (int128_ult(r1, m)) {
+q1 = int128_sub(q1, int128_one());
+r1 = int128_add(r1, d);
+if (int128_uge(r1, d)) {
+if (int128_ult(r1, m)) {
+q1 = int128_sub(q1, int128_one());
+r1 = int128_add(r1, d);
+}
+}
+}
+r1 = int128_sub(r1, m);
+
+r0 = int128_remu(r1, d1);
+q0 = int128_divu(r1, d1);
+mp0 = int128_getlo(q0);
+mp1 = int128_gethi(q0);
+mulu128(, , int128_getlo(d0));
+m = int128_make128(mp0, mp1);
+r0 = int128_make128(int128_getlo(n0), int128_getlo(r0));
+if (int128_ult(r0, m)) {
+q0 = int128_sub(q0, int128_one());
+r0 = int128_add(r0, d);
+if (int128_uge(r0, d)) {
+if (int128_ult(r0, m)) {
+q0 = int128_sub(q0, int128_one());
+r0 = int128_add(r0, d);
+}
+}
+}
+r0 = int128_sub(r0, m);
+
+*r = r0;
+return int128_or(int128_lshift(q1, 64), q0);
+}
+
+/*
+ * Unsigned 256-by-128 division.
+ * Returns the remainder.
+ * Returns quotient via plow and phigh.
+ * Also returns the remainder via the function return value.
+ */
+Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor)
+{
+Int128 dhi = *phigh;
+Int128 dlo = *plow;

[PATCH v2 2/9] target/ppc: Implemented vector divide instructions

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: "Lucas Mateus Castro (alqotel)" 

Implement the following PowerISA v3.1 instructions:
vdivsw: Vector Divide Signed Word
vdivuw: Vector Divide Unsigned Word
vdivsd: Vector Divide Signed Doubleword
vdivud: Vector Divide Unsigned Doubleword

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 target/ppc/insn32.decode|  7 
 target/ppc/translate/vmx-impl.c.inc | 59 +
 2 files changed, 66 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index ac2d3da9a7..597768558b 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -703,3 +703,10 @@ XVTLSBB 00 ... -- 00010 . 111011011 . - 
@XX2_bf_xb
 _s   s:uint8_t
 @XL_s   ..-- s:1 .. -   _s
 RFEBB   010011-- .   0010010010 -   @XL_s
+
+## Vector Division Instructions
+
+VDIVSW  000100 . . . 00110001011@VX
+VDIVUW  000100 . . . 00010001011@VX
+VDIVSD  000100 . . . 00111001011@VX
+VDIVUD  000100 . . . 00011001011@VX
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index 6101bca3fd..be35d6fdf3 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -3236,6 +3236,65 @@ TRANS(VMULHSD, do_vx_mulh, true , do_vx_vmulhd_i64)
 TRANS(VMULHUW, do_vx_mulh, false, do_vx_vmulhw_i64)
 TRANS(VMULHUD, do_vx_mulh, false, do_vx_vmulhd_i64)
 
+#define TRANS_VDIV_VMOD(FLAGS, NAME, VECE, FNI4_FUNC, FNI8_FUNC)\
+static bool trans_##NAME(DisasContext *ctx, arg_VX *a)  \
+{   \
+static const GVecGen3 op = {\
+.fni4 = FNI4_FUNC,  \
+.fni8 = FNI8_FUNC,  \
+.vece = VECE\
+};  \
+\
+REQUIRE_VECTOR(ctx);\
+REQUIRE_INSNS_FLAGS2(ctx, FLAGS);   \
+\
+tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra),\
+   avr_full_offset(a->vrb), 16, 16, );   \
+\
+return true;\
+}
+
+#define DO_VDIV_VMOD(NAME, SZ, DIV, SIGNED) \
+static void NAME(TCGv_i##SZ t, TCGv_i##SZ a, TCGv_i##SZ b)  \
+{   \
+/*  \
+ *  If N/0 the instruction used by the backend might deliver\
+ *  an invalid division signal to the process, so if b = 0 return   \
+ *  N/1 and if signed instruction, the same for a = int_min, b = -1 \
+ */ \
+if (SIGNED) {   \
+TCGv_i##SZ t0 = tcg_temp_new_i##SZ();   \
+TCGv_i##SZ t1 = tcg_temp_new_i##SZ();   \
+tcg_gen_setcondi_i##SZ(TCG_COND_EQ, t0, a, INT##SZ##_MIN);  \
+tcg_gen_setcondi_i##SZ(TCG_COND_EQ, t1, b, -1); \
+tcg_gen_and_i##SZ(t0, t0, t1);  \
+tcg_gen_setcondi_i##SZ(TCG_COND_EQ, t1, b, 0);  \
+tcg_gen_or_i##SZ(t0, t0, t1);   \
+tcg_gen_movi_i##SZ(t1, 0);  \
+tcg_gen_movcond_i##SZ(TCG_COND_NE, b, t0, t1, t0, b);   \
+DIV(t, a, b);   \
+tcg_temp_free_i##SZ(t0);\
+tcg_temp_free_i##SZ(t1);\
+} else {\
+TCGv_i##SZ zero = tcg_constant_i##SZ(0);\
+TCGv_i##SZ one = tcg_constant_i##SZ(1); \
+tcg_gen_movcond_i##SZ(TCG_COND_EQ, b, b, zero, one, b); \
+DIV(t, a, b);   \
+}   \
+}
+
+DO_VDIV_VMOD(do_divsw, 32, tcg_gen_div_i32, true)
+DO_VDIV_VMOD(do_divuw, 32, tcg_gen_divu_i32, false)
+DO_VDIV_VMOD(do_divsd, 64, tcg_gen_div_i64, true)
+DO_VDIV_VMOD(do_divud, 64, tcg_gen_divu_i64, false)
+
+TRANS_VDIV_VMOD(ISA310, VDIVSW, 

[PATCH v2 3/9] target/ppc: Implemented vector divide quadword

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: "Lucas Mateus Castro (alqotel)" 

Implement the following PowerISA v3.1 instructions:
vdivsq: Vector Divide Signed Quadword
vdivuq: Vector Divide Unsigned Quadword

Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 target/ppc/helper.h |  2 ++
 target/ppc/insn32.decode|  2 ++
 target/ppc/int_helper.c | 21 +
 target/ppc/translate/vmx-impl.c.inc |  2 ++
 4 files changed, 27 insertions(+)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 57da11c77e..4cfdf7b3ec 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -171,6 +171,8 @@ DEF_HELPER_FLAGS_3(VMULOSW, TCG_CALL_NO_RWG, void, avr, 
avr, avr)
 DEF_HELPER_FLAGS_3(VMULOUB, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VMULOUH, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VMULOUW, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(VDIVSQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(VDIVUQ, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_3(vslo, void, avr, avr, avr)
 DEF_HELPER_3(vsro, void, avr, avr, avr)
 DEF_HELPER_3(vsrv, void, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 597768558b..3a88a0b5bc 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -710,3 +710,5 @@ VDIVSW  000100 . . . 00110001011@VX
 VDIVUW  000100 . . . 00010001011@VX
 VDIVSD  000100 . . . 00111001011@VX
 VDIVUD  000100 . . . 00011001011@VX
+VDIVSQ  000100 . . . 0011011@VX
+VDIVUQ  000100 . . . 0001011@VX
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 492f34c499..ba5d4193ff 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1036,6 +1036,27 @@ void helper_XXPERMX(ppc_vsr_t *t, ppc_vsr_t *s0, 
ppc_vsr_t *s1, ppc_vsr_t *pcv,
 *t = tmp;
 }
 
+void helper_VDIVSQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b)
+{
+Int128 neg1 = int128_makes64(-1);
+Int128 int128_min = int128_make128(0, INT64_MIN);
+if (likely(int128_nz(b->s128) &&
+  (int128_ne(a->s128, int128_min) || int128_ne(b->s128, neg1 {
+t->s128 = int128_divs(a->s128, b->s128);
+} else {
+t->s128 = a->s128; /* Undefined behavior */
+}
+}
+
+void helper_VDIVUQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b)
+{
+if (int128_nz(b->s128)) {
+t->s128 = int128_divu(a->s128, b->s128);
+} else {
+t->s128 = a->s128; /* Undefined behavior */
+}
+}
+
 void helper_VPERM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
 ppc_avr_t result;
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index be35d6fdf3..bac0db7128 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -3292,6 +3292,8 @@ TRANS_VDIV_VMOD(ISA310, VDIVSW, MO_32, do_divsw, NULL)
 TRANS_VDIV_VMOD(ISA310, VDIVUW, MO_32, do_divuw, NULL)
 TRANS_VDIV_VMOD(ISA310, VDIVSD, MO_64, NULL, do_divsd)
 TRANS_VDIV_VMOD(ISA310, VDIVUD, MO_64, NULL, do_divud)
+TRANS_FLAGS2(ISA310, VDIVSQ, do_vx_helper, gen_helper_VDIVSQ)
+TRANS_FLAGS2(ISA310, VDIVUQ, do_vx_helper, gen_helper_VDIVUQ)
 
 #undef DO_VDIV_VMOD
 
-- 
2.31.1




[PATCH v2 1/9] qemu/int128: add int128_urshift

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: Matheus Ferst 

Implement an unsigned right shift for Int128 values and add the same
tests cases of int128_rshift in the unit test.

Signed-off-by: Matheus Ferst 
Signed-off-by: Lucas Mateus Castro (alqotel) 
---
 include/qemu/int128.h| 19 +++
 tests/unit/test-int128.c | 32 
 2 files changed, 51 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 2c4064256c..3af01f38cd 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -83,6 +83,11 @@ static inline Int128 int128_rshift(Int128 a, int n)
 return a >> n;
 }
 
+static inline Int128 int128_urshift(Int128 a, int n)
+{
+return (__uint128_t)a >> n;
+}
+
 static inline Int128 int128_lshift(Int128 a, int n)
 {
 return a << n;
@@ -299,6 +304,20 @@ static inline Int128 int128_rshift(Int128 a, int n)
 }
 }
 
+static inline Int128 int128_urshift(Int128 a, int n)
+{
+uint64_t h = a.hi;
+if (!n) {
+return a;
+}
+h = h >> (n & 63);
+if (n >= 64) {
+return int128_make64(h);
+} else {
+return int128_make128((a.lo >> n) | ((uint64_t)a.hi << (64 - n)), h);
+}
+}
+
 static inline Int128 int128_lshift(Int128 a, int n)
 {
 uint64_t l = a.lo << (n & 63);
diff --git a/tests/unit/test-int128.c b/tests/unit/test-int128.c
index b86a3c76e6..ae0f552193 100644
--- a/tests/unit/test-int128.c
+++ b/tests/unit/test-int128.c
@@ -206,6 +206,37 @@ static void test_rshift(void)
 test_rshift_one(0xFFFE8000U,  0, 0xFFFEULL, 
0x8000ULL);
 }
 
+static void __attribute__((__noinline__)) ATTRIBUTE_NOCLONE
+test_urshift_one(uint32_t x, int n, uint64_t h, uint64_t l)
+{
+Int128 a = expand(x);
+Int128 r = int128_urshift(a, n);
+g_assert_cmpuint(int128_getlo(r), ==, l);
+g_assert_cmpuint(int128_gethi(r), ==, h);
+}
+
+static void test_urshift(void)
+{
+test_urshift_one(0x0001U, 64, 0xULL, 
0x0001ULL);
+test_urshift_one(0x8001U, 64, 0xULL, 
0x8001ULL);
+test_urshift_one(0x7FFEU, 64, 0xULL, 
0x7FFEULL);
+test_urshift_one(0xFFFEU, 64, 0xULL, 
0xFFFEULL);
+test_urshift_one(0x0001U, 60, 0xULL, 
0x0010ULL);
+test_urshift_one(0x8001U, 60, 0x0008ULL, 
0x0010ULL);
+test_urshift_one(0x00018000U, 60, 0xULL, 
0x0018ULL);
+test_urshift_one(0x80018000U, 60, 0x0008ULL, 
0x0018ULL);
+test_urshift_one(0x7FFEU, 60, 0x0007ULL, 
0xFFE0ULL);
+test_urshift_one(0xFFFEU, 60, 0x000FULL, 
0xFFE0ULL);
+test_urshift_one(0x7FFE8000U, 60, 0x0007ULL, 
0xFFE8ULL);
+test_urshift_one(0xFFFE8000U, 60, 0x000FULL, 
0xFFE8ULL);
+test_urshift_one(0x00018000U,  0, 0x0001ULL, 
0x8000ULL);
+test_urshift_one(0x80018000U,  0, 0x8001ULL, 
0x8000ULL);
+test_urshift_one(0x7FFEU,  0, 0x7FFEULL, 
0xULL);
+test_urshift_one(0xFFFEU,  0, 0xFFFEULL, 
0xULL);
+test_urshift_one(0x7FFE8000U,  0, 0x7FFEULL, 
0x8000ULL);
+test_urshift_one(0xFFFE8000U,  0, 0xFFFEULL, 
0x8000ULL);
+}
+
 int main(int argc, char **argv)
 {
 g_test_init(, , NULL);
@@ -219,5 +250,6 @@ int main(int argc, char **argv)
 g_test_add_func("/int128/int128_ge", test_ge);
 g_test_add_func("/int128/int128_gt", test_gt);
 g_test_add_func("/int128/int128_rshift", test_rshift);
+g_test_add_func("/int128/int128_urshift", test_urshift);
 return g_test_run();
 }
-- 
2.31.1




[PATCH v2 6/9] host-utils: Implemented signed 256-by-128 division

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: "Lucas Mateus Castro (alqotel)" 

Based on already existing QEMU implementation created a signed
256 bit by 128 bit division needed to implement the vector divide
extended signed quadword instruction from PowerISA 3.1

Signed-off-by: Lucas Mateus Castro (alqotel) 
Reviewed-by: Richard Henderson 
---
 include/qemu/host-utils.h |  1 +
 util/host-utils.c | 51 +++
 2 files changed, 52 insertions(+)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 6da6a93f69..d0b444a40f 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -864,4 +864,5 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
 }
 
 Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor);
+Int128 divs256(Int128 *plow, Int128 *phigh, Int128 divisor);
 #endif
diff --git a/util/host-utils.c b/util/host-utils.c
index c6a01638c7..d221657e43 100644
--- a/util/host-utils.c
+++ b/util/host-utils.c
@@ -394,3 +394,54 @@ Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor)
 return rem;
 }
 }
+
+/*
+ * Signed 256-by-128 division.
+ * Returns quotient via plow and phigh.
+ * Also returns the remainder via the function return value.
+ */
+Int128 divs256(Int128 *plow, Int128 *phigh, Int128 divisor)
+{
+bool neg_quotient = false, neg_remainder = false;
+Int128 unsig_hi = *phigh, unsig_lo = *plow;
+Int128 rem;
+
+if (!int128_nonneg(*phigh)) {
+neg_quotient = !neg_quotient;
+neg_remainder = !neg_remainder;
+
+if (!int128_nz(unsig_lo)) {
+unsig_hi = int128_neg(unsig_hi);
+} else {
+unsig_hi = int128_not(unsig_hi);
+unsig_lo = int128_neg(unsig_lo);
+}
+}
+
+if (!int128_nonneg(divisor)) {
+neg_quotient = !neg_quotient;
+
+divisor = int128_neg(divisor);
+}
+
+rem = divu256(_lo, _hi, divisor);
+
+if (neg_quotient) {
+if (!int128_nz(unsig_lo)) {
+*phigh = int128_neg(unsig_hi);
+*plow = int128_zero();
+} else {
+*phigh = int128_not(unsig_hi);
+*plow = int128_neg(unsig_lo);
+}
+} else {
+*phigh = unsig_hi;
+*plow = unsig_lo;
+}
+
+if (neg_remainder) {
+return int128_neg(rem);
+} else {
+return rem;
+}
+}
-- 
2.31.1




[PATCH v2 0/9] VDIV/VMOD Implementation

2022-04-05 Thread Lucas Mateus Castro(alqotel)
From: "Lucas Mateus Castro (alqotel)" 

This patch series is an implementation of the vector divide, vector
divide extended and vector modulo instructions from PowerISA 3.1

The first patch are Matheus' patch, used here since the divs256 and
divu256 functions use int128_urshift.

v2 changes:
- Dropped int128_lshift patch
- Added missing int_min/-1 check
- Changed invalid division to a division by 1
- Created new macro responsible for invalid division check
  (replacing DIV_VEC, REM_VEC and the check in dives_i32/diveu_i32)
- Turned GVecGen3 array into single element

Lucas Mateus Castro (alqotel) (8):
  target/ppc: Implemented vector divide instructions
  target/ppc: Implemented vector divide quadword
  target/ppc: Implemented vector divide extended word
  host-utils: Implemented unsigned 256-by-128 division
  host-utils: Implemented signed 256-by-128 division
  target/ppc: Implemented remaining vector divide extended
  target/ppc: Implemented vector module word/doubleword
  target/ppc: Implemented vector module quadword

Matheus Ferst (1):
  qemu/int128: add int128_urshift

 include/qemu/host-utils.h   |  16 +++
 include/qemu/int128.h   |  39 ++
 target/ppc/helper.h |   8 ++
 target/ppc/insn32.decode|  23 
 target/ppc/int_helper.c | 106 
 target/ppc/translate/vmx-impl.c.inc | 125 +++
 tests/unit/test-int128.c|  32 +
 util/host-utils.c   | 179 
 8 files changed, 528 insertions(+)

-- 
2.31.1




Re: [PATCH v2 2/4] target/ppc: init 'lpcr' in kvmppc_enable_cap_large_decr()

2022-04-05 Thread Daniel Henrique Barboza




On 4/1/22 00:40, David Gibson wrote:

On Thu, Mar 31, 2022 at 03:46:57PM -0300, Daniel Henrique Barboza wrote:



On 3/31/22 14:36, Richard Henderson wrote:

On 3/31/22 11:17, Daniel Henrique Barboza wrote:

Hmm... this is seeming a bit like whack-a-mole.  Could we instead use
one of the valgrind hinting mechanisms to inform it that
kvm_get_one_reg() writes the variable at *target?


I didn't find a way of doing that looking in the memcheck helpers
(https://valgrind.org/docs/manual/mc-manual.html section 4.7). That would be a
good way of solving this warning because we would put stuff inside a specific
function X and all callers of X would be covered by it.

What I did find instead is a memcheck macro called VALGRIND_MAKE_MEM_DEFINED 
that
tells Valgrind that the var was initialized.

This patch would then be something as follows:


diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index dc93b99189..b0e22fa283 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -56,6 +56,10 @@
   #define DEBUG_RETURN_GUEST 0
   #define DEBUG_RETURN_GDB   1

+#ifdef CONFIG_VALGRIND_H
+#include 
+#endif
+
   const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
   KVM_CAP_LAST_INFO
   };
@@ -2539,6 +2543,10 @@ int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int 
enable)
   CPUState *cs = CPU(cpu);
   uint64_t lpcr;

+#ifdef CONFIG_VALGRIND_H
+    VALGRIND_MAKE_MEM_DEFINED(lpcr, sizeof(uint64_t));
+#endif
+
   kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
   /* Do we need to modify the LPCR? */


CONFIG_VALGRIND_H needs 'valgrind-devel´ installed.

I agree that this "Valgrind is complaining about variable initialization" is a 
whack-a-mole
situation that will keep happening in the future if we keep adding this same 
code pattern
(passing as reference an uninitialized var). For now, given that we have only 4 
instances
to fix it in ppc code (as far as I'm aware of), and we don't have a better way 
of telling
Valgrind that we know what we're doing, I think we're better of initializing 
these vars.


I would instead put this annotation inside kvm_get_one_reg, so that it covers 
all kvm hosts.  But it's too late to do this for 7.0.


I wasn't planning on pushing these changes for 7.0 since they aren't fixing mem
leaks or anything really bad. It's more of a quality of life improvement when
using Valgrind.

I also tried to put this annotation in kvm_get_one_reg() and it didn't solve the
warning.


That's weird, I'm pretty sure that should work.  I'd double check to
make sure you had all the parameters right (e.g. could you have marked
the pointer itself as initialized, rather than the memory it points
to).



You're right. I got confused with different setups here and there and thought 
that
it didn't work.

I sent a patch to kvm-all.c that tries to do that:


https://lists.gnu.org/archive/html/qemu-devel/2022-04/msg00507.html


As for this series, for now I'm willing to take it since it improves the 
situation with
simple initializations. We can reconsider it if we make good progress through 
the common
code. At any rate these are 7.1 patches, so we have time.



Thanks,


Daniel






I didn't find a way of telling Valgrind "consider that every time this
function is called with parameter X it initializes X". That would be a good 
solution
to put in the common KVM files and fix the problem for everybody.


Daniel






r~








Re: [RFC PATCH 1/1] kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory

2022-04-05 Thread Daniel Henrique Barboza




On 4/5/22 11:30, Peter Maydell wrote:

On Tue, 5 Apr 2022 at 14:07, Daniel Henrique Barboza
 wrote:


There is a lot of Valgrind warnings about conditional jump depending on
unintialized values like this one (taken from a pSeries guest):

  Conditional jump or move depends on uninitialised value(s)
 at 0xB011DC: kvmppc_enable_cap_large_decr (kvm.c:2544)
 by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523)
 by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921)
 by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73)
(...)
   Uninitialised value was created by a stack allocation
 at 0xB01150: kvmppc_enable_cap_large_decr (kvm.c:2538)

In this case, the alleged unintialized value is the 'lpcr' variable that
is written by kvm_get_one_reg() and then used in an if clause:

int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable)
{
 CPUState *cs = CPU(cpu);
 uint64_t lpcr;

 kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
 /* Do we need to modify the LPCR? */
 if (!!(lpcr & LPCR_LD) != !!enable) { < Valgrind warns here
(...)

A quick fix is to init the variable that kvm_get_one_reg() is going to
write ('lpcr' in the example above). Another idea is to convince
Valgrind that kvm_get_one_reg() inits the 'void *target' memory in case
the ioctl() is successful. This will put some boilerplate in the
function but it will bring benefit for its other callers.


Doesn't Valgrind have a way of modelling ioctls where it
knows what data is read and written ? In general
ioctl-using programs don't need to have special case
"I am running under valgrind" handling, so this seems to
me like valgrind is missing support for this particular ioctl.


I don't know if Valgrind is capable of doing that. Guess it's worth a look.



More generally, how much use is running QEMU with KVM enabled
under valgrind anyway? Valgrind has no way of knowing about
writes to memory that the guest vCPUs do...


At least in the hosts I have access to, I wasn't able to get a pSeries guest
booting up to prompt with Valgrind + TCG. It was painfully slow. Valgrind + KVM
is slow but doable. Granted, vCPUs reads/writes can't be profiled with it when
using KVM, but for everything else is alright.


Thanks,


Daniel




thanks
-- PMM




Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled

2022-04-05 Thread Si-Wei Liu




On 4/1/2022 7:00 PM, Jason Wang wrote:

On Sat, Apr 2, 2022 at 4:37 AM Si-Wei Liu  wrote:



On 3/31/2022 1:36 AM, Jason Wang wrote:

On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu  wrote:


On 3/30/2022 2:14 AM, Jason Wang wrote:

On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu  wrote:

Previous commit prevents vhost-user and vhost-vdpa from using
userland vq handler via disable_ioeventfd_handler. The same
needs to be done for host notifier cleanup too, as the
virtio_queue_host_notifier_read handler still tends to read
pending event left behind on ioeventfd and attempts to handle
outstanding kicks from QEMU userland vq.

If vq handler is not disabled on cleanup, it may lead to sigsegv
with recursive virtio_net_set_status call on the control vq:

0  0x7f8ce3ff3387 in raise () at /lib64/libc.so.6
1  0x7f8ce3ff4a78 in abort () at /lib64/libc.so.6
2  0x7f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
3  0x7f8ce3fec252 in  () at /lib64/libc.so.6
4  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:563
5  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:558
6  0x558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, 
vdev=0x558f568f91f0, n=2, mask=) at ../hw/virtio/vhost.c:1557

I feel it's probably a bug elsewhere e.g when we fail to start
vhost-vDPA, it's the charge of the Qemu to poll host notifier and we
will fallback to the userspace vq handler.

Apologies, an incorrect stack trace was pasted which actually came from
patch #1. I will post a v2 with the corresponding one as below:

0  0x55f800df1780 in qdev_get_parent_bus (dev=0x0) at
../hw/core/qdev.c:376
1  0x55f800c68ad8 in virtio_bus_device_iommu_enabled
(vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331
2  0x55f800d70d7f in vhost_memory_unmap (dev=) at
../hw/virtio/vhost.c:318
3  0x55f800d70d7f in vhost_memory_unmap (dev=,
buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at
../hw/virtio/vhost.c:336
4  0x55f800d71867 in vhost_virtqueue_stop
(dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590,
vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241
5  0x55f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30,
vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839
6  0x55f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30,
dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315
7  0x55f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
cvq=cvq@entry=1)
  at ../hw/net/vhost_net.c:423
8  0x55f800d4e628 in virtio_net_set_status (status=,
n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
9  0x55f800d4e628 in virtio_net_set_status
(vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at
../hw/net/virtio-net.c:370

I don't understand why virtio_net_handle_ctrl() call virtio_net_set_stauts()...

The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ
command, i.e. in virtio_net_handle_mq():

Completely forget that the code was actually written by me :\


1413 n->curr_queue_pairs = queue_pairs;
1414 /* stop the backend before changing the number of queue_pairs
to avoid handling a
1415  * disabled queue */
1416 virtio_net_set_status(vdev, vdev->status);
1417 virtio_net_set_queue_pairs(n);

Noted before the vdpa multiqueue support, there was never a vhost_dev
for ctrl_vq exposed, i.e. there's no host notifier set up for the
ctrl_vq on vhost_kernel as it is emulated in QEMU software.


10 0x55f800d534d8 in virtio_net_handle_ctrl (iov_cnt=, iov=, cmd=0 '\000', n=0x55f8044ec590) at
../hw/net/virtio-net.c:1408
11 0x55f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590,
vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452
12 0x55f800d69f37 in virtio_queue_host_notifier_read
(vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331
13 0x55f800d69f37 in virtio_queue_host_notifier_read
(n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575
14 0x55f800c688e6 in virtio_bus_cleanup_host_notifier
(bus=, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312
15 0x55f800d73106 in vhost_dev_disable_notifiers
(hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590)
  at ../../../include/hw/virtio/virtio-bus.h:35
16 0x55f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0,
dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316
17 0x55f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590,
ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7,
cvq=cvq@entry=1)
  at ../hw/net/vhost_net.c:423
18 0x55f800d4e628 in virtio_net_set_status (status=,
n=0x55f8044ec590) at ../hw/net/virtio-net.c:296
19 0x55f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590,
status=15 '\017') at ../hw/net/virtio-net.c:370
20 0x55f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590,
val=) at ../hw/virtio/virtio.c:1945
21 0x55f800d11d9d in vm_state_notify 

[PATCH] acpi: Bodge acpi_index migration

2022-04-05 Thread Dr. David Alan Gilbert (git)


binkkNQxjetjk.bin
Description: Binary data


Re: [PULL 0/3] Misc changes for 2022-04-05

2022-04-05 Thread Peter Maydell
On Tue, 5 Apr 2022 at 10:25, Paolo Bonzini  wrote:
>
> The following changes since commit 20661b75ea6093f5e59079d00a778a972d6732c5:
>
>   Merge tag 'pull-ppc-20220404' of https://github.com/legoater/qemu into 
> staging (2022-04-04 15:48:55 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 776a6a32b4982a68d3b7a77cbfaae6c2b363a0b8:
>
>   docs/system/i386: Add measurement calculation details to 
> amd-memory-encryption (2022-04-05 10:42:06 +0200)
>
> 
> * fix vss-win32 compilation with clang++
>
> * update Coverity model
>
> * add measurement calculation to amd-memory-encryption docs
>
> 
> Dov Murik (1):
>   docs/system/i386: Add measurement calculation details to 
> amd-memory-encryption
>
> Helge Konetzka (1):
>   qga/vss-win32: fix compilation with clang++
>
> Paolo Bonzini (1):
>   coverity: update model for latest tools


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM



Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-05 Thread Sean Christopherson
On Tue, Apr 05, 2022, Andy Lutomirski wrote:
> On Tue, Apr 5, 2022, at 3:36 AM, Quentin Perret wrote:
> > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote:
> >> The best I can come up with is a special type of shared page that is not
> >> GUP-able and maybe not even mmappable, having a clear option for
> >> transitions to fail, and generally preventing the nasty cases from
> >> happening in the first place.
> >
> > Right, that sounds reasonable to me.
> 
> At least as a v1, this is probably more straightforward than allowing mmap().
> Also, there's much to be said for a simpler, limited API, to be expanded if
> genuinely needed, as opposed to starting out with a very featureful API.

Regarding "genuinely needed", IMO the same applies to supporting this at all.
Without numbers from something at least approximating a real use case, we're 
just
speculating on which will be the most performant approach.

> >> Maybe there could be a special mode for the private memory fds in which
> >> specific pages are marked as "managed by this fd but actually shared".
> >> pread() and pwrite() would work on those pages, but not mmap().  (Or maybe
> >> mmap() but the resulting mappings would not permit GUP.)  And
> >> transitioning them would be a special operation on the fd that is specific
> >> to pKVM and wouldn't work on TDX or SEV.
> >
> > Aha, didn't think of pread()/pwrite(). Very interesting.
> 
> There are plenty of use cases for which pread()/pwrite()/splice() will be as
> fast or even much faster than mmap()+memcpy().

...

> resume guest
> *** host -> hypervisor -> guest ***
> Guest unshares the page.
> *** guest -> hypervisor ***
> Hypervisor removes PTE.  TLBI.
> *** hypervisor -> guest ***
> 
> Obviously considerable cleverness is needed to make a virt IOMMU like this
> work well, but still.
> 
> Anyway, my suggestion is that the fd backing proposal get slightly modified
> to get it ready for multiple subtypes of backing object, which should be a
> pretty minimal change.  Then, if someone actually needs any of this
> cleverness, it can be added later.  In the mean time, the
> pread()/pwrite()/splice() scheme is pretty good.

Tangentially related to getting private-fd ready for multiple things, what about
implementing the pread()/pwrite()/splice() scheme in pKVM itself?  I.e. read() 
on
the VM fd, with the offset corresponding to gfn in some way.

Ditto for mmap() on the VM fd, though that would require additional changes 
outside
of pKVM.

That would allow pKVM to support in-place conversions without the private-fd 
having
to differentiate between the type of protected VM, and without having to provide
new APIs from the private-fd.  TDX, SNP, etc... Just Work by not supporting the 
pKVM
APIs.

And assuming we get multiple consumers down the road, pKVM will need to be able 
to
communicate the "true" state of a page to other consumers, because in addition 
to
being a consumer, pKVM is also an owner/enforcer analogous to the TDX Module and
the SEV PSP.



Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Emanuele Giuseppe Esposito



Am 05/04/2022 um 19:53 schrieb Emanuele Giuseppe Esposito:
> 
> 
> Am 05/04/2022 um 17:04 schrieb Kevin Wolf:
>> Am 05.04.2022 um 15:09 hat Emanuele Giuseppe Esposito geschrieben:
>>> Am 05/04/2022 um 12:14 schrieb Kevin Wolf:
 I think all of this is really relevant for Emanuele's work, which
 involves adding AIO_WAIT_WHILE() deep inside graph update functions. I
 fully expect that we would see very similar problems, and just stacking
 drain sections over drain sections that might happen to usually fix
 things, but aren't guaranteed to, doesn't look like a good solution.
>>>
>>> Yes, I think at this point we all agreed to drop subtree_drain as
>>> replacement for AioContext.
>>>
>>> The alternative is what Paolo proposed in the other thread " Removal of
>>> AioContext lock, bs->parents and ->children: proof of concept"
>>> I am not sure which thread you replied first :)
>>
>> This one, I think. :-)
>>
>>> I think that proposal is not far from your idea, and it avoids to
>>> introduce or even use drains at all.
>>> Not sure why you called it a "step backwards even from AioContext locks".
>>
>> I was only referring to the lock locality there. AioContext locks are
>> really coarse, but still a finer granularity than a single global lock.
>>
>> In the big picture, it's still be better than the AioContext lock, but
>> that's because it's a different type of lock, not because it has better
>> locality.
>>
>> So I was just wondering if we can't have the different type of lock and
>> make it local to the BDS, too.
> 
> I guess this is the right time to discuss this.
> 
> I think that a global lock will be easier to handle, and we already have
> a concrete implementation (cpus-common).
> 
> I think that the reads in some sense are already BDS-specific, because
> each BDS that is reading has an internal a flag.
> Writes, on the other hand, are global. If a write is happening, no other
> read at all can run, even if it has nothing to do with it.
> 
> The question then is: how difficult would be to implement a BDS-specific
> write?
> From the API prospective, change
> bdrv_graph_wrlock(void);
> into
> bdrv_graph_wrlock(BlockDriverState *parent, BlockDriverState *child);
> I am not sure if/how complicated it will be. For sure all the global
> variables would end up in the BDS struct.
> 
> On the other side, also making instead read generic could be interesting.
> Think about drain: it is a recursive function, and it doesn't really
> make sense to take the rdlock for each node it traverses.

Otherwise a simple solution for drains that require no change at allis
to just take the rdlock on the bs calling drain, and since each write
waits for all reads to complete, it will work anyways.

The only detail is that assert_bdrv_graph_readable() will then need to
iterate through all nodes to be sure that at leas one of them is
actually reading.

So yeah I know this might be hard to realize without an implementation,
but my conclusion is to leave the lock as it is for now.

> Even though I don't know an easy way to replace ->has_waiter and
> ->reading_graph flags...
> 
> Emanuele
> 




Re: [RFC PATCH] python: add qmp-send program to send raw qmp commands to qemu

2022-04-05 Thread John Snow
On Tue, Apr 5, 2022, 5:03 AM Damien Hedde 
wrote:

>
>
> On 4/4/22 22:34, John Snow wrote:
> > On Wed, Mar 16, 2022 at 5:55 AM Damien Hedde 
> wrote:
> >>
> >> It takes an input file containing raw qmp commands (concatenated json
> >> dicts) and send all commands one by one to a qmp server. When one
> >> command fails, it exits.
> >>
> >> As a convenience, it can also wrap the qemu process to avoid having
> >> to start qemu in background. When wrapping qemu, the program returns
> >> only when the qemu process terminates.
> >>
> >> Signed-off-by: Damien Hedde 
> >> ---
> >>
> >> Hi all,
> >>
> >> Following our discussion, I've started this. What do you think ?
> >>
> >> I tried to follow Daniel's qmp-shell-wrap. I think it is
> >> better to have similar options (eg: logging). There is also room
> >> for factorizing code if we want to keep them aligned and ease
> >> maintenance.
> >>
> >> There are still some pylint issues (too many branches in main and it
> >> does not like my context manager if else line). But it's kind of a
> >> mess to fix theses so I think it's enough for a first version.
> >
> > Yeah, don't worry about these. You can just tell pylint to shut up
> > while you prototype. Sometimes it's just not worth spending more time
> > on a more beautiful factoring. Oh well.
> >
> >>
> >> I name that qmp-send as Daniel proposed, maybe qmp-test matches better
> >> what I'm doing there ?
> >>
> >
> > I think I agree with Dan's response.
> >
> >> Thanks,
> >> Damien
> >> ---
> >>   python/qemu/aqmp/qmp_send.py | 229 +++
> >
> > I recommend putting this in qemu/util/qmp_send.py instead.
> >
> > I'm in the process of pulling out the AQMP lib and hosting it
> > separately. Scripts like this I think should stay in the QEMU tree, so
> > moving it to util instead is probably best. Otherwise, I'll *really*
> > have to commit to the syntax, and that's probably a bigger hurdle than
> > you want to deal with.
>
> If it stays in QEMU tree, what licensing should I use ? LGPL does not
> hurt, no ?
>

Whichever you please. GPLv2+ would be convenient and harmonizes well with
other tools. LGPL is only something I started doing so that the "qemu.qmp"
package would be LGPL. Licensing the tools as LGPL was just a sin of
convenience so I could claim a single license for the whole wheel/egg/tgz.

(I didn't want to make separate qmp and qmp-tools packages.)

Go with what you feel is best.


> >
> >>   scripts/qmp/qmp-send |  11 ++
> >>   2 files changed, 240 insertions(+)
> >>   create mode 100644 python/qemu/aqmp/qmp_send.py
> >>   create mode 100755 scripts/qmp/qmp-send
> >>
> >> diff --git a/python/qemu/aqmp/qmp_send.py b/python/qemu/aqmp/qmp_send.py
> >> new file mode 100644
> >> index 00..cbca1d0205
> >> --- /dev/null
> >> +++ b/python/qemu/aqmp/qmp_send.py
> >
> > Seems broadly fine to me, but I didn't review closely this time. If it
> > works for you, it works for me.
> >
> > As for making QEMU hang: there's a few things you could do, take a
> > look at iotests and see how they handle timeout blocks in synchronous
> > code -- iotests.py line 696 or so, "class Timeout". When writing async
> > code, you can also do stuff like this:
> >
> > async def foo():
> >  await asyncio.wait_for(qmp.execute("some-command", args_etc),
> timeout=30)
> >
> > See https://docs.python.org/3/library/asyncio-task.html#asyncio.wait_for
> >
> > --js
> >
>
> Thanks for the tip,
> --
> Damien
>

Oh, and one more. the legacy.py bindings for AQMP also support a
configurable timeout that applies to most API calls by default.

see https://gitlab.com/jsnow/qemu.qmp/-/blob/main/qemu/qmp/legacy.py#L285

(Branch still in limbo here, but it should still be close to the same in
qemu.git)

I believe this is used by iotests.py when it sets up its machine.py
subclass ("VM", iirc) so that most qmp invocations in iotests have a
default timeout and won't hang tests indefinitely.

--js

>


Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-05 Thread Sean Christopherson
On Tue, Apr 05, 2022, Quentin Perret wrote:
> On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote:
> > >>  - it can be very useful for protected VMs to do shared=>private
> > >>conversions. Think of a VM receiving some data from the host in a
> > >>shared buffer, and then it wants to operate on that buffer without
> > >>risking to leak confidential informations in a transient state. In
> > >>that case the most logical thing to do is to convert the buffer back
> > >>to private, do whatever needs to be done on that buffer (decrypting a
> > >>frame, ...), and then share it back with the host to consume it;
> > >
> > > If performance is a motivation, why would the guest want to do two
> > > conversions instead of just doing internal memcpy() to/from a private
> > > page?  I would be quite surprised if multiple exits and TLB shootdowns is
> > > actually faster, especially at any kind of scale where zapping stage-2
> > > PTEs will cause lock contention and IPIs.
> > 
> > I don't know the numbers or all the details, but this is arm64, which is a
> > rather better architecture than x86 in this regard.  So maybe it's not so
> > bad, at least in very simple cases, ignoring all implementation details.
> > (But see below.)  Also the systems in question tend to have fewer CPUs than
> > some of the massive x86 systems out there.
> 
> Yep. I can try and do some measurements if that's really necessary, but
> I'm really convinced the cost of the TLBI for the shared->private
> conversion is going to be significantly smaller than the cost of memcpy
> the buffer twice in the guest for us.

It's not just the TLB shootdown, the VM-Exits aren't free.   And barring 
non-trivial
improvements to KVM's MMU, e.g. sharding of mmu_lock, modifying the page tables 
will
block all other updates and MMU operations.  Taking mmu_lock for read, should 
arm64
ever convert to a rwlock, is not an option because KVM needs to block other
conversions to avoid races.

Hmm, though batching multiple pages into a single request would mitigate most of
the overhead.

> There are variations of that idea: e.g. allow userspace to mmap the
> entire private fd but w/o taking a reference on pages mapped with
> PROT_NONE. And then the VMM can use mprotect() in response to
> share/unshare requests. I think Marc liked that idea as it keeps the
> userspace API closer to normal KVM -- there actually is a
> straightforward gpa->hva relation. Not sure how much that would impact
> the implementation at this point.
> 
> For the shared=>private conversion, this would be something like so:
> 
>  - the guest issues a hypercall to unshare a page;
> 
>  - the hypervisor forwards the request to the host;
> 
>  - the host kernel forwards the request to userspace;
> 
>  - userspace then munmap()s the shared page;
> 
>  - KVM then tries to take a reference to the page. If it succeeds, it
>re-enters the guest with a flag of some sort saying that the share
>succeeded, and the hypervisor will adjust pgtables accordingly. If
>KVM failed to take a reference, it flags this and the hypervisor will
>be responsible for communicating that back to the guest. This means
>the guest must handle failures (possibly fatal).
> 
> (There are probably many ways in which we can optimize this, e.g. by
> having the host proactively munmap() pages it no longer needs so that
> the unshare hypercall from the guest doesn't need to exit all the way
> back to host userspace.)

...

> > Maybe there could be a special mode for the private memory fds in which
> > specific pages are marked as "managed by this fd but actually shared".
> > pread() and pwrite() would work on those pages, but not mmap().  (Or maybe
> > mmap() but the resulting mappings would not permit GUP.)

Unless I misunderstand what you intend by pread()/pwrite(), I think we'd need to
allow mmap(), otherwise e.g. uaccess from the kernel wouldn't work.

> > And transitioning them would be a special operation on the fd that is
> > specific to pKVM and wouldn't work on TDX or SEV.

To keep things feature agnostic (IMO, baking TDX vs SEV vs pKVM info into 
private-fd
is a really bad idea), this could be handled by adding a flag and/or callback 
into
the notifier/client stating whether or not it supports mapping a private-fd, 
and then
mapping would be allowed if and only if all consumers support/allow mapping.

> > Hmm.  Sean and Chao, are we making a bit of a mistake by making these fds
> > technology-agnostic?  That is, would we want to distinguish between a TDX
> > backing fd, a SEV backing fd, a software-based backing fd, etc?  API-wise
> > this could work by requiring the fd to be bound to a KVM VM instance and
> > possibly even configured a bit before any other operations would be
> > allowed.

I really don't want to distinguish between between each exact feature, but I've
no objection to adding flags/callbacks to track specific properties of the
downstream consumers, e.g. "can this 

Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Emanuele Giuseppe Esposito



Am 05/04/2022 um 17:04 schrieb Kevin Wolf:
> Am 05.04.2022 um 15:09 hat Emanuele Giuseppe Esposito geschrieben:
>> Am 05/04/2022 um 12:14 schrieb Kevin Wolf:
>>> I think all of this is really relevant for Emanuele's work, which
>>> involves adding AIO_WAIT_WHILE() deep inside graph update functions. I
>>> fully expect that we would see very similar problems, and just stacking
>>> drain sections over drain sections that might happen to usually fix
>>> things, but aren't guaranteed to, doesn't look like a good solution.
>>
>> Yes, I think at this point we all agreed to drop subtree_drain as
>> replacement for AioContext.
>>
>> The alternative is what Paolo proposed in the other thread " Removal of
>> AioContext lock, bs->parents and ->children: proof of concept"
>> I am not sure which thread you replied first :)
> 
> This one, I think. :-)
> 
>> I think that proposal is not far from your idea, and it avoids to
>> introduce or even use drains at all.
>> Not sure why you called it a "step backwards even from AioContext locks".
> 
> I was only referring to the lock locality there. AioContext locks are
> really coarse, but still a finer granularity than a single global lock.
> 
> In the big picture, it's still be better than the AioContext lock, but
> that's because it's a different type of lock, not because it has better
> locality.
> 
> So I was just wondering if we can't have the different type of lock and
> make it local to the BDS, too.

I guess this is the right time to discuss this.

I think that a global lock will be easier to handle, and we already have
a concrete implementation (cpus-common).

I think that the reads in some sense are already BDS-specific, because
each BDS that is reading has an internal a flag.
Writes, on the other hand, are global. If a write is happening, no other
read at all can run, even if it has nothing to do with it.

The question then is: how difficult would be to implement a BDS-specific
write?
>From the API prospective, change
bdrv_graph_wrlock(void);
into
bdrv_graph_wrlock(BlockDriverState *parent, BlockDriverState *child);
I am not sure if/how complicated it will be. For sure all the global
variables would end up in the BDS struct.

On the other side, also making instead read generic could be interesting.
Think about drain: it is a recursive function, and it doesn't really
make sense to take the rdlock for each node it traverses.
Even though I don't know an easy way to replace ->has_waiter and
->reading_graph flags...

Emanuele




Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-05 Thread Andy Lutomirski



On Tue, Apr 5, 2022, at 3:36 AM, Quentin Perret wrote:
> On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote:
>> 
>> 
>> On Mon, Apr 4, 2022, at 10:06 AM, Sean Christopherson wrote:
>> > On Mon, Apr 04, 2022, Quentin Perret wrote:
>> >> On Friday 01 Apr 2022 at 12:56:50 (-0700), Andy Lutomirski wrote:
>> >> FWIW, there are a couple of reasons why I'd like to have in-place
>> >> conversions:
>> >> 
>> >>  - one goal of pKVM is to migrate some things away from the Arm
>> >>Trustzone environment (e.g. DRM and the likes) and into protected VMs
>> >>instead. This will give Linux a fighting chance to defend itself
>> >>against these things -- they currently have access to _all_ memory.
>> >>And transitioning pages between Linux and Trustzone (donations and
>> >>shares) is fast and non-destructive, so we really do not want pKVM to
>> >>regress by requiring the hypervisor to memcpy things;
>> >
>> > Is there actually a _need_ for the conversion to be non-destructive?  
>> > E.g. I assume
>> > the "trusted" side of things will need to be reworked to run as a pKVM 
>> > guest, at
>> > which point reworking its logic to understand that conversions are 
>> > destructive and
>> > slow-ish doesn't seem too onerous.
>> >
>> >>  - it can be very useful for protected VMs to do shared=>private
>> >>conversions. Think of a VM receiving some data from the host in a
>> >>shared buffer, and then it wants to operate on that buffer without
>> >>risking to leak confidential informations in a transient state. In
>> >>that case the most logical thing to do is to convert the buffer back
>> >>to private, do whatever needs to be done on that buffer (decrypting a
>> >>frame, ...), and then share it back with the host to consume it;
>> >
>> > If performance is a motivation, why would the guest want to do two 
>> > conversions
>> > instead of just doing internal memcpy() to/from a private page?  I 
>> > would be quite
>> > surprised if multiple exits and TLB shootdowns is actually faster, 
>> > especially at
>> > any kind of scale where zapping stage-2 PTEs will cause lock contention 
>> > and IPIs.
>> 
>> I don't know the numbers or all the details, but this is arm64, which is a 
>> rather better architecture than x86 in this regard.  So maybe it's not so 
>> bad, at least in very simple cases, ignoring all implementation details.  
>> (But see below.)  Also the systems in question tend to have fewer CPUs than 
>> some of the massive x86 systems out there.
>
> Yep. I can try and do some measurements if that's really necessary, but
> I'm really convinced the cost of the TLBI for the shared->private
> conversion is going to be significantly smaller than the cost of memcpy
> the buffer twice in the guest for us. To be fair, although the cost for
> the CPU update is going to be low, the cost for IOMMU updates _might_ be
> higher, but that very much depends on the hardware. On systems that use
> e.g. the Arm SMMU, the IOMMUs can use the CPU page-tables directly, and
> the iotlb invalidation is done on the back of the CPU invalidation. So,
> on systems with sane hardware the overhead is *really* quite small.
>
> Also, memcpy requires double the memory, it is pretty bad for power, and
> it causes memory traffic which can't be a good thing for things running
> concurrently.
>
>> If we actually wanted to support transitioning the same page between shared 
>> and private, though, we have a bit of an awkward situation.  Private to 
>> shared is conceptually easy -- do some bookkeeping, reconstitute the direct 
>> map entry, and it's done.  The other direction is a mess: all existing uses 
>> of the page need to be torn down.  If the page has been recently used for 
>> DMA, this includes IOMMU entries.
>>
>> Quentin: let's ignore any API issues for now.  Do you have a concept of how 
>> a nondestructive shared -> private transition could work well, even in 
>> principle?
>
> I had a high level idea for the workflow, but I haven't looked into the
> implementation details.
>
> The idea would be to allow KVM *or* userspace to take a reference
> to a page in the fd in an exclusive manner. KVM could take a reference
> on a page (which would be necessary before to donating it to a guest)
> using some kind of memfile_notifier as proposed in this series, and
> userspace could do the same some other way (mmap presumably?). In both
> cases, the operation might fail.
>
> I would imagine the boot and private->shared flow as follow:
>
>  - the VMM uses fallocate on the private fd, and associates the offset, size> with a memslot;
>
>  - the guest boots, and as part of that KVM takes references to all the
>pages that are donated to the guest. If userspace happens to have a
>mapping to a page, KVM will fail to take the reference, which would
>be fatal for the guest.
>
>  - once the guest has booted, it issues a hypercall to share a page back
>with the host;
>
>  - KVM is notified, 

Re: [PATCH v4 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2

2022-04-05 Thread David Miller
Recommendation for comment?

/* vri-d encoding matches vrr for 4b imm.
  .insn does not handle this encoding variant.
*/

Christian: I will push another patch version as soon as that's decided.
(unless you prefer to choose the comment and edit during staging)

On Tue, Apr 5, 2022 at 6:13 AM David Hildenbrand  wrote:
>
> On 01.04.22 17:25, Christian Borntraeger wrote:
> > Am 01.04.22 um 17:02 schrieb David Miller:
> >> vrr is almost a perfect match (it is for this, larger than imm4 would
> >> need to be split).
> >>
> >> .long : this would be uglier.
> >> use enough to be filled with nops after ?
> >> or use a 32b and 16b instead if it's in .text it should make no difference.
> >
> > I will let Richard or David decide what they prefer.
> >
>
> I don't particularly care as long as there is a comment stating why we
> need this hack.
>
> --
> Thanks,
>
> David / dhildenb
>



Re: [PATCH v5 0/9] Add support for AST1030 SoC

2022-04-05 Thread Cédric Le Goater

Hello Jamin,

On 4/1/22 10:38, Jamin Lin wrote:

Changes from v5:
- remove TYPE_ASPEED_MINIBMC_MACHINE and ASPEED_MINIBMC_MACHINE
- remove ast1030_machine_instance_init function

Changes from v4:
- drop the ASPEED_SMC_FEATURE_WDT_CONTROL flag in hw/ssi/aspeed_smc.c

Changes from v3:
- remove AspeedMiniBmcMachineState state structure and
   AspeedMiniBmcMachineClass class
- remove redundant new line in hw/arm/aspeed_ast10xx.c


Do we want to be in sync with the zephyr naming and use ast10x0.c ?

   https://github.com/zephyrproject-rtos/zephyr/tree/main/soc/arm/aspeed

This is just a question. Don't resend for this.

Thanks,

C.



Re: [PATCH v3 3/3] qcow2: Add errp to rebuild_refcount_structure()

2022-04-05 Thread Eric Blake
On Tue, Apr 05, 2022 at 03:46:52PM +0200, Hanna Reitz wrote:
> Instead of fprint()-ing error messages in rebuild_refcount_structure()
> and its rebuild_refcounts_write_refblocks() helper, pass them through an
> Error object to qcow2_check_refcounts() (which will then print it).
> 
> Suggested-by: Eric Blake 
> Signed-off-by: Hanna Reitz 
> ---
>  block/qcow2-refcount.c | 33 +++--
>  1 file changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index c5669eaa51..ed0ecfaa89 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -2465,7 +2465,8 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
>  static int rebuild_refcounts_write_refblocks(
>  BlockDriverState *bs, void **refcount_table, int64_t *nb_clusters,
>  int64_t first_cluster, int64_t end_cluster,
> -uint64_t **on_disk_reftable_ptr, uint32_t 
> *on_disk_reftable_entries_ptr
> +uint64_t **on_disk_reftable_ptr, uint32_t 
> *on_disk_reftable_entries_ptr,
> +Error **errp
>  )
>  {
>  BDRVQcow2State *s = bs->opaque;
> @@ -2516,8 +2517,8 @@ static int rebuild_refcounts_write_refblocks(
>nb_clusters,
>_free_cluster);
>  if (refblock_offset < 0) {
> -fprintf(stderr, "ERROR allocating refblock: %s\n",
> -strerror(-refblock_offset));
> +error_setg_errno(errp, -refblock_offset,
> + "ERROR allocating refblock");

Most uses of error_setg* don't ALL_CAPS the first word.  But this is
pre-existing, so I'm not insisting you change it here.

>  return refblock_offset;
>  }
>  
> @@ -2539,6 +2540,7 @@ static int rebuild_refcounts_write_refblocks(
>on_disk_reftable_entries *
>REFTABLE_ENTRY_SIZE);
>  if (!on_disk_reftable) {
> +error_setg(errp, "ERROR allocating reftable memory");
>  return -ENOMEM;

Ah, so this is also a corner case bug fix, where we didn't have a
message on all error paths.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [qemu.qmp PATCH 10/13] docs: add versioning policy to README

2022-04-05 Thread John Snow
On Tue, Apr 5, 2022, 5:16 AM Damien Hedde 
wrote:

>
>
> On 3/30/22 20:24, John Snow wrote:
> > The package is in an alpha state, but there's a method to the madness.
> >
> > Signed-off-by: John Snow 
> > ---
> >   README.rst | 21 +
> >   1 file changed, 21 insertions(+)
> >
> > diff --git a/README.rst b/README.rst
> > index 8593259..88efe84 100644
> > --- a/README.rst
> > +++ b/README.rst
> > @@ -154,6 +154,27 @@ fail. These checks use their own virtual
> environments and won't pollute
> >   your working space.
> >
> >
> > +Stability and Versioning
> > +
> > +
> > +This package uses a major.minor.micro SemVer versioning, with the
> > +following additional semantics during the alpha/beta period (Major
> > +version 0):
> > +
> > +This package treats 0.0.z versions as "alpha" versions. Each micro
> > +version update may change the API incompatibly. Early users are advised
> > +to pin against explicit versions, but check for updates often.
> > +
> > +A planned 0.1.z version will introduce the first "beta", whereafter each
> > +micro update will be backwards compatible, but each minor update will
> > +not be. The first beta version will be released after legacy.py is
> > +removed, and the API is tentatively "stable".
> > +
> > +Thereafter, normal SemVer/PEP440 rules will apply; micro updates will
> > +always be bugfixes, and minor updates will be reserved for backwards
> > +compatible feature changes.
> > +
> > +
> >   Changelog
> >   -
> >
>
> Looks reasonable to me.
> Reviewed-by: Damien Hedde 
>

Thanks! I'm hoping to make it easier to spin up more dev tooling outside of
the qemu tree. If you've got any wishlist items, feel free to let me know.

It's still early days for Python packages outside of the qemu tree, so
nearly everything is on the table still.

(the jsnow/python staging branch has some 17 patches in it that will be
checked in to QEMU when development re-opens. The forked qemu.qmp repo will
be based off of qemu.git after those patches go in. There's a bit of
shakeup where I delete the old qmp lib and replace it with what's currently
aqmp. It should hopefully not be a huge nuisance to your work, but if
there's issues, let me know.)

Thanks,
--John Snow


Re: [RFC PATCH] docs/devel: start documenting writing VirtIO devices

2022-04-05 Thread Alex Bennée


Cornelia Huck  writes:

> On Wed, Mar 16 2022, Alex Bennée  wrote:
>
>> Cornelia Huck  writes:
>>
>>> On Wed, Mar 09 2022, Alex Bennée  wrote:
>
 +Writing VirtIO backends for QEMU
 +
 +
 +This document attempts to outline the information a developer needs to
 +know to write backends for QEMU. It is specifically focused on
 +implementing VirtIO devices.
>>>
>>> I think you first need to define a bit more clearly what you consider a
>>> "backend". For virtio, it is probably "everything a device needs to
>>> function as a specific device type like net, block, etc., which may be
>>> implemented by different methods" (as you describe further below).
>>
>> How about:
>>
>>   This document attempts to outline the information a developer needs to
>>   know to write device emulations in QEMU. It is specifically focused on
>>   implementing VirtIO devices. For VirtIO the frontend is the driver
>>   running on the guest. The backend is the everything that QEMU needs to
>>   do to handle the emulation of the VirtIO device. This can be done
>>   entirely in QEMU, divided between QEMU and the kernel (vhost) or
>>   handled by a separate process which is configured by QEMU
>>   (vhost-user).
>
> I'm afraid that confuses me even more :)
>
> This sounds to me like frontend == driver (in virtio spec terminology)
> and backend == device. Is that really what you meant?

I think so. To be honest it's the different types of backend (in QEMU,
vhost and vhost-user) I'm trying to be clear about here. The
frontend/driver is just mentioned for completeness.

>
>>
>>>
 +
 +Front End Transports
 +
 +
 +VirtIO supports a number of different front end transports. The
 +details of the device remain the same but there are differences in
 +command line for specifying the device (e.g. -device virtio-foo
 +and -device virtio-foo-pci). For example:
 +
 +.. code:: c
 +
 +  static const TypeInfo vhost_user_blk_info = {
 +  .name = TYPE_VHOST_USER_BLK,
 +  .parent = TYPE_VIRTIO_DEVICE,
 +  .instance_size = sizeof(VHostUserBlk),
 +  .instance_init = vhost_user_blk_instance_init,
 +  .class_init = vhost_user_blk_class_init,
 +  };
 +
 +defines ``TYPE_VHOST_USER_BLK`` as a child of the generic
 +``TYPE_VIRTIO_DEVICE``.
>>>
>>> That's not what I'd consider a "front end", though?
>>
>> Yeah clumsy wording. I'm trying to get find a good example to show how
>> QOM can be used to abstract the core device operation and the wrappers
>> for different transports. However in the code base there seems to be
>> considerable variation about how this is done. Any advice as to the
>> best exemplary device to follow is greatly welcomed.
>
> I'm not sure which of the example we can really consider a "good"
> device; the normal modus operandi when writing a new device seems to be
> "pick the first device you can think of and copy whatever it
> does".

Yeah the QEMU curse. Hence trying to document the "best" approach or at
least make the picking of a reference a little less random ;-)

> Personally, I usally look at blk or net, but those carry a lot of
> legacy baggage; so maybe a modern virtio-1 only device like gpu? That
> one also has the advantage of not being pci-only.
>
> Does anyone else have a good suggestion here?

Sorry I totally forgot to include you in the Cc of the v1 posting:

  Subject: [PATCH  v1 09/13] docs/devel: start documenting writing VirtIO 
devices
  Date: Mon, 21 Mar 2022 15:30:33 +
  Message-Id: <20220321153037.3622127-10-alex.ben...@linaro.org>

although expect a v2 soonish (once I can get a reasonable qos-test
vhost-user test working).

>
>>
 And then for the PCI device it wraps around the
 +base device (although explicitly initialising via
 +virtio_instance_init_common):
 +
 +.. code:: c
 +
 +  struct VHostUserBlkPCI {
 +  VirtIOPCIProxy parent_obj;
 +  VHostUserBlk vdev;
 +  };
>>>
>>> The VirtIOPCIProxy seems to materialize a bit out of thin air
>>> here... maybe the information simply needs to be structured in a
>>> different way? Perhaps:
>>>
>>> - describe that virtio devices consist of a part that implements the
>>>   device functionality, which ultimately derives from VirtIODevice (the
>>>   "backend"), and a part that exposes a way for the operating system to
>>>   discover and use the device (the "frontend", what the virtio spec
>>>   calls a "transport")
>>> - decribe how the "frontend" part works (maybe mention VirtIOPCIProxy,
>>>   VirtIOMMIOProxy, and VirtioCcwDevice as specialized proxy devices for
>>>   PCI, MMIO, and CCW devices)
>>> - list the different types of "backends" (as you did below), and give
>>>   two examples of how VirtIODevice is extended (a plain one, and a
>>>   vhost-user one)
>>> - explain how frontend and backend together create an actual device
>>>   (with the two device 

Re: [PATCH] docs/ccid: convert to restructuredText

2022-04-05 Thread Damien Hedde




On 4/5/22 16:29, oxr...@gmx.us wrote:

From: Lucas Ramage 

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/527
Signed-off-by: Lucas Ramage 


Provided 2 minors tweaks (see below: missing empty line, and empty line 
at EOF),

Reviewed-by: Damien Hedde 

Note that I'm not competent regarding the content of this doc. But it 
corresponds to the previous version and the doc generation works.



---
  docs/ccid.txt| 182 ---
  docs/system/device-emulation.rst |   1 +
  docs/system/devices/ccid.rst | 171 +
  3 files changed, 172 insertions(+), 182 deletions(-)
  delete mode 100644 docs/ccid.txt
  create mode 100644 docs/system/devices/ccid.rst

diff --git a/docs/ccid.txt b/docs/ccid.txt
deleted file mode 100644
index 2b85b1bd42..00
--- a/docs/ccid.txt
+++ /dev/null
@@ -1,182 +0,0 @@
-QEMU CCID Device Documentation.
-
-Contents
-1. USB CCID device
-2. Building
-3. Using ccid-card-emulated with hardware
-4. Using ccid-card-emulated with certificates
-5. Using ccid-card-passthru with client side hardware
-6. Using ccid-card-passthru with client side certificates
-7. Passthrough protocol scenario
-8. libcacard
-
-1. USB CCID device
-
-The USB CCID device is a USB device implementing the CCID specification, which
-lets one connect smart card readers that implement the same spec. For more
-information see the specification:
-
- Universal Serial Bus
- Device Class: Smart Card
- CCID
- Specification for
- Integrated Circuit(s) Cards Interface Devices
- Revision 1.1
- April 22rd, 2005
-
-Smartcards are used for authentication, single sign on, decryption in
-public/private schemes and digital signatures. A smartcard reader on the client
-cannot be used on a guest with simple usb passthrough since it will then not be
-available on the client, possibly locking the computer when it is "removed". On
-the other hand this device can let you use the smartcard on both the client and
-the guest machine. It is also possible to have a completely virtual smart card
-reader and smart card (i.e. not backed by a physical device) using this device.
-
-2. Building
-
-The cryptographic functions and access to the physical card is done via the
-libcacard library, whose development package must be installed prior to
-building QEMU:
-
-In redhat/fedora:
-yum install libcacard-devel
-In ubuntu:
-apt-get install libcacard-dev
-
-Configuring and building:
-./configure --enable-smartcard && make
-
-
-3. Using ccid-card-emulated with hardware
-
-Assuming you have a working smartcard on the host with the current
-user, using libcacard, QEMU acts as another client using ccid-card-emulated:
-
-qemu -usb -device usb-ccid -device ccid-card-emulated
-
-
-4. Using ccid-card-emulated with certificates stored in files
-
-You must create the CA and card certificates. This is a one time process.
-We use NSS certificates:
-
-mkdir fake-smartcard
-cd fake-smartcard
-certutil -N -d sql:$PWD
-certutil -S -d sql:$PWD -s "CN=Fake Smart Card CA" -x -t TC,TC,TC -n 
fake-smartcard-ca
-certutil -S -d sql:$PWD -t ,, -s "CN=John Doe" -n id-cert -c 
fake-smartcard-ca
-certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (signing)" --nsCertType 
smime -n signing-cert -c fake-smartcard-ca
-certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (encryption)" --nsCertType 
sslClient -n encryption-cert -c fake-smartcard-ca
-
-Note: you must have exactly three certificates.
-
-You can use the emulated card type with the certificates backend:
-
-qemu -usb -device usb-ccid -device 
ccid-card-emulated,backend=certificates,db=sql:$PWD,cert1=id-cert,cert2=signing-cert,cert3=encryption-cert
-
-To use the certificates in the guest, export the CA certificate:
-
-certutil -L -r -d sql:$PWD -o fake-smartcard-ca.cer -n fake-smartcard-ca
-
-and import it in the guest:
-
-certutil -A -d /etc/pki/nssdb -i fake-smartcard-ca.cer -t TC,TC,TC -n 
fake-smartcard-ca
-
-In a Linux guest you can then use the CoolKey PKCS #11 module to access
-the card:
-
-certutil -d /etc/pki/nssdb -L -h all
-
-It will prompt you for the PIN (which is the password you assigned to the
-certificate database early on), and then show you all three certificates
-together with the manually imported CA cert:
-
-Certificate NicknameTrust Attributes
-fake-smartcard-ca   CT,C,C
-John Doe:CAC ID Certificate u,u,u
-John Doe:CAC Email Signature Certificateu,u,u
-John Doe:CAC Email Encryption Certificate   u,u,u
-
-If this does not happen, CoolKey is not installed or not registered with
-NSS.  Registration can be done from Firefox or the command line:
-
-modutil -dbdir /etc/pki/nssdb -add "CAC Module" -libfile 
/usr/lib64/pkcs11/libcoolkeypk11.so
-modutil -dbdir /etc/pki/nssdb -list
-
-
-5. Using ccid-card-passthru with client side hardware
-
-on the host specify the ccid-card-passthru device 

[RFC v2 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization

2022-04-05 Thread Stefan Hajnoczi
Avoid bounce buffers when QEMUIOVector elements are within previously
registered bdrv_register_buf() buffers.

The idea is that emulated storage controllers will register guest RAM
using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
requests. Therefore no blkio_add_mem_region() calls are necessary in the
performance-critical I/O code path.

This optimization doesn't apply if the I/O buffer is internally
allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
path because BDRV_REQ_REGISTERED_BUF is not set.

Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 106 +++---
 1 file changed, 101 insertions(+), 5 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 562e972003..41894c7015 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -1,7 +1,9 @@
 #include "qemu/osdep.h"
 #include 
 #include "block/block_int.h"
+#include "exec/memory.h"
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
 
@@ -25,6 +27,9 @@ typedef struct {
 
 /* Can we skip adding/deleting blkio_mem_regions? */
 bool needs_mem_regions;
+
+/* Are file descriptors necessary for blkio_mem_regions? */
+bool needs_mem_region_fd;
 } BDRVBlkioState;
 
 static void blkio_aiocb_complete(BlkioAIOCB *acb, int ret)
@@ -157,6 +162,8 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, 
int64_t offset,
 BlockCompletionFunc *cb, void *opaque)
 {
 BDRVBlkioState *s = bs->opaque;
+bool needs_mem_regions =
+s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
 struct iovec *iov = qiov->iov;
 int iovcnt = qiov->niov;
 BlkioAIOCB *acb;
@@ -166,7 +173,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, 
int64_t offset,
 
 acb = blkio_aiocb_get(bs, cb, opaque);
 
-if (s->needs_mem_regions) {
+if (needs_mem_regions) {
 if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
 qemu_aio_unref(>common);
 return NULL;
@@ -181,7 +188,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, 
int64_t offset,
 
 ret = blkioq_readv(s->blkioq, offset, iov, iovcnt, acb, 0);
 if (ret < 0) {
-if (s->needs_mem_regions) {
+if (needs_mem_regions) {
 blkio_free_mem_region(s->blkio, >mem_region);
 qemu_iovec_destroy(>qiov);
 }
@@ -202,6 +209,8 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, 
int64_t offset,
 {
 uint32_t blkio_flags = (flags & BDRV_REQ_FUA) ? BLKIO_REQ_FUA : 0;
 BDRVBlkioState *s = bs->opaque;
+bool needs_mem_regions =
+s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
 struct iovec *iov = qiov->iov;
 int iovcnt = qiov->niov;
 BlkioAIOCB *acb;
@@ -211,7 +220,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, 
int64_t offset,
 
 acb = blkio_aiocb_get(bs, cb, opaque);
 
-if (s->needs_mem_regions) {
+if (needs_mem_regions) {
 if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) {
 qemu_aio_unref(>common);
 return NULL;
@@ -225,7 +234,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, 
int64_t offset,
 
 ret = blkioq_writev(s->blkioq, offset, iov, iovcnt, acb, blkio_flags);
 if (ret < 0) {
-if (s->needs_mem_regions) {
+if (needs_mem_regions) {
 blkio_free_mem_region(s->blkio, >mem_region);
 }
 qemu_aio_unref(>common);
@@ -273,6 +282,80 @@ static void blkio_io_unplug(BlockDriverState *bs)
 }
 }
 
+static void blkio_register_buf(BlockDriverState *bs, void *host, size_t size)
+{
+BDRVBlkioState *s = bs->opaque;
+int ret;
+struct blkio_mem_region region = (struct blkio_mem_region){
+.addr = host,
+.len = size,
+.fd = -1,
+};
+
+if (((uintptr_t)host | size) % s->mem_region_alignment) {
+error_report_once("%s: skipping unaligned buf %p with size %zu",
+  __func__, host, size);
+return; /* skip unaligned */
+}
+
+/* Attempt to find the fd for a MemoryRegion */
+if (s->needs_mem_region_fd) {
+int fd = -1;
+ram_addr_t offset;
+MemoryRegion *mr;
+
+/*
+ * bdrv_register_buf() is called with the BQL held so mr lives at least
+ * until this function returns.
+ */
+mr = memory_region_from_host(host, );
+if (mr) {
+fd = memory_region_get_fd(mr);
+}
+if (fd == -1) {
+error_report_once("%s: skipping fd-less buf %p with size %zu",
+  __func__, host, size);
+return; /* skip if there is no fd */
+}
+
+region.fd = fd;
+region.fd_offset = offset;
+}
+
+WITH_QEMU_LOCK_GUARD(>lock) {
+ret = blkio_add_mem_region(s->blkio, );
+}
+
+if (ret < 0) {
+error_report_once("Failed to add blkio mem 

[RFC v2 8/8] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

2022-04-05 Thread Stefan Hajnoczi
Register guest RAM using BlockRAMRegistrar and set the
BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
accesses in I/O requests.

This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
on DMA mapping/unmapping.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio-blk.h |  2 ++
 hw/block/virtio-blk.c  | 13 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index d311c57cca..7f589b4146 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -19,6 +19,7 @@
 #include "hw/block/block.h"
 #include "sysemu/iothread.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/block-ram-registrar.h"
 #include "qom/object.h"
 
 #define TYPE_VIRTIO_BLK "virtio-blk-device"
@@ -64,6 +65,7 @@ struct VirtIOBlock {
 struct VirtIOBlockDataPlane *dataplane;
 uint64_t host_features;
 size_t config_size;
+BlockRAMRegistrar blk_ram_registrar;
 };
 
 typedef struct VirtIOBlockReq {
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 540c38f829..a18cf05f14 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -21,6 +21,7 @@
 #include "hw/block/block.h"
 #include "hw/qdev-properties.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/block-ram-registrar.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "hw/virtio/virtio-blk.h"
@@ -421,11 +422,13 @@ static inline void submit_requests(BlockBackend *blk, 
MultiReqBuffer *mrb,
 }
 
 if (is_write) {
-blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
-virtio_blk_rw_complete, mrb->reqs[start]);
+blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov,
+BDRV_REQ_REGISTERED_BUF, virtio_blk_rw_complete,
+mrb->reqs[start]);
 } else {
-blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
-   virtio_blk_rw_complete, mrb->reqs[start]);
+blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov,
+   BDRV_REQ_REGISTERED_BUF, virtio_blk_rw_complete,
+   mrb->reqs[start]);
 }
 }
 
@@ -1228,6 +1231,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 }
 
 s->change = qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s);
+blk_ram_registrar_init(>blk_ram_registrar, s->blk);
 blk_set_dev_ops(s->blk, _block_ops, s);
 blk_set_guest_block_size(s->blk, s->conf.conf.logical_block_size);
 
@@ -1255,6 +1259,7 @@ static void virtio_blk_device_unrealize(DeviceState *dev)
 }
 qemu_coroutine_decrease_pool_batch_size(conf->num_queues * conf->queue_size
 / 2);
+blk_ram_registrar_destroy(>blk_ram_registrar);
 qemu_del_vm_change_state_handler(s->change);
 blockdev_mark_auto_del(s->blk);
 virtio_cleanup(vdev);
-- 
2.35.1




[RFC v2 5/8] block: add BlockRAMRegistrar

2022-04-05 Thread Stefan Hajnoczi
Emulated devices and other BlockBackend users wishing to take advantage
of blk_register_buf() all have the same repetitive job: register
RAMBlocks with the BlockBackend using RAMBlockNotifier.

Add a BlockRAMRegistrar API to do this. A later commit will use this
from hw/block/virtio-blk.c.

Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS  |  1 +
 include/sysemu/block-ram-registrar.h | 30 +
 block/block-ram-registrar.c  | 39 
 block/meson.build|  1 +
 4 files changed, 71 insertions(+)
 create mode 100644 include/sysemu/block-ram-registrar.h
 create mode 100644 block/block-ram-registrar.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d839301f68..655f79c9f7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2462,6 +2462,7 @@ F: block*
 F: block/
 F: hw/block/
 F: include/block/
+F: include/sysemu/block-*.h
 F: qemu-img*
 F: docs/tools/qemu-img.rst
 F: qemu-io*
diff --git a/include/sysemu/block-ram-registrar.h 
b/include/sysemu/block-ram-registrar.h
new file mode 100644
index 00..09d63f64b2
--- /dev/null
+++ b/include/sysemu/block-ram-registrar.h
@@ -0,0 +1,30 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef BLOCK_RAM_REGISTRAR_H
+#define BLOCK_RAM_REGISTRAR_H
+
+#include "exec/ramlist.h"
+
+/**
+ * struct BlockRAMRegistrar:
+ *
+ * Keeps RAMBlock memory registered with a BlockBackend using
+ * blk_register_buf() including hotplugged memory.
+ *
+ * Emulated devices or other BlockBackend users initialize a BlockRAMRegistrar
+ * with blk_ram_registrar_init() before submitting I/O requests with the
+ * BLK_REQ_REGISTERED_BUF flag set.
+ */
+typedef struct {
+BlockBackend *blk;
+RAMBlockNotifier notifier;
+} BlockRAMRegistrar;
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk);
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r);
+
+#endif /* BLOCK_RAM_REGISTRAR_H */
diff --git a/block/block-ram-registrar.c b/block/block-ram-registrar.c
new file mode 100644
index 00..32a14b69ae
--- /dev/null
+++ b/block/block-ram-registrar.c
@@ -0,0 +1,39 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/block-ram-registrar.h"
+
+static void ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+size_t max_size)
+{
+BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+blk_register_buf(r->blk, host, max_size);
+}
+
+static void ram_block_removed(RAMBlockNotifier *n, void *host, size_t size,
+  size_t max_size)
+{
+BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+blk_unregister_buf(r->blk, host, max_size);
+}
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk)
+{
+r->blk = blk;
+r->notifier = (RAMBlockNotifier){
+.ram_block_added = ram_block_added,
+.ram_block_removed = ram_block_removed,
+};
+
+ram_block_notifier_add(>notifier);
+}
+
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r)
+{
+ram_block_notifier_remove(>notifier);
+}
diff --git a/block/meson.build b/block/meson.build
index 787667384a..b315593054 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -46,6 +46,7 @@ block_ss.add(files(
 ), zstd, zlib, gnutls)
 
 softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
+softmmu_ss.add(files('block-ram-registrar.c'))
 
 if get_option('qcow1').allowed()
   block_ss.add(files('qcow.c'))
-- 
2.35.1




[RFC v2 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove()

2022-04-05 Thread Stefan Hajnoczi
When a RAMBlockNotifier is added, ->ram_block_added() is called with all
existing RAMBlocks. There is no equivalent ->ram_block_removed() call
when a RAMBlockNotifier is removed.

The util/vfio-helpers.c code (the sole user of RAMBlockNotifier) is fine
with this asymmetry because it does not rely on RAMBlockNotifier for
cleanup. It walks its internal list of DMA mappings and unmaps them by
itself.

Future users of RAMBlockNotifier may not have an internal data structure
that records added RAMBlocks so they will need ->ram_block_removed()
callbacks.

This patch makes ram_block_notifier_remove() symmetric with respect to
callbacks. Now util/vfio-helpers.c needs to unmap remaining DMA mappings
after ram_block_notifier_remove() has been called. This is necessary
since users like block/nvme.c may create additional DMA mappings that do
not originate from the RAMBlockNotifier.

Reviewed-by: David Hildenbrand 
Signed-off-by: Stefan Hajnoczi 
---
 hw/core/numa.c  | 17 +
 util/vfio-helpers.c |  5 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 1aa05dcf42..6bf9694d20 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -822,6 +822,19 @@ static int ram_block_notify_add_single(RAMBlock *rb, void 
*opaque)
 return 0;
 }
 
+static int ram_block_notify_remove_single(RAMBlock *rb, void *opaque)
+{
+const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+const ram_addr_t size = qemu_ram_get_used_length(rb);
+void *host = qemu_ram_get_host_addr(rb);
+RAMBlockNotifier *notifier = opaque;
+
+if (host) {
+notifier->ram_block_removed(notifier, host, size, max_size);
+}
+return 0;
+}
+
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 QLIST_INSERT_HEAD(_list.ramblock_notifiers, n, next);
@@ -835,6 +848,10 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
 void ram_block_notifier_remove(RAMBlockNotifier *n)
 {
 QLIST_REMOVE(n, next);
+
+if (n->ram_block_removed) {
+qemu_ram_foreach_block(ram_block_notify_remove_single, n);
+}
 }
 
 void ram_block_notify_add(void *host, size_t size, size_t max_size)
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index b037d5faa5..dc90496592 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -847,10 +847,13 @@ void qemu_vfio_close(QEMUVFIOState *s)
 if (!s) {
 return;
 }
+
+ram_block_notifier_remove(>ram_notifier);
+
 for (i = 0; i < s->nr_mappings; ++i) {
 qemu_vfio_undo_mapping(s, >mappings[i], NULL);
 }
-ram_block_notifier_remove(>ram_notifier);
+
 g_free(s->usable_iova_ranges);
 s->nb_iova_ranges = 0;
 qemu_vfio_reset(s);
-- 
2.35.1




[RFC v2 6/8] stubs: add memory_region_from_host() and memory_region_get_fd()

2022-04-05 Thread Stefan Hajnoczi
The blkio block driver will need to look up the file descriptor for a
given pointer. This is possible in softmmu builds where the memory API
is available for querying guest RAM.

Add stubs so tools like qemu-img that link the block layer still build
successfully. In this case there is no guest RAM but that is fine.
Bounce buffers and their file descriptors will be allocated with
libblkio's blkio_alloc_mem_region() so we won't rely on QEMU's
memory_region_get_fd() in that case.

Signed-off-by: Stefan Hajnoczi 
---
 stubs/memory.c| 13 +
 stubs/meson.build |  1 +
 2 files changed, 14 insertions(+)
 create mode 100644 stubs/memory.c

diff --git a/stubs/memory.c b/stubs/memory.c
new file mode 100644
index 00..e9ec4e384b
--- /dev/null
+++ b/stubs/memory.c
@@ -0,0 +1,13 @@
+#include "qemu/osdep.h"
+#include "exec/memory.h"
+
+MemoryRegion *memory_region_from_host(void *host, ram_addr_t *offset)
+{
+return NULL;
+}
+
+int memory_region_get_fd(MemoryRegion *mr)
+{
+return -1;
+}
+
diff --git a/stubs/meson.build b/stubs/meson.build
index 6f80fec761..1e274d2db2 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -25,6 +25,7 @@ stub_ss.add(files('is-daemonized.c'))
 if libaio.found()
   stub_ss.add(files('linux-aio.c'))
 endif
+stub_ss.add(files('memory.c'))
 stub_ss.add(files('migr-blocker.c'))
 stub_ss.add(files('module-opts.c'))
 stub_ss.add(files('monitor.c'))
-- 
2.35.1




[RFC v2 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag

2022-04-05 Thread Stefan Hajnoczi
Block drivers may optimize I/O requests accessing buffers previously
registered with bdrv_register_buf(). Checking whether all elements of a
request's QEMUIOVector are within previously registered buffers is
expensive, so we need a hint from the user to avoid costly checks.

Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all
QEMUIOVector elements in an I/O request are known to be within
previously registered buffers.

bdrv_aligned_preadv() is strict in validating supported read flags and
its assertions fail when it sees BDRV_REQ_REGISTERED_BUF. There is no
harm in passing BDRV_REQ_REGISTERED_BUF to block drivers that do not
support it, so update the assertions to ignore BDRV_REQ_REGISTERED_BUF.

Care must be taken to clear the flag when the block layer or filter
drivers replace QEMUIOVector elements with bounce buffers since these
have not been registered with bdrv_register_buf(). A lot of the changes
in this commit deal with clearing the flag in those cases.

Ensuring that the flag is cleared properly is somewhat invasive to
implement across the block layer and it's hard to spot when future code
changes accidentally break it. Another option might be to add a flag to
QEMUIOVector itself and clear it in qemu_iovec_*() functions that modify
elements. That is more robust but somewhat of a layering violation, so I
haven't attempted that.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-common.h |  9 +
 block/blkverify.c|  4 ++--
 block/crypto.c   |  2 ++
 block/io.c   | 30 +++---
 block/mirror.c   |  2 ++
 block/raw-format.c   |  2 ++
 6 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..061606e867 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -80,6 +80,15 @@ typedef enum {
  */
 BDRV_REQ_MAY_UNMAP  = 0x4,
 
+/*
+ * An optimization hint when all QEMUIOVector elements are within
+ * previously registered bdrv_register_buf() memory ranges.
+ *
+ * Code that replaces the user's QEMUIOVector elements with bounce buffers
+ * must take care to clear this flag.
+ */
+BDRV_REQ_REGISTERED_BUF = 0x8,
+
 BDRV_REQ_FUA= 0x10,
 BDRV_REQ_WRITE_COMPRESSED   = 0x20,
 
diff --git a/block/blkverify.c b/block/blkverify.c
index e4a37af3b2..d624f4fd05 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -235,8 +235,8 @@ blkverify_co_preadv(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 qemu_iovec_init(_qiov, qiov->niov);
 qemu_iovec_clone(_qiov, qiov, buf);
 
-ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov, flags,
-false);
+ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov,
+flags & ~BDRV_REQ_REGISTERED_BUF, false);
 
 cmp_offset = qemu_iovec_compare(qiov, _qiov);
 if (cmp_offset != -1) {
diff --git a/block/crypto.c b/block/crypto.c
index 1ba82984ef..c900355adb 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -473,6 +473,8 @@ block_crypto_co_pwritev(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block);
 uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block);
 
+flags &= ~BDRV_REQ_REGISTERED_BUF;
+
 assert(!(flags & ~BDRV_REQ_FUA));
 assert(payload_offset < INT64_MAX);
 assert(QEMU_IS_ALIGNED(offset, sector_size));
diff --git a/block/io.c b/block/io.c
index a8a7920e29..139e36c2e1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1556,11 +1556,14 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
 max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
align);
 
-/* TODO: We would need a per-BDS .supported_read_flags and
+/*
+ * TODO: We would need a per-BDS .supported_read_flags and
  * potential fallback support, if we ever implement any read flags
  * to pass through to drivers.  For now, there aren't any
- * passthrough flags.  */
-assert(!(flags & ~(BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH)));
+ * passthrough flags except the BDRV_REQ_REGISTERED_BUF optimization hint.
+ */
+assert(!(flags & ~(BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH |
+   BDRV_REQ_REGISTERED_BUF)));
 
 /* Handle Copy on Read and associated serialisation */
 if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -1601,7 +1604,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
 goto out;
 }
 
-assert(!(flags & ~bs->supported_read_flags));
+assert(!(flags & ~(bs->supported_read_flags | BDRV_REQ_REGISTERED_BUF)));
 
 max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
 if (bytes <= max_bytes && bytes <= max_transfer) {
@@ -1790,7 +1793,8 @@ static void 

[RFC v2 3/8] block: pass size to bdrv_unregister_buf()

2022-04-05 Thread Stefan Hajnoczi
The only implementor of bdrv_register_buf() is block/nvme.c, where the
size is not needed when unregistering a buffer. This is because
util/vfio-helpers.c can look up mappings by address.

Future block drivers that implement bdrv_register_buf() may not be able
to do their job given only the buffer address. Add a size argument to
bdrv_unregister_buf().

Also document the assumptions about
bdrv_register_buf()/bdrv_unregister_buf() calls. The same 
values that were given to bdrv_register_buf() must be given to
bdrv_unregister_buf().

gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local
variable might be uninitialized, so it's necessary to silence the
compiler.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-global-state.h  | 5 -
 include/block/block_int-common.h| 2 +-
 include/sysemu/block-backend-global-state.h | 2 +-
 block/block-backend.c   | 4 ++--
 block/io.c  | 6 +++---
 block/nvme.c| 2 +-
 qemu-img.c  | 4 ++--
 7 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index 25bb69bbef..2295a7c767 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -244,9 +244,12 @@ void bdrv_del_child(BlockDriverState *parent, BdrvChild 
*child, Error **errp);
  * Register/unregister a buffer for I/O. For example, VFIO drivers are
  * interested to know the memory areas that would later be used for I/O, so
  * that they can prepare IOMMU mapping etc., to get better performance.
+ *
+ * Buffers must not overlap and they must be unregistered with the same  values that they were registered with.
  */
 void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
-void bdrv_unregister_buf(BlockDriverState *bs, void *host);
+void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size);
 
 void bdrv_cancel_in_flight(BlockDriverState *bs);
 
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 8947abab76..b7a7cbd3a5 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -435,7 +435,7 @@ struct BlockDriver {
  * DMA mapping for hot buffers.
  */
 void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
-void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host);
+void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host, size_t size);
 
 /*
  * This field is modified only under the BQL, and is part of
diff --git a/include/sysemu/block-backend-global-state.h 
b/include/sysemu/block-backend-global-state.h
index 2e93a74679..989ec0364b 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -107,7 +107,7 @@ void blk_io_limits_update_group(BlockBackend *blk, const 
char *group);
 void blk_set_force_allow_inactivate(BlockBackend *blk);
 
 void blk_register_buf(BlockBackend *blk, void *host, size_t size);
-void blk_unregister_buf(BlockBackend *blk, void *host);
+void blk_unregister_buf(BlockBackend *blk, void *host, size_t size);
 
 const BdrvChild *blk_root(BlockBackend *blk);
 
diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..8af00d8a36 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2591,10 +2591,10 @@ void blk_register_buf(BlockBackend *blk, void *host, 
size_t size)
 bdrv_register_buf(blk_bs(blk), host, size);
 }
 
-void blk_unregister_buf(BlockBackend *blk, void *host)
+void blk_unregister_buf(BlockBackend *blk, void *host, size_t size)
 {
 GLOBAL_STATE_CODE();
-bdrv_unregister_buf(blk_bs(blk), host);
+bdrv_unregister_buf(blk_bs(blk), host, size);
 }
 
 int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
diff --git a/block/io.c b/block/io.c
index 3280144a17..a8a7920e29 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3365,16 +3365,16 @@ void bdrv_register_buf(BlockDriverState *bs, void 
*host, size_t size)
 }
 }
 
-void bdrv_unregister_buf(BlockDriverState *bs, void *host)
+void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size)
 {
 BdrvChild *child;
 
 GLOBAL_STATE_CODE();
 if (bs->drv && bs->drv->bdrv_unregister_buf) {
-bs->drv->bdrv_unregister_buf(bs, host);
+bs->drv->bdrv_unregister_buf(bs, host, size);
 }
 QLIST_FOREACH(child, >children, next) {
-bdrv_unregister_buf(child->bs, host);
+bdrv_unregister_buf(child->bs, host, size);
 }
 }
 
diff --git a/block/nvme.c b/block/nvme.c
index 552029931d..88485e77f1 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1592,7 +1592,7 @@ static void nvme_register_buf(BlockDriverState *bs, void 
*host, size_t size)
 }
 }
 
-static void nvme_unregister_buf(BlockDriverState *bs, void *host)
+static void nvme_unregister_buf(BlockDriverState *bs, void *host, size_t 

[RFC v2 1/8] blkio: add io_uring block driver using libblkio

2022-04-05 Thread Stefan Hajnoczi
libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
high-performance disk I/O. It currently supports io_uring with
additional drivers planned.

One of the reasons for developing libblkio is that other applications
besides QEMU can use it. This will be particularly useful for
vhost-user-blk which applications may wish to use for connecting to
qemu-storage-daemon.

libblkio also gives us an opportunity to develop in Rust behind a C API
that is easy to consume from QEMU.

This commit adds an io_uring BlockDriver to QEMU using libblkio. For now
I/O buffers are copied through bounce buffers if the libblkio driver
requires it. Later commits add an optimization for pre-registering guest
RAM to avoid bounce buffers. It will be easy to add other libblkio
drivers since they will share the majority of code.

Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |   6 +
 meson_options.txt |   2 +
 qapi/block-core.json  |  18 +-
 meson.build   |   9 +
 block/blkio.c | 537 ++
 tests/qtest/modules-test.c|   3 +
 block/meson.build |   1 +
 scripts/meson-buildoptions.sh |   3 +
 8 files changed, 578 insertions(+), 1 deletion(-)
 create mode 100644 block/blkio.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4ad2451e03..d839301f68 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3349,6 +3349,12 @@ L: qemu-bl...@nongnu.org
 S: Maintained
 F: block/vdi.c
 
+blkio
+M: Stefan Hajnoczi 
+L: qemu-bl...@nongnu.org
+S: Maintained
+F: block/blkio.c
+
 iSCSI
 M: Ronnie Sahlberg 
 M: Paolo Bonzini 
diff --git a/meson_options.txt b/meson_options.txt
index 52b11cead4..1e82e770e7 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -101,6 +101,8 @@ option('bzip2', type : 'feature', value : 'auto',
description: 'bzip2 support for DMG images')
 option('cap_ng', type : 'feature', value : 'auto',
description: 'cap_ng support')
+option('blkio', type : 'feature', value : 'auto',
+   description: 'libblkio block device driver')
 option('bpf', type : 'feature', value : 'auto',
 description: 'eBPF support')
 option('cocoa', type : 'feature', value : 'auto',
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 4a7a6940a3..c04e1e325b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2924,7 +2924,9 @@
 'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
 {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
 {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
-'http', 'https', 'iscsi',
+'http', 'https',
+{ 'name': 'io_uring', 'if': 'CONFIG_BLKIO' },
+'iscsi',
 'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
 { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
@@ -3656,6 +3658,18 @@
 '*debug': 'int',
 '*logfile': 'str' } }
 
+##
+# @BlockdevOptionsIoUring:
+#
+# Driver specific block device options for the io_uring backend.
+#
+# @filename: path to the image file
+#
+# Since: 6.3
+##
+{ 'struct': 'BlockdevOptionsIoUring',
+  'data': { 'filename': 'str' } }
+
 ##
 # @IscsiTransport:
 #
@@ -4254,6 +4268,8 @@
'if': 'HAVE_HOST_BLOCK_DEVICE' },
   'http':   'BlockdevOptionsCurlHttp',
   'https':  'BlockdevOptionsCurlHttps',
+  'io_uring':   { 'type': 'BlockdevOptionsIoUring',
+  'if': 'CONFIG_BLKIO' },
   'iscsi':  'BlockdevOptionsIscsi',
   'luks':   'BlockdevOptionsLUKS',
   'nbd':'BlockdevOptionsNbd',
diff --git a/meson.build b/meson.build
index 861de93c4f..0ab17c8767 100644
--- a/meson.build
+++ b/meson.build
@@ -636,6 +636,13 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
  required: get_option('virglrenderer'),
  kwargs: static_kwargs)
 endif
+blkio = not_found
+if not get_option('blkio').auto() or have_block
+  blkio = dependency('blkio',
+ method: 'pkg-config',
+ required: get_option('blkio'),
+ kwargs: static_kwargs)
+endif
 curl = not_found
 if not get_option('curl').auto() or have_block
   curl = dependency('libcurl', version: '>=7.29.0',
@@ -1519,6 +1526,7 @@ config_host_data.set('CONFIG_LIBUDEV', libudev.found())
 config_host_data.set('CONFIG_LZO', lzo.found())
 config_host_data.set('CONFIG_MPATH', mpathpersist.found())
 config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api)
+config_host_data.set('CONFIG_BLKIO', blkio.found())
 config_host_data.set('CONFIG_CURL', curl.found())
 config_host_data.set('CONFIG_CURSES', curses.found())
 config_host_data.set('CONFIG_GBM', gbm.found())
@@ -3672,6 +3680,7 @@ summary_info += {'PAM':   pam}
 summary_info += {'iconv support': iconv}
 

[RFC v2 0/8] blkio: add libblkio BlockDriver

2022-04-05 Thread Stefan Hajnoczi
v2:
- Add BDRV_REQ_REGISTERED_BUF to bs.supported_write_flags [Stefano]
- Use new blkioq_get_num_completions() API
- Implement .bdrv_refresh_limits()

This patch series adds a QEMU BlockDriver for libblkio
(https://gitlab.com/libblkio/libblkio/), a library for high-performance block
device I/O. Currently libblkio has basic io_uring support with additional
drivers in development.

The first patch adds the core BlockDriver and most of the libblkio API usage.
The remainder of the patch series reworks the existing QEMU bdrv_register_buf()
API so virtio-blk emulation efficiently map guest RAM for libblkio - some
libblkio drivers require that I/O buffer memory is pre-registered (think VFIO,
vhost, etc).

This block driver is functional enough to boot guests. See the BlockDriver
struct in block/blkio.c for a list of APIs that still need to be implemented
(write_zeroes and discard are in development, the others are not). I'm also
waiting for libblkio to define queuing behavior and iovec lifetime requirements
before sending this as a non-RFC patch.

Regarding the design: each libblkio driver is a separately named BlockDriver.
That means there is an "io_uring" BlockDriver and not a generic "libblkio"
BlockDriver. In the future there will be additional BlockDrivers, all defined
in block/blkio.c. This way QAPI and open parameters are type-safe and mandatory
parameters can be checked by QEMU.

Stefan Hajnoczi (8):
  blkio: add io_uring block driver using libblkio
  numa: call ->ram_block_removed() in ram_block_notifer_remove()
  block: pass size to bdrv_unregister_buf()
  block: add BDRV_REQ_REGISTERED_BUF request flag
  block: add BlockRAMRegistrar
  stubs: add memory_region_from_host() and memory_region_get_fd()
  blkio: implement BDRV_REQ_REGISTERED_BUF optimization
  virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

 MAINTAINERS |   7 +
 meson_options.txt   |   2 +
 qapi/block-core.json|  18 +-
 meson.build |   9 +
 include/block/block-common.h|   9 +
 include/block/block-global-state.h  |   5 +-
 include/block/block_int-common.h|   2 +-
 include/hw/virtio/virtio-blk.h  |   2 +
 include/sysemu/block-backend-global-state.h |   2 +-
 include/sysemu/block-ram-registrar.h|  30 +
 block/blkio.c   | 633 
 block/blkverify.c   |   4 +-
 block/block-backend.c   |   4 +-
 block/block-ram-registrar.c |  39 ++
 block/crypto.c  |   2 +
 block/io.c  |  36 +-
 block/mirror.c  |   2 +
 block/nvme.c|   2 +-
 block/raw-format.c  |   2 +
 hw/block/virtio-blk.c   |  13 +-
 hw/core/numa.c  |  17 +
 qemu-img.c  |   4 +-
 stubs/memory.c  |  13 +
 tests/qtest/modules-test.c  |   3 +
 util/vfio-helpers.c |   5 +-
 block/meson.build   |   2 +
 scripts/meson-buildoptions.sh   |   3 +
 stubs/meson.build   |   1 +
 28 files changed, 845 insertions(+), 26 deletions(-)
 create mode 100644 include/sysemu/block-ram-registrar.h
 create mode 100644 block/blkio.c
 create mode 100644 block/block-ram-registrar.c
 create mode 100644 stubs/memory.c

-- 
2.35.1





Re: [qemu.qmp PATCH 02/13] fork qemu.qmp from qemu.git

2022-04-05 Thread John Snow
On Tue, Apr 5, 2022, 4:51 AM Kashyap Chamarthy  wrote:

> On Mon, Apr 04, 2022 at 02:56:10PM -0400, John Snow wrote:
> > On Mon, Apr 4, 2022 at 2:54 PM John Snow  wrote:
>
> [...]
>
> > > > >  .gitignore |  2 +-
> > > > >  Makefile   | 16 
> > > > >  setup.cfg  | 24 +---
> > > > >  setup.py   |  2 +-
> > > > >  4 files changed, 11 insertions(+), 33 deletions(-)
> > > >
> > > > The changes here look fine to me (and thanks for making it a "micro
> > > > change").  I'll let sharper eyes than mine to give a closer look at
> the
> > > > `git filter-repo` surgery.  Although, that looks fine to me too.
> > > >
> > > > [...]
> > > >
> > > > >  .PHONY: distclean
> > > > >  distclean: clean
> > > > > - rm -rf qemu.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/
> > > > > + rm -rf qemu.qmp.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/
> > > > >   rm -f .coverage .coverage.*
> > > > >   rm -rf htmlcov/
> > > > > diff --git a/setup.cfg b/setup.cfg
> > > > > index e877ea5..4ffab73 100644
> > > > > --- a/setup.cfg
> > > > > +++ b/setup.cfg
> > > > > @@ -1,5 +1,5 @@
> > > > >  [metadata]
> > > > > -name = qemu
> > > > > +name = qemu.qmp
> > > > >  version = file:VERSION
> > > > >  maintainer = QEMU Developer Team
> > > >
> > > > In the spirit of patch 04 ("update maintainer metadata"), do you also
> > > > want to update here too? s/QEMU Developer Team/QEMU Project?
> > > >
> > >
> > > Good spot.
> >
> > ...Or, uh. That's exactly what I update in patch 04. Are you asking me
> > to fold in that change earlier? I'm confused now.
>
> Oops, perils of reviewing late in the day.  I missed to notice it's the
> same file.  You're right; please ignore my remark.  Sorry for the noise.
>

I made the same mistake upon reading the feedback, so we're both guilty 

Thanks Kashyap, I appreciate the review.

There's three more series here to apply to the new forked package (not yet
re-sent to the ML):

(2) Adding GitLab CI configuration. Not relevant for you, probably.

(3) Adding Sphinx documentation. This builds jsnow.gitlab.io/qemu.qmp/ -
I'd be appreciative of your feedback on this. I'm interested both in
proofreading and in design feedback here. All comments welcome.

[Though more rigorous changes to the design might be a "later" thing, but
the feedback is welcomed all the same.]

(4) Adding automatic package builds and git-based versioning to GitLab.
Maybe also not too relevant for you. 


>
>
> --
> /kashyap
>

Thanks for your time!


Re: [PULL 00/10] QAPI patches patches for 2022-04-05

2022-04-05 Thread Peter Maydell
On Tue, 5 Apr 2022 at 11:35, Markus Armbruster  wrote:
>
> I double-checked these patches affect *only* generated documentation.
> Safe enough for 7.0, I think.  But I'm quite content to hold on to
> them until after the release, if that's preferred.
>
> The following changes since commit 20661b75ea6093f5e59079d00a778a972d6732c5:
>
>   Merge tag 'pull-ppc-20220404' of https://github.com/legoater/qemu into 
> staging (2022-04-04 15:48:55 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/armbru.git tags/pull-qapi-2022-04-05
>
> for you to fetch changes up to 8230f3389c7d7215d0c3946d415f54b3e9c07f73:
>
>   qapi: Fix calc-dirty-rate example (2022-04-05 12:30:45 +0200)
>
> 
> QAPI patches patches for 2022-04-05
>
> 



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM



Re: [RFC PATCH] docs/devel: start documenting writing VirtIO devices

2022-04-05 Thread Cornelia Huck
On Wed, Mar 16 2022, Alex Bennée  wrote:

> Cornelia Huck  writes:
>
>> On Wed, Mar 09 2022, Alex Bennée  wrote:

>>> +Writing VirtIO backends for QEMU
>>> +
>>> +
>>> +This document attempts to outline the information a developer needs to
>>> +know to write backends for QEMU. It is specifically focused on
>>> +implementing VirtIO devices.
>>
>> I think you first need to define a bit more clearly what you consider a
>> "backend". For virtio, it is probably "everything a device needs to
>> function as a specific device type like net, block, etc., which may be
>> implemented by different methods" (as you describe further below).
>
> How about:
>
>   This document attempts to outline the information a developer needs to
>   know to write device emulations in QEMU. It is specifically focused on
>   implementing VirtIO devices. For VirtIO the frontend is the driver
>   running on the guest. The backend is the everything that QEMU needs to
>   do to handle the emulation of the VirtIO device. This can be done
>   entirely in QEMU, divided between QEMU and the kernel (vhost) or
>   handled by a separate process which is configured by QEMU
>   (vhost-user).

I'm afraid that confuses me even more :)

This sounds to me like frontend == driver (in virtio spec terminology)
and backend == device. Is that really what you meant?

>
>>
>>> +
>>> +Front End Transports
>>> +
>>> +
>>> +VirtIO supports a number of different front end transports. The
>>> +details of the device remain the same but there are differences in
>>> +command line for specifying the device (e.g. -device virtio-foo
>>> +and -device virtio-foo-pci). For example:
>>> +
>>> +.. code:: c
>>> +
>>> +  static const TypeInfo vhost_user_blk_info = {
>>> +  .name = TYPE_VHOST_USER_BLK,
>>> +  .parent = TYPE_VIRTIO_DEVICE,
>>> +  .instance_size = sizeof(VHostUserBlk),
>>> +  .instance_init = vhost_user_blk_instance_init,
>>> +  .class_init = vhost_user_blk_class_init,
>>> +  };
>>> +
>>> +defines ``TYPE_VHOST_USER_BLK`` as a child of the generic
>>> +``TYPE_VIRTIO_DEVICE``.
>>
>> That's not what I'd consider a "front end", though?
>
> Yeah clumsy wording. I'm trying to get find a good example to show how
> QOM can be used to abstract the core device operation and the wrappers
> for different transports. However in the code base there seems to be
> considerable variation about how this is done. Any advice as to the
> best exemplary device to follow is greatly welcomed.

I'm not sure which of the example we can really consider a "good"
device; the normal modus operandi when writing a new device seems to be
"pick the first device you can think of and copy whatever it
does". Personally, I usally look at blk or net, but those carry a lot of
legacy baggage; so maybe a modern virtio-1 only device like gpu? That
one also has the advantage of not being pci-only.

Does anyone else have a good suggestion here?

>
>>> And then for the PCI device it wraps around the
>>> +base device (although explicitly initialising via
>>> +virtio_instance_init_common):
>>> +
>>> +.. code:: c
>>> +
>>> +  struct VHostUserBlkPCI {
>>> +  VirtIOPCIProxy parent_obj;
>>> +  VHostUserBlk vdev;
>>> +  };
>>
>> The VirtIOPCIProxy seems to materialize a bit out of thin air
>> here... maybe the information simply needs to be structured in a
>> different way? Perhaps:
>>
>> - describe that virtio devices consist of a part that implements the
>>   device functionality, which ultimately derives from VirtIODevice (the
>>   "backend"), and a part that exposes a way for the operating system to
>>   discover and use the device (the "frontend", what the virtio spec
>>   calls a "transport")
>> - decribe how the "frontend" part works (maybe mention VirtIOPCIProxy,
>>   VirtIOMMIOProxy, and VirtioCcwDevice as specialized proxy devices for
>>   PCI, MMIO, and CCW devices)
>> - list the different types of "backends" (as you did below), and give
>>   two examples of how VirtIODevice is extended (a plain one, and a
>>   vhost-user one)
>> - explain how frontend and backend together create an actual device
>>   (with the two device examples, and maybe also with the plain one
>>   plugged as both PCI and CCW?); maybe also mention that MMIO is a bit
>>   different? (it always confuses me)
>
> OK I'll see how I can restructure things to make it clearer. Do we also
> have to take into account the object heirarchy for different types of
> device (i.e. block or net)? Or is that all plumbing into QEMUs
> sub-system internals done in the VirtIO device objects?

An example of how a device plugs into a bigger infrastructure like the
block layer might be helpful, but it also might complicate the
documentation (as you probably won't need to do anything like that if
you write a device that does not use any established infrastructure.)
Maybe just gloss over it for now?

>
>>> +
>>> +Back End Implementations
>>> +
>>> +
>>> 

Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Kevin Wolf
Am 05.04.2022 um 15:09 hat Emanuele Giuseppe Esposito geschrieben:
> Am 05/04/2022 um 12:14 schrieb Kevin Wolf:
> > I think all of this is really relevant for Emanuele's work, which
> > involves adding AIO_WAIT_WHILE() deep inside graph update functions. I
> > fully expect that we would see very similar problems, and just stacking
> > drain sections over drain sections that might happen to usually fix
> > things, but aren't guaranteed to, doesn't look like a good solution.
> 
> Yes, I think at this point we all agreed to drop subtree_drain as
> replacement for AioContext.
> 
> The alternative is what Paolo proposed in the other thread " Removal of
> AioContext lock, bs->parents and ->children: proof of concept"
> I am not sure which thread you replied first :)

This one, I think. :-)

> I think that proposal is not far from your idea, and it avoids to
> introduce or even use drains at all.
> Not sure why you called it a "step backwards even from AioContext locks".

I was only referring to the lock locality there. AioContext locks are
really coarse, but still a finer granularity than a single global lock.

In the big picture, it's still be better than the AioContext lock, but
that's because it's a different type of lock, not because it has better
locality.

So I was just wondering if we can't have the different type of lock and
make it local to the BDS, too.

Kevin




Re: [PATCH] ui/cursor: fix integer overflow in cursor_alloc (CVE-2022-4206)

2022-04-05 Thread Mauro Matteo Cascella
On Tue, Apr 5, 2022 at 1:10 PM Gerd Hoffmann  wrote:
>
> > > +++ b/ui/cursor.c
> > > @@ -46,6 +46,13 @@ static QEMUCursor *cursor_parse_xpm(const char *xpm[])
> > >
> > >  /* parse pixel data */
> > >  c = cursor_alloc(width, height);
> > > +
> > > +if (!c) {
> > > +fprintf(stderr, "%s: cursor %ux%u alloc error\n",
> > > +__func__, width, height);
> > > +return NULL;
> > > +}
> > >
> >
> > I think you could simply abort() in this function. It is used with static
> > data (ui/cursor*.xpm)
>
> Yes, that should never happen.
>
> Missing: vmsvga_cursor_define() calls cursor_alloc() with guest-supplied
> values too.

I skipped that because the check (cursor.width > 256 || cursor.height
> 256) is already done in vmsvga_fifo_run before calling
vmsvga_cursor_define. You want me to add another check in
vmsvga_cursor_define and return NULL if cursor_alloc fails?

> take care,
>   Gerd
>


--
Mauro Matteo Cascella
Red Hat Product Security
PGP-Key ID: BB3410B0




[PATCH v1] configure: judge build dir permission

2022-04-05 Thread Guo Zhi
If this patch is applied, issue:

https://gitlab.com/qemu-project/qemu/-/issues/321

can be closed.

Signed-off-by: Guo Zhi 
---
 configure | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 7c08c18358..9cfa78efd2 100755
--- a/configure
+++ b/configure
@@ -24,7 +24,13 @@ then
 then
 if test -f $MARKER
 then
-   rm -rf build
+if test -w $MARKER
+then
+rm -rf build
+else
+echo "ERROR: ./build dir already exists and can not be removed 
due to permission"
+exit 1
+fi
 else
 echo "ERROR: ./build dir already exists and was not previously 
created by configure"
 exit 1
-- 
2.35.1




Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Kevin Wolf
Am 05.04.2022 um 14:12 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Thanks Kevin! I have already run out of arguments in the battle
> against using subtree-drains to isolate graph modification operations
> from each other in different threads in the mailing list)
> 
> (Note also, that the top-most version of this patch is "[PATCH v2]
> block/stream: Drain subtree around graph change")

Oops, I completely missed the v2. Thanks!

> About avoiding polling during graph-modifying operations, there is a
> problem: some IO operations are involved into block-graph modifying
> operations. At least it's rewriting "backing_file_offset" and
> "backing_file_size" fields in qcow2 header.
> 
> We can't just separate rewriting metadata from graph modifying
> operation: this way another graph-modifying operation may interleave
> and we'll write outdated metadata.

Hm, generally we don't update image metadata when we reconfigure the
graph. Most changes are temporary (like insertion of filter nodes) and
the image header only contains a "default configuration" to be used on
the next start.

There are only a few places that update the image header; I think it's
generally block job completions. They obviously update the in-memory
graph, too, but they don't write to the image file (and therefore
potentially poll) in the middle of updating the in-memory graph, but
they do both in separate steps.

I think this is okay. We must just avoid polling in the middle of graph
updates because if something else changes the graph there, it's not
clear any more that we're really doing what the caller had in mind.

> So I still think, we need a kind of global lock for graph modifying
> operations. Or a kind per-BDS locks as you propose. But in this case
> we need to be sure that taking all needed per-BDS locks we'll avoid
> deadlocking.

I guess this depends on the exact granularity of the locks we're using.
If you take the lock only while updating a single edge, I don't think
you could easily deadlock. If you hold it for more complex operations,
it becomes harder to tell without checking the code.

Kevin




[PATCH v1] hw/ppc: change indentation to spaces from TABs

2022-04-05 Thread Guo Zhi
There are still some files in the QEMU PPC code base that use TABs for 
indentation instead of using  spaces.
The TABs should be replaced so that we have a consistent coding style.

If this patch is applied, issue:

https://gitlab.com/qemu-project/qemu/-/issues/374

can be closed.

Signed-off-by: Guo Zhi 
---
 hw/core/uboot_image.h  | 185 -
 hw/ppc/ppc440_bamboo.c |   6 +-
 hw/ppc/spapr_rtas.c|  18 ++--
 include/hw/ppc/ppc.h   |  10 +--
 4 files changed, 109 insertions(+), 110 deletions(-)

diff --git a/hw/core/uboot_image.h b/hw/core/uboot_image.h
index 608022de6e..980e9cc014 100644
--- a/hw/core/uboot_image.h
+++ b/hw/core/uboot_image.h
@@ -12,7 +12,7 @@
  *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU General Public License along
@@ -32,128 +32,127 @@
 /*
  * Operating System Codes
  */
-#define IH_OS_INVALID  0   /* Invalid OS   */
-#define IH_OS_OPENBSD  1   /* OpenBSD  */
-#define IH_OS_NETBSD   2   /* NetBSD   */
-#define IH_OS_FREEBSD  3   /* FreeBSD  */
-#define IH_OS_4_4BSD   4   /* 4.4BSD   */
-#define IH_OS_LINUX5   /* Linux*/
-#define IH_OS_SVR4 6   /* SVR4 */
-#define IH_OS_ESIX 7   /* Esix */
-#define IH_OS_SOLARIS  8   /* Solaris  */
-#define IH_OS_IRIX 9   /* Irix */
-#define IH_OS_SCO  10  /* SCO  */
-#define IH_OS_DELL 11  /* Dell */
-#define IH_OS_NCR  12  /* NCR  */
-#define IH_OS_LYNXOS   13  /* LynxOS   */
-#define IH_OS_VXWORKS  14  /* VxWorks  */
-#define IH_OS_PSOS 15  /* pSOS */
-#define IH_OS_QNX  16  /* QNX  */
-#define IH_OS_U_BOOT   17  /* Firmware */
-#define IH_OS_RTEMS18  /* RTEMS*/
-#define IH_OS_ARTOS19  /* ARTOS*/
-#define IH_OS_UNITY20  /* Unity OS */
+#define IH_OS_INVALID 0 /* Invalid OS */
+#define IH_OS_OPENBSD 1 /* OpenBSD */
+#define IH_OS_NETBSD  2 /* NetBSD */
+#define IH_OS_FREEBSD 3 /* FreeBSD */
+#define IH_OS_4_4BSD  4 /* 4.4BSD */
+#define IH_OS_LINUX   5 /* Linux */
+#define IH_OS_SVR46 /* SVR4 */
+#define IH_OS_ESIX7 /* Esix */
+#define IH_OS_SOLARIS 8 /* Solaris */
+#define IH_OS_IRIX9 /* Irix */
+#define IH_OS_SCO 10 /* SCO */
+#define IH_OS_DELL11 /* Dell */
+#define IH_OS_NCR 12 /* NCR */
+#define IH_OS_LYNXOS  13 /* LynxOS */
+#define IH_OS_VXWORKS 14 /* VxWorks */
+#define IH_OS_PSOS15 /* pSOS */
+#define IH_OS_QNX 16 /* QNX */
+#define IH_OS_U_BOOT  17 /* Firmware */
+#define IH_OS_RTEMS   18 /* RTEMS */
+#define IH_OS_ARTOS   19 /* ARTOS */
+#define IH_OS_UNITY   20 /* Unity OS */
 
 /*
  * CPU Architecture Codes (supported by Linux)
  */
-#define IH_CPU_INVALID 0   /* Invalid CPU  */
-#define IH_CPU_ALPHA   1   /* Alpha*/
-#define IH_CPU_ARM 2   /* ARM  */
-#define IH_CPU_I3863   /* Intel x86*/
-#define IH_CPU_IA644   /* IA64 */
-#define IH_CPU_MIPS5   /* MIPS */
-#define IH_CPU_MIPS64  6   /* MIPS  64 Bit */
-#define IH_CPU_PPC 7   /* PowerPC  */
-#define IH_CPU_S3908   /* IBM S390 */
-#define IH_CPU_SH  9   /* SuperH   */
-#define IH_CPU_SPARC   10  /* Sparc*/
-#define IH_CPU_SPARC64 11  /* Sparc 64 Bit */
-#define IH_CPU_M68K12  /* M68K */
-#define IH_CPU_NIOS13  /* Nios-32  */
-#define IH_CPU_MICROBLAZE  14  /* MicroBlaze   */
-#define IH_CPU_NIOS2   15  /* Nios-II  */
-#define IH_CPU_BLACKFIN16  /* Blackfin */
-#define IH_CPU_AVR32   17  /* AVR32*/
+#define IH_CPU_INVALID0 /* Invalid CPU */
+#define IH_CPU_ALPHA  1 /* Alpha */
+#define IH_CPU_ARM2 /* ARM */
+#define IH_CPU_I386   3 /* Intel x86 */
+#define IH_CPU_IA64   4 /* IA64 */
+#define IH_CPU_MIPS   5 /* MIPS */
+#define IH_CPU_MIPS64 6 /* MIPS  64 Bit */
+#define IH_CPU_PPC7 /* PowerPC */
+#define IH_CPU_S390   8 /* IBM S390 */
+#define IH_CPU_SH 9 /* SuperH */
+#define IH_CPU_SPARC  10 /* Sparc */
+#define IH_CPU_SPARC6411 /* Sparc 64 Bit */
+#define IH_CPU_M68K   12 /* M68K */
+#define IH_CPU_NIOS   13 /* Nios-32 */
+#define IH_CPU_MICROBLAZE 14 /* MicroBlaze   */

Re: [PATCH v3 3/5] tests/qtest/libqos: Skip hotplug tests if pci root bus is not hotpluggable

2022-04-05 Thread Alex Bennée


Eric Auger  writes:

> ARM does not not support hotplug on pcie.0. Add a flag on the bus
> which tells if devices can be hotplugged and skip hotplug tests
> if the bus cannot be hotplugged. This is a temporary solution to
> enable the other pci tests on aarch64.
>
> Signed-off-by: Eric Auger 
> Acked-by: Thomas Huth 

Reviewed-by: Alex Bennée 


-- 
Alex Bennée



Re: [RFC PATCH 1/1] kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory

2022-04-05 Thread Peter Maydell
On Tue, 5 Apr 2022 at 14:07, Daniel Henrique Barboza
 wrote:
>
> There is a lot of Valgrind warnings about conditional jump depending on
> unintialized values like this one (taken from a pSeries guest):
>
>  Conditional jump or move depends on uninitialised value(s)
> at 0xB011DC: kvmppc_enable_cap_large_decr (kvm.c:2544)
> by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523)
> by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921)
> by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73)
> (...)
>   Uninitialised value was created by a stack allocation
> at 0xB01150: kvmppc_enable_cap_large_decr (kvm.c:2538)
>
> In this case, the alleged unintialized value is the 'lpcr' variable that
> is written by kvm_get_one_reg() and then used in an if clause:
>
> int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable)
> {
> CPUState *cs = CPU(cpu);
> uint64_t lpcr;
>
> kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
> /* Do we need to modify the LPCR? */
> if (!!(lpcr & LPCR_LD) != !!enable) { < Valgrind warns here
> (...)
>
> A quick fix is to init the variable that kvm_get_one_reg() is going to
> write ('lpcr' in the example above). Another idea is to convince
> Valgrind that kvm_get_one_reg() inits the 'void *target' memory in case
> the ioctl() is successful. This will put some boilerplate in the
> function but it will bring benefit for its other callers.

Doesn't Valgrind have a way of modelling ioctls where it
knows what data is read and written ? In general
ioctl-using programs don't need to have special case
"I am running under valgrind" handling, so this seems to
me like valgrind is missing support for this particular ioctl.

More generally, how much use is running QEMU with KVM enabled
under valgrind anyway? Valgrind has no way of knowing about
writes to memory that the guest vCPUs do...

thanks
-- PMM



[PATCH] docs/ccid: convert to restructuredText

2022-04-05 Thread oxr463
From: Lucas Ramage 

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/527
Signed-off-by: Lucas Ramage 
---
 docs/ccid.txt| 182 ---
 docs/system/device-emulation.rst |   1 +
 docs/system/devices/ccid.rst | 171 +
 3 files changed, 172 insertions(+), 182 deletions(-)
 delete mode 100644 docs/ccid.txt
 create mode 100644 docs/system/devices/ccid.rst

diff --git a/docs/ccid.txt b/docs/ccid.txt
deleted file mode 100644
index 2b85b1bd42..00
--- a/docs/ccid.txt
+++ /dev/null
@@ -1,182 +0,0 @@
-QEMU CCID Device Documentation.
-
-Contents
-1. USB CCID device
-2. Building
-3. Using ccid-card-emulated with hardware
-4. Using ccid-card-emulated with certificates
-5. Using ccid-card-passthru with client side hardware
-6. Using ccid-card-passthru with client side certificates
-7. Passthrough protocol scenario
-8. libcacard
-
-1. USB CCID device
-
-The USB CCID device is a USB device implementing the CCID specification, which
-lets one connect smart card readers that implement the same spec. For more
-information see the specification:
-
- Universal Serial Bus
- Device Class: Smart Card
- CCID
- Specification for
- Integrated Circuit(s) Cards Interface Devices
- Revision 1.1
- April 22rd, 2005
-
-Smartcards are used for authentication, single sign on, decryption in
-public/private schemes and digital signatures. A smartcard reader on the client
-cannot be used on a guest with simple usb passthrough since it will then not be
-available on the client, possibly locking the computer when it is "removed". On
-the other hand this device can let you use the smartcard on both the client and
-the guest machine. It is also possible to have a completely virtual smart card
-reader and smart card (i.e. not backed by a physical device) using this device.
-
-2. Building
-
-The cryptographic functions and access to the physical card is done via the
-libcacard library, whose development package must be installed prior to
-building QEMU:
-
-In redhat/fedora:
-yum install libcacard-devel
-In ubuntu:
-apt-get install libcacard-dev
-
-Configuring and building:
-./configure --enable-smartcard && make
-
-
-3. Using ccid-card-emulated with hardware
-
-Assuming you have a working smartcard on the host with the current
-user, using libcacard, QEMU acts as another client using ccid-card-emulated:
-
-qemu -usb -device usb-ccid -device ccid-card-emulated
-
-
-4. Using ccid-card-emulated with certificates stored in files
-
-You must create the CA and card certificates. This is a one time process.
-We use NSS certificates:
-
-mkdir fake-smartcard
-cd fake-smartcard
-certutil -N -d sql:$PWD
-certutil -S -d sql:$PWD -s "CN=Fake Smart Card CA" -x -t TC,TC,TC -n 
fake-smartcard-ca
-certutil -S -d sql:$PWD -t ,, -s "CN=John Doe" -n id-cert -c 
fake-smartcard-ca
-certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (signing)" --nsCertType 
smime -n signing-cert -c fake-smartcard-ca
-certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (encryption)" --nsCertType 
sslClient -n encryption-cert -c fake-smartcard-ca
-
-Note: you must have exactly three certificates.
-
-You can use the emulated card type with the certificates backend:
-
-qemu -usb -device usb-ccid -device 
ccid-card-emulated,backend=certificates,db=sql:$PWD,cert1=id-cert,cert2=signing-cert,cert3=encryption-cert
-
-To use the certificates in the guest, export the CA certificate:
-
-certutil -L -r -d sql:$PWD -o fake-smartcard-ca.cer -n fake-smartcard-ca
-
-and import it in the guest:
-
-certutil -A -d /etc/pki/nssdb -i fake-smartcard-ca.cer -t TC,TC,TC -n 
fake-smartcard-ca
-
-In a Linux guest you can then use the CoolKey PKCS #11 module to access
-the card:
-
-certutil -d /etc/pki/nssdb -L -h all
-
-It will prompt you for the PIN (which is the password you assigned to the
-certificate database early on), and then show you all three certificates
-together with the manually imported CA cert:
-
-Certificate NicknameTrust Attributes
-fake-smartcard-ca   CT,C,C
-John Doe:CAC ID Certificate u,u,u
-John Doe:CAC Email Signature Certificateu,u,u
-John Doe:CAC Email Encryption Certificate   u,u,u
-
-If this does not happen, CoolKey is not installed or not registered with
-NSS.  Registration can be done from Firefox or the command line:
-
-modutil -dbdir /etc/pki/nssdb -add "CAC Module" -libfile 
/usr/lib64/pkcs11/libcoolkeypk11.so
-modutil -dbdir /etc/pki/nssdb -list
-
-
-5. Using ccid-card-passthru with client side hardware
-
-on the host specify the ccid-card-passthru device with a suitable chardev:
-
-qemu -chardev socket,server=on,host=0.0.0.0,port=2001,id=ccid,wait=off \
- -usb -device usb-ccid -device ccid-card-passthru,chardev=ccid
-
-on the client run vscclient, built when you built QEMU:
-
-vscclient  2001
-
-
-6. Using ccid-card-passthru with client 

Re: [PATCH 2/2] hw/xen/xen_pt: Resolve igd_passthrough_isa_bridge_create() indirection

2022-04-05 Thread Anthony PERARD via
On Sat, Mar 26, 2022 at 05:58:24PM +0100, Bernhard Beschow wrote:
> Now that igd_passthrough_isa_bridge_create() is implemented within the
> xen context it may use Xen* data types directly and become
> xen_igd_passthrough_isa_bridge_create(). This resolves an indirection.
> 
> Signed-off-by: Bernhard Beschow 

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



Re: [PATCH 1/2] hw/xen/xen_pt: Confine igd-passthrough-isa-bridge to XEN

2022-04-05 Thread Anthony PERARD via
On Sat, Mar 26, 2022 at 05:58:23PM +0100, Bernhard Beschow wrote:
> igd-passthrough-isa-bridge is only requested in xen_pt but was
> implemented in pc_piix.c. This caused xen_pt to dependend on i386/pc
> which is hereby resolved.
> 
> Signed-off-by: Bernhard Beschow 

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Kevin Wolf
Am 05.04.2022 um 13:47 hat Hanna Reitz geschrieben:
> On 05.04.22 12:14, Kevin Wolf wrote:
> > Am 24.03.2022 um 13:57 hat Hanna Reitz geschrieben:
> > > When the stream block job cuts out the nodes between top and base in
> > > stream_prepare(), it does not drain the subtree manually; it fetches the
> > > base node, and tries to insert it as the top node's backing node with
> > > bdrv_set_backing_hd().  bdrv_set_backing_hd() however will drain, and so
> > > the actual base node might change (because the base node is actually not
> > > part of the stream job) before the old base node passed to
> > > bdrv_set_backing_hd() is installed.
> > > 
> > > This has two implications:
> > > 
> > > First, the stream job does not keep a strong reference to the base node.
> > > Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g.
> > > because some other block job is drained to finish), we will get a
> > > use-after-free.  We should keep a strong reference to that node.
> > > 
> > > Second, even with such a strong reference, the problem remains that the
> > > base node might change before bdrv_set_backing_hd() actually runs and as
> > > a result the wrong base node is installed.
> > > 
> > > Both effects can be seen in 030's TestParallelOps.test_overlapping_5()
> > > case, which has five nodes, and simultaneously streams from the middle
> > > node to the top node, and commits the middle node down to the base node.
> > > As it is, this will sometimes crash, namely when we encounter the
> > > above-described use-after-free.
> > > 
> > > Taking a strong reference to the base node, we no longer get a crash,
> > > but the resuling block graph is less than ideal: The expected result is
> > > obviously that all middle nodes are cut out and the base node is the
> > > immediate backing child of the top node.  However, if stream_prepare()
> > > takes a strong reference to its base node (the middle node), and then
> > > the commit job finishes in bdrv_set_backing_hd(), supposedly dropping
> > > that middle node, the stream job will just reinstall it again.
> > > 
> > > Therefore, we need to keep the whole subtree drained in
> > > stream_prepare()
> > That doesn't sound right. I think in reality it's "if we take the really
> > big hammer and drain the whole subtree, then the bit that we really need
> > usually happens to be covered, too".
> > 
> > When you have a long backing chain and merge the two topmost overlays
> > with streaming, then it's none of the stream job's business whether
> > there is I/O going on for the base image way down the chain. Subtree
> > drains do much more than they should in this case.
> 
> Yes, see the discussion I had with Vladimir.  He convinced me that this
> can’t be an indefinite solution, but that we need locking for graph changes
> that’s separate from draining, because (1) those are different things, and
> (2) changing the graph should influence I/O as little as possible.
> 
> I found this the best solution to fix a known case of a use-after-free for
> 7.1, though.

I'm not arguing against a short-term band-aid solution (I assume you
mean for 7.0?) as long as we agree that this is what it is. The commit
message just sounded as if this were the right solution rather than a
hack, so I wanted to make the point.

> > At the same time they probably do too little, because what you're
> > describing you're protecting against is not I/O, but graph modifications
> > done by callbacks invoked in the AIO_WAIT_WHILE() when replacing the
> > backing file. The callback could be invoked by I/O on an entirely
> > different subgraph (maybe if the other thing is a mirror job)or it
> > could be a BH or anything else really. bdrv_drain_all() would increase
> > your chances, but I'm not sure if even that would be guaranteed to be
> > enough - because it's really another instance of abusing drain for
> > locking, we're not really interested in the _I/O_ of the node.
> 
> The most common instances of graph modification I see are QMP and block jobs
> finishing.  The former will not be deterred by draining, and we do know of
> one instance where that is a problem (see the bdrv_next() discussion). 
> Generally, it isn’t though.  (If it is, this case here won’t be the only
> thing that breaks.)

To be honest, I would be surprised if other things weren't broken if QMP
commands come in with unfortunate timing.

> As for the latter, most block jobs are parents of the nodes they touch
> (stream is one notable exception with how it handles its base, and changing
> that did indeed cause us headache before), and so will at least be paused
> when a drain occurs on a node they touch.  Since pausing doesn’t affect jobs
> that have exited their main loop, there might be some problem with
> concurrent jobs that are also finished but yielding, but I couldn’t find
> such a case.

True, the way that we implement drain in the block job actually means
that they fully pause and therefore can't complete even if they wouldn't

Re: [PATCH v3 2/5] tests/qtest/libqos/pci: Introduce pio_limit

2022-04-05 Thread Alex Bennée


Eric Auger  writes:

> At the moment the IO space limit is hardcoded to
> QPCI_PIO_LIMIT = 0x1. When accesses are performed to a bar,
> the base address of this latter is compared against the limit
> to decide whether we perform an IO or a memory access.
>
> On ARM, we cannot keep this PIO limit as the arm-virt machine
> uses [0x3eff, 0x3f00 ] for the IO space map and we
> are mandated to allocate at 0x0.
>
> Add a new flag in QPCIBar indicating whether it is an IO bar
> or a memory bar. This flag is set on QPCIBar allocation and
> provisionned based on the BAR configuration. Then the new flag
> is used in access functions and in iomap() function.
>
> Signed-off-by: Eric Auger 
> Reviewed-by: Thomas Huth 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée



Re: [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio (!working)

2022-04-05 Thread Alex Bennée


"Dr. David Alan Gilbert"  writes:

> * Alex Bennée (alex.ben...@linaro.org) wrote:
>> 
>> (expanding the CC list for help, anyone have a better idea about how
>> vhost-user qtests should work/see obvious issues with this patch?)
>
> How exactly does it fail?

➜  env QTEST_QEMU_BINARY=./qemu-system-aarch64 
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
G_TEST_DBUS_DAEMON=/home/alex/lsrc/qemu.git/tests/dbus-v
mstate-daemon.sh QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=137 
./tests/qtest/qos-test -p 
/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile
# random seed: R02S5d7667675b4f6dd3b8559f8db621296c
# starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-1245871.sock 
-qtest-log /dev/null -chardev socket,path=/tmp/qtest-1245871.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -machine none -accel qtest
# Start of aarch64 tests
# Start of virt tests
# Start of generic-pcihost tests
# Start of pci-bus-generic tests
# Start of pci-bus tests
# Start of vhost-user-gpio-pci tests
# Start of vhost-user-gpio tests
# Start of vhost-user-gpio-tests tests
# Start of read-guest-mem tests
# child process 
(/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
 [1245877])
 exit status: 1 (error)
# child process 
(/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
 [1245877]) stdout: ""
# child process 
(/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
 [1245877]) stderr: "qemu-system-aarch64: -device 
vhost-user-gpio-pci,id=gpio0,chardev=chr-vhost-user-test,vhostforce=on: 
Duplicate ID 'gpio0' for device\nsocket_accept failed: Resource temporarily 
unavailable\n**\nERROR:../../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake:
 assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)\n"
**
ERROR:../../tests/qtest/qos-test.c:189:subprocess_run_one_test: child process 
(/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
 [1245877]) failed unexpectedly
Bail out! ERROR:../../tests/qtest/qos-test.c:189:subprocess_run_one_test: child 
process 
(/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess
 [1245877]) failed unexpectedly
fish: “env QTEST_QEMU_BINARY=./qemu-sy…” terminated by signal SIGABRT
(Abort)

Although it would be nice if I could individually run qos-tests with all
the make machinery setting things up.


>
> DAve
>
>> Alex Bennée  writes:
>> 
>> > We don't have a virtio-gpio implementation in QEMU and only
>> > support a vhost-user backend. The QEMU side of the code is minimal so
>> > it should be enough to instantiate the device and pass some vhost-user
>> > messages over the control socket. To do this we hook into the existing
>> > vhost-user-test code and just add the bits required for gpio.
>> >
>> > Based-on: 20220118203833.316741-1-eric.au...@redhat.com
>> > Signed-off-by: Alex Bennée 
>> > Cc: Viresh Kumar 
>> > Cc: Paolo Bonzini 
>> >
>> > ---
>> >
>> > This goes as far as to add things to the QOS tree but so far it's
>> > failing to properly start QEMU with the chardev socket needed to
>> > communicate between the mock vhost-user daemon and QEMU itself.
>> > ---
>> >  tests/qtest/libqos/virtio-gpio.h | 34 +++
>> >  tests/qtest/libqos/virtio-gpio.c | 98 
>> >  tests/qtest/vhost-user-test.c| 34 +++
>> >  tests/qtest/libqos/meson.build   |  1 +
>> >  4 files changed, 167 insertions(+)
>> >  create mode 100644 tests/qtest/libqos/virtio-gpio.h
>> >  create mode 100644 tests/qtest/libqos/virtio-gpio.c
>> >
>> > diff --git a/tests/qtest/libqos/virtio-gpio.h 
>> > b/tests/qtest/libqos/virtio-gpio.h
>> > new file mode 100644
>> > index 00..abe6967ae9
>> > --- /dev/null
>> > +++ b/tests/qtest/libqos/virtio-gpio.h
>> > @@ -0,0 +1,34 @@
>> > +/*
>> > + * virtio-gpio structures
>> > + *
>> > + * Copyright (c) 2022 Linaro Ltd
>> > + *
>> > + * SPDX-License-Identifier: GPL-2.0-or-later
>> > + */
>> > +
>> > +#ifndef TESTS_LIBQOS_VIRTIO_GPIO_H
>> > +#define TESTS_LIBQOS_VIRTIO_GPIO_H
>> > +
>> > +#include "qgraph.h"
>> > +#include "virtio.h"
>> > +#include "virtio-pci.h"
>> > +
>> > +typedef struct QVhostUserGPIO QVhostUserGPIO;
>> > +typedef struct QVhostUserGPIOPCI QVhostUserGPIOPCI;
>> > +typedef struct QVhostUserGPIODevice QVhostUserGPIODevice;
>> > +
>> > +struct QVhostUserGPIO {
>> > +QVirtioDevice *vdev;
>> > +};
>> > +
>> > +struct QVhostUserGPIOPCI {
>> > +QVirtioPCIDevice pci_vdev;
>> > +QVhostUserGPIO gpio;
>> > +};
>> > +
>> > +struct 

[PATCH v3 2/3] iotests/108: Test new refcount rebuild algorithm

2022-04-05 Thread Hanna Reitz
One clear problem with how qcow2's refcount structure rebuild algorithm
used to be before "qcow2: Improve refcount structure rebuilding" was
that it is prone to failure for qcow2 images on block devices: There is
generally unused space after the actual image, and if that exceeds what
one refblock covers, the old algorithm would invariably write the
reftable past the block device's end, which cannot work.  The new
algorithm does not have this problem.

Test it with three tests:
(1) Create an image with more empty space at the end than what one
refblock covers, see whether rebuilding the refcount structures
results in a change in the image file length.  (It should not.)

(2) Leave precisely enough space somewhere at the beginning of the image
for the new reftable (and the refblock for that place), see whether
the new algorithm puts the reftable there.  (It should.)

(3) Test the original problem: Create (something like) a block device
with a fixed size, then create a qcow2 image in there, write some
data, and then have qemu-img check rebuild the refcount structures.
Before HEAD^, the reftable would have been written past the image
file end, i.e. outside of what the block device provides, which
cannot work.  HEAD^ should have fixed that.
("Something like a block device" means a loop device if we can use
one ("sudo -n losetup" works), or a FUSE block export with
growable=false otherwise.)

Reviewed-by: Eric Blake 
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/108 | 259 -
 tests/qemu-iotests/108.out |  81 
 2 files changed, 339 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
index 56339ab2c5..ed02b3267b 100755
--- a/tests/qemu-iotests/108
+++ b/tests/qemu-iotests/108
@@ -30,13 +30,20 @@ status=1# failure is the default!
 
 _cleanup()
 {
-   _cleanup_test_img
+_cleanup_test_img
+if [ -f "$TEST_DIR/qsd.pid" ]; then
+qsd_pid=$(cat "$TEST_DIR/qsd.pid")
+kill -KILL "$qsd_pid"
+fusermount -u "$TEST_DIR/fuse-export" &>/dev/null
+fi
+rm -f "$TEST_DIR/fuse-export"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # get standard environment, filters and checks
 . ./common.rc
 . ./common.filter
+. ./common.qemu
 
 # This tests qcow2-specific low-level functionality
 _supported_fmt qcow2
@@ -47,6 +54,22 @@ _supported_os Linux
 # files
 _unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' data_file
 
+# This test either needs sudo -n losetup or FUSE exports to work
+if sudo -n losetup &>/dev/null; then
+loopdev=true
+else
+loopdev=false
+
+# QSD --export fuse will either yield "Parameter 'id' is missing"
+# or "Invalid parameter 'fuse'", depending on whether there is
+# FUSE support or not.
+error=$($QSD --export fuse 2>&1)
+if [[ $error = *"Invalid parameter 'fuse'" ]]; then
+_notrun 'Passwordless sudo for losetup or FUSE support required, but' \
+'neither is available'
+fi
+fi
+
 echo
 echo '=== Repairing an image without any refcount table ==='
 echo
@@ -138,6 +161,240 @@ _make_test_img 64M
 poke_file "$TEST_IMG" $((0x10008)) "\xff\xff\xff\xff\xff\xff\x00\x00"
 _check_test_img -r all
 
+echo
+echo '=== Check rebuilt reftable location ==='
+
+# In an earlier version of the refcount rebuild algorithm, the
+# reftable was generally placed at the image end (unless something was
+# allocated in the area covered by the refblock right before the image
+# file end, then we would try to place the reftable in that refblock).
+# This was later changed so the reftable would be placed in the
+# earliest possible location.  Test this.
+
+echo
+echo '--- Does the image size increase? ---'
+echo
+
+# First test: Just create some image, write some data to it, and
+# resize it so there is free space at the end of the image (enough
+# that it spans at least one full refblock, which for cluster_size=512
+# images, spans 128k).  With the old algorithm, the reftable would
+# have then been placed at the end of the image file, but with the new
+# one, it will be put in that free space.
+# We want to check whether the size of the image file increases due to
+# rebuilding the refcount structures (it should not).
+
+_make_test_img -o 'cluster_size=512' 1M
+# Write something
+$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+# Add free space
+file_len=$(stat -c '%s' "$TEST_IMG")
+truncate -s $((file_len + 256 * 1024)) "$TEST_IMG"
+
+# Corrupt the image by saying the image header was not allocated
+rt_offset=$(peek_file_be "$TEST_IMG" 48 8)
+rb_offset=$(peek_file_be "$TEST_IMG" $rt_offset 8)
+poke_file "$TEST_IMG" $rb_offset "\x00\x00"
+
+# Check whether rebuilding the refcount structures increases the image
+# file size
+file_len=$(stat -c '%s' "$TEST_IMG")
+echo
+# The only leaks there can be are the old refcount structures that are
+# leaked during rebuilding, no need to clutter 

[PATCH v3 3/3] qcow2: Add errp to rebuild_refcount_structure()

2022-04-05 Thread Hanna Reitz
Instead of fprint()-ing error messages in rebuild_refcount_structure()
and its rebuild_refcounts_write_refblocks() helper, pass them through an
Error object to qcow2_check_refcounts() (which will then print it).

Suggested-by: Eric Blake 
Signed-off-by: Hanna Reitz 
---
 block/qcow2-refcount.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index c5669eaa51..ed0ecfaa89 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2465,7 +2465,8 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
 static int rebuild_refcounts_write_refblocks(
 BlockDriverState *bs, void **refcount_table, int64_t *nb_clusters,
 int64_t first_cluster, int64_t end_cluster,
-uint64_t **on_disk_reftable_ptr, uint32_t *on_disk_reftable_entries_ptr
+uint64_t **on_disk_reftable_ptr, uint32_t 
*on_disk_reftable_entries_ptr,
+Error **errp
 )
 {
 BDRVQcow2State *s = bs->opaque;
@@ -2516,8 +2517,8 @@ static int rebuild_refcounts_write_refblocks(
   nb_clusters,
   _free_cluster);
 if (refblock_offset < 0) {
-fprintf(stderr, "ERROR allocating refblock: %s\n",
-strerror(-refblock_offset));
+error_setg_errno(errp, -refblock_offset,
+ "ERROR allocating refblock");
 return refblock_offset;
 }
 
@@ -2539,6 +2540,7 @@ static int rebuild_refcounts_write_refblocks(
   on_disk_reftable_entries *
   REFTABLE_ENTRY_SIZE);
 if (!on_disk_reftable) {
+error_setg(errp, "ERROR allocating reftable memory");
 return -ENOMEM;
 }
 
@@ -2562,7 +2564,7 @@ static int rebuild_refcounts_write_refblocks(
 ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
 s->cluster_size, false);
 if (ret < 0) {
-fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
+error_setg_errno(errp, -ret, "ERROR writing refblock");
 return ret;
 }
 
@@ -2578,7 +2580,7 @@ static int rebuild_refcounts_write_refblocks(
 ret = bdrv_pwrite(bs->file, refblock_offset, on_disk_refblock,
   s->cluster_size);
 if (ret < 0) {
-fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
+error_setg_errno(errp, -ret, "ERROR writing refblock");
 return ret;
 }
 
@@ -2601,7 +2603,8 @@ static int rebuild_refcounts_write_refblocks(
 static int rebuild_refcount_structure(BlockDriverState *bs,
   BdrvCheckResult *res,
   void **refcount_table,
-  int64_t *nb_clusters)
+  int64_t *nb_clusters,
+  Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;
 int64_t reftable_offset = -1;
@@ -2652,7 +2655,7 @@ static int rebuild_refcount_structure(BlockDriverState 
*bs,
 rebuild_refcounts_write_refblocks(bs, refcount_table, nb_clusters,
   0, *nb_clusters,
   _disk_reftable,
-  _disk_reftable_entries);
+  _disk_reftable_entries, errp);
 if (reftable_size_changed < 0) {
 res->check_errors++;
 ret = reftable_size_changed;
@@ -2676,8 +2679,8 @@ static int rebuild_refcount_structure(BlockDriverState 
*bs,
   refcount_table, nb_clusters,
   _free_cluster);
 if (reftable_offset < 0) {
-fprintf(stderr, "ERROR allocating reftable: %s\n",
-strerror(-reftable_offset));
+error_setg_errno(errp, -reftable_offset,
+ "ERROR allocating reftable");
 res->check_errors++;
 ret = reftable_offset;
 goto fail;
@@ -2695,7 +2698,7 @@ static int rebuild_refcount_structure(BlockDriverState 
*bs,
   reftable_start_cluster,
   reftable_end_cluster,
   _disk_reftable,
-  _disk_reftable_entries);
+  _disk_reftable_entries, errp);
 if (reftable_size_changed < 0) {
 res->check_errors++;
 ret = reftable_size_changed;
@@ -2725,7 +2728,7 @@ static int rebuild_refcount_structure(BlockDriverState 
*bs,
 ret = 

[PATCH v3 1/3] qcow2: Improve refcount structure rebuilding

2022-04-05 Thread Hanna Reitz
When rebuilding the refcount structures (when qemu-img check -r found
errors with refcount = 0, but reference count > 0), the new refcount
table defaults to being put at the image file end[1].  There is no good
reason for that except that it means we will not have to rewrite any
refblocks we already wrote to disk.

Changing the code to rewrite those refblocks is not too difficult,
though, so let us do that.  That is beneficial for images on block
devices, where we cannot really write beyond the end of the image file.

Use this opportunity to add extensive comments to the code, and refactor
it a bit, getting rid of the backwards-jumping goto.

[1] Unless there is something allocated in the area pointed to by the
last refblock, so we have to write that refblock.  In that case, we
try to put the reftable in there.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
Closes: https://gitlab.com/qemu-project/qemu/-/issues/941
Reviewed-by: Eric Blake 
Signed-off-by: Hanna Reitz 
---
 block/qcow2-refcount.c | 332 +
 1 file changed, 235 insertions(+), 97 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b91499410c..c5669eaa51 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2438,111 +2438,140 @@ static int64_t alloc_clusters_imrt(BlockDriverState 
*bs,
 }
 
 /*
- * Creates a new refcount structure based solely on the in-memory information
- * given through *refcount_table. All necessary allocations will be reflected
- * in that array.
+ * Helper function for rebuild_refcount_structure().
  *
- * On success, the old refcount structure is leaked (it will be covered by the
- * new refcount structure).
+ * Scan the range of clusters [first_cluster, end_cluster) for allocated
+ * clusters and write all corresponding refblocks to disk.  The refblock
+ * and allocation data is taken from the in-memory refcount table
+ * *refcount_table[] (of size *nb_clusters), which is basically one big
+ * (unlimited size) refblock for the whole image.
+ *
+ * For these refblocks, clusters are allocated using said in-memory
+ * refcount table.  Care is taken that these allocations are reflected
+ * in the refblocks written to disk.
+ *
+ * The refblocks' offsets are written into a reftable, which is
+ * *on_disk_reftable_ptr[] (of size *on_disk_reftable_entries_ptr).  If
+ * that reftable is of insufficient size, it will be resized to fit.
+ * This reftable is not written to disk.
+ *
+ * (If *on_disk_reftable_ptr is not NULL, the entries within are assumed
+ * to point to existing valid refblocks that do not need to be allocated
+ * again.)
+ *
+ * Return whether the on-disk reftable array was resized (true/false),
+ * or -errno on error.
  */
-static int rebuild_refcount_structure(BlockDriverState *bs,
-  BdrvCheckResult *res,
-  void **refcount_table,
-  int64_t *nb_clusters)
+static int rebuild_refcounts_write_refblocks(
+BlockDriverState *bs, void **refcount_table, int64_t *nb_clusters,
+int64_t first_cluster, int64_t end_cluster,
+uint64_t **on_disk_reftable_ptr, uint32_t *on_disk_reftable_entries_ptr
+)
 {
 BDRVQcow2State *s = bs->opaque;
-int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
+int64_t cluster;
 int64_t refblock_offset, refblock_start, refblock_index;
-uint32_t reftable_size = 0;
-uint64_t *on_disk_reftable = NULL;
+int64_t first_free_cluster = 0;
+uint64_t *on_disk_reftable = *on_disk_reftable_ptr;
+uint32_t on_disk_reftable_entries = *on_disk_reftable_entries_ptr;
 void *on_disk_refblock;
-int ret = 0;
-struct {
-uint64_t reftable_offset;
-uint32_t reftable_clusters;
-} QEMU_PACKED reftable_offset_and_clusters;
-
-qcow2_cache_empty(bs, s->refcount_block_cache);
+bool reftable_grown = false;
+int ret;
 
-write_refblocks:
-for (; cluster < *nb_clusters; cluster++) {
+for (cluster = first_cluster; cluster < end_cluster; cluster++) {
+/* Check all clusters to find refblocks that contain non-zero entries 
*/
 if (!s->get_refcount(*refcount_table, cluster)) {
 continue;
 }
 
+/*
+ * This cluster is allocated, so we need to create a refblock
+ * for it.  The data we will write to disk is just the
+ * respective slice from *refcount_table, so it will contain
+ * accurate refcounts for all clusters belonging to this
+ * refblock.  After we have written it, we will therefore skip
+ * all remaining clusters in this refblock.
+ */
+
 refblock_index = cluster >> s->refcount_block_bits;
 refblock_start = refblock_index << s->refcount_block_bits;
 
-/* Don't allocate a cluster in a refblock already written to disk */
-if (first_free_cluster < refblock_start) 

[PATCH v3 0/3] qcow2: Improve refcount structure rebuilding

2022-04-05 Thread Hanna Reitz
Hi,

v2 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2022-03/msg01260.html

v1 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2021-03/msg00651.html

This series fixes the qcow2 refcount structure rebuilding mechanism for
when the qcow2 image file doesn’t allow writes beyond the end of file
(e.g. because it’s on an LVM block device).

v3:
- Added patch 3 (didn’t squash this into patch 1, because (a) Eric gave
  his R-b on 1 as-is, and (b) I ended up retouching
  rebuild_refcount_structure() as a whole, not just the new helper, so a
  dedicated patch made more sense)
- In patch 1: Changed `assert(reftable_size_changed == true)` to just
  `assert(reftable_size_changed)`
- In patch 2: In comments, replaced “were” by “was”


git-backport-diff against v2:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/3:[0002] [FC] 'qcow2: Improve refcount structure rebuilding'
002/3:[0006] [FC] 'iotests/108: Test new refcount rebuild algorithm'
003/3:[down] 'qcow2: Add errp to rebuild_refcount_structure()'


Hanna Reitz (3):
  qcow2: Improve refcount structure rebuilding
  iotests/108: Test new refcount rebuild algorithm
  qcow2: Add errp to rebuild_refcount_structure()

 block/qcow2-refcount.c | 353 ++---
 tests/qemu-iotests/108 | 259 ++-
 tests/qemu-iotests/108.out |  81 +
 3 files changed, 587 insertions(+), 106 deletions(-)

-- 
2.35.1




Re: [PATCH v9 27/45] hw/cxl/host: Add support for CXL Fixed Memory Windows.

2022-04-05 Thread Markus Armbruster
Jonathan Cameron  writes:

> From: Jonathan Cameron 
>
> The concept of these is introduced in [1] in terms of the
> description the CEDT ACPI table. The principal is more general.
> Unlike once traffic hits the CXL root bridges, the host system
> memory address routing is implementation defined and effectively
> static once observable by standard / generic system software.
> Each CXL Fixed Memory Windows (CFMW) is a region of PA space
> which has fixed system dependent routing configured so that
> accesses can be routed to the CXL devices below a set of target
> root bridges. The accesses may be interleaved across multiple
> root bridges.
>
> For QEMU we could have fully specified these regions in terms
> of a base PA + size, but as the absolute address does not matter
> it is simpler to let individual platforms place the memory regions.
>
> ExampleS:
> -cxl-fixed-memory-window targets.0=cxl.0,size=128G
> -cxl-fixed-memory-window targets.0=cxl.1,size=128G
> -cxl-fixed-memory-window 
> targets.0=cxl0,targets.1=cxl.1,size=256G,interleave-granularity=2k
>
> Specifies
> * 2x 128G regions not interleaved across root bridges, one for each of
>   the root bridges with ids cxl.0 and cxl.1
> * 256G region interleaved across root bridges with ids cxl.0 and cxl.1
> with a 2k interleave granularity.
>
> When system software enumerates the devices below a given root bridge
> it can then decide which CFMW to use. If non interleave is desired
> (or possible) it can use the appropriate CFMW for the root bridge in
> question.  If there are suitable devices to interleave across the
> two root bridges then it may use the 3rd CFMS.
>
> A number of other designs were considered but the following constraints
> made it hard to adapt existing QEMU approaches to this particular problem.
> 1) The size must be known before a specific architecture / board brings
>up it's PA memory map.  We need to set up an appropriate region.
> 2) Using links to the host bridges provides a clean command line interface
>but these links cannot be established until command line devices have
>been added.
>
> Hence the two step process used here of first establishing the size,
> interleave-ways and granularity + caching the ids of the host bridges
> and then, once available finding the actual host bridges so they can
> be used later to support interleave decoding.
>
> [1] CXL 2.0 ECN: CEDT CFMWS & QTG DSM (computeexpresslink.org / 
> specifications)
>
> Signed-off-by: Jonathan Cameron 

QAPI schema
Acked-by: Markus Armbruster 




Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Emanuele Giuseppe Esposito



Am 05/04/2022 um 12:14 schrieb Kevin Wolf:
> I think all of this is really relevant for Emanuele's work, which
> involves adding AIO_WAIT_WHILE() deep inside graph update functions. I
> fully expect that we would see very similar problems, and just stacking
> drain sections over drain sections that might happen to usually fix
> things, but aren't guaranteed to, doesn't look like a good solution.

Yes, I think at this point we all agreed to drop subtree_drain as
replacement for AioContext.

The alternative is what Paolo proposed in the other thread " Removal of
AioContext lock, bs->parents and ->children: proof of concept"
I am not sure which thread you replied first :)

I think that proposal is not far from your idea, and it avoids to
introduce or even use drains at all.
Not sure why you called it a "step backwards even from AioContext locks".

Emanuele




[RFC PATCH 1/1] kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory

2022-04-05 Thread Daniel Henrique Barboza
There is a lot of Valgrind warnings about conditional jump depending on
unintialized values like this one (taken from a pSeries guest):

 Conditional jump or move depends on uninitialised value(s)
at 0xB011DC: kvmppc_enable_cap_large_decr (kvm.c:2544)
by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523)
by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921)
by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73)
(...)
  Uninitialised value was created by a stack allocation
at 0xB01150: kvmppc_enable_cap_large_decr (kvm.c:2538)

In this case, the alleged unintialized value is the 'lpcr' variable that
is written by kvm_get_one_reg() and then used in an if clause:

int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable)
{
CPUState *cs = CPU(cpu);
uint64_t lpcr;

kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
/* Do we need to modify the LPCR? */
if (!!(lpcr & LPCR_LD) != !!enable) { < Valgrind warns here
(...)

A quick fix is to init the variable that kvm_get_one_reg() is going to
write ('lpcr' in the example above). Another idea is to convince
Valgrind that kvm_get_one_reg() inits the 'void *target' memory in case
the ioctl() is successful. This will put some boilerplate in the
function but it will bring benefit for its other callers.

This patch uses the memcheck VALGRING_MAKE_MEM_DEFINED() to mark the
'target' variable as initialized if the ioctl is successful.

Cc: Paolo Bonzini 
Signed-off-by: Daniel Henrique Barboza 
---
 accel/kvm/kvm-all.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 5f1377ca04..d9acba23c7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -53,6 +53,10 @@
 #include 
 #endif
 
+#ifdef CONFIG_VALGRIND_H
+#include 
+#endif
+
 /* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
  * need to use the real host PAGE_SIZE, as that's what KVM will use.
  */
@@ -3504,6 +3508,19 @@ int kvm_get_one_reg(CPUState *cs, uint64_t id, void 
*target)
 if (r) {
 trace_kvm_failed_reg_get(id, strerror(-r));
 }
+
+#ifdef CONFIG_VALGRIND_H
+if (r == 0) {
+switch (id & KVM_REG_SIZE_MASK) {
+case KVM_REG_SIZE_U32:
+VALGRIND_MAKE_MEM_DEFINED(target, sizeof(uint32_t));
+break;
+case KVM_REG_SIZE_U64:
+VALGRIND_MAKE_MEM_DEFINED(target, sizeof(uint64_t));
+break;
+}
+}
+#endif
 return r;
 }
 
-- 
2.35.1




[RFC PATCH 0/1] add Valgrind hint in kvm_get_one_reg()

2022-04-05 Thread Daniel Henrique Barboza
Hi,

Valgrind is not happy with how we're using KVM functions that receives a
parameter via reference and write them. This results in a lot of
complaints about uninitialized values when using these functions
because, as default, Valgrind doesn't know that the variable is being
initialized in the function.

This is the overall pattern that Valgrind does not like:

---
uint64_t val;
(...)
kvm_get_one_reg(, );

if (val) {...}
---

Valgrind complains that the 'if' clause is using an uninitialized
variable.

A quick fix is to init 'val' and be done with it. The drawback is that
every single caller of kvm_get_one_reg() must also be bothered with
initializing these variables to avoid the warnings.

David suggested in [1] that, instead, we should add a Valgrind hint in
the common KVM functions to fix this issue for everyone. This is what
this patch accomplishes. kvm_get_one_reg() has 20+ callers so I believe
this extra boilerplate is worth the benefits.

There are more common instances of KVM functions that Valgrind complains
about. If we're good with the approach taken here we can think about
adding this hint for more functions.


[1] https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg07351.html

Daniel Henrique Barboza (1):
  kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory

 accel/kvm/kvm-all.c | 17 +
 1 file changed, 17 insertions(+)

-- 
2.35.1




Re: [PULL 0/2] target-arm queue

2022-04-05 Thread Peter Maydell
On Tue, 5 Apr 2022 at 10:26, Peter Maydell  wrote:
>
> Couple of trivial fixes for rc3...
>
> The following changes since commit 20661b75ea6093f5e59079d00a778a972d6732c5:
>
>   Merge tag 'pull-ppc-20220404' of https://github.com/legoater/qemu into 
> staging (2022-04-04 15:48:55 +0100)
>
> are available in the Git repository at:
>
>   https://git.linaro.org/people/pmaydell/qemu-arm.git 
> tags/pull-target-arm-20220405
>
> for you to fetch changes up to 80b952bb694a90f7e530d407b01066894e64a443:
>
>   docs/system/devices/can.rst: correct links to CTU CAN FD IP core 
> documentation. (2022-04-05 09:29:28 +0100)
>
> 
> target-arm queue:
>  * docs/system/devices/can.rst: correct links to CTU CAN FD IP core 
> documentation.
>  * xlnx-bbram: hw/nvram: Fix uninitialized Error *
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM



Re: [RFC PATCH] python: add qmp-send program to send raw qmp commands to qemu

2022-04-05 Thread Damien Hedde




On 4/5/22 07:41, Markus Armbruster wrote:

Daniel P. Berrangé  writes:


On Wed, Mar 16, 2022 at 10:54:55AM +0100, Damien Hedde wrote:

It takes an input file containing raw qmp commands (concatenated json
dicts) and send all commands one by one to a qmp server. When one
command fails, it exits.

As a convenience, it can also wrap the qemu process to avoid having
to start qemu in background. When wrapping qemu, the program returns
only when the qemu process terminates.

Signed-off-by: Damien Hedde 


[...]


I name that qmp-send as Daniel proposed, maybe qmp-test matches better
what I'm doing there ?


'qmp-test' is a use case specific name. I think it is better to
name it based on functionality provided rather than anticipated
use case, since use cases evolve over time, hence 'qmp-send'.


Well, it doesn't just send, it also receives.

qmpcat, like netcat and socat?



anyone against qmpcat ?
--
Damien



Re: [PATCH v2] hw/ppc/ppc405_boards: Initialize g_autofree pointer

2022-04-05 Thread Peter Maydell
On Tue, 5 Apr 2022 at 13:40, Bernhard Beschow  wrote:
>
> Resolves the only compiler warning when building a full QEMU under Arch Linux:
>
>   Compiling C object libqemu-ppc-softmmu.fa.p/hw_ppc_ppc405_boards.c.o
>   In file included from /usr/include/glib-2.0/glib.h:114,
>from qemu/include/glib-compat.h:32,
>from qemu/include/qemu/osdep.h:132,
>from ../src/hw/ppc/ppc405_boards.c:25:
>   ../src/hw/ppc/ppc405_boards.c: In function ‘ref405ep_init’:
>   /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: warning: ‘filename’ 
> may be used uninitialized in this function [-Wmaybe-uninitialized]
>  28 |   g_free (*pp);
> |   ^~~~
>   ../src/hw/ppc/ppc405_boards.c:265:26: note: ‘filename’ was declared here
> 265 | g_autofree char *filename;
> |  ^~~~
>
> Signed-off-by: Bernhard Beschow 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM



[PATCH v2] hw/ppc/ppc405_boards: Initialize g_autofree pointer

2022-04-05 Thread Bernhard Beschow
Resolves the only compiler warning when building a full QEMU under Arch Linux:

  Compiling C object libqemu-ppc-softmmu.fa.p/hw_ppc_ppc405_boards.c.o
  In file included from /usr/include/glib-2.0/glib.h:114,
   from qemu/include/glib-compat.h:32,
   from qemu/include/qemu/osdep.h:132,
   from ../src/hw/ppc/ppc405_boards.c:25:
  ../src/hw/ppc/ppc405_boards.c: In function ‘ref405ep_init’:
  /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: warning: ‘filename’ may 
be used uninitialized in this function [-Wmaybe-uninitialized]
 28 |   g_free (*pp);
|   ^~~~
  ../src/hw/ppc/ppc405_boards.c:265:26: note: ‘filename’ was declared here
265 | g_autofree char *filename;
|  ^~~~

Signed-off-by: Bernhard Beschow 
---
 hw/ppc/ppc405_boards.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 7e1a4ac955..3bed7002d2 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -262,13 +262,13 @@ static void ref405ep_init(MachineState *machine)
 /* allocate and load BIOS */
 if (machine->firmware) {
 MemoryRegion *bios = g_new(MemoryRegion, 1);
-g_autofree char *filename;
+g_autofree char *filename = qemu_find_file(QEMU_FILE_TYPE_BIOS,
+   machine->firmware);
 long bios_size;
 
 memory_region_init_rom(bios, NULL, "ef405ep.bios", BIOS_SIZE,
_fatal);
 
-filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, machine->firmware);
 if (!filename) {
 error_report("Could not find firmware '%s'", machine->firmware);
 exit(1);
-- 
2.35.1




Re: [PATCH] hw/ppc/ppc405_boards: Initialize g_autofree pointer

2022-04-05 Thread Bernhard Beschow
Am 5. April 2022 12:00:19 UTC schrieb Peter Maydell :
>On Tue, 5 Apr 2022 at 12:32, Bernhard Beschow  wrote:
>>
>> Resolves the only compiler warning when building a full QEMU under Arch 
>> Linux:
>>
>>   Compiling C object libqemu-ppc-softmmu.fa.p/hw_ppc_ppc405_boards.c.o
>>   In file included from /usr/include/glib-2.0/glib.h:114,
>>from qemu/include/glib-compat.h:32,
>>from qemu/include/qemu/osdep.h:132,
>>from ../src/hw/ppc/ppc405_boards.c:25:
>>   ../src/hw/ppc/ppc405_boards.c: In function ‘ref405ep_init’:
>>   /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: warning: ‘filename’ 
>> may be used uninitialized in this function [-Wmaybe-uninitialized]
>>  28 |   g_free (*pp);
>> |   ^~~~
>>   ../src/hw/ppc/ppc405_boards.c:265:26: note: ‘filename’ was declared here
>> 265 | g_autofree char *filename;
>> |  ^~~~
>>
>> Signed-off-by: Bernhard Beschow 
>> ---
>>  hw/ppc/ppc405_boards.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
>> index 7e1a4ac955..326353ea25 100644
>> --- a/hw/ppc/ppc405_boards.c
>> +++ b/hw/ppc/ppc405_boards.c
>> @@ -262,7 +262,7 @@ static void ref405ep_init(MachineState *machine)
>>  /* allocate and load BIOS */
>>  if (machine->firmware) {
>>  MemoryRegion *bios = g_new(MemoryRegion, 1);
>> -g_autofree char *filename;
>> +g_autofree char *filename = NULL;
>>  long bios_size;
>>
>>  memory_region_init_rom(bios, NULL, "ef405ep.bios", BIOS_SIZE,
>
>The compiler's wrong here, because there's no way to get to the free
>without passing through the actual initialization:

Yep. It breaks compilation with -Werror, though, which is useful for 
development.

>
>filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, machine->firmware);
>
>I think I would prefer a fix which hoisted that up to the declaration,
>rather than setting it to NULL and then unconditionally overwriting that
>(which some future compiler version might notice and warn about):
>
> g_autofree char *filename = qemu_find_file(QEMU_FILE_TYPE_BIOS,
>machine->firmware);

Ack - I prefer that solution and I'll submit v2.

I'm often confused as to when to use RAII in QEMU and when not to.

Best regards,
Bernhard

>
>thanks
>-- PMM




[PATCH] [PATCH RFC v3] Implements Backend Program conventions for vhost-user-scsi

2022-04-05 Thread Sakshi Kaushik
Signed-off-by: Sakshi Kaushik 
---
 contrib/vhost-user-scsi/vhost-user-scsi.c | 76 +++
 1 file changed, 51 insertions(+), 25 deletions(-)

diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c 
b/contrib/vhost-user-scsi/vhost-user-scsi.c
index 4f6e3e2a24..74ec44d190 100644
--- a/contrib/vhost-user-scsi/vhost-user-scsi.c
+++ b/contrib/vhost-user-scsi/vhost-user-scsi.c
@@ -351,34 +351,58 @@ fail:
 
 /** vhost-user-scsi **/
 
+int opt_fdnum = -1;
+char *opt_socket_path;
+gboolean opt_print_caps;
+char *iscsi_uri;
+
+static GOptionEntry entries[] = {
+{ "print-capabilities", 'c', 0, G_OPTION_ARG_NONE, _print_caps,
+  "Print capabilities", NULL },
+{ "fd", 'f', 0, G_OPTION_ARG_INT, _fdnum,
+  "Use inherited fd socket", "FDNUM" },
+{ "iscsi_uri", 'i', 0, G_OPTION_ARG_FILENAME, _uri,
+  "Use inherited fd socket", "FDNUM" },
+{ "socket-path", 's', 0, G_OPTION_ARG_FILENAME, _socket_path,
+  "Use UNIX socket path", "PATH" }
+};
+
 int main(int argc, char **argv)
 {
 VusDev *vdev_scsi = NULL;
-char *unix_fn = NULL;
-char *iscsi_uri = NULL;
-int lsock = -1, csock = -1, opt, err = EXIT_SUCCESS;
-
-while ((opt = getopt(argc, argv, "u:i:")) != -1) {
-switch (opt) {
-case 'h':
-goto help;
-case 'u':
-unix_fn = g_strdup(optarg);
-break;
-case 'i':
-iscsi_uri = g_strdup(optarg);
-break;
-default:
-goto help;
-}
+int lsock = -1, csock = -1, err = EXIT_SUCCESS;
+
+GError *error = NULL;
+GOptionContext *context;
+
+context = g_option_context_new(NULL);
+g_option_context_add_main_entries(context, entries, NULL);
+if (!g_option_context_parse(context, , , )) {
+g_printerr("Option parsing failed: %s\n", error->message);
+exit(EXIT_FAILURE);
 }
-if (!unix_fn || !iscsi_uri) {
+
+if (opt_print_caps) {
+g_print("{\n");
+g_print("  \"type\": \"scsi\",\n");
+g_print("}\n");
+goto out;
+}
+
+if (!opt_socket_path || !iscsi_uri) {
 goto help;
 }
 
-lsock = unix_sock_new(unix_fn);
-if (lsock < 0) {
-goto err;
+if (opt_socket_path) {
+lsock = unix_sock_new(opt_socket_path);
+if (lsock < 0) {
+exit(EXIT_FAILURE);
+}
+} else if (opt_fdnum < 0) {
+g_print("%s\n", g_option_context_get_help(context, true, NULL));
+exit(EXIT_FAILURE);
+} else {
+lsock = opt_fdnum;
 }
 
 csock = accept(lsock, NULL, NULL);
@@ -408,7 +432,7 @@ out:
 if (vdev_scsi) {
 g_main_loop_unref(vdev_scsi->loop);
 g_free(vdev_scsi);
-unlink(unix_fn);
+unlink(opt_socket_path);
 }
 if (csock >= 0) {
 close(csock);
@@ -416,7 +440,7 @@ out:
 if (lsock >= 0) {
 close(lsock);
 }
-g_free(unix_fn);
+g_free(opt_socket_path);
 g_free(iscsi_uri);
 
 return err;
@@ -426,10 +450,12 @@ err:
 goto out;
 
 help:
-fprintf(stderr, "Usage: %s [ -u unix_sock_path -i iscsi_uri ] | [ -h ]\n",
+fprintf(stderr, "Usage: %s [ -s socket-path -i iscsi_uri -f fd -p 
print-capabilities ] | [ -h ]\n",
 argv[0]);
-fprintf(stderr, "  -u path to unix socket\n");
+fprintf(stderr, "  -s path to unix socket\n");
 fprintf(stderr, "  -i iscsi uri for lun 0\n");
+fprintf(stderr, "  -f fd, file-descriptor\n");
+fprintf(stderr, "  -p denotes print-capabilities\n");
 fprintf(stderr, "  -h print help and quit\n");
 
 goto err;
-- 
2.17.1




Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Vladimir Sementsov-Ogievskiy

05.04.2022 13:14, Kevin Wolf wrote:

Am 24.03.2022 um 13:57 hat Hanna Reitz geschrieben:

When the stream block job cuts out the nodes between top and base in
stream_prepare(), it does not drain the subtree manually; it fetches the
base node, and tries to insert it as the top node's backing node with
bdrv_set_backing_hd().  bdrv_set_backing_hd() however will drain, and so
the actual base node might change (because the base node is actually not
part of the stream job) before the old base node passed to
bdrv_set_backing_hd() is installed.

This has two implications:

First, the stream job does not keep a strong reference to the base node.
Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g.
because some other block job is drained to finish), we will get a
use-after-free.  We should keep a strong reference to that node.

Second, even with such a strong reference, the problem remains that the
base node might change before bdrv_set_backing_hd() actually runs and as
a result the wrong base node is installed.

Both effects can be seen in 030's TestParallelOps.test_overlapping_5()
case, which has five nodes, and simultaneously streams from the middle
node to the top node, and commits the middle node down to the base node.
As it is, this will sometimes crash, namely when we encounter the
above-described use-after-free.

Taking a strong reference to the base node, we no longer get a crash,
but the resuling block graph is less than ideal: The expected result is
obviously that all middle nodes are cut out and the base node is the
immediate backing child of the top node.  However, if stream_prepare()
takes a strong reference to its base node (the middle node), and then
the commit job finishes in bdrv_set_backing_hd(), supposedly dropping
that middle node, the stream job will just reinstall it again.

Therefore, we need to keep the whole subtree drained in
stream_prepare()


That doesn't sound right. I think in reality it's "if we take the really
big hammer and drain the whole subtree, then the bit that we really need
usually happens to be covered, too".

When you have a long backing chain and merge the two topmost overlays
with streaming, then it's none of the stream job's business whether
there is I/O going on for the base image way down the chain. Subtree
drains do much more than they should in this case.

At the same time they probably do too little, because what you're
describing you're protecting against is not I/O, but graph modifications
done by callbacks invoked in the AIO_WAIT_WHILE() when replacing the
backing file. The callback could be invoked by I/O on an entirely
different subgraph (maybe if the other thing is a mirror job) or it
could be a BH or anything else really. bdrv_drain_all() would increase
your chances, but I'm not sure if even that would be guaranteed to be
enough - because it's really another instance of abusing drain for
locking, we're not really interested in the _I/O_ of the node.


so that the graph modification it performs is effectively atomic,
i.e. that the base node it fetches is still the base node when
bdrv_set_backing_hd() sets it as the top node's backing node.


I think the way to keep graph modifications atomic is avoid polling in
the middle. Not even running any callbacks is a lot safer than trying to
make sure there can't be undesired callbacks that want to run.

So probably adding drain (or anything else that polls) in
bdrv_set_backing_hd() was a bad idea. It could assert that the parent
node is drained, but it should be the caller's responsibility to do so.

What streaming completion should look like is probably something like
this:

 1. Drain above_base, this also drains all parents up to the top node
(needed because in-flight I/O using an edge that is removed isn't
going to end well)

 2. Without any polling involved:
 a. Find base (it can't change without polling)
 b. Update top->backing to point to base

 3. End of drain.

You don't have to keep extra references or deal with surprise removals
of nodes because the whole thing is atomic when you don't poll. Other
threads can't interfere either because graph modification requires the
BQL.

There is no reason to keep base drained because its I/O doesn't
interfere with the incoming edge that we're changing.

I think all of this is really relevant for Emanuele's work, which
involves adding AIO_WAIT_WHILE() deep inside graph update functions. I
fully expect that we would see very similar problems, and just stacking
drain sections over drain sections that might happen to usually fix
things, but aren't guaranteed to, doesn't look like a good solution.



Thanks Kevin! I have already run out of arguments in the battle against using 
subtree-drains to isolate graph modification operations from each other in 
different threads in the mailing list)

(Note also, that the top-most version of this patch is "[PATCH v2] block/stream: 
Drain subtree around graph change")


About 

Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Hanna Reitz

On 05.04.22 13:47, Hanna Reitz wrote:

On 05.04.22 12:14, Kevin Wolf wrote:


[...]


At the same time they probably do too little, because what you're
describing you're protecting against is not I/O, but graph modifications
done by callbacks invoked in the AIO_WAIT_WHILE() when replacing the
backing file. The callback could be invoked by I/O on an entirely
different subgraph (maybe if the other thing is a mirror job)or it
could be a BH or anything else really. bdrv_drain_all() would increase
your chances, but I'm not sure if even that would be guaranteed to be
enough - because it's really another instance of abusing drain for
locking, we're not really interested in the _I/O_ of the node.


[...]

I’m not sure what you’re arguing for, so I can only assume. Perhaps 
you’re arguing for reverting this patch, which I wouldn’t want to do, 
because at least it fixes the one known use-after-free case. Perhaps 
you’re arguing that we need something better, and then I completely agree.


Perhaps I should also note that what actually fixes the use-after-free 
is the bdrv_ref()/unref() pair.  The drained section is just there to 
ensure that the graph is actually correct (i.e. if a concurrently 
finishing job removes @base before the stream job’s 
bdrv_set_backing_hd() can set it as the top node’s backing node, that we 
won’t reinstate this @base that the other job just removed).  So even if 
this does too little, at least there won’t be a use-after-free.


OTOH, if it does much too much, we can drop the drain and keep the 
ref/unref.  I don’t want to have a release with a use-after-free that I 
know of, but I’d be fine if the block graph is “just” outdated.


Hanna




Re: [PATCH] hw/ppc/ppc405_boards: Initialize g_autofree pointer

2022-04-05 Thread Peter Maydell
On Tue, 5 Apr 2022 at 12:32, Bernhard Beschow  wrote:
>
> Resolves the only compiler warning when building a full QEMU under Arch Linux:
>
>   Compiling C object libqemu-ppc-softmmu.fa.p/hw_ppc_ppc405_boards.c.o
>   In file included from /usr/include/glib-2.0/glib.h:114,
>from qemu/include/glib-compat.h:32,
>from qemu/include/qemu/osdep.h:132,
>from ../src/hw/ppc/ppc405_boards.c:25:
>   ../src/hw/ppc/ppc405_boards.c: In function ‘ref405ep_init’:
>   /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: warning: ‘filename’ 
> may be used uninitialized in this function [-Wmaybe-uninitialized]
>  28 |   g_free (*pp);
> |   ^~~~
>   ../src/hw/ppc/ppc405_boards.c:265:26: note: ‘filename’ was declared here
> 265 | g_autofree char *filename;
> |  ^~~~
>
> Signed-off-by: Bernhard Beschow 
> ---
>  hw/ppc/ppc405_boards.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
> index 7e1a4ac955..326353ea25 100644
> --- a/hw/ppc/ppc405_boards.c
> +++ b/hw/ppc/ppc405_boards.c
> @@ -262,7 +262,7 @@ static void ref405ep_init(MachineState *machine)
>  /* allocate and load BIOS */
>  if (machine->firmware) {
>  MemoryRegion *bios = g_new(MemoryRegion, 1);
> -g_autofree char *filename;
> +g_autofree char *filename = NULL;
>  long bios_size;
>
>  memory_region_init_rom(bios, NULL, "ef405ep.bios", BIOS_SIZE,

The compiler's wrong here, because there's no way to get to the free
without passing through the actual initialization:

filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, machine->firmware);

I think I would prefer a fix which hoisted that up to the declaration,
rather than setting it to NULL and then unconditionally overwriting that
(which some future compiler version might notice and warn about):

 g_autofree char *filename = qemu_find_file(QEMU_FILE_TYPE_BIOS,
machine->firmware);

thanks
-- PMM



Re: [PATCH] ui/cursor: fix integer overflow in cursor_alloc (CVE-2022-4206)

2022-04-05 Thread Peter Maydell
On Tue, 5 Apr 2022 at 11:50, Mauro Matteo Cascella  wrote:
>
> Prevent potential integer overflow by limiting 'width' and 'height' to
> 512x512. Also change 'datasize' type to size_t. Refer to security
> advisory https://starlabs.sg/advisories/22-4206/ for more information.
>
> Fixes: CVE-2022-4206
> Signed-off-by: Mauro Matteo Cascella 

> diff --git a/ui/cursor.c b/ui/cursor.c
> index 1d62ddd4d0..7cfb08a030 100644
> --- a/ui/cursor.c
> +++ b/ui/cursor.c
> @@ -46,6 +46,13 @@ static QEMUCursor *cursor_parse_xpm(const char *xpm[])
>
>  /* parse pixel data */
>  c = cursor_alloc(width, height);
> +
> +if (!c) {
> +fprintf(stderr, "%s: cursor %ux%u alloc error\n",
> +__func__, width, height);
> +return NULL;

Side note, we should probably clean up the error handling in
this file to not be "print to stderr" at some point...

> +}
> +
>  for (pixel = 0, y = 0; y < height; y++, line++) {
>  for (x = 0; x < height; x++, pixel++) {
>  idx = xpm[line][x];
> @@ -91,7 +98,10 @@ QEMUCursor *cursor_builtin_left_ptr(void)
>  QEMUCursor *cursor_alloc(int width, int height)
>  {
>  QEMUCursor *c;
> -int datasize = width * height * sizeof(uint32_t);
> +size_t datasize = width * height * sizeof(uint32_t);
> +
> +if (width > 512 || height > 512)
> +return NULL;

Coding style requires braces on if statements.

thanks
-- PMM



Re: [PATCH] block/stream: Drain subtree around graph change

2022-04-05 Thread Hanna Reitz

On 05.04.22 12:14, Kevin Wolf wrote:

Am 24.03.2022 um 13:57 hat Hanna Reitz geschrieben:

When the stream block job cuts out the nodes between top and base in
stream_prepare(), it does not drain the subtree manually; it fetches the
base node, and tries to insert it as the top node's backing node with
bdrv_set_backing_hd().  bdrv_set_backing_hd() however will drain, and so
the actual base node might change (because the base node is actually not
part of the stream job) before the old base node passed to
bdrv_set_backing_hd() is installed.

This has two implications:

First, the stream job does not keep a strong reference to the base node.
Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g.
because some other block job is drained to finish), we will get a
use-after-free.  We should keep a strong reference to that node.

Second, even with such a strong reference, the problem remains that the
base node might change before bdrv_set_backing_hd() actually runs and as
a result the wrong base node is installed.

Both effects can be seen in 030's TestParallelOps.test_overlapping_5()
case, which has five nodes, and simultaneously streams from the middle
node to the top node, and commits the middle node down to the base node.
As it is, this will sometimes crash, namely when we encounter the
above-described use-after-free.

Taking a strong reference to the base node, we no longer get a crash,
but the resuling block graph is less than ideal: The expected result is
obviously that all middle nodes are cut out and the base node is the
immediate backing child of the top node.  However, if stream_prepare()
takes a strong reference to its base node (the middle node), and then
the commit job finishes in bdrv_set_backing_hd(), supposedly dropping
that middle node, the stream job will just reinstall it again.

Therefore, we need to keep the whole subtree drained in
stream_prepare()

That doesn't sound right. I think in reality it's "if we take the really
big hammer and drain the whole subtree, then the bit that we really need
usually happens to be covered, too".

When you have a long backing chain and merge the two topmost overlays
with streaming, then it's none of the stream job's business whether
there is I/O going on for the base image way down the chain. Subtree
drains do much more than they should in this case.


Yes, see the discussion I had with Vladimir.  He convinced me that this 
can’t be an indefinite solution, but that we need locking for graph 
changes that’s separate from draining, because (1) those are different 
things, and (2) changing the graph should influence I/O as little as 
possible.


I found this the best solution to fix a known case of a use-after-free 
for 7.1, though.



At the same time they probably do too little, because what you're
describing you're protecting against is not I/O, but graph modifications
done by callbacks invoked in the AIO_WAIT_WHILE() when replacing the
backing file. The callback could be invoked by I/O on an entirely
different subgraph (maybe if the other thing is a mirror job)or it
could be a BH or anything else really. bdrv_drain_all() would increase
your chances, but I'm not sure if even that would be guaranteed to be
enough - because it's really another instance of abusing drain for
locking, we're not really interested in the _I/O_ of the node.


The most common instances of graph modification I see are QMP and block 
jobs finishing.  The former will not be deterred by draining, and we do 
know of one instance where that is a problem (see the bdrv_next() 
discussion).  Generally, it isn’t though.  (If it is, this case here 
won’t be the only thing that breaks.)


As for the latter, most block jobs are parents of the nodes they touch 
(stream is one notable exception with how it handles its base, and 
changing that did indeed cause us headache before), and so will at least 
be paused when a drain occurs on a node they touch.  Since pausing 
doesn’t affect jobs that have exited their main loop, there might be 
some problem with concurrent jobs that are also finished but yielding, 
but I couldn’t find such a case.


I’m not sure what you’re arguing for, so I can only assume.  Perhaps 
you’re arguing for reverting this patch, which I wouldn’t want to do, 
because at least it fixes the one known use-after-free case. Perhaps 
you’re arguing that we need something better, and then I completely agree.



so that the graph modification it performs is effectively atomic,
i.e. that the base node it fetches is still the base node when
bdrv_set_backing_hd() sets it as the top node's backing node.

I think the way to keep graph modifications atomic is avoid polling in
the middle. Not even running any callbacks is a lot safer than trying to
make sure there can't be undesired callbacks that want to run.

So probably adding drain (or anything else that polls) in
bdrv_set_backing_hd() was a bad idea. It could assert that the parent
node is drained, but it should be the 

Re: [PULL 0/3] Misc changes for 2022-04-05

2022-04-05 Thread Peter Maydell
On Tue, 5 Apr 2022 at 10:25, Paolo Bonzini  wrote:
>
> The following changes since commit 20661b75ea6093f5e59079d00a778a972d6732c5:
>
>   Merge tag 'pull-ppc-20220404' of https://github.com/legoater/qemu into 
> staging (2022-04-04 15:48:55 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 776a6a32b4982a68d3b7a77cbfaae6c2b363a0b8:
>
>   docs/system/i386: Add measurement calculation details to 
> amd-memory-encryption (2022-04-05 10:42:06 +0200)
>
> 
> * fix vss-win32 compilation with clang++
>
> * update Coverity model
>
> * add measurement calculation to amd-memory-encryption docs
>
> 

Hi; this tag doesn't match what your pullreq cover letter claims
it is -- it is pointing at 267b85d4e3d15, not 776a6a32b49, and
it has way more than 3 patches in it.

thanks
-- PMM



  1   2   >