date:20230620

Re: [RFC PATCH 6/9] ui/gtk: Add a new parameter to assign connectors/monitors to GFX VCs

2023-06-20 Thread Markus Armbruster

Dongwon Kim  writes:

> From: Vivek Kasireddy 
>
> The new parameter named "connector" can be used to assign physical
> monitors/connectors to individual GFX VCs such that when the monitor
> is connected or hotplugged, the associated GTK window would be
> moved to it. If the monitor is disconnected or unplugged, the
> associated GTK window would be hidden and a relevant disconnect
> event would be sent to the Guest.
>
> Usage: -device virtio-gpu-pci,max_outputs=2,blob=true,...
>-display gtk,gl=on,connectors.0=eDP-1,connectors.1=DP-1.
>
> Cc: Gerd Hoffmann 
> Cc: Daniel P. Berrangé 
> Cc: Markus Armbruster 
> Cc: Philippe Mathieu-Daudé 
> Cc: Marc-André Lureau 
> Signed-off-by: Vivek Kasireddy 
> Signed-off-by: Dongwon Kim 

[...]

> --- a/qapi/ui.json
> +++ b/qapi/ui.json
> @@ -1315,13 +1315,22 @@
>  # @show-menubar: Display the main window menubar.  Defaults to "on".
>  # (Since 8.0)
>  #
> +# @connectors:  List of physical monitor/connector names where the GTK
> +#   windows containing the respective graphics virtual consoles
> +#   (VCs) are to be placed. If a mapping exists for a VC, it
> +#   will be moved to that specific monitor or else it would
> +#   not be displayed anywhere and would appear disconnected
> +#   to the guest.
> +#   (Since 8.1)

Please format like

   # @connectors: List of physical monitor/connector names where the GTK
   # windows containing the respective graphics virtual consoles
   # (VCs) are to be placed.  If a mapping exists for a VC, it will
   # be moved to that specific monitor or else it would not be
   # displayed anywhere and would appear disconnected to the guest.
   # (Since 8.1)

to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments
to conform to current conventions).

The meaning of @connectors is less than clear.  The phrase "If a mapping
exists for a VC" suggests it is a mapping of sorts.  "List of physical
monitor/connector names" indicates it maps to physical monitor /
connector name.  What does it map from?  VC number?  How are VCs
numbered?  Is it the same number we use in QOM /backend/console[NUM]?

Using a list for the mapping means the mapping must be dense, e.g. I
can't map #0 and #2 but not #1.  Is this what we want?

The sentence "If a mapping exists" confusing has a dangling else
ambiguity of sorts.  I can interpret it as

If a mapping exists for a VC:
the window will be moved to that specific monitor
or else it would not be displayed anywhere and would appear ...

or as

If a mapping exists for a VC:
the window will be moved to that specific monitor
or else it would not be displayed anywhere and would appear ...

I think we have three cases:

0. No mapping provided

1. Mapping provided, and the named monitor / connector exists

2. Mapping provided, and the named monitor / connector does not exist

We can go from case 1 to 2 (disconnect) and vice versa (connect) at any
time.

Please spell out behavior for each case, and for the transitions between
case 1 and 2.

> +#
>  # Since: 2.12
>  ##
>  { 'struct'  : 'DisplayGTK',
>'data': { '*grab-on-hover' : 'bool',
>  '*zoom-to-fit'   : 'bool',
>  '*show-tabs' : 'bool',
> -'*show-menubar'  : 'bool'  } }
> +'*show-menubar'  : 'bool',
> +'*connectors': ['str'] } }
>  
>  ##
>  # @DisplayEGLHeadless:

[...]

Request for Assistance: Adding I2C Support in QEMU for Raspberry Pi (BCM2835 Peripherals)

2023-06-20 Thread Shivam

Hi,

I hope this email finds you well. I am reaching out to seek guidance and
assistance regarding a project I am working on involving the addition of
I2C support in QEMU for the Raspberry Pi, specifically targeting the
BCM2835 peripherals.

I have been studying the BCM2835 datasheet to familiarize myself with the
I2C device registers and their functionalities. Currently, I have started
implementing the i2c controller for bcm2835, but now couldn't able to get
the feel that how should I integrate it with the BCM2835
Soc.(bcm2835_peripheral.c)

I have attaching bcm2835_i2c.c ( which have basic template for BSC0
controller)


Thanks & Regards
Shivam Vijay


bcm2835_i2c.c
Description: Binary data

Re: [PATCH v2 03/10] target/i386: TCG supports RDSEED

2023-06-20 Thread Paolo Bonzini

Il mar 20 giu 2023, 18:24 Richard Henderson 
ha scritto:

> On 6/20/23 17:16, Paolo Bonzini wrote:
> > TCG implements RDSEED, and in fact uses qcrypto_random_bytes which is
> > secure enough to match hardware behavior.  Expose it to guests.
> >
> > Reviewed-by: Richard Henderson 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >   target/i386/cpu.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index ff3dcd02dcb..fc4246223d4 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -657,11 +657,10 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
> vendor1,
> > CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ADX
> | \
> > CPUID_7_0_EBX_PCOMMIT | CPUID_7_0_EBX_CLFLUSHOPT |
>   \
> > CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_MPX |
> CPUID_7_0_EBX_FSGSBASE | \
> > -  CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_AVX2)
> > +  CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_AVX2 |
> CPUID_7_0_EBX_RDSEED)
> > /* missing:
> > CPUID_7_0_EBX_HLE
> > -  CPUID_7_0_EBX_INVPCID, CPUID_7_0_EBX_RTM,
> > -  CPUID_7_0_EBX_RDSEED */
> > +  CPUID_7_0_EBX_INVPCID, CPUID_7_0_EBX_RTM */
> >   #define TCG_7_0_ECX_FEATURES (CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU |
> \
> > /* CPUID_7_0_ECX_OSPKE is dynamic */ \
> > CPUID_7_0_ECX_LA57 | CPUID_7_0_ECX_PKS | CPUID_7_0_ECX_VAES)
>
> Still missing the check for CPUID_7_0_EBX_RDSEED at the RDSEED insn.
>

Sorry, I 6kissed that remain. It's more of a separate patch IMO, I will add
it.

Paolo


> r~
>
>

Re: [PATCH 1/4] target/ppc: Fix instruction loading endianness in alignment interrupt

2023-06-20 Thread Nicholas Piggin

On Wed Jun 21, 2023 at 2:54 AM AEST, Nicholas Piggin wrote:
> On Wed Jun 21, 2023 at 12:26 AM AEST, BALATON Zoltan wrote:
> > On Tue, 20 Jun 2023, Nicholas Piggin wrote:
> > > powerpc ifetch endianness depends on MSR[LE] so it has to byteswap
> > > after cpu_ldl_code(). This corrects DSISR bits in alignment
> > > interrupts when running in little endian mode.
> > >
> > > Reviewed-by: Fabiano Rosas 
> > > Signed-off-by: Nicholas Piggin 
> > > ---
> > > target/ppc/excp_helper.c | 22 +-
> > > 1 file changed, 21 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> > > index 12d8a7257b..a2801f6e6b 100644
> > > --- a/target/ppc/excp_helper.c
> > > +++ b/target/ppc/excp_helper.c
> > > @@ -133,6 +133,26 @@ static void dump_hcall(CPUPPCState *env)
> > >   env->nip);
> > > }
> > >
> > > +#ifdef CONFIG_TCG
> > > +/* Return true iff byteswap is needed to load instruction */
> > > +static inline bool insn_need_byteswap(CPUArchState *env)
> > > +{
> > > +/* SYSTEM builds TARGET_BIG_ENDIAN. Need to swap when MSR[LE] is set 
> > > */
> > > +return !!(env->msr & ((target_ulong)1 << MSR_LE));
> > > +}
> >
> > Don't other places typically use FIELD_EX64 to test for msr bits now? If 
>
> Yeah I should use that, good point. There's at least another case in
> that file that doesn't use it but I probably added that too :/

This incremental patch fixes it:

Thanks,
Nick

---
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index ff7166adf9..cfdbeb0da5 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -138,7 +138,7 @@ static void dump_hcall(CPUPPCState *env)
 static inline bool insn_need_byteswap(CPUArchState *env)
 {
 /* SYSTEM builds TARGET_BIG_ENDIAN. Need to swap when MSR[LE] is set */
-return !!(env->msr & ((target_ulong)1 << MSR_LE));
+return FIELD_EX64(env->msr, MSR, LE);
 }
 
 static uint32_t ppc_ldl_code(CPUArchState *env, hwaddr addr)

Re: [PATCH] Revert "cputlb: Restrict SavedIOTLB to system emulation"

2023-06-20 Thread Richard Henderson


On 6/20/23 19:57, Peter Maydell wrote:

This reverts commit d7ee93e24359703debf4137f4cc632563aa4e8d1.

That commit tries to make a field in the CPUState struct not be
present when CONFIG_USER_ONLY is set.  Unfortunately, you can't
conditionally omit fields in structs like this based on ifdefs that
are set per-target.  If you try it, then code in files compiled
per-target (where CONFIG_USER_ONLY is or can be set) will disagree
about the struct layout with files that are compiled once-only (where
this kind of ifdef is never set).

This manifests specifically in 'make check-tcg' failing, because code
in cpus-common.c that sets up the CPUState::cpu_index field puts it
at a different offset from the code in plugins/core.c in
qemu_plugin_vcpu_init_hook() which reads the cpu_index field.  The
latter then hits an assert because from its point of view every
thread has a 0 cpu_index. There might be other weird behaviour too.

Mostly we catch this kind of bug because the CONFIG_whatever is
listed in include/exec/poison.h and so the reference to it in
build-once source files will then cause a compiler error.
Unfortunately CONFIG_USER_ONLY is an exception to that: we have some
places where we use it in "safe" ways in headers that will be seen by
once-only source files (e.g.  ifdeffing out function prototypes) and
it would be a lot of refactoring to be able to get to a position
where we could poison it.  This leaves us in a "you have to be
careful to walk around the bear trap" situation...

Fixes: d7ee93e243597 ("cputlb: Restrict SavedIOTLB to system emulation")
Signed-off-by: Peter Maydell 
---
  include/hw/core/cpu.h | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)


Ho hum, thanks.  I'll apply this directly.

r~

Re: [PULL 49/52] exec/poison: Do not poison CONFIG_SOFTMMU

2023-06-20 Thread Richard Henderson


On 6/20/23 20:01, Peter Maydell wrote:

On Mon, 5 Jun 2023 at 21:23, Richard Henderson
 wrote:


If CONFIG_USER_ONLY is ok generically, so is CONFIG_SOFTMMU,
because they are exactly opposite.


This isn't quite right. CONFIG_USER_ONLY is theoretically
something we should poison, because it's unsafe in the general
case to use it in compiled-once source files. But in practice
we make quite a lot of use of it in "we know this specific
use of it is OK" situations, like ifdeffing out function
prototypes. So we'd like to poison it, but we can't poison
it without a huge amoun of refactoring which isn't really
worth the effort.


Yes, a similar amount of refactoring would have been required within tcg/ to retain the 
poison of CONFIG_SOFTMMU.



So it's not a good model for "therefore it's OK not to poison
CONFIG_SOFTMMU" -- we should leave that poisoned if we can,
so we don't introduce either new buggy uses of CONFIG_SOFTMMU,
or new "we know this is safe" uses of it which will make
it difficult to put it back into the poison-list later...


My plan is to remove it as a define entirely.  But not this cycle.


r~

Re: [PATCH v4] hw/pci: enforce use of slot only slot 0 when devices have an upstream PCIE port

2023-06-20 Thread Michael S. Tsirkin

On Wed, Jun 21, 2023 at 08:09:55AM +0530, Ani Sinha wrote:
> 
> 
> > On 20-Jun-2023, at 4:13 PM, Michael S. Tsirkin  wrote:
> > 
> > On Tue, Jun 20, 2023 at 12:48:05PM +0530, Ani Sinha wrote:
> >> When a device has an upstream PCIE port, we can only use slot 0.
> > 
> > Actually, it's when device is plugged into a PCIE port.
> > So maybe:
> > 
> > PCI Express ports only have one slot, so
> > PCI Express devices can only be plugged into
> > slot 0 on a PCIE port
> > 
> >> Non-zero slots
> >> are invalid. This change ensures that we throw an error if the user
> >> tries to hotplug a device with an upstream PCIE port to a non-zero slot.
> > 
> > it also adds a comment explaining why function 0 must not exist
> > when function != 0 is added. or maybe split that part out.
> > 
> >> CC: jus...@redhat.com
> >> CC: imamm...@redhat.com
> >> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> >> Signed-off-by: Ani Sinha 
> >> ---
> >> hw/pci/pci.c | 18 ++
> >> 1 file changed, 18 insertions(+)
> >> 
> >> changelog:
> >> v2: addressed issue with multifunction pcie root ports. Should allow
> >> hotplug on functions other than function 0.
> >> v3: improved commit message.
> >> v4: improve commit message and code comments further. Some more
> >> improvements might come in v5. No claims made here that this is
> >> the final one :-)
> >> 
> >> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >> index bf38905b7d..30ce6a78cb 100644
> >> --- a/hw/pci/pci.c
> >> +++ b/hw/pci/pci.c
> >> @@ -64,6 +64,7 @@ bool pci_available = true;
> >> static char *pcibus_get_dev_path(DeviceState *dev);
> >> static char *pcibus_get_fw_dev_path(DeviceState *dev);
> >> static void pcibus_reset(BusState *qbus);
> >> +static bool pcie_has_upstream_port(PCIDevice *dev);
> >> 
> >> static Property pci_props[] = {
> >> DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> >> @@ -1182,6 +1183,11 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> >> *pci_dev,
> >> } else if (dev->hotplugged &&
> >>!pci_is_vf(pci_dev) &&
> >>pci_get_function_0(pci_dev)) {
> >> +/*
> >> + * populating function 0 triggers a bus scan from the guest that
> >> + * exposes other non-zero functions. Hence we need to ensure that
> >> + * function 0 wasn't added yet.
> >> + */
> > 
> > Pls capitalize populating. Also, comments like this should come
> > before the logic they document, not after. By the way it doesn't
> > have to be a bus scan - I'd just say "a scan" - with ACPI
> > guest knows what was added and can just probe the device functions.
> > 
> >> error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
> >>" new func %s cannot be exposed to guest.",
> >>PCI_SLOT(pci_get_function_0(pci_dev)->devfn),
> >> @@ -1189,6 +1195,18 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> >> *pci_dev,
> >>name);
> >> 
> >>return NULL;
> >> +} else if (dev->hotplugged &&
> > 
> > why hotplugged? Doesn't the same rule apply to all devices?
> > 
> >> +   !pci_is_vf(pci_dev) &&
> > 
> > Hmm. I think you copied it from here:
> >} else if (dev->hotplugged &&
> >   !pci_is_vf(pci_dev) &&
> >   pci_get_function_0(pci_dev)) {
> > 
> > it makes sense there because VFs are added later
> > after PF exists.
> 
> I thought PFs are handled only in the host OS and only VFs are
> passthrough into the guest?

This is emulated SRIOV. host and guest would be nested L2.

> I thought this check was because VFs have
> a different domain address separate from other non-vf devices in the
> guest PCI tree. 

Maybe take a look at the SRIOV spec then.

> > 
> > But here it makes no sense that I can see.
> > 
> > 
> >> +   pcie_has_upstream_port(pci_dev) && PCI_SLOT(devfn)) {
> >> +/*
> >> + * If the device has an upstream PCIE port, like a pcie root port,
> > 
> > no, a root port can not be an upstream port.
> > 
> > 
> >> + * we only support functions on slot 0.
> >> + */
> >> +error_setg(errp, "PCI: slot %d is not valid for %s,"
> >> +   " only functions on slot 0 is supported for devices"
> >> +   " with an upstream PCIE port.",
> > 
> > 
> > something like:
> > 
> >error_setg(errp, "PCI: slot %d is not valid for %s:"
> >   " PCI Express devices can only be plugged into slot 0")
> > 
> > and then you don't really need a comment.
> > 
> > 
> >> +   PCI_SLOT(devfn), name);
> >> +return NULL;
> >> }
> >> 
> >> pci_dev->devfn = devfn;
> >> -- 
> >> 2.39.1

Re: [PATCH v4 1/1] hw/arm/sbsa-ref: use XHCI to replace EHCI

2023-06-20 Thread Yuquan Wang

On 2023-06-21 01:24,  Leif wrote:

> Leif, do you think we should bump the minor version here?
 
I think that makes sense, yes.
 
/
 Leif

Thanks for everyone's guidance.
There is a new confusion: Which minor version should I bump to (2 or 3) ?
As I found that Marcin’s latest patch (add ITS support in SBSA GIC
https://lists.nongnu.org/archive/html/qemu-arm/2023-06/msg00709.html )
increased the minor version to 2. 


Many thanks
Yuquan

[PATCH v2] hw/pci: add comment explaining the reason for checking function 0 in hotplug

2023-06-20 Thread Ani Sinha

This change is cosmetic. A comment is added explaining why we need to check for
the availability of function 0 when we hotplug a device.

CC: m...@redhat.com
Signed-off-by: Ani Sinha 
---
 hw/pci/pci.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

changelog:
v2: moved comment location as per mst suggestion.

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index bf38905b7d..459c7123a8 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1179,7 +1179,13 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev,
PCI_SLOT(devfn), PCI_FUNC(devfn), name,
bus->devices[devfn]->name, bus->devices[devfn]->qdev.id);
 return NULL;
-} else if (dev->hotplugged &&
+}
+/*
+ * Populating function 0 triggers a scan from the guest that
+ * exposes other non-zero functions. Hence we need to ensure that
+ * function 0 wasn't added yet.
+ */
+else if (dev->hotplugged &&
!pci_is_vf(pci_dev) &&
pci_get_function_0(pci_dev)) {
 error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
-- 
2.39.1

Re: [PATCH v4] hw/pci: enforce use of slot only slot 0 when devices have an upstream PCIE port

2023-06-20 Thread Ani Sinha




> On 20-Jun-2023, at 4:13 PM, Michael S. Tsirkin  wrote:
> 
> On Tue, Jun 20, 2023 at 12:48:05PM +0530, Ani Sinha wrote:
>> When a device has an upstream PCIE port, we can only use slot 0.
> 
> Actually, it's when device is plugged into a PCIE port.
> So maybe:
> 
>   PCI Express ports only have one slot, so
>   PCI Express devices can only be plugged into
>   slot 0 on a PCIE port
> 
>> Non-zero slots
>> are invalid. This change ensures that we throw an error if the user
>> tries to hotplug a device with an upstream PCIE port to a non-zero slot.
> 
> it also adds a comment explaining why function 0 must not exist
> when function != 0 is added. or maybe split that part out.
> 
>> CC: jus...@redhat.com
>> CC: imamm...@redhat.com
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
>> Signed-off-by: Ani Sinha 
>> ---
>> hw/pci/pci.c | 18 ++
>> 1 file changed, 18 insertions(+)
>> 
>> changelog:
>> v2: addressed issue with multifunction pcie root ports. Should allow
>> hotplug on functions other than function 0.
>> v3: improved commit message.
>> v4: improve commit message and code comments further. Some more
>> improvements might come in v5. No claims made here that this is
>> the final one :-)
>> 
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index bf38905b7d..30ce6a78cb 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -64,6 +64,7 @@ bool pci_available = true;
>> static char *pcibus_get_dev_path(DeviceState *dev);
>> static char *pcibus_get_fw_dev_path(DeviceState *dev);
>> static void pcibus_reset(BusState *qbus);
>> +static bool pcie_has_upstream_port(PCIDevice *dev);
>> 
>> static Property pci_props[] = {
>> DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>> @@ -1182,6 +1183,11 @@ static PCIDevice *do_pci_register_device(PCIDevice 
>> *pci_dev,
>> } else if (dev->hotplugged &&
>>!pci_is_vf(pci_dev) &&
>>pci_get_function_0(pci_dev)) {
>> +/*
>> + * populating function 0 triggers a bus scan from the guest that
>> + * exposes other non-zero functions. Hence we need to ensure that
>> + * function 0 wasn't added yet.
>> + */
> 
> Pls capitalize populating. Also, comments like this should come
> before the logic they document, not after. By the way it doesn't
> have to be a bus scan - I'd just say "a scan" - with ACPI
> guest knows what was added and can just probe the device functions.
> 
>> error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
>>" new func %s cannot be exposed to guest.",
>>PCI_SLOT(pci_get_function_0(pci_dev)->devfn),
>> @@ -1189,6 +1195,18 @@ static PCIDevice *do_pci_register_device(PCIDevice 
>> *pci_dev,
>>name);
>> 
>>return NULL;
>> +} else if (dev->hotplugged &&
> 
> why hotplugged? Doesn't the same rule apply to all devices?
> 
>> +   !pci_is_vf(pci_dev) &&
> 
> Hmm. I think you copied it from here:
>} else if (dev->hotplugged &&
>   !pci_is_vf(pci_dev) &&
>   pci_get_function_0(pci_dev)) {
> 
> it makes sense there because VFs are added later
> after PF exists.

I thought PFs are handled only in the host OS and only VFs are passthrough into 
the guest? I thought this check was because VFs have a different domain address 
separate from other non-vf devices in the guest PCI tree. 

> 
> But here it makes no sense that I can see.
> 
> 
>> +   pcie_has_upstream_port(pci_dev) && PCI_SLOT(devfn)) {
>> +/*
>> + * If the device has an upstream PCIE port, like a pcie root port,
> 
> no, a root port can not be an upstream port.
> 
> 
>> + * we only support functions on slot 0.
>> + */
>> +error_setg(errp, "PCI: slot %d is not valid for %s,"
>> +   " only functions on slot 0 is supported for devices"
>> +   " with an upstream PCIE port.",
> 
> 
> something like:
> 
>error_setg(errp, "PCI: slot %d is not valid for %s:"
>   " PCI Express devices can only be plugged into slot 0")
> 
> and then you don't really need a comment.
> 
> 
>> +   PCI_SLOT(devfn), name);
>> +return NULL;
>> }
>> 
>> pci_dev->devfn = devfn;
>> -- 
>> 2.39.1

[PATCH RESEND v2 1/2] target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE

2023-06-20 Thread Dongli Zhang

The "perf stat" at the VM side still works even we set "-cpu host,-pmu" in
the QEMU command line. That is, neither "-cpu host,-pmu" nor "-cpu EPYC"
could disable the pmu virtualization in an AMD environment.

We still see below at VM kernel side ...

[0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

... although we expect something like below.

[0.596381] Performance Events: PMU not available due to virtualization, 
using software events only.
[0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

This is because the AMD pmu (v1) does not rely on cpuid to decide if the
pmu virtualization is supported.

We introduce a new property 'pmu-cap-disabled' for KVM accel to set
KVM_PMU_CAP_DISABLE if KVM_CAP_PMU_CAPABILITY is supported. Only x86 host
is supported because currently KVM uses KVM_CAP_PMU_CAPABILITY only for
x86.

Cc: Joe Jin 
Cc: Like Xu 
Signed-off-by: Dongli Zhang 
---
Changed since v1:
- In version 1 we did not introduce the new property. We ioctl
  KVM_PMU_CAP_DISABLE only before the creation of the 1st vcpu. We had
  introduced a helpfer function to do this job before creating the 1st
  KVM vcpu in v1.

 accel/kvm/kvm-all.c  |  1 +
 include/sysemu/kvm_int.h |  1 +
 qemu-options.hx  |  7 ++
 target/i386/kvm/kvm.c| 46 
 4 files changed, 55 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7679f397ae..238098e991 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3763,6 +3763,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->xen_version = 0;
 s->xen_gnttab_max_frames = 64;
 s->xen_evtchn_max_pirq = 256;
+s->pmu_cap_disabled = false;
 }
 
 /**
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 511b42bde5..cbbe08ec54 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -123,6 +123,7 @@ struct KVMState
 uint32_t xen_caps;
 uint16_t xen_gnttab_max_frames;
 uint16_t xen_evtchn_max_pirq;
+bool pmu_cap_disabled;
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/qemu-options.hx b/qemu-options.hx
index b57489d7ca..1976c0ca3e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
 "tb-size=n (TCG translation block cache size)\n"
 "dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n"
 "notify-vmexit=run|internal-error|disable,notify-window=n 
(enable notify VM exit and set notify window, x86 only)\n"
+"pmu-cap-disabled=true|false (disable 
KVM_CAP_PMU_CAPABILITY, x86 only, default false)\n"
 "thread=single|multi (enable multi-threaded TCG)\n", 
QEMU_ARCH_ALL)
 SRST
 ``-accel name[,prop=value[,...]]``
@@ -254,6 +255,12 @@ SRST
 open up for a specified of time (i.e. notify-window).
 Default: notify-vmexit=run,notify-window=0.
 
+``pmu-cap-disabled=true|false``
+When the KVM accelerator is used, it controls whether to disable the
+KVM_CAP_PMU_CAPABILITY via KVM_PMU_CAP_DISABLE. When disabled, the
+PMU virtualization is disabled at the KVM module side. This is for
+x86 host only.
+
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index de531842f6..bf4136fa1b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -129,6 +129,7 @@ static bool has_msr_ucode_rev;
 static bool has_msr_vmx_procbased_ctls2;
 static bool has_msr_perf_capabs;
 static bool has_msr_pkrs;
+static bool has_pmu_cap;
 
 static uint32_t has_architectural_pmu_version;
 static uint32_t num_architectural_pmu_gp_counters;
@@ -2767,6 +2768,23 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 }
 
+has_pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
+if (s->pmu_cap_disabled) {
+if (has_pmu_cap) {
+ret = kvm_vm_enable_cap(s, KVM_CAP_PMU_CAPABILITY, 0,
+KVM_PMU_CAP_DISABLE);
+if (ret < 0) {
+s->pmu_cap_disabled = false;
+error_report("kvm: Failed to disable pmu cap: %s",
+ strerror(-ret));
+}
+} else {
+s->pmu_cap_disabled = false;
+error_report("kvm: KVM_CAP_PMU_CAPABILITY is not supported");
+}
+}
+
 return 0;
 }
 
@@ -5951,6 +5969,28 @@ static void kvm_arch_set_xen_evtchn_max_pirq(Object 
*obj, Visitor *v,
 s->xen_evtchn_max_pirq = value;
 }
 
+static void kvm_set_pmu_cap_disabled(Object *obj, Visitor *v,
+ const char *name, void *opaque,
+ Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+bool pmu_cap_disabled;
+Error *error = NULL;
+
+if (s->fd != -1) {
+error_setg(errp, "Cannot set properties after

[PATCH RESEND v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs

2023-06-20 Thread Dongli Zhang

This is to rebase the patchset on top of the most recet QEMU.

This patchset is to fix two svm pmu virtualization bugs, x86 only.

version 1:
https://lore.kernel.org/all/20221119122901.2469-1-dongli.zh...@oracle.com/

1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.

To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
virtualization. There is still below at the VM linux side ...

[0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

... although we expect something like below.

[0.596381] Performance Events: PMU not available due to virtualization, 
using software events only.
[0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

The 1st patch has introduced a new x86 only accel/kvm property
"pmu-cap-disabled=true" to disable the pmu virtualization via
KVM_PMU_CAP_DISABLE.

I considered 'KVM_X86_SET_MSR_FILTER' initially before patchset v1.
Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
finally used the latter because it is easier to use.


2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
at the KVM side may inject random unwanted/unknown NMIs to the VM.

The svm pmu registers are not reset during QEMU system_reset.

(1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
is running "perf top". The pmu registers are not disabled gracefully.

(2). Although the x86_cpu_reset() resets many registers to zero, the
kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
some pmu events are still enabled at the KVM side.

(3). The KVM pmc_speculative_in_use() always returns true so that the events
will not be reclaimed. The kvm_pmc->perf_event is still active.

(4). After the reboot, the VM kernel reports below error:

[0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, 
complain to your hardware vendor.
[0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 
c0010200 is 530076)

(5). In a worse case, the active kvm_pmc->perf_event is still able to
inject unknown NMIs randomly to the VM kernel.

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

The 2nd patch is to fix the issue by resetting AMD pmu registers as well as
Intel registers.


This patchset does not cover PerfMonV2, until the below patchset is merged
into the KVM side. It has been queued to kvm-x86 next by Sean.

[PATCH v7 00/12] KVM: x86: Add AMD Guest PerfMonV2 PMU support
https://lore.kernel.org/all/168609790857.1417369.13152633386083458084.b4...@google.com/


Dongli Zhang (2):
  target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE
  target/i386/kvm: get and put AMD pmu registers

 accel/kvm/kvm-all.c  |   1 +
 include/sysemu/kvm_int.h |   1 +
 qemu-options.hx  |   7 +++
 target/i386/cpu.h|   5 ++
 target/i386/kvm/kvm.c| 129 +-
 5 files changed, 141 insertions(+), 2 deletions(-)


Thank you very much!

Dongli Zhang

[PATCH RESEND v2 2/2] target/i386/kvm: get and put AMD pmu registers

2023-06-20 Thread Dongli Zhang

The QEMU side calls kvm_get_msrs() to save the pmu registers from the KVM
side to QEMU, and calls kvm_put_msrs() to store the pmu registers back to
the KVM side.

However, only the Intel gp/fixed/global pmu registers are involved. There
is not any implementation for AMD pmu registers. The
'has_architectural_pmu_version' and 'num_architectural_pmu_gp_counters' are
calculated at kvm_arch_init_vcpu() via cpuid(0xa). This does not work for
AMD. Before AMD PerfMonV2, the number of gp registers is decided based on
the CPU version.

This patch is to add the support for AMD version=1 pmu, to get and put AMD
pmu registers. Otherwise, there will be a bug:

1. The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
is running "perf top". The pmu registers are not disabled gracefully.

2. Although the x86_cpu_reset() resets many registers to zero, the
kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
some pmu events are still enabled at the KVM side.

3. The KVM pmc_speculative_in_use() always returns true so that the events
will not be reclaimed. The kvm_pmc->perf_event is still active.

4. After the reboot, the VM kernel reports below error:

[0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, 
complain to your hardware vendor.
[0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 
c0010200 is 530076)

5. In a worse case, the active kvm_pmc->perf_event is still able to
inject unknown NMIs randomly to the VM kernel.

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

The patch is to fix the issue by resetting AMD pmu registers during the
reset.

Cc: Joe Jin 
Cc: Like Xu 
Signed-off-by: Dongli Zhang 
---
 target/i386/cpu.h |  5 +++
 target/i386/kvm/kvm.c | 83 +--
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index cd047e0410..b8ba72e87a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -471,6 +471,11 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL   0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+#define MSR_K7_EVNTSEL0 0xc001
+#define MSR_K7_PERFCTR0 0xc0010004
+#define MSR_F15H_PERF_CTL0  0xc0010200
+#define MSR_F15H_PERF_CTR0  0xc0010201
+
 #define MSR_MC0_CTL 0x400
 #define MSR_MC0_STATUS  0x401
 #define MSR_MC0_ADDR0x402
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index bf4136fa1b..a0f7273dad 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2084,6 +2084,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 }
 
+/*
+ * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
+ * disable the AMD pmu virtualization.
+ *
+ * If KVM_CAP_PMU_CAPABILITY is supported, kvm_state->pmu_cap_disabled
+ * indicates the KVM side has already disabled the pmu virtualization.
+ */
+if (IS_AMD_CPU(env) && !cs->kvm_state->pmu_cap_disabled) {
+int64_t family;
+
+family = (env->cpuid_version >> 8) & 0xf;
+if (family == 0xf) {
+family += (env->cpuid_version >> 20) & 0xff;
+}
+
+if (family >= 6) {
+has_architectural_pmu_version = 1;
+
+if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE) {
+num_architectural_pmu_gp_counters = 6;
+} else {
+num_architectural_pmu_gp_counters = 4;
+}
+}
+}
+
 cpu_x86_cpuid(env, 0x8000, 0, , , , );
 
 for (i = 0x8000; i <= limit; i++) {
@@ -3438,7 +3464,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 
env->poll_control_msr);
 }
 
-if (has_architectural_pmu_version > 0) {
+if (has_architectural_pmu_version > 0 && IS_INTEL_CPU(env)) {
 if (has_architectural_pmu_version > 1) {
 /* Stop the counter.  */
 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -3469,6 +3495,26 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_global_ctrl);
 }
 }
+
+if (has_architectural_pmu_version > 0 && IS_AMD_CPU(env)) {
+uint32_t sel_base = MSR_K7_EVNTSEL0;
+uint32_t ctr_base = MSR_K7_PERFCTR0;
+uint32_t step = 1;
+
+if (num_architectural_pmu_gp_counters == 6) {
+sel_base = MSR_F15H_PERF_CTL0;
+ctr_base = MSR_F15H_PERF_CTR0;
+step = 2;
+}
+
+for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+kvm_msr_entry_add(cpu, ctr_base + i * step,
+  env->msr_gp_counters[i]);
+kvm_msr_entry_add(cpu, sel_base + i * step,
+

Re: [PATCH] target/riscv/cpu.c: fix veyron-v1 CPU properties

2023-06-20 Thread LIU Zhiwei




On 2023/6/20 23:24, Daniel Henrique Barboza wrote:

Commit 7f0bdfb5bfc2 ("target/riscv/cpu.c: remove cfg setup from
riscv_cpu_init()") removed code that was enabling mmu, pmp, ext_ifencei
and ext_icsr from riscv_cpu_init(), the init() function of
TYPE_RISCV_CPU, parent type of all RISC-V CPUss. This was done to force
CPUs to explictly enable all extensions and features it requires,
without any 'magic values' that were inherited by the parent type.

This commit failed to make appropriate changes in the 'veyron-v1' CPU,
added earlier by commit e1d084a8524a. The result is that the veyron-v1
CPU has ext_ifencei, ext_icsr and pmp set to 'false', which is not the
case.

The reason why it took this long to notice (thanks LIU Zhiwei for
reporting it) is because Linux doesn't mind 'ifencei' and 'icsr' being
absent in the 'riscv,isa' DT, implying that they're both present if the
'i' extension is enabled. OpenSBI also doesn't error out or warns about
the lack of 'pmp', it'll just not protect memory pages.

Fix it by setting them to 'true' in rv64_veyron_v1_cpu_init() like
7f0bdfb5bfc2 already did with other CPUs.

Reported-by: LIU Zhiwei 
Fixes: 7f0bdfb5bfc2 ("target/riscv/cpu.c: remove cfg setup from 
riscv_cpu_init()")
Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 881bddf393..707f62b592 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -444,6 +444,9 @@ static void rv64_veyron_v1_cpu_init(Object *obj)
  
  /* Enable ISA extensions */

  cpu->cfg.mmu = true;
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.pmp = true;


Reviewed-by: LIU Zhiwei 

Zhiwei


  cpu->cfg.ext_icbom = true;
  cpu->cfg.cbom_blocksize = 64;
  cpu->cfg.cboz_blocksize = 64;

RE: [PATCH v2] vfio/migration: Refactor and fix print of "Migration disabled"

2023-06-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Joao Martins 
>Sent: Tuesday, June 20, 2023 5:28 PM
>To: Duan, Zhenzhong ; Avihai Horon
>; qemu-devel@nongnu.org
>Cc: alex.william...@redhat.com; c...@redhat.com; Peng, Chao P
>
>Subject: Re: [PATCH v2] vfio/migration: Refactor and fix print of "Migration
>disabled"
>
>On 20/06/2023 09:55, Duan, Zhenzhong wrote:
>>> -Original Message-
>>> From: Joao Martins 
>>> Sent: Tuesday, June 20, 2023 4:23 PM
>>> To: Duan, Zhenzhong ; Avihai Horon
>>> ; qemu-devel@nongnu.org
>>> Cc: alex.william...@redhat.com; c...@redhat.com; Peng, Chao P
>>> 
>>> Subject: Re: [PATCH v2] vfio/migration: Refactor and fix print of
>>> "Migration disabled"
>>>
>>> On 20/06/2023 04:04, Duan, Zhenzhong wrote:
> -Original Message-
> From: Avihai Horon 
> Sent: Monday, June 19, 2023 7:14 PM
 ...
>> a/hw/vfio/migration.c b/hw/vfio/migration.c index
>> 6b58dddb8859..bc51aa765cb8 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -632,42 +632,41 @@ int64_t vfio_mig_bytes_transferred(void)
>>   return bytes_transferred;
>>   }
>>
>> -int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>> +bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
>>   {
>> -int ret = -ENOTSUP;
>> +int ret;
>>
>> -if (!vbasedev->enable_migration) {
>> +if (!vbasedev->enable_migration || vfio_migration_init(vbasedev)) {
>> +error_setg(>migration_blocker,
>> +   "VFIO device doesn't support migration");
>>   goto add_blocker;
>>   }
>>
>> -ret = vfio_migration_init(vbasedev);
>> -if (ret) {
>> +if (vfio_block_multiple_devices_migration(errp)) {
>> +error_setg(>migration_blocker,
>> +   "Migration is currently not supported with multiple "
>> +   "VFIO devices");
>>   goto add_blocker;
>>   }
>
> Here you are tying the multiple devices blocker to a specific device.
> This could be problematic:
> If you add vfio device #1 and then device #2 then the blocker will
> be added to device #2. If you then remove device #1, migration will
> still be blocked although it shouldn't.
>
> I think we should keep it as a global blocker and not a per-device 
> blocker.

 Thanks for point out, you are right, seems I need to restore the
 multiple
>>> devices part code.
>>>
>>> It's the same for vIOMMU migration blocker. You could have a machine
>>> with default_bus_bypass_iommu=on and add device #1 with
>>> bypass_iommu=off attribute in pxb PCI port, and then add device #2
>>> with bypass_iommu=on. The blocker is added because of device #1 but
>>> then it will remain blocked if you remove it.
>>
>> Right, thanks for point out, I'm thinking about changing
>> vfio_viommu_preset() to check corresponding device's address space rather
>than all vfio devices'.
>>
>> Let me know if you prefer to restore vIOMMU blocker as global too,
>> then I'll not try with my idea furtherly.
>
>The vIOMMU migration blocker doesn't need to be global, true, as it doesn't
>care about others address space -- if each device has a blocker as long as the
>one device blocker is removed it should become make VM migratable again
>(but atm we will be blocked by the multi device blocker anyway). This should
>consolidate things into a single migration blocker and avoid the special path. 
>I
>am not enterily sure if the refactor will give *that* much gain but that's
>probably because I haven't seen the final result.

OK, let me write one for discuss, having per device vIOMMU blocker, global 
multiple devices blocker, etc.
>
>IIUC the problem with this patch is that you remove what unblocks the
>migration, and I guess that need to stay there for the global case.
Yes.

Thanks
Zhenzhong

[RFC PATCH 6/9] ui/gtk: Add a new parameter to assign connectors/monitors to GFX VCs

2023-06-20 Thread Dongwon Kim

From: Vivek Kasireddy 

The new parameter named "connector" can be used to assign physical
monitors/connectors to individual GFX VCs such that when the monitor
is connected or hotplugged, the associated GTK window would be
moved to it. If the monitor is disconnected or unplugged, the
associated GTK window would be hidden and a relevant disconnect
event would be sent to the Guest.

Usage: -device virtio-gpu-pci,max_outputs=2,blob=true,...
   -display gtk,gl=on,connectors.0=eDP-1,connectors.1=DP-1.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrangé 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Signed-off-by: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 include/ui/gtk.h |   1 +
 qapi/ui.json |  11 +-
 qemu-options.hx  |   5 +-
 ui/gtk.c | 271 +++
 4 files changed, 263 insertions(+), 25 deletions(-)

diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index e7c4726aad..189817ab88 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -84,6 +84,7 @@ typedef struct VirtualConsole {
 GtkWidget *menu_item;
 GtkWidget *tab_item;
 GtkWidget *focus;
+GdkMonitor *monitor;
 VirtualConsoleType type;
 union {
 VirtualGfxConsole gfx;
diff --git a/qapi/ui.json b/qapi/ui.json
index 2755395483..0f5ab35bae 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -1315,13 +1315,22 @@
 # @show-menubar: Display the main window menubar.  Defaults to "on".
 # (Since 8.0)
 #
+# @connectors:  List of physical monitor/connector names where the GTK
+#   windows containing the respective graphics virtual consoles
+#   (VCs) are to be placed. If a mapping exists for a VC, it
+#   will be moved to that specific monitor or else it would
+#   not be displayed anywhere and would appear disconnected
+#   to the guest.
+#   (Since 8.1)
+#
 # Since: 2.12
 ##
 { 'struct'  : 'DisplayGTK',
   'data': { '*grab-on-hover' : 'bool',
 '*zoom-to-fit'   : 'bool',
 '*show-tabs' : 'bool',
-'*show-menubar'  : 'bool'  } }
+'*show-menubar'  : 'bool',
+'*connectors': ['str'] } }
 
 ##
 # @DisplayEGLHeadless:
diff --git a/qemu-options.hx b/qemu-options.hx
index b57489d7ca..2eb0d6a129 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2044,7 +2044,7 @@ DEF("display", HAS_ARG, QEMU_OPTION_display,
 #if defined(CONFIG_GTK)
 "-display gtk[,full-screen=on|off][,gl=on|off][,grab-on-hover=on|off]\n"
 "
[,show-tabs=on|off][,show-cursor=on|off][,window-close=on|off]\n"
-"[,show-menubar=on|off]\n"
+"[,show-menubar=on|off][,connectors.=]\n"
 #endif
 #if defined(CONFIG_VNC)
 "-display vnc=[,]\n"
@@ -2139,6 +2139,9 @@ SRST
 
 ``show-menubar=on|off`` : Display the main window menubar, defaults to 
"on"
 
+``connectors=`` : VC to connector mappings to display the VC
+ window on a specific monitor
+
 ``curses[,charset=]``
 Display video output via curses. For graphics device models
 which support a text mode, QEMU can display this output using a
diff --git a/ui/gtk.c b/ui/gtk.c
index d8323a3a9d..f4c71454a3 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -38,6 +38,7 @@
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
+#include "qemu/option.h"
 
 #include "ui/console.h"
 #include "ui/gtk.h"
@@ -741,6 +742,39 @@ static void gd_set_ui_size(VirtualConsole *vc, gint width, 
gint height)
 dpy_set_ui_info(vc->gfx.dcl.con, , true);
 }
 
+static void gd_ui_hide(VirtualConsole *vc)
+{
+QemuUIInfo info;
+
+vc->gfx.visible = false;
+info = *dpy_get_ui_info(vc->gfx.dcl.con);
+info.width = 0;
+info.height = 0;
+dpy_set_ui_info(vc->gfx.dcl.con, , false);
+}
+
+static void gd_ui_show(VirtualConsole *vc)
+{
+QemuUIInfo info;
+GtkDisplayState *s = vc->s;
+GdkWindow *window = gtk_widget_get_window(vc->gfx.drawing_area);
+
+info = *dpy_get_ui_info(vc->gfx.dcl.con);
+info.width = gdk_window_get_width(window);
+info.height = gdk_window_get_height(window);
+dpy_set_ui_info(vc->gfx.dcl.con, , false);
+
+if (gd_is_grab_active(s)) {
+gd_grab_keyboard(vc, "user-request-main-window");
+gd_grab_pointer(vc, "user-request-main-window");
+} else {
+gd_ungrab_keyboard(s);
+gd_ungrab_pointer(s);
+}
+
+vc->gfx.visible = true;
+}
+
 #if defined(CONFIG_OPENGL)
 
 static gboolean gd_render_event(GtkGLArea *area, GdkGLContext *context,
@@ -1352,12 +1386,10 @@ static void gd_menu_switch_vc(GtkMenuItem *item, void 
*opaque)
 GtkDisplayState *s = opaque;
 VirtualConsole *vc;
 GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
-GdkWindow *window;
 gint page;
 
 vc = gd_vc_find_current(s);
-vc->gfx.visible = false;
-gd_set_ui_size(vc, 0, 0);
+

[RFC PATCH 4/9] ui/gtk: Disable the scanout when a detached tab is closed

2023-06-20 Thread Dongwon Kim

From: Vivek Kasireddy 

When a detached tab window is closed, the underlying (EGL) context
is destroyed; therefore, disable the scanout which also destroys the
underlying framebuffer (id) and other objects. Also add calls to
make the context current in disable scanout and other missing places.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrangé 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Signed-off-by: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk-egl.c | 3 +++
 ui/gtk-gl-area.c | 2 ++
 ui/gtk.c | 1 +
 3 files changed, 6 insertions(+)

diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 443873e266..aa22ebbd98 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -214,6 +214,9 @@ void gd_egl_scanout_disable(DisplayChangeListener *dcl)
 {
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+eglMakeCurrent(qemu_egl_display, vc->gfx.esurface,
+   vc->gfx.esurface, vc->gfx.ectx);
+
 vc->gfx.w = 0;
 vc->gfx.h = 0;
 gtk_egl_set_scanout_mode(vc, false);
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index 68b16a5ff1..8228cc9f3f 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -270,6 +270,7 @@ void gd_gl_area_scanout_disable(DisplayChangeListener *dcl)
 {
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
 gtk_gl_area_set_scanout_mode(vc, false);
 }
 
@@ -282,6 +283,7 @@ void gd_gl_area_scanout_flush(DisplayChangeListener *dcl,
 return;
 }
 
+gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
 if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf->draw_submitted) {
 graphic_hw_gl_block(vc->gfx.dcl.con, true);
 vc->gfx.guest_fb.dmabuf->draw_submitted = true;
diff --git a/ui/gtk.c b/ui/gtk.c
index f9096aea14..90ecb8b82f 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1400,6 +1400,7 @@ static gboolean gd_tab_window_close(GtkWidget *widget, 
GdkEvent *event,
 
 vc->gfx.visible = false;
 gd_set_ui_size(vc, 0, 0);
+dpy_gl_scanout_disable(vc->gfx.dcl.con);
 gtk_widget_set_sensitive(vc->menu_item, true);
 gd_widget_reparent(vc->window, s->notebook, vc->tab_item);
 gtk_notebook_set_tab_label_text(GTK_NOTEBOOK(s->notebook),
-- 
2.34.1

[RFC PATCH 2/9] ui/gtk: set the ui size to 0 when invisible

2023-06-20 Thread Dongwon Kim

Getting guest displays disconnected by setting ui size to 0 when
the VC is set as invisible. When the VC is set as visible again,
the ui size is restored back to its previous size to reconnect
guest displays.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrangé 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index 84c50d835e..ff4a5c58ea 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1352,10 +1352,12 @@ static void gd_menu_switch_vc(GtkMenuItem *item, void 
*opaque)
 GtkDisplayState *s = opaque;
 VirtualConsole *vc;
 GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
+GdkWindow *window;
 gint page;
 
 vc = gd_vc_find_current(s);
 vc->gfx.visible = false;
+gd_set_ui_size(vc, 0, 0);
 
 vc = gd_vc_find_by_menu(s);
 gtk_release_modifiers(s);
@@ -1363,6 +1365,9 @@ static void gd_menu_switch_vc(GtkMenuItem *item, void 
*opaque)
 page = gtk_notebook_page_num(nb, vc->tab_item);
 gtk_notebook_set_current_page(nb, page);
 gtk_widget_grab_focus(vc->focus);
+window = gtk_widget_get_window(vc->gfx.drawing_area);
+gd_set_ui_size(vc, gdk_window_get_width(window),
+   gdk_window_get_height(window));
 vc->gfx.visible = true;
 }
 }
@@ -1394,6 +1399,7 @@ static gboolean gd_tab_window_close(GtkWidget *widget, 
GdkEvent *event,
 GtkDisplayState *s = vc->s;
 
 vc->gfx.visible = false;
+gd_set_ui_size(vc, 0, 0);
 gtk_widget_set_sensitive(vc->menu_item, true);
 gd_widget_reparent(vc->window, s->notebook, vc->tab_item);
 gtk_notebook_set_tab_label_text(GTK_NOTEBOOK(s->notebook),
@@ -1429,6 +1435,7 @@ static gboolean gd_win_grab(void *opaque)
 static void gd_menu_untabify(GtkMenuItem *item, void *opaque)
 {
 GtkDisplayState *s = opaque;
+GdkWindow *window;
 VirtualConsole *vc = gd_vc_find_current(s);
 
 if (vc->type == GD_VC_GFX &&
@@ -1467,6 +1474,10 @@ static void gd_menu_untabify(GtkMenuItem *item, void 
*opaque)
 gd_update_geometry_hints(vc);
 gd_update_caption(s);
 }
+
+window = gtk_widget_get_window(vc->gfx.drawing_area);
+gd_set_ui_size(vc, gdk_window_get_width(window),
+   gdk_window_get_height(window));
 vc->gfx.visible = true;
 }
 
@@ -1791,7 +1802,9 @@ static gboolean gd_configure(GtkWidget *widget,
 {
 VirtualConsole *vc = opaque;
 
-gd_set_ui_size(vc, cfg->width, cfg->height);
+if (vc->gfx.visible) {
+gd_set_ui_size(vc, cfg->width, cfg->height);
+}
 return FALSE;
 }
 
-- 
2.34.1

[RFC PATCH 1/9] ui/gtk: skip drawing guest scanout when associated VC is invisible

2023-06-20 Thread Dongwon Kim

A new flag "visible" that specifies visibility status of the gfx console.
The polarity of the flag determines whether the drawing surface should
continuously updated upon scanout flush. The flag is set to 'true' when
the window bound to the VC is in visible state  but set to 'false' when
the window is inactivated or closed. When invisible, QEMU will skip any of
draw events.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrangé 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 include/ui/gtk.h |  1 +
 ui/gtk-egl.c |  8 
 ui/gtk-gl-area.c |  8 
 ui/gtk.c | 10 +-
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index ae0f53740d..e7c4726aad 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -57,6 +57,7 @@ typedef struct VirtualGfxConsole {
 bool y0_top;
 bool scanout_mode;
 bool has_dmabuf;
+bool visible;
 #endif
 } VirtualGfxConsole;
 
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 19130041bc..443873e266 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -247,6 +247,10 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
 #ifdef CONFIG_GBM
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+if (!vc->gfx.visible) {
+return;
+}
+
 eglMakeCurrent(qemu_egl_display, vc->gfx.esurface,
vc->gfx.esurface, vc->gfx.ectx);
 
@@ -341,6 +345,10 @@ void gd_egl_flush(DisplayChangeListener *dcl,
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 GtkWidget *area = vc->gfx.drawing_area;
 
+if (!vc->gfx.visible) {
+return;
+}
+
 if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf->draw_submitted) {
 graphic_hw_gl_block(vc->gfx.dcl.con, true);
 vc->gfx.guest_fb.dmabuf->draw_submitted = true;
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index c384a1516b..68b16a5ff1 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -278,6 +278,10 @@ void gd_gl_area_scanout_flush(DisplayChangeListener *dcl,
 {
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+if (!vc->gfx.visible) {
+return;
+}
+
 if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf->draw_submitted) {
 graphic_hw_gl_block(vc->gfx.dcl.con, true);
 vc->gfx.guest_fb.dmabuf->draw_submitted = true;
@@ -291,6 +295,10 @@ void gd_gl_area_scanout_dmabuf(DisplayChangeListener *dcl,
 #ifdef CONFIG_GBM
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+if (!vc->gfx.visible) {
+return;
+}
+
 gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
 egl_dmabuf_import_texture(dmabuf);
 if (!dmabuf->texture) {
diff --git a/ui/gtk.c b/ui/gtk.c
index e50f950f2b..84c50d835e 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1350,15 +1350,20 @@ static void gd_menu_quit(GtkMenuItem *item, void 
*opaque)
 static void gd_menu_switch_vc(GtkMenuItem *item, void *opaque)
 {
 GtkDisplayState *s = opaque;
-VirtualConsole *vc = gd_vc_find_by_menu(s);
+VirtualConsole *vc;
 GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
 gint page;
 
+vc = gd_vc_find_current(s);
+vc->gfx.visible = false;
+
+vc = gd_vc_find_by_menu(s);
 gtk_release_modifiers(s);
 if (vc) {
 page = gtk_notebook_page_num(nb, vc->tab_item);
 gtk_notebook_set_current_page(nb, page);
 gtk_widget_grab_focus(vc->focus);
+vc->gfx.visible = true;
 }
 }
 
@@ -1388,6 +1393,7 @@ static gboolean gd_tab_window_close(GtkWidget *widget, 
GdkEvent *event,
 VirtualConsole *vc = opaque;
 GtkDisplayState *s = vc->s;
 
+vc->gfx.visible = false;
 gtk_widget_set_sensitive(vc->menu_item, true);
 gd_widget_reparent(vc->window, s->notebook, vc->tab_item);
 gtk_notebook_set_tab_label_text(GTK_NOTEBOOK(s->notebook),
@@ -1461,6 +1467,7 @@ static void gd_menu_untabify(GtkMenuItem *item, void 
*opaque)
 gd_update_geometry_hints(vc);
 gd_update_caption(s);
 }
+vc->gfx.visible = true;
 }
 
 static void gd_menu_show_menubar(GtkMenuItem *item, void *opaque)
@@ -2499,6 +2506,7 @@ static void gtk_display_init(DisplayState *ds, 
DisplayOptions *opts)
 #ifdef CONFIG_GTK_CLIPBOARD
 gd_clipboard_init(s);
 #endif /* CONFIG_GTK_CLIPBOARD */
+vc->gfx.visible = true;
 }
 
 static void early_gtk_display_init(DisplayOptions *opts)
-- 
2.34.1

[RFC PATCH 7/9] ui/gtk: unblock gl if draw submitted already or fence is not yet signaled

2023-06-20 Thread Dongwon Kim

Remove monitor while a guest frame is still being processed could block
the guest (virtio-gpu) scanout pipe line. It is needed to manually flush
the pipeline to prevent the permanent lockup.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrangé 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk.c | 29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index f4c71454a3..e4ef1f7173 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -598,10 +598,21 @@ void gd_hw_gl_flushed(void *vcon)
 VirtualConsole *vc = vcon;
 QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
 
-qemu_set_fd_handler(dmabuf->fence_fd, NULL, NULL, NULL);
-close(dmabuf->fence_fd);
-dmabuf->fence_fd = -1;
-graphic_hw_gl_block(vc->gfx.dcl.con, false);
+if (!dmabuf) {
+return;
+}
+
+if (dmabuf->fence_fd > 0) {
+qemu_set_fd_handler(dmabuf->fence_fd, NULL, NULL, NULL);
+close(dmabuf->fence_fd);
+dmabuf->fence_fd = -1;
+graphic_hw_gl_block(vc->gfx.dcl.con, false);
+} else if (dmabuf->draw_submitted) {
+/* if called after a frame is submitted but render event
+ * is not scheduled yet, cancel submitted draw. */
+dmabuf->draw_submitted = false;
+graphic_hw_gl_block(vc->gfx.dcl.con, false);
+}
 }
 
 /** DisplayState Callbacks (opengl version) **/
@@ -742,6 +753,9 @@ static void gd_set_ui_size(VirtualConsole *vc, gint width, 
gint height)
 dpy_set_ui_info(vc->gfx.dcl.con, , true);
 }
 
+static gboolean gd_window_state_event(GtkWidget *widget, GdkEvent *event,
+  void *opaque);
+
 static void gd_ui_hide(VirtualConsole *vc)
 {
 QemuUIInfo info;
@@ -751,6 +765,8 @@ static void gd_ui_hide(VirtualConsole *vc)
 info.width = 0;
 info.height = 0;
 dpy_set_ui_info(vc->gfx.dcl.con, , false);
+/* forcefully cancel rendering sequence */
+gd_hw_gl_flushed(vc);
 }
 
 static void gd_ui_show(VirtualConsole *vc)
@@ -1460,11 +1476,6 @@ static gboolean gd_window_state_event(GtkWidget *widget, 
GdkEvent *event,
 
 if (event->window_state.new_window_state & GDK_WINDOW_STATE_ICONIFIED) {
 gd_ui_hide(vc);
-if (vc->gfx.guest_fb.dmabuf &&
-vc->gfx.guest_fb.dmabuf->draw_submitted) {
-vc->gfx.guest_fb.dmabuf->draw_submitted = false;
-graphic_hw_gl_block(vc->gfx.dcl.con, false);
-}
 } else {
 gd_ui_show(vc);
 }
-- 
2.34.1

[RFC PATCH 9/9] ui/gtk: skip refresh/rendering if VC is invisible

2023-06-20 Thread Dongwon Kim

Skip any drawing activities if VC is invisible because it can't be finished.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrangé 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk-egl.c | 4 
 ui/gtk-gl-area.c | 4 
 ui/gtk.c | 4 
 3 files changed, 12 insertions(+)

diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 8eae2b4b1f..63bfad1f06 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -148,6 +148,10 @@ void gd_egl_refresh(DisplayChangeListener *dcl)
 gd_update_monitor_refresh_rate(
 vc, vc->window ? vc->window : vc->gfx.drawing_area);
 
+if (!vc->gfx.visible) {
+return;
+}
+
 if (!vc->gfx.esurface) {
 gd_egl_init(vc);
 if (!vc->gfx.esurface) {
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index 8228cc9f3f..8d01addb3b 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -123,6 +123,10 @@ void gd_gl_area_refresh(DisplayChangeListener *dcl)
 
 gd_update_monitor_refresh_rate(vc, vc->window ? vc->window : 
vc->gfx.drawing_area);
 
+if (!vc->gfx.visible) {
+return;
+}
+
 if (!vc->gfx.gls) {
 if (!gtk_widget_get_realized(vc->gfx.drawing_area)) {
 return;
diff --git a/ui/gtk.c b/ui/gtk.c
index e4ef1f7173..0bc35b64e0 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -847,6 +847,10 @@ static gboolean gd_draw_event(GtkWidget *widget, cairo_t 
*cr, void *opaque)
 
 #if defined(CONFIG_OPENGL)
 if (vc->gfx.gls) {
+if (!vc->gfx.visible) {
+return TRUE;
+}
+
 if (gtk_use_gl_area) {
 /* invoke render callback please */
 return FALSE;
-- 
2.34.1

[RFC PATCH 0/9] ui: guest displays multiple connectors suppport and hotplug in

2023-06-20 Thread Dongwon Kim

(This series replace two patch series,
[PATCH v2 0/6] ui/gtk: Add a new parameter to assign connectors/monitors (v2)
https://lists.gnu.org/archive/html/qemu-devel/2022-11/msg03098.html and
[RFC PATCH 0/3] ui/gtk: no render event when vc is invisible
https://lists.gnu.org/archive/html/qemu-devel/2022-10/msg04926.html) 

There is a need (expressed by several customers/users) to assign
ownership of one or more physical monitors/connectors to individual
Guests such that there is a clear notion of which Guest's contents
are being displayed on any given monitor. Given that there is always
a Display Server/Compositor running on the Host, monitor ownership
can never truly be transferred to Guests. However, the closest we
can come to realizing this concept is to request the Host compositor
to fullscreen the Guest's windows on individual monitors. This way,
it would become possible to have 4 different Guests' windows be
displayed on 4 different monitors or a single Guest's windows (or
virtual consoles/outputs) be displayed on 4 monitors or any such
combination.

This patch series attempts to accomplish this by introducing a new
parameter named "connector" to assign the monitors to the GFX VCs
associated with a Guest. If the assigned monitor is not connected,
then the Guest's window would not be displayed anywhere similar to
how a Host compositor would behave when the connectors are not
connected. Once the monitor is hotplugged, the Guest's window(s)
would be positioned on the assigned monitor.

The first 4 patches (~0004) are for some prep work that adds a flag
called 'visible' for VC that indicates the visibility of the associated
GTK window and making drawing operation skipped for invisible VCs.

0005 and 0006 are actual implementation of monitors/connectors mapping
to the guests. 0007 through 0009 are additional code changes for preventing
deadlock situation due to asynchronous display hot plug in event when guest
scanout is shared as blobs (zero copy display sharing)

Example Usage: -device virtio-gpu-pci,max_outputs=2,blob=true..
   -display gtk,gl=on,connector.0=eDP-1,connector.1=DP-1.

Dongwon Kim (6):
  ui/gtk: skip drawing guest scanout when associated VC is invisible
  ui/gtk: set the ui size to 0 when invisible
  ui/gtk: reset visible flag when window is minimized
  ui/gtk: unblock gl if draw submitted already or fence is not yet
signaled
  ui/gtk: skip drawing if any of ctx/surface/image don't exist
  ui/gtk: skip refresh/rendering if VC is invisible

Vivek Kasireddy (3):
  ui/gtk: Disable the scanout when a detached tab is closed
  ui/gtk: Factor out tab window creation into a separate function
  ui/gtk: Add a new parameter to assign connectors/monitors to GFX VCs

 include/ui/gtk.h |   2 +
 qapi/ui.json |  11 +-
 qemu-options.hx  |   5 +-
 ui/gtk-egl.c |  20 +++
 ui/gtk-gl-area.c |  14 ++
 ui/gtk.c | 362 +++
 6 files changed, 384 insertions(+), 30 deletions(-)

-- 
2.34.1

[RFC PATCH 8/9] ui/gtk: skip drawing if any of ctx/surface/image don't exist

2023-06-20 Thread Dongwon Kim

Rendering of scanout could be skipped if ctx/surface/image don't
exist due to an asynchronous event such as monitors being disconnected.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrangé 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk-egl.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index aa22ebbd98..8eae2b4b1f 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -106,6 +106,11 @@ void gd_egl_draw(VirtualConsole *vc)
 if (!vc->gfx.ds) {
 return;
 }
+
+if (!vc->gfx.esurface || !vc->gfx.ectx || !vc->gfx.ds->image) {
+return;
+}
+
 eglMakeCurrent(qemu_egl_display, vc->gfx.esurface,
vc->gfx.esurface, vc->gfx.ectx);
 
-- 
2.34.1

[RFC PATCH 5/9] ui/gtk: Factor out tab window creation into a separate function

2023-06-20 Thread Dongwon Kim

From: Vivek Kasireddy 

Pull the code that creates a new window associated with a notebook
tab into a separate function. This new function can be useful not
just when user wants to detach a tab but also in the future when
a new window creation is needed in other scenarios.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrangé 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Signed-off-by: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk.c | 71 +++-
 1 file changed, 39 insertions(+), 32 deletions(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index 90ecb8b82f..d8323a3a9d 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1462,6 +1462,44 @@ static gboolean gd_win_grab(void *opaque)
 return TRUE;
 }
 
+static void gd_tab_window_create(VirtualConsole *vc)
+{
+GtkDisplayState *s = vc->s;
+
+gtk_widget_set_sensitive(vc->menu_item, false);
+vc->window = gtk_window_new(GTK_WINDOW_TOPLEVEL);
+gtk_window_set_default_size(GTK_WINDOW(vc->window),
+surface_width(vc->gfx.ds),
+surface_height(vc->gfx.ds));
+#if defined(CONFIG_OPENGL)
+if (vc->gfx.esurface) {
+   eglDestroySurface(qemu_egl_display, vc->gfx.esurface);
+   vc->gfx.esurface = NULL;
+}
+if (vc->gfx.ectx) {
+   eglDestroyContext(qemu_egl_display, vc->gfx.ectx);
+   vc->gfx.ectx = NULL;
+}
+#endif
+gd_widget_reparent(s->notebook, vc->window, vc->tab_item);
+
+g_signal_connect(vc->window, "delete-event",
+G_CALLBACK(gd_tab_window_close), vc);
+gtk_widget_show_all(vc->window);
+
+if (qemu_console_is_graphic(vc->gfx.dcl.con)) {
+   GtkAccelGroup *ag = gtk_accel_group_new();
+   gtk_window_add_accel_group(GTK_WINDOW(vc->window), ag);
+
+   GClosure *cb = g_cclosure_new_swap(G_CALLBACK(gd_win_grab),
+  vc, NULL);
+   gtk_accel_group_connect(ag, GDK_KEY_g, HOTKEY_MODIFIERS, 0, cb);
+}
+
+gd_update_geometry_hints(vc);
+gd_update_caption(s);
+}
+
 static void gd_menu_untabify(GtkMenuItem *item, void *opaque)
 {
 GtkDisplayState *s = opaque;
@@ -1474,38 +1512,7 @@ static void gd_menu_untabify(GtkMenuItem *item, void 
*opaque)
FALSE);
 }
 if (!vc->window) {
-gtk_widget_set_sensitive(vc->menu_item, false);
-vc->window = gtk_window_new(GTK_WINDOW_TOPLEVEL);
-#if defined(CONFIG_OPENGL)
-if (vc->gfx.esurface) {
-eglDestroySurface(qemu_egl_display, vc->gfx.esurface);
-vc->gfx.esurface = NULL;
-}
-if (vc->gfx.esurface) {
-eglDestroyContext(qemu_egl_display, vc->gfx.ectx);
-vc->gfx.ectx = NULL;
-}
-#endif
-gd_widget_reparent(s->notebook, vc->window, vc->tab_item);
-
-g_signal_connect(vc->window, "delete-event",
- G_CALLBACK(gd_tab_window_close), vc);
-g_signal_connect(vc->window, "window-state-event",
- G_CALLBACK(gd_window_state_event), vc);
-
-gtk_widget_show_all(vc->window);
-
-if (qemu_console_is_graphic(vc->gfx.dcl.con)) {
-GtkAccelGroup *ag = gtk_accel_group_new();
-gtk_window_add_accel_group(GTK_WINDOW(vc->window), ag);
-
-GClosure *cb = g_cclosure_new_swap(G_CALLBACK(gd_win_grab),
-   vc, NULL);
-gtk_accel_group_connect(ag, GDK_KEY_g, HOTKEY_MODIFIERS, 0, cb);
-}
-
-gd_update_geometry_hints(vc);
-gd_update_caption(s);
+gd_tab_window_create(vc);
 }
 
 window = gtk_widget_get_window(vc->gfx.drawing_area);
-- 
2.34.1

[RFC PATCH 3/9] ui/gtk: reset visible flag when window is minimized

2023-06-20 Thread Dongwon Kim

Add a callback for window-state-event that resets vc->gfx.visible when
associated window is minimized or restored.

In case of virtio-gpu blob scanout, if the window is minimized before
the rendering event for the last guest scanout frame is finished, it cancels
the draw submission and unblocks the pipeline to prevent a permanent lockup.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrangé 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Marc-André Lureau 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/ui/gtk.c b/ui/gtk.c
index ff4a5c58ea..f9096aea14 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1419,6 +1419,35 @@ static gboolean gd_tab_window_close(GtkWidget *widget, 
GdkEvent *event,
 return TRUE;
 }
 
+static gboolean gd_window_state_event(GtkWidget *widget, GdkEvent *event,
+  void *opaque)
+{
+VirtualConsole *vc = opaque;
+
+if (!vc) {
+return TRUE;
+}
+
+if (event->window_state.new_window_state & GDK_WINDOW_STATE_ICONIFIED) {
+vc->gfx.visible = false;
+gd_set_ui_size(vc, 0, 0);
+if (vc->gfx.guest_fb.dmabuf &&
+vc->gfx.guest_fb.dmabuf->draw_submitted) {
+vc->gfx.guest_fb.dmabuf->draw_submitted = false;
+graphic_hw_gl_block(vc->gfx.dcl.con, false);
+}
+} else {
+GdkWindow *window;
+window = gtk_widget_get_window(vc->gfx.drawing_area);
+gd_set_ui_size(vc, gdk_window_get_width(window),
+   gdk_window_get_height(window));
+
+vc->gfx.visible = true;
+}
+
+return TRUE;
+}
+
 static gboolean gd_win_grab(void *opaque)
 {
 VirtualConsole *vc = opaque;
@@ -1460,6 +1489,9 @@ static void gd_menu_untabify(GtkMenuItem *item, void 
*opaque)
 
 g_signal_connect(vc->window, "delete-event",
  G_CALLBACK(gd_tab_window_close), vc);
+g_signal_connect(vc->window, "window-state-event",
+ G_CALLBACK(gd_window_state_event), vc);
+
 gtk_widget_show_all(vc->window);
 
 if (qemu_console_is_graphic(vc->gfx.dcl.con)) {
@@ -2498,6 +2530,11 @@ static void gtk_display_init(DisplayState *ds, 
DisplayOptions *opts)
 }
 
 vc = gd_vc_find_current(s);
+
+g_signal_connect(s->window, "window-state-event",
+ G_CALLBACK(gd_window_state_event),
+ vc);
+
 gtk_widget_set_sensitive(s->view_menu, vc != NULL);
 #ifdef CONFIG_VTE
 gtk_widget_set_sensitive(s->copy_item,
-- 
2.34.1

Re: [PATCH] target/riscv/cpu.c: fix veyron-v1 CPU properties

2023-06-20 Thread Alistair Francis

On Wed, Jun 21, 2023 at 1:25 AM Daniel Henrique Barboza
 wrote:
>
> Commit 7f0bdfb5bfc2 ("target/riscv/cpu.c: remove cfg setup from
> riscv_cpu_init()") removed code that was enabling mmu, pmp, ext_ifencei
> and ext_icsr from riscv_cpu_init(), the init() function of
> TYPE_RISCV_CPU, parent type of all RISC-V CPUss. This was done to force
> CPUs to explictly enable all extensions and features it requires,
> without any 'magic values' that were inherited by the parent type.
>
> This commit failed to make appropriate changes in the 'veyron-v1' CPU,
> added earlier by commit e1d084a8524a. The result is that the veyron-v1
> CPU has ext_ifencei, ext_icsr and pmp set to 'false', which is not the
> case.
>
> The reason why it took this long to notice (thanks LIU Zhiwei for
> reporting it) is because Linux doesn't mind 'ifencei' and 'icsr' being
> absent in the 'riscv,isa' DT, implying that they're both present if the
> 'i' extension is enabled. OpenSBI also doesn't error out or warns about
> the lack of 'pmp', it'll just not protect memory pages.
>
> Fix it by setting them to 'true' in rv64_veyron_v1_cpu_init() like
> 7f0bdfb5bfc2 already did with other CPUs.
>
> Reported-by: LIU Zhiwei 
> Fixes: 7f0bdfb5bfc2 ("target/riscv/cpu.c: remove cfg setup from 
> riscv_cpu_init()")
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 881bddf393..707f62b592 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -444,6 +444,9 @@ static void rv64_veyron_v1_cpu_init(Object *obj)
>
>  /* Enable ISA extensions */
>  cpu->cfg.mmu = true;
> +cpu->cfg.ext_ifencei = true;
> +cpu->cfg.ext_icsr = true;
> +cpu->cfg.pmp = true;
>  cpu->cfg.ext_icbom = true;
>  cpu->cfg.cbom_blocksize = 64;
>  cpu->cfg.cboz_blocksize = 64;
> --
> 2.41.0
>
>

[PATCH RFC 4/6] iotests: use the correct python to run linters

2023-06-20 Thread John Snow

Whichever python is used to run iotest 297 should be the one used to
actually run the linters.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/linters.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/linters.py b/tests/qemu-iotests/linters.py
index 65c4c4e827..9fb3fd1449 100644
--- a/tests/qemu-iotests/linters.py
+++ b/tests/qemu-iotests/linters.py
@@ -68,7 +68,7 @@ def run_linter(
 :raise CalledProcessError: If the linter process exits with failure.
 """
 subprocess.run(
-('python3', '-m', tool, *args),
+(sys.executable, '-m', tool, *args),
 env=env,
 check=True,
 stdout=subprocess.PIPE if suppress_output else None,
-- 
2.40.1

[PATCH RFC 0/6] Switch iotests to pyvenv

2023-06-20 Thread John Snow

Hi, this is ... a fairly incomplete series about trying to get iotests
to run out of the configure-time venv. I'm looking for some feedback, so
out to the list it goes.

Primarily, I'm having doubts about these points:

1) I think I need something like "mkvenv install" in the first patch,
   but mkvenv.py is getting pretty long...

2) Is there a way to optimize the speed for patch #2? Maybe installing
   this package can be skipped until it's needed, but that means that
   things like iotest's ./check might get complicated to support that.

3) I cheated quite a bit in patch 4 to use the correct Python to launch
   iotests, but I'm wondering if there's a nicer way to solve this
   more *completely*.

John Snow (6):
  experiment: add mkvenv install
  build, tests: Add qemu in-tree packages to pyvenv at configure time.
  iotests: get rid of '..' in path environment output
  iotests: use the correct python to run linters
  iotests: use pyvenv/bin/python3 to launch child test processes
  iotests: don't add qemu.git/python to PYTHONPATH

 configure | 31 +++
 python/scripts/mkvenv.py  | 40 +++
 tests/qemu-iotests/linters.py |  2 +-
 tests/qemu-iotests/testenv.py | 21 --
 4 files changed, 87 insertions(+), 7 deletions(-)

-- 
2.40.1

[PATCH RFC 3/6] iotests: get rid of '..' in path environment output

2023-06-20 Thread John Snow

Resolve the build_root before we append more items onto it so that the
environment output is more concise with less parent directory confetti
in it.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/testenv.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 9a37ad9152..e67ebd254b 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -216,7 +216,7 @@ def __init__(self, source_dir: str, build_dir: str,
 self.source_iotests = source_dir
 self.build_iotests = build_dir
 
-self.build_root = os.path.join(self.build_iotests, '..', '..')
+self.build_root = Path(self.build_iotests).parent.parent
 
 self.init_directories()
 
-- 
2.40.1

[PATCH RFC 6/6] iotests: don't add qemu.git/python to PYTHONPATH

2023-06-20 Thread John Snow

qemu.* should be provided by the configure-time venv, now.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/testenv.py | 4 
 1 file changed, 4 deletions(-)

diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 1b095d70f2..6441145701 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -108,12 +108,8 @@ def init_directories(self) -> None:
  SAMPLE_IMG_DIR
 """
 
-# Path where qemu goodies live in this source tree.
-qemu_srctree_path = Path(__file__, '../../../python').resolve()
-
 self.pythonpath = os.pathsep.join(filter(None, (
 self.source_iotests,
-str(qemu_srctree_path),
 os.getenv('PYTHONPATH'),
 )))
 
-- 
2.40.1

[PATCH RFC 2/6] build, tests: Add qemu in-tree packages to pyvenv at configure time.

2023-06-20 Thread John Snow

though, ouch: on my machine this takes 3-4 entire seconds to do. I wish
it wasn't so slow, but we can't rely on these packages not having any
dependencies any more.

We could theoretically use a .pth hack when creating the venv to
automatically include this directory as an "installed packages"
location, but when we go to drop qemu.qmp in the future, that will break
- I think we need to *install* this package.

Signed-off-by: John Snow 
---
 configure | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/configure b/configure
index 01a53576a7..d2e0abc068 100755
--- a/configure
+++ b/configure
@@ -250,6 +250,7 @@ git_submodules_action="update"
 git="git"
 debug_tcg="no"
 docs="auto"
+tests="enabled"
 EXESUF=""
 prefix="/usr/local"
 qemu_suffix="qemu"
@@ -639,6 +640,10 @@ for opt do
   ;;
   --disable-docs) docs=disabled
   ;;
+  --enable-tests) tests=enabled
+  ;;
+  --disable-tests) tests=disabled
+  ;;
   --cpu=*)
   ;;
   --target-list=*) target_list="$optarg"
@@ -985,6 +990,32 @@ if test "$docs" != "disabled" ; then
 fi
 fi
 
+# Optionally pre-load the testing pre-requisites. This is for iotests,
+# vmtests, and anything else that uses Python qemu.* packages. Note that
+# our in-tree qemu packages are currently pure python with zero external
+# dependencies. For this reason, it excludes the Avocado dependencies
+# which are installed on-demand at time of use instead.
+
+mkvenv_flags=""
+if test "$pypi" = "enabled" ; then
+mkvenv_flags="--online"
+fi
+
+if test "$tests" = "enabled" ; then
+if ! $mkvenv install \
+ $mkvenv_flags \
+ --editable \
+ --dir "${source_path}/python/wheels" \
+ "${source_path}/python/";
+then
+echo "There was a problem installing the in-tree python packages for 
testing."
+exit 1
+fi
+touch pyvenv/tests.group
+fi
+
+echo "mkvenv: done for now, ciao!"
+
 # Probe for ninja
 
 if test -z "$ninja"; then
-- 
2.40.1

[PATCH RFC 5/6] iotests: use pyvenv/bin/python3 to launch child test processes

2023-06-20 Thread John Snow

Now that there's a fancy venv set up for us by configure, we should take
care to use it even when check is invoked directly.

./check will now use the pyvenv environment when launching python tests,
which allows those tests to find and access the 'qemu.*' packages
without PYTHONPATH modifications.

RFC: This patch now means that ./check may launch test subprocesses
using a different Python than the one used to launch it. If that isn't
acceptable, we might need a launcher shim whose job it is to sit above
"check" and just chooses the correct Python.

...Or maybe it's fine the way it is.

Comments welcome, sorry for my indecision.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/testenv.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index e67ebd254b..1b095d70f2 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -138,7 +138,20 @@ def init_binaries(self) -> None:
  PYTHON (for bash tests)
  QEMU_PROG, QEMU_IMG_PROG, QEMU_IO_PROG, QEMU_NBD_PROG, QSD_PROG
 """
-self.python = sys.executable
+# The python we want to use to launch tests.
+self.python: str = str(
+Path(self.build_root).joinpath('pyvenv', 'bin', 'python3')
+)
+# RFC: Do I need to amend '.exe' for windows, or nah?
+
+if self.python != sys.executable:
+print(
+"Note: "
+f"check was launched with a Python ({sys.executable}) "
+f"that doesn't match QEMU's configured Python ({self.python})."
+" QEMU's Python will be used for individual test processes.",
+file=sys.stderr
+)
 
 def root(*names: str) -> str:
 return os.path.join(self.build_root, *names)
-- 
2.40.1

[PATCH RFC 1/6] experiment: add mkvenv install

2023-06-20 Thread John Snow

This is just so I can do "mkvenv install './python'" or "mkvenv install
file:python" to install the in-tree packages to pyvenv.

It probably isn't quite appropriate to bypass do_ensure in its entirety
like this because we miss out on a lot of error handling, but as a quick
proof of concept it works just fine.

Signed-off-by: John Snow 
---
 python/scripts/mkvenv.py | 40 
 1 file changed, 40 insertions(+)

diff --git a/python/scripts/mkvenv.py b/python/scripts/mkvenv.py
index a47f1eaf5d..ea8df34111 100644
--- a/python/scripts/mkvenv.py
+++ b/python/scripts/mkvenv.py
@@ -940,6 +940,35 @@ def _add_ensure_subcommand(subparsers: Any) -> None:
 )
 
 
+def _add_install_subcommand(subparsers: Any) -> None:
+subparser = subparsers.add_parser(
+"install", help="Install the specified package."
+)
+subparser.add_argument(
+"--online",
+action="store_true",
+help="Install packages from PyPI, if necessary.",
+)
+subparser.add_argument(
+"--dir",
+type=str,
+action="store",
+help="Path to vendored packages where we may install from.",
+)
+subparser.add_argument(
+'--editable',
+action="store_true",
+help="Should package(s) be installed in editable mode?"
+)
+subparser.add_argument(
+"dep_specs",
+type=str,
+action="store",
+help="PEP 508 Dependency specification, e.g. 'meson>=0.61.5'",
+nargs="+",
+)
+
+
 def main() -> int:
 """CLI interface to make_qemu_venv. See module docstring."""
 if os.environ.get("DEBUG") or os.environ.get("GITLAB_CI"):
@@ -964,6 +993,7 @@ def main() -> int:
 _add_create_subcommand(subparsers)
 _add_post_init_subcommand(subparsers)
 _add_ensure_subcommand(subparsers)
+_add_install_subcommand(subparsers)
 
 args = parser.parse_args()
 try:
@@ -982,6 +1012,16 @@ def main() -> int:
 wheels_dir=args.dir,
 prog=args.diagnose,
 )
+if args.command == "install":
+print(f"mkvenv: installing {', '.join(args.dep_specs)}", 
file=sys.stderr)
+pip_args = list(args.dep_specs)
+if args.editable:
+pip_args.insert(0, "--editable")
+pip_install(
+args=pip_args,
+online=args.online,
+wheels_dir=args.dir
+)
 logger.debug("mkvenv.py %s: exiting", args.command)
 except Ouch as exc:
 print("\n*** Ouch! ***\n", file=sys.stderr)
-- 
2.40.1

Re: [PATCH] hw/pci: add comment explaining the reason for checking function 0 in hotplug

2023-06-20 Thread Michael S. Tsirkin

On Tue, Jun 20, 2023 at 07:55:51PM +0530, Ani Sinha wrote:
> This change is cosmetic. A comment is added explaining why we need to check 
> for
> the availability of function 0 when we hotplug a device.
> 
> CC: m...@redhat.com
> Signed-off-by: Ani Sinha 
> ---
>  hw/pci/pci.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index bf38905b7d..847e534f68 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1179,6 +1179,11 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> *pci_dev,
> PCI_SLOT(devfn), PCI_FUNC(devfn), name,
> bus->devices[devfn]->name, bus->devices[devfn]->qdev.id);
>  return NULL;
> +/*
> + * Populating function 0 triggers a scan from the guest that
> + * exposes other non-zero functions. Hence we need to ensure that
> + * function 0 wasn't added yet.
> + */

bad place for the comment

>  }


stick the comment here

> else

or here

> if (dev->hotplugged &&
> !pci_is_vf(pci_dev) &&
> pci_get_function_0(pci_dev)) {
> -- 
> 2.39.1

Re: [PATCH v2 15/18] target/riscv: make riscv_isa_string_ext() KVM compatible

2023-06-20 Thread Daniel Henrique Barboza





On 6/19/23 06:54, Andrew Jones wrote:

On Tue, Jun 13, 2023 at 05:58:54PM -0300, Daniel Henrique Barboza wrote:

The isa_edata_arr[] is used by riscv_isa_string_ext() to create the
riscv,isa DT attribute. isa_edata_arr[] is kept in sync with the TCG
property vector riscv_cpu_extensions[], i.e. all TCG properties from
this vector that has a riscv,isa representation are included in
isa_edata_arr[].

KVM doesn't implement all TCG properties, but allow them to be created
anyway to not force an API change between TCG and KVM guests. Some of
these TCG-only extensions are defaulted to 'true', and users are also
allowed to enable them. KVM doesn't care, but riscv_isa_string_ext()
does. The result is that these TCG-only enabled extensions will appear
in the riscv,isa DT string under KVM.

To avoid code repetition and re-use riscv_isa_string_ext() for KVM
guests we'll make a couple of tweaks:

- set env->priv_ver to 'LATEST' for the KVM 'host' CPU. This is needed
   because riscv_isa_string_ext() makes a priv check for each extension
   before including them in the ISA string. KVM doesn't care about
   env->priv_ver, since it's part of the TCG-only CPU validation, so this
   change is benign for KVM;

- add a new 'kvm_available' flag in isa_ext_data struct. This flag is
   set via a new ISA_EXT_DATA_ENTRY_KVM macro to report that, for a given
   extension, KVM also supports it. riscv_isa_string_ext() then can check
   if a given extension is known by KVM and skip it if it's not.

This will allow us to re-use riscv_isa_string_ext() for KVM guests.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 28 
  1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a4f3ed0c17..a773c09645 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -44,11 +44,15 @@ struct isa_ext_data {
  const char *name;
  int min_version;
  int ext_enable_offset;
+bool kvm_available;
  };
  
  #define ISA_EXT_DATA_ENTRY(_name, _min_ver, _prop) \

  {#_name, _min_ver, offsetof(struct RISCVCPUConfig, _prop)}
  
+#define ISA_EXT_DATA_ENTRY_KVM(_name, _min_ver, _prop) \

+{#_name, _min_ver, offsetof(struct RISCVCPUConfig, _prop), true}
+
  /*
   * Here are the ordering rules of extension naming defined by RISC-V
   * specification :
@@ -68,14 +72,17 @@ struct isa_ext_data {
   *
   * Single letter extensions are checked in riscv_cpu_validate_misa_priv()
   * instead.
+ *
+ * ISA_EXT_DATA_ENTRY_KVM() is used to indicate that the extension is
+ * also known by the KVM driver. If unsure, use ISA_EXT_DATA_ENTRY().
   */
  static const struct isa_ext_data isa_edata_arr[] = {
-ISA_EXT_DATA_ENTRY(zicbom, PRIV_VERSION_1_12_0, ext_icbom),
-ISA_EXT_DATA_ENTRY(zicboz, PRIV_VERSION_1_12_0, ext_icboz),
+ISA_EXT_DATA_ENTRY_KVM(zicbom, PRIV_VERSION_1_12_0, ext_icbom),
+ISA_EXT_DATA_ENTRY_KVM(zicboz, PRIV_VERSION_1_12_0, ext_icboz),
  ISA_EXT_DATA_ENTRY(zicond, PRIV_VERSION_1_12_0, ext_zicond),
  ISA_EXT_DATA_ENTRY(zicsr, PRIV_VERSION_1_10_0, ext_icsr),
  ISA_EXT_DATA_ENTRY(zifencei, PRIV_VERSION_1_10_0, ext_ifencei),
-ISA_EXT_DATA_ENTRY(zihintpause, PRIV_VERSION_1_10_0, ext_zihintpause),
+ISA_EXT_DATA_ENTRY_KVM(zihintpause, PRIV_VERSION_1_10_0, ext_zihintpause),
  ISA_EXT_DATA_ENTRY(zawrs, PRIV_VERSION_1_12_0, ext_zawrs),
  ISA_EXT_DATA_ENTRY(zfh, PRIV_VERSION_1_11_0, ext_zfh),
  ISA_EXT_DATA_ENTRY(zfhmin, PRIV_VERSION_1_11_0, ext_zfhmin),
@@ -89,7 +96,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zcmp, PRIV_VERSION_1_12_0, ext_zcmp),
  ISA_EXT_DATA_ENTRY(zcmt, PRIV_VERSION_1_12_0, ext_zcmt),
  ISA_EXT_DATA_ENTRY(zba, PRIV_VERSION_1_12_0, ext_zba),
-ISA_EXT_DATA_ENTRY(zbb, PRIV_VERSION_1_12_0, ext_zbb),
+ISA_EXT_DATA_ENTRY_KVM(zbb, PRIV_VERSION_1_12_0, ext_zbb),
  ISA_EXT_DATA_ENTRY(zbc, PRIV_VERSION_1_12_0, ext_zbc),
  ISA_EXT_DATA_ENTRY(zbkb, PRIV_VERSION_1_12_0, ext_zbkb),
  ISA_EXT_DATA_ENTRY(zbkc, PRIV_VERSION_1_12_0, ext_zbkc),
@@ -114,13 +121,13 @@ static const struct isa_ext_data isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
  ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
  ISA_EXT_DATA_ENTRY(smstateen, PRIV_VERSION_1_12_0, ext_smstateen),
-ISA_EXT_DATA_ENTRY(ssaia, PRIV_VERSION_1_12_0, ext_ssaia),
+ISA_EXT_DATA_ENTRY_KVM(ssaia, PRIV_VERSION_1_12_0, ext_ssaia),
  ISA_EXT_DATA_ENTRY(sscofpmf, PRIV_VERSION_1_12_0, ext_sscofpmf),
-ISA_EXT_DATA_ENTRY(sstc, PRIV_VERSION_1_12_0, ext_sstc),
+ISA_EXT_DATA_ENTRY_KVM(sstc, PRIV_VERSION_1_12_0, ext_sstc),
  ISA_EXT_DATA_ENTRY(svadu, PRIV_VERSION_1_12_0, ext_svadu),
-ISA_EXT_DATA_ENTRY(svinval, PRIV_VERSION_1_12_0, ext_svinval),
+ISA_EXT_DATA_ENTRY_KVM(svinval, PRIV_VERSION_1_12_0, ext_svinval),
  ISA_EXT_DATA_ENTRY(svnapot, PRIV_VERSION_1_12_0, ext_svnapot),
-

Re: [PATCH V1 2/3] migration: fix suspended runstate

2023-06-20 Thread Peter Xu

On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
> Migration of a guest in the suspended state is broken.  The incoming
> migration code automatically tries to wake the guest, which IMO is
> wrong -- the guest should end migration in the same state it started.
> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
> bypasses vm_start().  The guest appears to be in the running state, but
> it is not.
> 
> To fix, leave the guest in the suspended state, but call
> qemu_system_start_on_wakeup_request() so the guest is properly resumed
> later, when the client sends a system_wakeup command.
> 
> Signed-off-by: Steve Sistare 
> ---
>  migration/migration.c | 11 ---
>  softmmu/runstate.c|  1 +
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 17b4b47..851fe6d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque)
>  vm_start();
>  } else {
>  runstate_set(global_state_get_runstate());
> +if (runstate_check(RUN_STATE_SUSPENDED)) {
> +/* Force vm_start to be called later. */
> +qemu_system_start_on_wakeup_request();
> +}

Is this really needed, along with patch 1?

I have a very limited knowledge on suspension, so I'm prone to making
mistakes..

But from what I read this, qemu_system_wakeup_request() (existing one, not
after patch 1 applied) will setup wakeup_reason and kick the main thread
using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
the main thread later on after qemu_wakeup_requested() returns true.

>  }
>  /*
>   * This must happen after any state changes since as soon as an external
> @@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms)
>  qemu_mutex_lock_iothread();
>  trace_postcopy_start_set_run();
>  
> -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>  global_state_store();
>  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>  if (ret < 0) {
> @@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s)
>  if (s->state == MIGRATION_STATUS_ACTIVE) {
>  qemu_mutex_lock_iothread();
>  s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>  
>  s->vm_old_state = runstate_get();
>  global_state_store();
> @@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque)
>  
>  qemu_mutex_lock_iothread();
>  
> -/*
> - * If VM is currently in suspended state, then, to make a valid runstate
> - * transition in vm_stop_force_state() we need to wakeup it up.
> - */
> -qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);

Removal of these three places seems reasonable to me, or we won't persist
the SUSPEND state.

Above comment was the major reason I used to have thought it was needed
(again, based on zero knowledge around this..), but perhaps it was just
wrong?  I would assume vm_stop_force_state() will still just work with
suepended, am I right?

>  s->vm_old_state = runstate_get();
>  
>  global_state_store();
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index e127b21..771896c 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -159,6 +159,7 @@ static const RunStateTransition 
> runstate_transitions_def[] = {
>  { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>  { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>  { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
> +{ RUN_STATE_SUSPENDED, RUN_STATE_PAUSED },
>  { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
>  { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
>  
> -- 
> 1.8.3.1
> 

-- 
Peter Xu

Re: [PATCH 12/12] hw/vmapple/vmapple: Add vmapple machine type

2023-06-20 Thread Bernhard Beschow




Am 14. Juni 2023 22:57:34 UTC schrieb Alexander Graf :
>Apple defines a new "vmapple" machine type as part of its proprietary
>macOS Virtualization.Framework vmm. This machine type is similar to the
>virt one, but with subtle differences in base devices, a few special
>vmapple device additions and a vastly different boot chain.
>
>This patch reimplements this machine type in QEMU. To use it, you
>have to have a readily installed version of macOS for VMApple,
>run on macOS with -accel hvf, pass the Virtualization.Framework
>boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash
>and pass aux and root volume as virtio drives. In addition, you also
>need to find the machine UUID and pass that as -M vmapple,uuid= parameter:
>
>$ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \
>-bios 
> /System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin
>-drive file=aux,if=pflash,format=raw \
>-drive file=root,if=pflash,format=raw \
>-drive file=aux,if=none,id=aux,format=raw \
>-device virtio-blk-pci,drive=aux,x-apple-type=2 \
>-drive file=root,if=none,id=root,format=raw \
>-device virtio-blk-pci,drive=root,x-apple-type=1
>
>With all these in place, you should be able to see macOS booting
>successfully.

This documentation seems valuable for the QEMU manual. But AFAICS there is no 
documentation like this added to the QEMU manual in this series. This means 
that it'll get "lost". How about adding it, possibly in this patch?

Note that I'm not able to test this series. I'm just seeing the 
valuable-information-in-the-commit-message-which-will-get-lost pattern.

>
>Signed-off-by: Alexander Graf 
>---
> hw/vmapple/Kconfig |  19 ++
> hw/vmapple/meson.build |   1 +
> hw/vmapple/vmapple.c   | 661 +
> 3 files changed, 681 insertions(+)
> create mode 100644 hw/vmapple/vmapple.c
>
>diff --git a/hw/vmapple/Kconfig b/hw/vmapple/Kconfig
>index ba37fc5b81..7a2375dc95 100644
>--- a/hw/vmapple/Kconfig
>+++ b/hw/vmapple/Kconfig
>@@ -9,3 +9,22 @@ config VMAPPLE_CFG
> 
> config VMAPPLE_PVG
> bool
>+
>+config VMAPPLE
>+bool
>+depends on ARM && HVF
>+default y if ARM && HVF
>+imply PCI_DEVICES
>+select ARM_GIC
>+select PLATFORM_BUS
>+select PCI_EXPRESS
>+select PCI_EXPRESS_GENERIC_BRIDGE
>+select PL011 # UART
>+select PL031 # RTC
>+select PL061 # GPIO
>+select GPIO_PWR
>+select PVPANIC_MMIO
>+select VMAPPLE_AES
>+select VMAPPLE_BDIF
>+select VMAPPLE_CFG
>+select VMAPPLE_PVG
>diff --git a/hw/vmapple/meson.build b/hw/vmapple/meson.build
>index 31fec87156..d732873d35 100644
>--- a/hw/vmapple/meson.build
>+++ b/hw/vmapple/meson.build
>@@ -2,3 +2,4 @@ softmmu_ss.add(when: 'CONFIG_VMAPPLE_AES',  if_true: 
>files('aes.c'))
> softmmu_ss.add(when: 'CONFIG_VMAPPLE_BDIF', if_true: files('bdif.c'))
> softmmu_ss.add(when: 'CONFIG_VMAPPLE_CFG',  if_true: files('cfg.c'))
> softmmu_ss.add(when: 'CONFIG_VMAPPLE_PVG',  if_true: [files('apple-gfx.m'), 
> pvg, metal])
>+specific_ss.add(when: 'CONFIG_VMAPPLE', if_true: files('vmapple.c'))
>diff --git a/hw/vmapple/vmapple.c b/hw/vmapple/vmapple.c
>new file mode 100644
>index 00..5d3fe54b96
>--- /dev/null
>+++ b/hw/vmapple/vmapple.c
>@@ -0,0 +1,661 @@
>+/*
>+ * VMApple machine emulation
>+ *
>+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Is an "All Rights Reserved" wording compatible with the GPL?

Best regards,
Bernhard

>+ *
>+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
>+ * See the COPYING file in the top-level directory.
>+ *
>+ * VMApple is the device model that the macOS built-in hypervisor called
>+ * "Virtualization.framework" exposes to Apple Silicon macOS guests. The
>+ * machine model in this file implements the same device model in QEMU, but
>+ * does not use any code from Virtualization.Framework.
>+ */
>+
>+#include "qemu/osdep.h"
>+#include "qemu/help-texts.h"
>+#include "qemu/datadir.h"
>+#include "qemu/units.h"
>+#include "qemu/option.h"
>+#include "monitor/qdev.h"
>+#include "qapi/error.h"
>+#include "hw/sysbus.h"
>+#include "hw/arm/boot.h"
>+#include "hw/arm/primecell.h"
>+#include "hw/boards.h"
>+#include "net/net.h"
>+#include "sysemu/sysemu.h"
>+#include "sysemu/runstate.h"
>+#include "sysemu/kvm.h"
>+#include "sysemu/hvf.h"
>+#include "hw/loader.h"
>+#include "qapi/error.h"
>+#include "qemu/bitops.h"
>+#include "qemu/error-report.h"
>+#include "qemu/module.h"
>+#include "hw/pci-host/gpex.h"
>+#include "hw/virtio/virtio-pci.h"
>+#include "hw/qdev-properties.h"
>+#include "hw/intc/arm_gic.h"
>+#include "hw/intc/arm_gicv3_common.h"
>+#include "hw/irq.h"
>+#include "qapi/visitor.h"
>+#include "qapi/qapi-visit-common.h"
>+#include "standard-headers/linux/input.h"
>+#include "target/arm/internals.h"
>+#include "target/arm/kvm_arm.h"
>+#include "hw/char/pl011.h"
>+#include "qemu/guest-random.h"
>+#include

Re: [PATCH] git-submodule.sh: allow running in validate mode without previous update

2023-06-20 Thread Paolo Bonzini

Il mar 20 giu 2023, 19:35 Nina Schoetterl-Glausch  ha
scritto:

> > +modules="$modules $m"
> > +grep $m $substat > /dev/null 2>&1 || $GIT submodule status
> $module >> $substat
> > +else
> > +echo "warn: ignoring non-existent submodule $m"
>
> What is the rational for ignoring non-existing submodules, i.e. how do the
> arguments to
> the script go stale as you say in the patch description?
>

For example when a Makefile calls the script before the Makefile itself is
rebuilt.

I'm asking because the fedora spec file initializes a new git repo in order
> to apply
> patches so the script exits with 0.


You mean it succeeds even if roms/SLOF is empty?

Nothing that cannot be worked around ofc.
>
> > +fi
> > +done
> > +else
> > +modules=$maybe_modules
> >  fi
> >
> > -if test -n "$maybe_modules" && test -z "$GIT"
> > -then
> > -echo "$0: unexpectedly called with submodules but git binary not
> found"
> > -exit 1
> > -fi
> > -
> > -modules=""
> > -for m in $maybe_modules
> > -do
> > -$GIT submodule status $m 1> /dev/null 2>&1
> > -if test $? = 0
> > -then
> > -modules="$modules $m"
> > -else
> > -echo "warn: ignoring non-existent submodule $m"
> > -fi
> > -done
> > -
> >  case "$command" in
> >  status|validate)
> > -test -f "$substat" || validate_error "$command"
> > -test -z "$maybe_modules" && exit 0
> >  for module in $modules; do
> > -check_updated $module || validate_error "$command"
> > +if is_git; then
> > +check_updated $module || validate_error "$command"
> > +elif ! test -d $module; then
>
> archive-source.sh creates an empty directory for e.g. roms/SLOF,
> so this check succeeds even if the submodule sources are unavailable.

Something like
>
> elif ! test -d $module || test -z "$(ls -A "$module")"; then
>

Or (set "$module"/* && test -e "$1").

Paolo

works.
>
> > +echo "$0: sources not available for $module and
> $no_git_error"
> > +validate_error "$command"
> > +fi
> >  done
> > -exit 0
> >  ;;
> > +
> >  update)
> > -test -e $substat || touch $substat
> > -test -z "$maybe_modules" && exit 0
> > +is_git || {
> > +echo "$0: unexpectedly called with submodules but $no_git_error"
> > +exit 1
> > +}
> >
> >  $GIT submodule update --init $modules 1>/dev/null
> >  test $? -ne 0 && update_error "failed to update modules"
>
>

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-20 Thread Maciej S. Szmigiero


On 19.06.2023 17:58, David Hildenbrand wrote:

[...]

Sorry for the late reply!

Still trying to make up my mind what the right way forward with this is.



This usage is still problematic I suspect (well, and a layer violation 
regarding the machine). The machine hotplug handler is supposed to call the 
pre_plug/plug/unplug hooks as response to pre_plug/plug/unplug notifications 
from the core. See how we handle virtio-mem/virtio-pmem/nvdimms as an example.

We assume that when memory_device_pre_plug() gets called, that the device is 
not realized yet, but once it gets plugged, that it already is realized, and 
that the device will actually vanish (get unrealized) when unplugging the 
device.
Otherwise memory device logic like in get_plugged_memory_size() stops working.


get_plugged_memory_size() just calls get_plugged_size() method on every
realized TYPE_MEMORY_DEVICE.

While this now always returns the whole backing memory size (once the
backend gets plugged) I don't see a reason why this method could not be
overridden in hv-balloon to return just the currently hot-added size.

By the way, this function seems to be used just for reporting stats via QMP.


memory_device_build_list() is another example, used for 
memory_device_get_free_addr().


I don't see it calling get_plugged_size() method, I can see it only using
(indirectly) get_addr() method.

You'd be blocking memory address ranges with an unplugged-but-realized memory device.> 
Memory device code expects that realized memory devices are plugged and vice versa.


Which QEMU code you mean specifically? Maybe it just needs a trivial
change.

Before the driver hot-adds the first chunk of memory it does not use any
part of the address space.

After that, it has to reserve address space for the whole backing memory
device, so no other devices will claim parts of it and because a
TYPE_MEMORY_DEVICE (currently) can have just a single range.

This address space is released when the VM is restarted.





As an example, see device_set_realized() on the pre_plug+realize+plug 
interaction.

IIRC, you're reusing the already-realized hv-balloon device here, correct?


Yes - in this version of the driver.

The previous version used separate virtual DIMM devices instead but you have
recommended against that approach.



Yes. My recommendation was to make the hv-balloon device a memory device and 
use a single memory region, which you did (and I think it's much better).

It's now all about when we (un)plug the memory device itself -- and how.



Why can't you call the pre_plug/plug/unplug functions from the machine 
pre_plug/plug/unplug hooks -- exactly once for the memory device when plugging 
the hv-balloon device?

Is it to support the !memdev case or why is this this plugging/unplugging in 
our_range_plugged_new()/our_range_plugged_free() required?


At least for three (four) reasons:
1a) At the hv-balloon plug time the device doesn't yet know the guest
alignement requirements - or whether the guest supports memory hot add at
all - that's what the device will learn only once the guest connects
to the protocol.


Understood, so you want to at least expose the memory dynamically to the VM 
(map the MR on demand).

That could be done using a memory region container like virtio-mem is planning 
[1] on using fairly easily.

[1] https://lkml.kernel.org/r/20230616092654.175518-14-da...@redhat.com


Thanks for the pointer to your series - I've looked at it and it seems
to me that while it allows multiple memory subregions, each backed by
a separate memslot it still needs a single big main region for
the particular TYPE_MEMORY_DEVICE, am I right?


1b) For the same reason the memory region has to be unplugged at the VM
reset time - the new guest might have stricter alignement requirements


Alignment is certainly interesting, but is it a real problem?

As default (not other memory devices) you get an address that's aligned to 1 
GiB. And, in fact, you can simply always request a 1 GiB alignment for the 
device, independent of the guest requirement.

Would the guest requirement be even stricter than that (e.g., 2 GiB)?


The protocol allows up to 32 GiB alignement so we cannot simply
hardcode the alignement to 1 GiB, especially since this is Windows
we're talking about (so this parameter is subject to unpredictable
changes).


In theory, when using a memory region container (again [1]) into which you 
dynamically map the RAM region, you can do this alignment internally.

So it might be an option to use a memory region container and dynamically map 
into that one as you please (it just has to have a fixed size).


Still, demand-allocating just the right memory region (with the right
alignement) seems to me like a cleaner solution than allocating a huge
worst-case memory region upfront and then trying to carve the right
part of it.



By the way, the memory region *can't* be unplugged yet at VMBus device
reset time - Windows keeps on using it until the system is

[PULL 0/1] Seabios hppa v7 patches

2023-06-20 Thread Helge Deller

The following changes since commit 327ec8d6c2a2223b78d311153a471036e474c5c5:

  Merge tag 'pull-tcg-20230423' of https://gitlab.com/rth7680/qemu into staging 
(2023-04-23 11:20:37 +0100)

are available in the Git repository at:

  https://github.com/hdeller/qemu-hppa.git tags/seabios-hppa-v7-pull-request

for you to fetch changes up to bb9c998ca9343d445c76b69fa15dea9db692f526:

  target/hppa: New SeaBIOS-hppa version 7 (2023-06-20 21:39:47 +0200)


hppa: New SeaBIOS-hppa version 7 ROM

New SeaBIOS-hppa version 7 ROM to fix Debian-12
CD-ROM boot issues.

Signed-off-by: Helge Deller 



Helge Deller (1):
  target/hppa: New SeaBIOS-hppa version 7

 pc-bios/hppa-firmware.img | Bin 719368 -> 719376 bytes
 roms/seabios-hppa |   2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)

--
2.38.1

[PATCH 1/2] memory: introduce memory_region_init_ram_protected()

2023-06-20 Thread Laurent Vivier

Commit 56918a126a ("memory: Add RAM_PROTECTED flag to skip IOMMU mappings")
has introduced the RAM_PROTECTED flag to denote "protected" memory.

This flags is only used with qemu_ram_alloc_from_fd() for now.

To be able to register memory region with this flag, define
memory_region_init_ram_protected() and declare the flag as valid in
qemu_ram_alloc_internal() and qemu_ram_alloc().

Signed-off-by: Laurent Vivier 
---
 include/exec/memory.h | 33 +
 softmmu/memory.c  | 33 +++--
 softmmu/physmem.c |  4 ++--
 3 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 47c2e0221c35..d8760015c381 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1520,6 +1520,39 @@ void memory_region_init_iommu(void *_iommu_mr,
   const char *name,
   uint64_t size);
 
+/**
+ * memory_region_init_ram_protected - Initialize RAM memory region.  Accesses
+ *into the region will modify memory
+ *directly.
+ *
+ * The memory is created with the RAM_PROTECTED flag, for memory that
+ * looks and acts like RAM but inaccessible via normal mechanisms,
+ * including DMA.
+ *
+ * @mr: the #MemoryRegion to be initialized
+ * @owner: the object that tracks the region's reference count (must be
+ * TYPE_DEVICE or a subclass of TYPE_DEVICE, or NULL)
+ * @name: name of the memory region
+ * @size: size of the region in bytes
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * This function allocates RAM for a board model or device, and
+ * arranges for it to be migrated (by calling vmstate_register_ram()
+ * if @owner is a DeviceState, or vmstate_register_ram_global() if
+ * @owner is NULL).
+ *
+ * TODO: Currently we restrict @owner to being either NULL (for
+ * global RAM regions with no owner) or devices, so that we can
+ * give the RAM block a unique name for migration purposes.
+ * We should lift this restriction and allow arbitrary Objects.
+ * If you pass a non-NULL non-device @owner then we will assert.
+ */
+void memory_region_init_ram_protected(MemoryRegion *mr,
+  Object *owner,
+  const char *name,
+  uint64_t size,
+  Error **errp);
+
 /**
  * memory_region_init_ram - Initialize RAM memory region.  Accesses into the
  *  region will modify memory directly.
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce7028..952c87277353 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -3551,16 +3551,18 @@ void mtree_info(bool flatview, bool dispatch_tree, bool 
owner, bool disabled)
 }
 }
 
-void memory_region_init_ram(MemoryRegion *mr,
-Object *owner,
-const char *name,
-uint64_t size,
-Error **errp)
+static void memory_region_init_ram_flags(MemoryRegion *mr,
+ Object *owner,
+ const char *name,
+ uint64_t size,
+ uint32_t ram_flags,
+ Error **errp)
 {
 DeviceState *owner_dev;
 Error *err = NULL;
 
-memory_region_init_ram_nomigrate(mr, owner, name, size, );
+memory_region_init_ram_flags_nomigrate(mr, owner, name, size, ram_flags,
+   );
 if (err) {
 error_propagate(errp, err);
 return;
@@ -3575,6 +3577,25 @@ void memory_region_init_ram(MemoryRegion *mr,
 vmstate_register_ram(mr, owner_dev);
 }
 
+void memory_region_init_ram_protected(MemoryRegion *mr,
+  Object *owner,
+  const char *name,
+  uint64_t size,
+  Error **errp)
+{
+memory_region_init_ram_flags(mr, owner, name, size, RAM_PROTECTED,
+ errp);
+}
+
+void memory_region_init_ram(MemoryRegion *mr,
+Object *owner,
+const char *name,
+uint64_t size,
+Error **errp)
+{
+memory_region_init_ram_flags(mr, owner, name, size, 0, errp);
+}
+
 void memory_region_init_rom(MemoryRegion *mr,
 Object *owner,
 const char *name,
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 6bdd944fe880..bf66c81e7255 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1978,7 +1978,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
ram_addr_t max_size,
 Error *local_err =

[PATCH 0/2] vhost-vdpa: skip TPM CRB memory section

2023-06-20 Thread Laurent Vivier

An error is reported for vhost-vdpa case:
qemu-kvm: vhost_vdpa_listener_region_add received unaligned region

Marc-André has proposed a fix to this problem by skipping
the memory region owned by the TPM CRB but it seems more generic
to skip not DMA-able memory.

We have a memory flag for that, RAM_PROTECTED.

This series expands the memory API to provide a way to initialize
a "protected" memory region and use it with the TPM CRB object.

For the previous discussions, see

https://lists.nongnu.org/archive/html/qemu-devel/2022-11/msg03670.html

and from Eric for VFIO:

https://lore.kernel.org/all/20220506132510.1847942-1-eric.au...@redhat.com/
https://lore.kernel.org/all/20220524091405.416256-1-eric.au...@redhat.com/

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2141965

Thanks,
Laurent

Laurent Vivier (2):
  memory: introduce memory_region_init_ram_protected()
  tpm_crb: mark memory as protected

 hw/tpm/tpm_crb.c  |  2 +-
 include/exec/memory.h | 33 +
 softmmu/memory.c  | 33 +++--
 softmmu/physmem.c |  4 ++--
 4 files changed, 63 insertions(+), 9 deletions(-)

-- 
2.41.0

[PATCH 2/2] tpm_crb: mark memory as protected

2023-06-20 Thread Laurent Vivier

This memory is not correctly aligned and cannot be registered
by vDPA and VFIO.

An error is reported for vhost-vdpa case:
qemu-kvm: vhost_vdpa_listener_region_add received unaligned region

To make it ignored by VFIO and vDPA devices, mark it as RAM_PROTECTED.

The RAM_PROTECTED flag has been introduced to skip memory
region that looks like RAM but is not accessible via normal
mechanims, including DMA.

See 56918a126a ("memory: Add RAM_PROTECTED flag to skip IOMMU mappings")

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2141965

cc: peter.mayd...@linaro.org
cc: marcandre.lur...@redhat.com
cc: eric.au...@redhat.com
cc: m...@redhat.com
cc: jasow...@redhat.com
Signed-off-by: Laurent Vivier 
---
 hw/tpm/tpm_crb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c
index ea930da545af..0a93c488f2fa 100644
--- a/hw/tpm/tpm_crb.c
+++ b/hw/tpm/tpm_crb.c
@@ -296,7 +296,7 @@ static void tpm_crb_realize(DeviceState *dev, Error **errp)
 
 memory_region_init_io(>mmio, OBJECT(s), _crb_memory_ops, s,
 "tpm-crb-mmio", sizeof(s->regs));
-memory_region_init_ram(>cmdmem, OBJECT(s),
+memory_region_init_ram_protected(>cmdmem, OBJECT(s),
 "tpm-crb-cmd", CRB_CTRL_CMD_SIZE, errp);
 
 memory_region_add_subregion(get_system_memory(),
-- 
2.41.0

Re: [PATCH v7] Emulate dip switch language layout settings on SUN keyboard

2023-06-20 Thread Henrik Carlqvist

On Tue, 20 Jun 2023 10:22:40 +0100
Daniel P. BerrangÃ©  wrote:

Thanks for your feedback!

> Assuming you have docutils installed, QEMU will build the manual by
> default and print any issues on console during build. You can point
> your browser to $BUILD/docs/manual/system/index.html to see the result.

It seems as if I have docutils version 0.17.1 installed. However the
build/docs directory only contains a symlink to the config directory in
../../docs after make is completed.

> For future reference, if you want to put some questions/notes in the
> submission, it is best to keep them separate from the commit message
> text, as the questions/notes shouldn't end up in git history. To
> separate them, put questions  immediately after the '---' that separate
> the commit message from the diffstat

Thanks! Will do...

> You need to remove the space between :ref: and `keyboard`.
> 
> You'll also need to add it to a ToC (table of contents) otherwise
> the build system complains.
> 
> I'd suggest putting the new file at docs/system/devices/keyboards.rst
> and adding to the ToC in docs/system/device-emulation.rst

I will update the .rst files and placements, hopefully the coming weekend and
come back with an updated patch. However, until I am able to build something
from those .rst files, I can only follow your instructions to finally get them
right.

Best regards Henrik

[PATCH v1 21/23] pc/q35: setup q35 for xen

2023-06-20 Thread Joel Upham

Mirrored the init done for piix devices when xen is being used.
This is needed for xen memory to be initialized and used with q35.

Signed-off-by: Joel Upham 
---
 hw/i386/pc_q35.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 789a23ce6b..0b53a86dd2 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -145,6 +145,7 @@ static void pc_q35_init(MachineState *machine)
 MemoryRegion *system_io = get_system_io();
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
+MemoryRegion *ram_memory;
 GSIState *gsi_state;
 ISABus *isa_bus;
 int i;
@@ -196,8 +197,12 @@ static void pc_q35_init(MachineState *machine)
 }
 
 pc_machine_init_sgx_epc(pcms);
-x86_cpus_init(x86ms, pcmc->default_cpu_version);
 
+x86_cpus_init(x86ms, pcmc->default_cpu_version);
+if (xen_enabled()) {
+xen_hvm_init_pc(pcms, _memory);
+machine->ram = ram_memory;
+}
 kvmclock_create(pcmc->kvmclock_create_always);
 
 /* pci enabled */
@@ -230,7 +235,15 @@ static void pc_q35_init(MachineState *machine)
 }
 
 /* allocate ram and load rom/bios */
-pc_memory_init(pcms, system_memory, rom_memory, pci_hole64_size);
+if (!xen_enabled()) 
+pc_memory_init(pcms, system_memory, rom_memory, pci_hole64_size);
+ else {
+pc_system_flash_cleanup_unused(pcms);
+if (machine->kernel_filename != NULL) {
+/* For xen HVM direct kernel boot, load linux here */
+xen_load_linux(pcms);
+}
+}
 
 object_property_add_child(OBJECT(machine), "q35", OBJECT(q35_host));
 object_property_set_link(OBJECT(q35_host), MCH_HOST_PROP_RAM_MEM,
@@ -307,7 +320,7 @@ static void pc_q35_init(MachineState *machine)
 
 assert(pcms->vmport != ON_OFF_AUTO__MAX);
 if (pcms->vmport == ON_OFF_AUTO_AUTO) {
-pcms->vmport = ON_OFF_AUTO_ON;
+pcms->vmport = xen_enabled() ? ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON;
 }
 
 /* init basic PC hardware */
-- 
2.34.1

[PATCH v1 12/23] xen/pt: allow to hide PCIe Extended Capabilities

2023-06-20 Thread Joel Upham

We need to hide some unwanted PCI/PCIe capabilities for passed through
devices.
Normally we do this by marking the capability register group
as XEN_PT_GRP_TYPE_HARDWIRED which exclude this capability from the
capability list and returns zeroes on attempts to read capability body.
Skipping the capability in the linked list of capabilities can be done
by changing Next Capability register to skip one or many unwanted
capabilities.

One difference between PCI and PCIe Extended capabilities is that we don't
have the list head field anymore. PCIe Extended capabilities always start
at offset 0x100 if they're present. Unfortunately, there are typically
only few PCIe extended capabilities present which means there is a chance
that some capability we want to hide will reside at offset 0x100 in PCIe
config space.

The simplest way to hide such capabilities from guest OS or drivers
is faking their capability ID value.

This patch adds the Capability ID register handler which checks
- if the capability to which this register belong starts at offset 0x100
  in PCIe config space
- if this capability is marked as XEN_PT_GRP_TYPE_HARDWIRED

If it is the case, then a fake Capability ID value is returned.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt.c | 11 ++-
 hw/xen/xen_pt.h |  4 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index f757978800..2399fabb2b 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -164,7 +164,16 @@ static uint32_t xen_pt_pci_read_config(PCIDevice *d, 
uint32_t addr, int len)
 reg_grp_entry = xen_pt_find_reg_grp(s, addr);
 if (reg_grp_entry) {
 /* check 0-Hardwired register group */
-if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED) {
+if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED &&
+/*
+ * For PCIe Extended Capabilities we need to emulate
+ * CapabilityID and NextCapability/Version registers for a
+ * hardwired reg group located at the offset 0x100 in PCIe
+ * config space. This allows us to hide the first extended
+ * capability as well.
+ */
+!(reg_grp_entry->base_offset == PCI_CONFIG_SPACE_SIZE &&
+ranges_overlap(addr, len, 0x100, 4))) {
 /* no need to emulate, just return 0 */
 val = 0;
 goto exit;
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index eb062be3f4..9a191cbc8f 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -93,6 +93,10 @@ typedef int (*xen_pt_conf_byte_read)
 
 #define XEN_PCI_INTEL_OPREGION 0xfc
 
+#define XEN_PCIE_CAP_ID 0
+#define XEN_PCIE_CAP_LIST_NEXT 2
+#define XEN_PCIE_FAKE_CAP_ID_BASE 0xFE00
+
 #define XEN_PCI_IGD_DOMAIN 0
 #define XEN_PCI_IGD_BUS 0
 #define XEN_PCI_IGD_DEV 2
-- 
2.34.1

[PATCH v1 15/23] xen/pt: add AER PCIe Extended Capability descriptor and sizing

2023-06-20 Thread Joel Upham

The patch provides Advanced Error Reporting PCIe Extended Capability
description structure and corresponding capability sizing function.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt_config_init.c | 72 +
 1 file changed, 72 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 69d8857c66..9fd0531bc4 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1861,6 +1861,70 @@ static int xen_pt_msix_size_init(XenPCIPassthroughState 
*s,
 }
 
 
+/* get Advanced Error Reporting Extended Capability register group size */
+#define PCI_ERR_CAP_TLP_PREFIX_LOG  (1U << 11)
+#define PCI_DEVCAP2_END_END_TLP_PREFIX  (1U << 21)
+static int xen_pt_ext_cap_aer_size_init(XenPCIPassthroughState *s,
+const XenPTRegGroupInfo *grp_reg,
+uint32_t base_offset,
+uint32_t *size)
+{
+uint8_t dev_type = get_pcie_device_type(s);
+uint32_t aer_caps = 0;
+uint32_t sz = 0;
+int pcie_cap_pos;
+uint32_t devcaps2;
+int ret = 0;
+
+pcie_cap_pos = xen_host_pci_find_next_cap(>real_device, 0,
+  PCI_CAP_ID_EXP);
+if (!pcie_cap_pos) {
+XEN_PT_ERR(>dev,
+   "Cannot find a required PCI Express Capability\n");
+return -1;
+}
+
+if (get_pcie_capability_version(s) > 1) {
+ret = xen_host_pci_get_long(>real_device,
+pcie_cap_pos + PCI_EXP_DEVCAP2,
+);
+if (ret) {
+XEN_PT_ERR(>dev, "Error while reading Device "
+   "Capabilities 2 Register \n");
+return -1;
+}
+}
+
+if (devcaps2 & PCI_DEVCAP2_END_END_TLP_PREFIX) {
+ret = xen_host_pci_get_long(>real_device,
+base_offset + PCI_ERR_CAP,
+_caps);
+if (ret) {
+XEN_PT_ERR(>dev,
+   "Error while reading AER Extended Capability\n");
+return -1;
+}
+
+if (aer_caps & PCI_ERR_CAP_TLP_PREFIX_LOG) {
+sz = 0x48;
+}
+}
+
+if (!sz) {
+if (dev_type == PCI_EXP_TYPE_ROOT_PORT ||
+dev_type == PCI_EXP_TYPE_RC_EC) {
+sz = 0x38;
+} else {
+sz = 0x2C;
+}
+}
+
+*size = sz;
+
+log_pcie_extended_cap(s, "AER", base_offset, *size);
+return ret;
+}
+
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
 /* Header Type0 reg group */
 {
@@ -2128,6 +2192,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
 .size_init  = xen_pt_reg_grp_size_init,
 .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
 },
+/* Advanced Error Reporting Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ERR),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = 0xFF,
+.size_init  = xen_pt_ext_cap_aer_size_init,
+.emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+},
 {
 .grp_size = 0,
 },
-- 
2.34.1

[PATCH v1 01/23] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices

2023-06-20 Thread Joel Upham

The primary difference in PCI device IRQ management between Xen HVM and
QEMU is that Xen PCI IRQs are "device-centric" while QEMU PCI IRQs are
"chipset-centric". Namely, Xen uses PCI device BDF and INTx as coordinates
to assert IRQ while QEMU finds out to which chipset PIRQ the IRQ is routed
through the hierarchy of PCI buses and manages IRQ assertion on chipset
side (as PIRQ inputs).

Two callback functions are used for this purpose: .map_irq and .set_irq
(named after corresponding structure fields). Corresponding Xen-specific
callback functions are piix3_set_irq() and pci_slot_get_pirq(). In Xen
case these functions do not operate on pirq pin numbers. Instead, they use
a specific value to pass BDF/INTx information between .map_irq and
.set_irq -- PCI device devfn and INTx pin number are combined into
pseudo-PIRQ in pci_slot_get_pirq, which piix3_set_irq later decodes back
into devfn and INTx number for passing to *set_pci_intx_level() call.

For Xen on Q35 this scheme is still applicable, with the exception that
function names are non-descriptive now and need to be renamed to show
their common i440/Q35 nature. Proposed new names are:

xen_pci_slot_get_pirq --> xen_cmn_pci_slot_get_pirq
xen_piix3_set_irq --> xen_cmn_set_irq

Another IRQ-related difference between i440 and Q35 is the number of PIRQ
inputs and PIRQ routers (PCI IRQ links in terms of ACPI) available. i440
has 4 PCI interrupt links, while Q35 has 8 (PIRQA...PIRQH).
Currently Xen have support for only 4 PCI links, so we describe only 4 of
8 PCI links in ACPI tables. Also, hvmloader disables PIRQ routing for
PIRQE..PIRQH by writing 80h into corresponding PIRQ[n]_ROUT registers.

All this PCI interrupt routing stuff is largely an ancient legacy from PIC
era. It's hardly worth to extend number of PCI links supported as we
normally deal with APIC mode and/or MSI interrupts.

The only useful thing to do with PIRQE..PIRQH routing currently is to
check if guest actually attempts to use it for some reason (despite ACPI
PCI routing information provided). In this case, a warning is logged.

Things have changed a bit in modern Qemu, and more changes to the IRQ
mapping had to be done inside the lpc_ich9 to write the irqs and setup
the mappings.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/i386/pc_piix.c |  3 +-
 hw/i386/xen/xen-hvm.c |  7 +++--
 hw/isa/lpc_ich9.c | 53 ---
 hw/isa/piix3.c|  2 +-
 include/hw/southbridge/ich9.h |  1 +
 include/hw/xen/xen.h  |  4 +--
 stubs/xen-hw-stub.c   |  4 +--
 7 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index d5b0dcd1fe..8c1b20f3bc 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -62,6 +62,7 @@
 #endif
 #include "hw/xen/xen-x86.h"
 #include "hw/xen/xen.h"
+#include "sysemu/xen.h"
 #include "migration/global_state.h"
 #include "migration/misc.h"
 #include "sysemu/numa.h"
@@ -233,7 +234,7 @@ static void pc_init1(MachineState *machine,
   x86ms->above_4g_mem_size,
   pci_memory, ram_memory);
 pci_bus_map_irqs(pci_bus,
- xen_enabled() ? xen_pci_slot_get_pirq
+ xen_enabled() ? xen_cmn_pci_slot_get_pirq
: pc_pci_slot_get_pirq);
 pcms->bus = pci_bus;
 
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 56641a550e..540ac46639 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -15,6 +15,7 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_host.h"
 #include "hw/i386/pc.h"
+#include "hw/southbridge/ich9.h"
 #include "hw/irq.h"
 #include "hw/hw.h"
 #include "hw/i386/apic-msidef.h"
@@ -136,14 +137,14 @@ typedef struct XenIOState {
 Notifier wakeup;
 } XenIOState;
 
-/* Xen specific function for piix pci */
+/* Xen-specific functions for pci dev IRQ handling */
 
-int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
+int xen_cmn_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
 {
 return irq_num + (PCI_SLOT(pci_dev->devfn) << 2);
 }
 
-void xen_piix3_set_irq(void *opaque, int irq_num, int level)
+void xen_cmn_set_irq(void *opaque, int irq_num, int level)
 {
 xen_set_pci_intx_level(xen_domid, 0, 0, irq_num >> 2,
irq_num & 3, level);
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 9c47a2f6c7..733a99d443 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -51,6 +51,9 @@
 #include "hw/core/cpu.h"
 #include "hw/nvram/fw_cfg.h"
 #include "qemu/cutils.h"
+#include "hw/xen/xen.h"
+#include "sysemu/xen.h"
+#include "hw/southbridge/piix.h"
 #include "hw/acpi/acpi_aml_interface.h"
 #include "trace.h"
 
@@ -535,11 +538,49 @@ static int ich9_lpc_post_load(void *opaque, int 
version_id)
 return 0;
 }
 
+static void ich9_lpc_config_write_xen(PCIDevice *d,
+  uint32_t addr,

[PATCH v1 1/1] Q35 Support

2023-06-20 Thread Joel Upham

---
 hw/acpi/ich9.c|   22 +-
 hw/acpi/pcihp.c   |6 +-
 hw/core/machine.c |   19 +
 hw/i386/pc_piix.c |3 +-
 hw/i386/pc_q35.c  |   39 +-
 hw/i386/xen/xen-hvm.c |7 +-
 hw/i386/xen/xen_platform.c|   19 +-
 hw/isa/lpc_ich9.c |   53 +-
 hw/isa/piix3.c|2 +-
 hw/pci-host/q35.c |   28 +-
 hw/pci/pci.c  |   17 +
 hw/xen/xen-host-pci-device.c  |  106 +++-
 hw/xen/xen-host-pci-device.h  |6 +-
 hw/xen/xen_pt.c   |   49 +-
 hw/xen/xen_pt.h   |   19 +-
 hw/xen/xen_pt_config_init.c   | 1103 ++---
 include/hw/acpi/ich9.h|1 +
 include/hw/acpi/pcihp.h   |2 +
 include/hw/boards.h   |1 +
 include/hw/i386/pc.h  |3 +
 include/hw/pci-host/q35.h |4 +-
 include/hw/pci/pci.h  |3 +
 include/hw/southbridge/ich9.h |1 +
 include/hw/xen/xen.h  |4 +-
 qemu-options.hx   |1 +
 softmmu/datadir.c |1 -
 softmmu/qdev-monitor.c|3 +-
 stubs/xen-hw-stub.c   |4 +-
 28 files changed, 1395 insertions(+), 131 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 25e2c7243e..234706a191 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -39,6 +39,8 @@
 #include "hw/southbridge/ich9.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/xen/xen.h"
+#include "sysemu/xen.h"
 
 //#define DEBUG
 
@@ -67,6 +69,10 @@ static void ich9_gpe_writeb(void *opaque, hwaddr addr, 
uint64_t val,
 ICH9LPCPMRegs *pm = opaque;
 acpi_gpe_ioport_writeb(>acpi_regs, addr, val);
 acpi_update_sci(>acpi_regs, pm->irq);
+
+if (xen_enabled()) {
+acpi_pcihp_reset(>acpi_pci_hotplug);
+}
 }
 
 static const MemoryRegionOps ich9_gpe_ops = {
@@ -137,7 +143,8 @@ static int ich9_pm_post_load(void *opaque, int version_id)
 {
 ICH9LPCPMRegs *pm = opaque;
 uint32_t pm_io_base = pm->pm_io_base;
-pm->pm_io_base = 0;
+if (!xen_enabled())
+pm->pm_io_base = 0;
 ich9_pm_iospace_update(pm, pm_io_base);
 return 0;
 }
@@ -268,7 +275,10 @@ static void pm_reset(void *opaque)
 acpi_pm1_evt_reset(>acpi_regs);
 acpi_pm1_cnt_reset(>acpi_regs);
 acpi_pm_tmr_reset(>acpi_regs);
-acpi_gpe_reset(>acpi_regs);
+/* Noticed guest freezing in xen when this was reset after S3. */
+if (!xen_enabled()) {
+acpi_gpe_reset(>acpi_regs);
+}
 
 pm->smi_en = 0;
 if (!pm->smm_enabled) {
@@ -316,7 +326,7 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, 
qemu_irq sci_irq)
 acpi_pm_tco_init(>tco_regs, >io);
 }
 
-if (pm->acpi_pci_hotplug.use_acpi_hotplug_bridge) {
+if (pm->acpi_pci_hotplug.use_acpi_hotplug_bridge || xen_enabled()) {
 acpi_pcihp_init(OBJECT(lpc_pci),
 >acpi_pci_hotplug,
 pci_get_bus(lpc_pci),
@@ -332,10 +342,14 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, 
qemu_irq sci_irq)
 pm->powerdown_notifier.notify = pm_powerdown_req;
 qemu_register_powerdown_notifier(>powerdown_notifier);
 
+if (xen_enabled()) {
+acpi_set_pci_info(true);
+}
+
 legacy_acpi_cpu_hotplug_init(pci_address_space_io(lpc_pci),
 OBJECT(lpc_pci), >gpe_cpu, ICH9_CPU_HOTPLUG_IO_BASE);
 
-if (pm->acpi_memory_hotplug.is_enabled) {
+if (pm->acpi_memory_hotplug.is_enabled || xen_enabled()) {
 acpi_memory_hotplug_init(pci_address_space_io(lpc_pci), 
OBJECT(lpc_pci),
  >acpi_memory_hotplug,
  ACPI_MEMORY_HOTPLUG_BASE);
diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index cdd6f775a1..5b065d670c 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -40,6 +40,7 @@
 #include "qapi/error.h"
 #include "qom/qom-qobject.h"
 #include "trace.h"
+#include "sysemu/xen.h"
 
 #define ACPI_PCIHP_SIZE 0x0018
 #define PCI_UP_BASE 0x
@@ -84,7 +85,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
 bool is_bridge = IS_PCI_BRIDGE(br);
 
 /* hotplugged bridges can't be described in ACPI ignore them */
-if (qbus_is_hotpluggable(BUS(bus))) {
+/* Xen requires hotplugging to the root device, even on the Q35 chipset */
+if (qbus_is_hotpluggable(BUS(bus)) || xen_enabled()) {
 if (!is_bridge || (!br->hotplugged && info->has_bridge_hotplug)) {
 bus_bsel = g_malloc(sizeof *bus_bsel);
 
@@ -97,7 +99,7 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
 return info;
 }
 
-static void acpi_set_pci_info(bool has_bridge_hotplug)
+void acpi_set_pci_info(bool has_bridge_hotplug)
 {
 static bool bsel_is_set;
 Object *host = acpi_get_i386_pci_host();
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 1000406211..703138d2ec 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -455,6 +455,20 @@ static void machine_set_graphics(Object

[PATCH v1 23/23] s3 support: enabling s3 with q35

2023-06-20 Thread Joel Upham

Resetting pci devices after s3 causes guest freezes, as xen usually
likes to handle resetting devices.


Signed-off-by: Joel Upham 
---
 hw/acpi/ich9.c| 12 
 hw/pci-host/q35.c |  3 ++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1c236be1c7..234706a191 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -143,7 +143,8 @@ static int ich9_pm_post_load(void *opaque, int version_id)
 {
 ICH9LPCPMRegs *pm = opaque;
 uint32_t pm_io_base = pm->pm_io_base;
-pm->pm_io_base = 0;
+if (!xen_enabled())
+pm->pm_io_base = 0;
 ich9_pm_iospace_update(pm, pm_io_base);
 return 0;
 }
@@ -274,7 +275,10 @@ static void pm_reset(void *opaque)
 acpi_pm1_evt_reset(>acpi_regs);
 acpi_pm1_cnt_reset(>acpi_regs);
 acpi_pm_tmr_reset(>acpi_regs);
-acpi_gpe_reset(>acpi_regs);
+/* Noticed guest freezing in xen when this was reset after S3. */
+if (!xen_enabled()) {
+acpi_gpe_reset(>acpi_regs);
+}
 
 pm->smi_en = 0;
 if (!pm->smm_enabled) {
@@ -322,7 +326,7 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, 
qemu_irq sci_irq)
 acpi_pm_tco_init(>tco_regs, >io);
 }
 
-if (pm->acpi_pci_hotplug.use_acpi_hotplug_bridge) {
+if (pm->acpi_pci_hotplug.use_acpi_hotplug_bridge || xen_enabled()) {
 acpi_pcihp_init(OBJECT(lpc_pci),
 >acpi_pci_hotplug,
 pci_get_bus(lpc_pci),
@@ -345,7 +349,7 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, 
qemu_irq sci_irq)
 legacy_acpi_cpu_hotplug_init(pci_address_space_io(lpc_pci),
 OBJECT(lpc_pci), >gpe_cpu, ICH9_CPU_HOTPLUG_IO_BASE);
 
-if (pm->acpi_memory_hotplug.is_enabled) {
+if (pm->acpi_memory_hotplug.is_enabled || xen_enabled()) {
 acpi_memory_hotplug_init(pci_address_space_io(lpc_pci), 
OBJECT(lpc_pci),
  >acpi_memory_hotplug,
  ACPI_MEMORY_HOTPLUG_BASE);
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 1fe4e5a5c9..5891839ce9 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -580,7 +580,8 @@ static void mch_reset(DeviceState *qdev)
 d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
 d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
 
-mch_update(mch);
+if (!xen_enabled())
+mch_update(mch);
 }
 
 static void mch_realize(PCIDevice *d, Error **errp)
-- 
2.34.1

[PATCH v1 20/23] xen platform: unplug ahci object

2023-06-20 Thread Joel Upham

This will unplug the ahci device when the Xen driver calls for an unplug.
This has been tested to work in linux and Windows guests.
When q35 is detected, we will remove the ahci controller
with the hard disks.  In the libxl config, cdrom devices
are put on a seperate ahci controller. This allows for 6 cdrom
devices to be added, and 6 qemu hard disks.


Signed-off-by: Joel Upham 
---
 hw/i386/xen/xen_platform.c | 19 ++-
 hw/pci/pci.c   | 17 +
 include/hw/pci/pci.h   |  3 +++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index 57f1d742c1..0375337222 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -34,6 +34,7 @@
 #include "sysemu/block-backend.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
+#include "include/hw/i386/pc.h"
 #include "qom/object.h"
 
 #ifdef CONFIG_XEN
@@ -223,6 +224,12 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
*opaque)
 if (flags & UNPLUG_NVME_DISKS) {
 object_unparent(OBJECT(d));
 }
+break;
+
+case PCI_CLASS_STORAGE_SATA:
+   if (!aux) {
+object_unparent(OBJECT(d));
+}
 
 default:
 break;
@@ -231,7 +238,17 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
*opaque)
 
 static void pci_unplug_disks(PCIBus *bus, uint32_t flags)
 {
-pci_for_each_device(bus, 0, unplug_disks, );
+PCIBus *q35 = find_q35();
+if (q35) {
+/* When q35 is detected, we will remove the ahci controller
+* with the hard disks.  In the libxl config, cdrom devices
+* are put on a seperate ahci controller. This allows for 6 cdrom
+* devices to be added, and 6 qemu hard disks.
+*/
+pci_function_for_one_bus(bus, unplug_disks, );
+} else {
+pci_for_each_device(bus, 0, unplug_disks, );
+}
 }
 
 static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, uint32_t 
val)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 1cc7c89036..8eac3d751a 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1815,6 +1815,23 @@ void pci_for_each_device_reverse(PCIBus *bus, int 
bus_num,
 }
 }
 
+void pci_function_for_one_bus(PCIBus *bus,
+  void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
+  void *opaque)
+{
+bus = pci_find_bus_nr(bus, 0);
+
+if (bus) {
+PCIDevice *d;
+
+d = bus->devices[PCI_DEVFN(4,0)];
+if (d) {
+fn(bus, d, opaque);
+return;
+}
+}
+}
+
 void pci_for_each_device_under_bus(PCIBus *bus,
pci_bus_dev_fn fn, void *opaque)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index e6d0574a29..c53e21082a 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -343,6 +343,9 @@ void pci_for_each_device_under_bus(PCIBus *bus,
 void pci_for_each_device_under_bus_reverse(PCIBus *bus,
pci_bus_dev_fn fn,
void *opaque);
+void pci_function_for_one_bus(PCIBus *bus,
+ void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
+ void *opaque);
 void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
   pci_bus_fn end, void *parent_state);
 PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
-- 
2.34.1

[PATCH v1 09/23] xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology check

2023-06-20 Thread Joel Upham

Compared to legacy i440 system, there are certain difficulties while
passing through PCIe devices to guest OSes like Windows 7 and above
on platforms with native support of PCIe bus (in our case Q35). This
problem is not applicable to older OSes like Windows XP -- PCIe
passthrough on such OSes can be used normally as these OSes have
no support for PCIe-specific features and treat all PCIe devices as legacy
PCI ones.

The problem manifests itself as "Code 10" error for a passed thru PCIe
device in Windows Device Manager (along with exclamation mark on it). The
device with such error do not function no matter the fact that Windows
successfully booted while actually using this device, ex. as a primary VGA
card with VBE features, LFB, etc. working properly during boot time.
It doesn't matter which PCI class the device have -- the problem is common
to GPUs, NIC cards, USB controllers, etc. In the same time, all these
devices can be passed thru successfully using i440 emulation on same
Windows 7+ OSes.

The actual root cause of the problem lies in the fact that Windows kernel
(PnP manager particularly) while processing StartDevice IRP refuses
to continue to start the device and control flow actually doesn't even
reach the IRP handler in the device driver at all. The real reason for
this typically does not appear at the time PnP manager tries to start the
device, but happens much earlier -- during the Windows boot stage, while
enumerating devices on a PCI/PCIe bus in the Windows pci.sys driver. There
is a set of checks for every discovered device on the PCIe bus. Failing
some of them leads to marking the discovered PCIe device as 'invalid'
by setting the flag. Later on, StartDevice attempt will fail due to this
flag, finally resulting in Code 10 error.

The actual check in pci.sys which results in the PCIe device being marked
as 'invalid' in our case is a validation of upstream PCIe bus hierarchy
to which passed through device belongs. Basically, pci.sys checks if the
PCIe device has parent devices, such as PCIe Root Port or upstream PCIe
switch. In our case the PCIe device has no parents and resides on bus
0 without eg. corresponding Root Port.

Therefore, in order to resolve this problem in a architecturally correct
way, we need to introduce to Xen some support of at least trivial non-flat
PCI bus hierarchy. In very simplest case - just one virtual Root Port,
on secondary bus of which all physical functions of the real passed thru
device will reside, eg. GPU and its HDAudio function.

This solution is not hard to implement technically, but there are multiple
affecting limitations present in Xen (many related to each other)
currently:

- in many places the code is limited to use bus 0 only. This applicable
  to both hypervisor and supplemental modules like hvmloader. This
  limitation is enforced on API level -- many functions and interfaces
  allow to specify only devfn argument while bus 0 being implied.

- lot of code assumes Type0 PCI config space layout only, while we need
  to handle Type1 PCI devices as well

- currently there no way to assign to a guest domain even a simplest
  linked hierarchy of passed thru PCI devices. In some cases we might need
  to passthrough a real PCIe Switch/Root Port with his downstream child
  devices.

- in a similar way Xen/hvmloader lacks the concept of IO/MMIO space
  nesting. Both code which does MMIO hole sizing and code which allocates
  BARs to MMIO hole have no idea of MMIO ranges nesting and their relations.
  In case of virtual Root Port we have basically an emulated PCI-PCI bridge
  with some parts of its MMIO range used for real MMIO ranges of passed
  through device(s).

So, adding to Xen multiple PCI buses support will require a bit of effort
and discussions regarding the actual design of the feature.  Nevertheless,
this task is crucial for PCI/GPU passthrough features of Xen to work
properly.

To summarize, we need to implement following things in the future:
1) Get rid of PCI bus 0 limitation everywhere. This could've been
  a simplest of subtasks but in reality this will require to change
  interfaces as well - AFAIR even adding a PCI device via QMP only allows
  to specify a device slot while we need to have some way to place the
  device on an arbitrary bus.

2) Fully or partially emulated PCI-PCI bridge which will provide
  a secondary bus for PCIe device placement - there might be a possibility
  to reuse some existing emulation QEMU provides. This also includes Type1
  devices support.
  The task will become more complicated if there arise necessity, for
  example, to control the PCIe link for a passed through PCIe device. As PT
  device reset is mandatory in most cases, there might be a chance
  to encounter a situation when we need to retrain the PCIe link to restore
  PCIe link speed after the reset. In this case there will be a need
  to selectively translate accesses to certain registers of emulated PCIe
  Switch/Root Port to the corresponding

[PATCH v1 11/23] xen/pt: handle PCIe Extended Capabilities Next register

2023-06-20 Thread Joel Upham

The patch adds new xen_pt_ext_cap_ptr_reg_init function which is used
to initialize the emulated next pcie extended capability pointer.

Primary mission of this function is to have a method to selectively hide
some extended capabilities from the capability linked list, skipping them
by altering the Next capability pointer value.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt_config_init.c | 87 +++--
 1 file changed, 55 insertions(+), 32 deletions(-)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 34ed9c25c5..ed36edbc4a 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -27,7 +27,10 @@
 
 static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s, XenPTRegInfo *reg,
uint32_t real_offset, uint32_t *data);
-
+static int xen_pt_ext_cap_ptr_reg_init(XenPCIPassthroughState *s,
+   XenPTRegInfo *reg,
+   uint32_t real_offset,
+   uint32_t *data);
 
 /* helper */
 
@@ -1928,48 +1931,68 @@ out:
 return 0;
 }
 
+#define PCIE_EXT_CAP_NEXT_SHIFT 4
+#define PCIE_EXT_CAP_VER_MASK   0xF
 
-/*
- * Main
- */
-
-static uint8_t find_cap_offset(XenPCIPassthroughState *s, uint8_t cap)
+static int xen_pt_ext_cap_ptr_reg_init(XenPCIPassthroughState *s,
+   XenPTRegInfo *reg,
+   uint32_t real_offset,
+   uint32_t *data)
 {
-uint8_t id;
-unsigned max_cap = XEN_PCI_CAP_MAX;
-uint8_t pos = PCI_CAPABILITY_LIST;
-uint8_t status = 0;
+int i, rc;
+XenHostPCIDevice *d = >real_device;
+uint16_t reg_field;
+uint16_t cur_offset, version, cap_id;
+uint32_t header;
 
-if (xen_host_pci_get_byte(>real_device, PCI_STATUS, )) {
-return 0;
-}
-if ((status & PCI_STATUS_CAP_LIST) == 0) {
-return 0;
+if (real_offset < 0x0010) {
+XEN_PT_ERR(>dev, "Incorrect PCIe extended capability offset "
+   "encountered: 0x%04x\n", real_offset);
+return -EINVAL;
 }
 
-while (max_cap--) {
-if (xen_host_pci_get_byte(>real_device, pos, )) {
-break;
-}
-if (pos < PCI_CONFIG_HEADER_SIZE) {
-break;
-}
+rc = xen_host_pci_get_word(d, real_offset, _field);
+if (rc)
+return rc;
 
-pos &= ~3;
-if (xen_host_pci_get_byte(>real_device,
-  pos + PCI_CAP_LIST_ID, )) {
-break;
-}
+/* preserve version field */
+version= reg_field & PCIE_EXT_CAP_VER_MASK;
+cur_offset = reg_field >> PCIE_EXT_CAP_NEXT_SHIFT;
 
-if (id == 0xff) {
-break;
+while (cur_offset && cur_offset != 0xFFF) {
+rc = xen_host_pci_get_long(d, cur_offset, );
+if (rc) {
+XEN_PT_ERR(>dev, "Failed to read PCIe extended capability "
+   "@0x%x (rc:%d)\n", cur_offset, rc);
+return rc;
 }
-if (id == cap) {
-return pos;
+
+cap_id = PCI_EXT_CAP_ID(header);
+
+for (i = 0; xen_pt_emu_reg_grps[i].grp_size != 0; i++) {
+uint32_t cur_grp_id = xen_pt_emu_reg_grps[i].grp_id;
+
+if (!IS_PCIE_EXT_CAP_ID(cur_grp_id))
+continue;
+
+if (xen_pt_hide_dev_cap(d, cur_grp_id))
+continue;
+
+if (GET_PCIE_EXT_CAP_ID(cur_grp_id) == cap_id) {
+if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU)
+goto out;
+
+/* skip TYPE_HARDWIRED capability, move the ptr to next one */
+break;
+}
 }
 
-pos += PCI_CAP_LIST_NEXT;
+/* next capability */
+cur_offset = PCI_EXT_CAP_NEXT(header);
 }
+
+out:
+*data = (cur_offset << PCIE_EXT_CAP_NEXT_SHIFT) | version;
 return 0;
 }
 
-- 
2.34.1

[PATCH v1 16/23] xen/pt: add descriptors and size calculation for RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities

2023-06-20 Thread Joel Upham

Add few more PCIe Extended Capabilities entries to the
xen_pt_emu_reg_grps[] array along with their corresponding *_size_init()
functions.

All these capabilities have non-fixed size but their size calculation
is very simple, hence adding them in a single batch.

For every capability register group, only 2 registers are emulated
currently: Capability ID (16 bit) and Next Capability Offset/Version (16
bit). Both needed to implement the selective capability hiding. All other
registers are passed through at the moment (unless they belong to
a capability marked as "hardwired" which is hidden)

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt_config_init.c | 224 
 1 file changed, 224 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 9fd0531bc4..1fba0b9d6c 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1925,6 +1925,174 @@ static int 
xen_pt_ext_cap_aer_size_init(XenPCIPassthroughState *s,
 return ret;
 }
 
+/* get Root Complex Link Declaration Extended Capability register group size */
+#define RCLD_GET_NUM_ENTRIES(x) (((x) >> 8) & 0xFF)
+static int xen_pt_ext_cap_rcld_size_init(XenPCIPassthroughState *s,
+ const XenPTRegGroupInfo *grp_reg,
+ uint32_t base_offset,
+ uint32_t *size)
+{
+uint32_t elem_self_descr = 0;
+
+int ret = xen_host_pci_get_long(>real_device,
+base_offset + 4,
+_self_descr);
+
+*size = 0x10 + RCLD_GET_NUM_ENTRIES(elem_self_descr) * 0x10;
+
+log_pcie_extended_cap(s, "Root Complex Link Declaration",
+  base_offset, *size);
+return ret;
+}
+
+/* get Access Control Services Extended Capability register group size */
+#define ACS_VECTOR_SIZE_BITS(x)x) >> 8) & 0xFF) ?: 256)
+static int xen_pt_ext_cap_acs_size_init(XenPCIPassthroughState *s,
+const XenPTRegGroupInfo *grp_reg,
+uint32_t base_offset,
+uint32_t *size)
+{
+uint16_t acs_caps = 0;
+
+int ret = xen_host_pci_get_word(>real_device,
+base_offset + PCI_ACS_CAP,
+_caps);
+
+if (acs_caps & PCI_ACS_EC) {
+uint32_t vector_sz = ACS_VECTOR_SIZE_BITS(acs_caps);
+
+*size = PCI_ACS_EGRESS_CTL_V + ((vector_sz + 7) & ~7) / 8;
+} else {
+*size = PCI_ACS_EGRESS_CTL_V;
+}
+
+log_pcie_extended_cap(s, "ACS", base_offset, *size);
+return ret;
+}
+
+/* get Multicast Extended Capability register group size */
+static int xen_pt_ext_cap_multicast_size_init(XenPCIPassthroughState *s,
+  const XenPTRegGroupInfo *grp_reg,
+  uint32_t base_offset,
+  uint32_t *size)
+{
+uint8_t dev_type = get_pcie_device_type(s);
+
+switch (dev_type) {
+case PCI_EXP_TYPE_ENDPOINT:
+case PCI_EXP_TYPE_LEG_END:
+case PCI_EXP_TYPE_RC_END:
+case PCI_EXP_TYPE_RC_EC:
+default:
+*size = PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF;
+break;
+
+case PCI_EXP_TYPE_ROOT_PORT:
+case PCI_EXP_TYPE_UPSTREAM:
+case PCI_EXP_TYPE_DOWNSTREAM:
+*size = 0x30;
+break;
+}
+
+log_pcie_extended_cap(s, "Multicast", base_offset, *size);
+return 0;
+}
+
+/* get Dynamic Power Allocation Extended Capability register group size */
+static int xen_pt_ext_cap_dpa_size_init(XenPCIPassthroughState *s,
+const XenPTRegGroupInfo *grp_reg,
+uint32_t base_offset,
+uint32_t *size)
+{
+uint32_t dpa_caps = 0;
+uint32_t num_entries;
+
+int ret = xen_host_pci_get_long(>real_device,
+base_offset + PCI_DPA_CAP,
+_caps);
+
+num_entries = (dpa_caps & PCI_DPA_CAP_SUBSTATE_MASK) + 1;
+
+*size = PCI_DPA_BASE_SIZEOF + num_entries /*byte-size registers*/;
+
+log_pcie_extended_cap(s, "Dynamic Power Allocation", base_offset, *size);
+return ret;
+}
+
+/* get TPH Requester Extended Capability register group size */
+static int xen_pt_ext_cap_tph_size_init(XenPCIPassthroughState *s,
+const XenPTRegGroupInfo *grp_reg,
+uint32_t base_offset,
+uint32_t *size)
+{
+uint32_t tph_caps = 0;
+uint32_t num_entries;
+
+int ret = xen_host_pci_get_long(>real_device,
+base_offset + PCI_TPH_CAP,
+_caps);
+
+

[PATCH v1 06/23] xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and PCIe Extended Capabilities enumeration

2023-06-20 Thread Joel Upham

This patch introduces 2 new functions,
- xen_host_pci_find_next_ext_cap (actually a reworked
  xen_host_pci_find_ext_cap_offset function which is unused)
- xen_host_pci_find_next_cap

These functions allow to search for PCI/PCIe capabilities in a uniform
way. Both functions allow to search either a specific capability or any
encountered next (by specifying CAP_ID_ANY as a capability ID) -- this may
be useful when we merely need to traverse the capability list one-by-one.
In both functions the 'pos' argument allows to continue searching from
last position (0 means to start from beginning).

In order not to probe PCIe Extended Capabilities existence every time,
xen_host_pci_find_next_ext_cap makes use of the new 'has_pcie_ext_caps'
field in XenHostPCIDevice structure which is filled only once (in
xen_host_pci_device_get).

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen-host-pci-device.c | 91 
 hw/xen/xen-host-pci-device.h |  5 +-
 2 files changed, 85 insertions(+), 11 deletions(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716..a7021a5d56 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -32,6 +32,7 @@
 
 #define IORESOURCE_PREFETCH 0x1000  /* No side effects */
 #define IORESOURCE_MEM_64   0x0010
+#define XEN_HOST_PCI_CAP_MAX48
 
 static void xen_host_pci_sysfs_path(const XenHostPCIDevice *d,
 const char *name, char *buf, ssize_t size)
@@ -198,6 +199,19 @@ static bool xen_host_pci_dev_is_virtfn(XenHostPCIDevice *d)
 return !stat(path, );
 }
 
+static bool xen_host_pci_dev_has_pcie_ext_caps(XenHostPCIDevice *d)
+{
+uint32_t header;
+
+if (xen_host_pci_get_long(d, PCI_CONFIG_SPACE_SIZE, ))
+return false;
+
+if (header == 0 || header == ~0U)
+return false;
+
+return true;
+}
+
 static void xen_host_pci_config_open(XenHostPCIDevice *d, Error **errp)
 {
 char path[PATH_MAX];
@@ -296,37 +310,93 @@ int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, 
uint8_t *buf, int len)
 return xen_host_pci_config_write(d, pos, buf, len);
 }
 
-int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, uint32_t cap)
+int xen_host_pci_find_next_ext_cap(XenHostPCIDevice *d, int pos, uint32_t cap)
 {
 uint32_t header = 0;
 int max_cap = XEN_HOST_PCI_MAX_EXT_CAP;
-int pos = PCI_CONFIG_SPACE_SIZE;
+
+if (!d->has_pcie_ext_caps)
+return 0;
+
+if (!pos) {
+pos = PCI_CONFIG_SPACE_SIZE;
+} else {
+if (xen_host_pci_get_long(d, pos, ))
+return 0;
+
+pos = PCI_EXT_CAP_NEXT(header);
+}
 
 do {
+if (!pos || pos < PCI_CONFIG_SPACE_SIZE) {
+break;
+}
+
 if (xen_host_pci_get_long(d, pos, )) {
 break;
 }
 /*
  * If we have no capabilities, this is indicated by cap ID,
  * cap version and next pointer all being 0.
+* Also check for all F's returned (which means PCIe ext conf space
+* is unreadable for some reason)
  */
-if (header == 0) {
+   if (header == 0 || header == ~0U) {
 break;
 }
 
-if (PCI_EXT_CAP_ID(header) == cap) {
+if (cap == CAP_ID_ANY) {
+return pos;
+} else if (PCI_EXT_CAP_ID(header) == cap) {
 return pos;
 }
 
 pos = PCI_EXT_CAP_NEXT(header);
-if (pos < PCI_CONFIG_SPACE_SIZE) {
+} while (--max_cap);
+
+return 0;
+}
+
+int xen_host_pci_find_next_cap(XenHostPCIDevice *d, int pos, uint32_t cap)
+{
+uint8_t id;
+unsigned max_cap = XEN_HOST_PCI_CAP_MAX;
+uint8_t status = 0;
+uint8_t curpos;
+
+if (xen_host_pci_get_byte(d, PCI_STATUS, ))
+return 0;
+
+if ((status & PCI_STATUS_CAP_LIST) == 0)
+return 0;
+
+if (pos < PCI_CAPABILITY_LIST) {
+curpos = PCI_CAPABILITY_LIST;
+} else {
+curpos = (uint8_t) pos;
+}
+
+while (max_cap--) {
+if (xen_host_pci_get_byte(d, curpos, ))
+ break;
+if (!curpos)
+ break;
+
+if (cap == CAP_ID_ANY)
+return curpos;
+
+if (xen_host_pci_get_byte(d, curpos + PCI_CAP_LIST_ID, ))
 break;
-}
 
-max_cap--;
-} while (max_cap > 0);
+if (id == 0xff)
+break;
+else if (id == cap)
+return curpos;
+
+curpos += PCI_CAP_LIST_NEXT;
+}
 
-return -1;
+return 0;
 }
 
 void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
@@ -376,7 +446,8 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 }
 d->class_code = v;
 
-d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
+d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
+d->has_pcie_ext_caps = xen_host_pci_dev_has_pcie_ext_caps(d);
 
 return;
 
diff --git

[PATCH v1 22/23] qdev-monitor/pt: bypass root device check

2023-06-20 Thread Joel Upham

On xen we need to be able to have hotpluggable root devices,
even on Q35 at the moment. Having this check disables PT of
devices, so lets turn it off for now.

Signed-off-by: Joel Upham 
---
 softmmu/qdev-monitor.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index b8d2c4dadd..f57dfa1964 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -43,6 +43,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/clock.h"
 #include "hw/boards.h"
+#include "sysemu/xen.h"
 
 /*
  * Aliases were a bad idea from the start.  Let's keep them
@@ -663,7 +664,8 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
 return NULL;
 }
 
-if (phase_check(PHASE_MACHINE_READY) && bus && !qbus_is_hotpluggable(bus)) 
{
+if (phase_check(PHASE_MACHINE_READY) && bus && !qbus_is_hotpluggable(bus)
+&& !xen_enabled()) {
 error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name);
 return NULL;
 }
-- 
2.34.1

[PATCH v1 05/23] q35: Fix incorrect values for PCIEXBAR masks

2023-06-20 Thread Joel Upham

There are two small issues in PCIEXBAR address mask handling:
- wrong bit positions for address mask bits (see PCIEXBAR description
  in Q35 datasheet)
- incorrect usage of 64ADR_MASK

Due to this, attempting to write a valid PCIEXBAR address may cause it to
shift to another address, causing memory layout corruption where emulated
MMIO regions may overlap real (passed through) MMIO ranges. Fix this
by providing correct values.

I included the xen_enabled() check as I did not want to impact current
use cases that are not xen related (if they are not seeing a problem).

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/pci-host/q35.c | 16 +---
 include/hw/pci-host/q35.h |  4 ++--
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index fe5fc0f47c..1fe4e5a5c9 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -37,6 +37,7 @@
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "qemu/module.h"
+#include "sysemu/xen.h"
 
 /
  * Q35 host
@@ -324,12 +325,21 @@ static void mch_update_pciexbar(MCHPCIState *mch)
 break;
 case MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_128M:
 length = 128 * 1024 * 1024;
-addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK |
-MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK;
+   if (!xen_enabled()) {
+addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK |
+MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK;
+   } else {
+addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK;
+}
 break;
 case MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_64M:
 length = 64 * 1024 * 1024;
-addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK;
+   if (!xen_enabled()) {
+addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK;
+   } else {
+addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK |
+MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK;
+}
 break;
 case MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_RVD:
 qemu_log_mask(LOG_GUEST_ERROR, "Q35: Reserved PCIEXBAR LENGTH\n");
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index e89329c51e..441cce6ccd 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -105,8 +105,8 @@ struct Q35PCIHost {
 #define MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT   0xb000
 #define MCH_HOST_BRIDGE_PCIEXBAR_MAX   (0x1000) /* 256M */
 #define MCH_HOST_BRIDGE_PCIEXBAR_ADMSK Q35_MASK(64, 35, 28)
-#define MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK  ((uint64_t)(1 << 26))
-#define MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK   ((uint64_t)(1 << 25))
+#define MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK  ((uint64_t)(1 << 27))
+#define MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK   ((uint64_t)(1 << 26))
 #define MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_MASK   ((uint64_t)(0x3 << 1))
 #define MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_256M   ((uint64_t)(0x0 << 1))
 #define MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_128M   ((uint64_t)(0x1 << 1))
-- 
2.34.1

[PATCH v1 07/23] xen/pt: avoid reading PCIe device type and cap version multiple times

2023-06-20 Thread Joel Upham

xen_pt_config_init.c reads Device/Port Type and Capability version fields
in many places. Two functions are used for this purpose:
get_capability_version and get_device_type. These functions perform PCI
conf space reading every time they're called. Another bad thing is that
these functions know nothing about where PCI Expess Capability is located,
so its offset must be provided explicitly in function arguments. Their
typical usage is like this:
uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
uint8_t dev_type = get_device_type(s, real_offset - reg->offset);

To avoid this, the PCI Express Capability register now being read only
once and stored in  XenHostPCIDevice structure (pcie_flags field). The
capabiliy offset parameter is no longer needed, simplifying functions
usage. Also, get_device_type and get_capability_version were renamed
to more descriptive get_pcie_device_type and get_pcie_capability_version.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen-host-pci-device.c | 15 +++
 hw/xen/xen-host-pci-device.h |  1 +
 hw/xen/xen_pt_config_init.c  | 34 ++
 3 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index a7021a5d56..63481a859e 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -405,6 +405,7 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 {
 ERRP_GUARD();
 unsigned int v;
+int pcie_cap_pos;
 
 d->config_fd = -1;
 d->domain = domain;
@@ -449,6 +450,20 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
 d->has_pcie_ext_caps = xen_host_pci_dev_has_pcie_ext_caps(d);
 
+/* read and store PCIe Capabilities field for later use */
+pcie_cap_pos = xen_host_pci_find_next_cap(d, 0, PCI_CAP_ID_EXP);
+
+if (pcie_cap_pos) {
+if (xen_host_pci_get_word(d, pcie_cap_pos + PCI_EXP_FLAGS,
+  >pcie_flags)) {
+error_setg(errp, "Unable to read from PCI Express capability "
+   "structure at 0x%x", pcie_cap_pos);
+goto error;
+}
+} else {
+d->pcie_flags = 0x;
+}
+
 return;
 
 error:
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 37c5614a24..2884c4b4b9 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -27,6 +27,7 @@ typedef struct XenHostPCIDevice {
 uint16_t device_id;
 uint32_t class_code;
 int irq;
+uint16_t pcie_flags;
 
 XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
 XenHostPCIIORegion rom;
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 2b8680b112..47c8482f32 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -832,24 +832,18 @@ static XenPTRegInfo xen_pt_emu_reg_vendor[] = {
  * PCI Express Capability
  */
 
-static inline uint8_t get_capability_version(XenPCIPassthroughState *s,
- uint32_t offset)
+static inline uint8_t get_pcie_capability_version(XenPCIPassthroughState *s)
 {
-uint8_t flag;
-if (xen_host_pci_get_byte(>real_device, offset + PCI_EXP_FLAGS, )) 
{
-return 0;
-}
-return flag & PCI_EXP_FLAGS_VERS;
+assert(s->real_device.pcie_flags != 0x);
+
+return (uint8_t) (s->real_device.pcie_flags & PCI_EXP_FLAGS_VERS);
 }
 
-static inline uint8_t get_device_type(XenPCIPassthroughState *s,
-  uint32_t offset)
+static inline uint8_t get_pcie_device_type(XenPCIPassthroughState *s)
 {
-uint8_t flag;
-if (xen_host_pci_get_byte(>real_device, offset + PCI_EXP_FLAGS, )) 
{
-return 0;
-}
-return (flag & PCI_EXP_FLAGS_TYPE) >> 4;
+assert(s->real_device.pcie_flags != 0x);
+
+return (uint8_t) ((s->real_device.pcie_flags & PCI_EXP_FLAGS_TYPE) >> 4);
 }
 
 /* initialize Link Control register */
@@ -857,8 +851,8 @@ static int xen_pt_linkctrl_reg_init(XenPCIPassthroughState 
*s,
 XenPTRegInfo *reg, uint32_t real_offset,
 uint32_t *data)
 {
-uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
-uint8_t dev_type = get_device_type(s, real_offset - reg->offset);
+uint8_t cap_ver  = get_pcie_capability_version(s);
+uint8_t dev_type = get_pcie_device_type(s);
 
 /* no need to initialize in case of Root Complex Integrated Endpoint
  * with cap_ver 1.x
@@ -875,7 +869,7 @@ static int xen_pt_devctrl2_reg_init(XenPCIPassthroughState 
*s,
 XenPTRegInfo *reg, uint32_t real_offset,
 uint32_t *data)
 {
-uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+uint8_t cap_ver =

[PATCH v1 17/23] xen/pt: add Resizable BAR PCIe Extended Capability descriptor and sizing

2023-06-20 Thread Joel Upham

Unlike other PCIe Extended Capabilities, we currently cannot allow attempts
to use Resizable BAR Capability. Without specifically handling BAR resizing
we're likely end up with corrupted MMIO hole layout if guest OS will
attempt to use this feature. Actually, recent Windows versions started
to understand and use the Resizable BAR Capability (see [1]).

For now, we need to hide the Resizable BAR Capability from guest OS until
BAR resizing emulation support will be implemented in Xen. This support
is a pretty much mandatory todo-feature as the effect of writing
to Resizable BAR control registers can be considered similar
to reprogramming normal BAR registers -- i.e. this needs to be handled
explicitly, resulting in corresponding MMIO BAR range(s) remapping.
Until then, we mark the Resizable BAR Capability as
XEN_PT_GRP_TYPE_HARDWIRED.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt_config_init.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 1fba0b9d6c..c5157ee3ee 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -2093,6 +2093,27 @@ static int 
xen_pt_ext_cap_pmux_size_init(XenPCIPassthroughState *s,
 return ret;
 }
 
+/* get Resizable BAR Extended Capability register group size */
+static int xen_pt_ext_cap_rebar_size_init(XenPCIPassthroughState *s,
+  const XenPTRegGroupInfo *grp_reg,
+  uint32_t base_offset,
+  uint32_t *size)
+{
+uint32_t rebar_ctl = 0;
+uint32_t num_entries;
+
+int ret = xen_host_pci_get_long(>real_device,
+base_offset + PCI_REBAR_CTRL,
+_ctl);
+num_entries =
+(rebar_ctl & PCI_REBAR_CTRL_NBAR_MASK) >> PCI_REBAR_CTRL_NBAR_SHIFT;
+
+*size = num_entries*8 + 4;
+
+log_pcie_extended_cap(s, "Resizable BAR", base_offset, *size);
+return ret;
+}
+
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
 /* Header Type0 reg group */
 {
@@ -2424,6 +2445,13 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
 .size_init  = xen_pt_ext_cap_dpc_size_init,
 .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
 },
+/* Resizable BAR Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_REBAR),
+.grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+.grp_size   = 0xFF,
+.size_init  = xen_pt_ext_cap_rebar_size_init,
+},
 {
 .grp_size = 0,
 },
-- 
2.34.1

[PATCH v1 18/23] xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors and sizing

2023-06-20 Thread Joel Upham

Virtual Channel/MFVC capabilities are relatively useless for emulation
(passing through accesses to them should be enough in most cases) yet they
have hardest format of all PCIe Extended Capabilities, mostly because
VC capability format allows the sparse config space layout with gaps
between the parts which make up the VC capability.

We have the main capability body followed by variable number of entries
where each entry may additionally reference the arbitration table outside
main capability body. There are no constrains on these arbitration table
offsets -- in theory, they may reside outside the VC capability range
anywhere in PCIe extended config space. Also, every arbitration table size
is not fixed - it depends on current VC/Port Arbitration Select field
value.

To simplify things, this patch assume that changing VC/Port Arbitration
Select value (i.e. resizing arbitration tables) do not cause arbitration
table offsets to change. Normally the device must place arbitration tables
considering their maximum size, not current one. Maximum arbitration table
size depends on VC/Port Arbitration Capability bitmask -- this is what
actually used to calculate the arbitration table size.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt_config_init.c | 191 
 1 file changed, 191 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index c5157ee3ee..4e14adf2b2 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -2114,6 +2114,173 @@ static int 
xen_pt_ext_cap_rebar_size_init(XenPCIPassthroughState *s,
 return ret;
 }
 
+/* get VC/VC9/MFVC Extended Capability register group size */
+static uint32_t get_arb_table_len_max(XenPCIPassthroughState *s,
+  uint32_t max_bit_supported,
+  uint32_t arb_cap)
+{
+int n_bit;
+uint32_t table_max_size = 0;
+
+if (!arb_cap) {
+return 0;
+}
+
+for (n_bit = 7; n_bit >= 0 && !(arb_cap & (1 << n_bit)); n_bit--);
+
+if (n_bit > max_bit_supported) {
+XEN_PT_ERR(>dev, "Warning: encountered unknown VC arbitration "
+   "capability supported: 0x%02x\n", (uint8_t) arb_cap);
+}
+
+switch (n_bit) {
+case 0: break;
+case 1: return 32;
+case 2: return 64;
+case 3: /*128 too*/
+case 4: return 128;
+default:
+table_max_size = 8 << n_bit;
+}
+
+return table_max_size;
+}
+
+#define GET_ARB_TABLE_OFFSET(x)   (((x) >> 24) * 0x10)
+#define GET_VC_ARB_CAPABILITY(x)  ((x) & 0xFF)
+#define ARB_TABLE_ENTRY_SIZE_BITS(x)  (1 << (((x) & PCI_VC_CAP1_ARB_SIZE)\
+  >> 10))
+static int xen_pt_ext_cap_vchan_size_init(XenPCIPassthroughState *s,
+  const XenPTRegGroupInfo *grp_reg,
+  uint32_t base_offset,
+  uint32_t *size)
+{
+uint32_t header;
+uint32_t vc_cap_max_size = PCIE_CONFIG_SPACE_SIZE - base_offset;
+uint32_t next_ptr;
+uint32_t arb_table_start_max = 0, arb_table_end_max = 0;
+uint32_t port_vc_cap1, port_vc_cap2, vc_rsrc_cap;
+uint32_t ext_vc_count = 0;
+uint32_t arb_table_entry_size;  /* in bits */
+const char *cap_name;
+int ret;
+int i;
+
+ret = xen_host_pci_get_long(>real_device, base_offset, );
+if (ret) {
+goto err_read;
+}
+
+next_ptr = PCI_EXT_CAP_NEXT(header);
+
+switch (PCI_EXT_CAP_ID(header)) {
+case PCI_EXT_CAP_ID_VC:
+case PCI_EXT_CAP_ID_VC9:
+cap_name = "Virtual Channel";
+break;
+case PCI_EXT_CAP_ID_MFVC:
+cap_name = "Multi-Function VC";
+break;
+default:
+XEN_PT_ERR(>dev, "Unknown VC Extended Capability ID "
+   "encountered: 0x%04x\n", PCI_EXT_CAP_ID(header));
+return -1;
+}
+
+if (next_ptr && next_ptr > base_offset) {
+vc_cap_max_size = next_ptr - base_offset;
+}
+
+ret = xen_host_pci_get_long(>real_device,
+base_offset + PCI_VC_PORT_CAP1,
+_vc_cap1);
+if (ret) {
+goto err_read;
+}
+
+ret = xen_host_pci_get_long(>real_device,
+base_offset + PCI_VC_PORT_CAP2,
+_vc_cap2);
+if (ret) {
+goto err_read;
+}
+
+ext_vc_count = port_vc_cap1 & PCI_VC_CAP1_EVCC;
+
+arb_table_start_max = GET_ARB_TABLE_OFFSET(port_vc_cap2);
+
+/* check arbitration table offset for validity */
+if (arb_table_start_max >= vc_cap_max_size) {
+XEN_PT_ERR(>dev, "Warning: VC arbitration table offset points "
+   "outside the expected range: %#04x\n",
+   (uint16_t) arb_table_start_max);
+/* skip this arbitration table */
+arb_table_start_max = 0;

[PATCH v1 04/23] q35/xen: Add Xen platform device support for Q35

2023-06-20 Thread Joel Upham

Current Xen/QEMU method to control Xen Platform device on i440 is a bit
odd -- enabling/disabling Xen platform device actually modifies the QEMU
emulated machine type, namely xenfv <--> pc.

In order to avoid multiplying machine types, use a new way to control Xen
Platform device for QEMU -- "xen-platform-dev" machine property (bool).
To maintain backward compatibility with existing Xen/QEMU setups, this
is only applicable to q35 machine currently. i440 emulation still uses the
old method (i.e. xenfv/pc machine selection) to control Xen Platform
device, this may be changed later to xen-platform-dev property as well.

This way we can use a single machine type (q35) and change just
xen-platform-dev value to on/off to control Xen platform device.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/core/machine.c   | 19 +++
 hw/i386/pc_q35.c| 20 +++-
 include/hw/boards.h |  1 +
 qemu-options.hx |  1 +
 4 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 1000406211..703138d2ec 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -455,6 +455,20 @@ static void machine_set_graphics(Object *obj, bool value, 
Error **errp)
 ms->enable_graphics = value;
 }
 
+static bool machine_get_xen_platform_dev(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->xen_platform_dev;
+}
+
+static void machine_set_xen_platform_dev(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->xen_platform_dev = value;
+}
+
 static char *machine_get_firmware(Object *obj, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
@@ -1004,6 +1018,11 @@ static void machine_class_init(ObjectClass *oc, void 
*data)
 object_class_property_set_description(oc, "graphics",
 "Set on/off to enable/disable graphics emulation");
 
+object_class_property_add_bool(oc, "xen-platform-dev",
+machine_get_xen_platform_dev, machine_set_xen_platform_dev);
+object_class_property_set_description(oc, "xen-platform-dev",
+"Set on/off to enable/disable Xen Platform device");
+
 object_class_property_add_str(oc, "firmware",
 machine_get_firmware, machine_set_firmware);
 object_class_property_set_description(oc, "firmware",
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 6155427e48..789a23ce6b 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -57,10 +57,24 @@
 #include "hw/hyperv/vmbus-bridge.h"
 #include "hw/mem/nvdimm.h"
 #include "hw/i386/acpi-build.h"
+#include "hw/xen/xen-x86.h"
+#include "sysemu/xen.h"
 
 /* ICH9 AHCI has 6 ports */
 #define MAX_SATA_PORTS 6
 
+static void q35_xen_hvm_init(MachineState *machine)
+{
+PCMachineState *pcms = PC_MACHINE(machine);
+
+if (xen_enabled()) {
+/* check if Xen Platform device is enabled */
+if (machine->xen_platform_dev) {
+pci_create_simple(pcms->bus, -1, "xen-platform");
+}
+}
+}
+
 struct ehci_companions {
 const char *name;
 int func;
@@ -273,8 +287,12 @@ static void pc_q35_init(MachineState *machine)
 for (i = 0; i < IOAPIC_NUM_PINS; i++) {
 qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, x86ms->gsi[i]);
 }
-isa_bus = ISA_BUS(qdev_get_child_bus(lpc_dev, "isa.0"));
 
+if (xen_enabled()) {
+q35_xen_hvm_init(machine);
+}
+
+isa_bus = ISA_BUS(qdev_get_child_bus(lpc_dev, "isa.0"));
 if (x86ms->pic == ON_OFF_AUTO_ON || x86ms->pic == ON_OFF_AUTO_AUTO) {
 pc_i8259_create(isa_bus, gsi_state->i8259_irq);
 }
diff --git a/include/hw/boards.h b/include/hw/boards.h
index a385010909..0b021f0764 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -339,6 +339,7 @@ struct MachineState {
 bool mem_merge;
 bool usb;
 bool usb_disabled;
+bool xen_platform_dev;
 char *firmware;
 bool iommu;
 bool suppress_vmdesc;
diff --git a/qemu-options.hx b/qemu-options.hx
index b37eb9662b..ea018257da 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "vmport=on|off|auto controls emulation of vmport (default: 
auto)\n"
 "dump-guest-core=on|off include guest memory in a core 
dump (default=on)\n"
 "mem-merge=on|off controls memory merge support (default: 
on)\n"
+"xen-platform-dev=on|off controls Xen Platform device 
(default=off)\n"
 "aes-key-wrap=on|off controls support for AES key wrapping 
(default=on)\n"
 "dea-key-wrap=on|off controls support for DEA key wrapping 
(default=on)\n"
 "suppress-vmdesc=on|off disables self-describing migration 
(default=off)\n"
-- 
2.34.1

[PATCH v1 08/23] xen/pt: determine the legacy/PCIe mode for a passed through device

2023-06-20 Thread Joel Upham

Even if we have some real PCIe device being passed through to a guest,
there are situations when we cannot use its PCIe features, primarily
allowing to access extended (>256) config space.

Basically, we can allow reading PCIe extended config space only if both
the device and emulated system are PCIe-capable. So it's a combination
of checks:
- PCI Express capability presence
- pci_is_express(device)
- pci_bus_is_express(device bus)

The AND-product of these checks is stored to pcie_enabled_dev flag
in XenPCIPassthroughState for later use in functions like
xen_pt_pci_config_access_check.

This way we get consistent behavior when the same PCIe device being passed
through to either i440 domain or Q35 one.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt.c | 28 ++--
 hw/xen/xen_pt.h |  1 +
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index a540149639..65c5516ef4 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -701,6 +701,21 @@ static const MemoryListener xen_pt_io_listener = {
 .priority = 10,
 };
 
+static inline bool xen_pt_dev_is_pcie_mode(PCIDevice *d)
+{
+XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
+PCIBus *bus = pci_get_bus(d);
+
+if (bus != NULL) {
+if (pci_is_express(d) && pci_bus_is_express(bus) &&
+xen_host_pci_find_next_cap(>real_device, 0, PCI_CAP_ID_EXP)) {
+return true;
+}
+}
+
+return false;
+}
+
 /* destroy. */
 static void xen_pt_destroy(PCIDevice *d) {
 
@@ -787,8 +802,17 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
s->real_device.dev, s->real_device.func);
 }
 
-/* Initialize virtualized PCI configuration (Extended 256 Bytes) */
-memset(d->config, 0, PCI_CONFIG_SPACE_SIZE);
+s->pcie_enabled_dev = xen_pt_dev_is_pcie_mode(d);
+if (s->pcie_enabled_dev) {
+XEN_PT_LOG(d, "Host device %04x:%02x:%02x.%d passed thru "
+   "in PCIe mode\n", s->real_device.domain,
+s->real_device.bus, s->real_device.dev,
+s->real_device.func);
+}
+
+/* Initialize virtualized PCI configuration space (256/4K bytes) */
+memset(d->config, 0, pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE
+   : PCI_CONFIG_SPACE_SIZE);
 
 s->memory_listener = xen_pt_memory_listener;
 s->io_listener = xen_pt_io_listener;
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index b20744f7c7..1c9cd6b615 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -234,6 +234,7 @@ struct XenPCIPassthroughState {
 
 PCIHostDeviceAddress hostaddr;
 bool is_virtfn;
+bool pcie_enabled_dev;
 bool permissive;
 bool permissive_warned;
 XenHostPCIDevice real_device;
-- 
2.34.1

[PATCH v1 13/23] xen/pt: add Vendor-specific PCIe Extended Capability descriptor and sizing

2023-06-20 Thread Joel Upham

The patch provides Vendor-specific PCIe Extended Capability description
structure and corresponding sizing function. In this particular case the
size of the Vendor capability is available in the VSEC Length field.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt_config_init.c | 71 -
 1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index ed36edbc4a..20b5561d25 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -124,6 +124,17 @@ static uint32_t get_throughable_mask(const 
XenPCIPassthroughState *s,
 return throughable_mask & valid_mask;
 }
 
+static void log_pcie_extended_cap(XenPCIPassthroughState *s,
+  const char *cap_name,
+  uint32_t base_offset, uint32_t size)
+{
+if (size) {
+XEN_PT_LOG(>dev, "Found PCIe Extended Capability: %s at 0x%04x, "
+"size 0x%x bytes\n", cap_name,
+(uint16_t) base_offset, size);
+}
+}
+
 /
  * general register functions
  */
@@ -1622,6 +1633,42 @@ static XenPTRegInfo xen_pt_emu_reg_igd_opregion[] = {
 },
 };
 
+/* Vendor-specific Ext Capability Structure reg static information table */
+static XenPTRegInfo xen_pt_ext_cap_emu_reg_vendor[] = {
+{
+.offset = XEN_PCIE_CAP_ID,
+.size   = 2,
+.init_val   = 0x,
+.ro_mask= 0x,
+.emu_mask   = 0x,
+.init   = xen_pt_ext_cap_capid_reg_init,
+.u.w.read   = xen_pt_word_reg_read,
+.u.w.write  = xen_pt_word_reg_write,
+},
+{
+.offset = XEN_PCIE_CAP_LIST_NEXT,
+.size   = 2,
+.init_val   = 0x,
+.ro_mask= 0x,
+.emu_mask   = 0x,
+.init   = xen_pt_ext_cap_ptr_reg_init,
+.u.w.read   = xen_pt_word_reg_read,
+.u.w.write  = xen_pt_word_reg_write,
+},
+{
+.offset = PCI_VNDR_HEADER,
+.size   = 4,
+.init_val   = 0x,
+.ro_mask= 0x,
+.emu_mask   = 0x,
+.init   = xen_pt_common_reg_init,
+.u.dw.read  = xen_pt_long_reg_read,
+.u.dw.write = xen_pt_long_reg_write,
+},
+{
+.size = 0,
+},
+};
 /
  * Capabilities
  */
@@ -1647,9 +1694,23 @@ static int 
xen_pt_vendor_size_init(XenPCIPassthroughState *s,
 return ret;
 }
 
+static int xen_pt_ext_cap_vendor_size_init(XenPCIPassthroughState *s,
+   const XenPTRegGroupInfo *grp_reg,
+   uint32_t base_offset,
+   uint32_t *size)
 {
-return xen_host_pci_get_byte(>real_device, base_offset + 0x02, size);
+uint32_t vsec_hdr = 0;
+int ret = xen_host_pci_get_long(>real_device,
+base_offset + PCI_VNDR_HEADER,
+_hdr);
+
+*size = PCI_VNDR_HEADER_LEN(vsec_hdr);
+
+log_pcie_extended_cap(s, "Vendor-specific", base_offset, *size);
+
+return ret;
 }
+
 /* get PCI Express Capability Structure register group size */
 static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
  const XenPTRegGroupInfo *grp_reg,
@@ -1876,6 +1937,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
 .size_init   = xen_pt_reg_grp_size_init,
 .emu_regs= xen_pt_emu_reg_igd_opregion,
 },
+/* Vendor-specific Extended Capability reg group */
+{
+.grp_id  = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VNDR),
+.grp_type= XEN_PT_GRP_TYPE_EMU,
+.grp_size= 0xFF,
+.size_init   = xen_pt_ext_cap_vendor_size_init,
+.emu_regs= xen_pt_ext_cap_emu_reg_vendor,
+},
 {
 .grp_size = 0,
 },
-- 
2.34.1

[PATCH v1 03/23] q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35

2023-06-20 Thread Joel Upham

This patch allows to use ACPI PCI hotplug functionality for Xen on Q35.
All added code depends on xen_enabled(), so no functionality change for
non-Xen usage.

We need to call the acpi_set_pci_info function from ich9_pm_init as well,
so it was made globally visible again (as it was before).

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/acpi/ich9.c  | 10 ++
 hw/acpi/pcihp.c |  2 +-
 include/hw/acpi/pcihp.h |  2 ++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 25e2c7243e..1c236be1c7 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -39,6 +39,8 @@
 #include "hw/southbridge/ich9.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/xen/xen.h"
+#include "sysemu/xen.h"
 
 //#define DEBUG
 
@@ -67,6 +69,10 @@ static void ich9_gpe_writeb(void *opaque, hwaddr addr, 
uint64_t val,
 ICH9LPCPMRegs *pm = opaque;
 acpi_gpe_ioport_writeb(>acpi_regs, addr, val);
 acpi_update_sci(>acpi_regs, pm->irq);
+
+if (xen_enabled()) {
+acpi_pcihp_reset(>acpi_pci_hotplug);
+}
 }
 
 static const MemoryRegionOps ich9_gpe_ops = {
@@ -332,6 +338,10 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, 
qemu_irq sci_irq)
 pm->powerdown_notifier.notify = pm_powerdown_req;
 qemu_register_powerdown_notifier(>powerdown_notifier);
 
+if (xen_enabled()) {
+acpi_set_pci_info(true);
+}
+
 legacy_acpi_cpu_hotplug_init(pci_address_space_io(lpc_pci),
 OBJECT(lpc_pci), >gpe_cpu, ICH9_CPU_HOTPLUG_IO_BASE);
 
diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index f4e39d7a9c..5b065d670c 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -99,7 +99,7 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
 return info;
 }
 
-static void acpi_set_pci_info(bool has_bridge_hotplug)
+void acpi_set_pci_info(bool has_bridge_hotplug)
 {
 static bool bsel_is_set;
 Object *host = acpi_get_i386_pci_host();
diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h
index ef59810c17..d35a517c9e 100644
--- a/include/hw/acpi/pcihp.h
+++ b/include/hw/acpi/pcihp.h
@@ -72,6 +72,8 @@ void acpi_pcihp_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 /* Called on reset */
 void acpi_pcihp_reset(AcpiPciHpState *s);
 
+void acpi_set_pci_info(bool has_bridge_hotplug);
+
 void build_append_pcihp_slots(Aml *parent_scope, PCIBus *bus);
 
 extern const VMStateDescription vmstate_acpi_pcihp_pci_status;
-- 
2.34.1

[PATCH v1 00/23] Q35 support for Xen

2023-06-20 Thread Joel Upham

These are the Qemu changes needed to support the q35 chipset for xen
I based the patches from 2017 found on the mailing list here:
https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg01176.html

I have been using a version of these patches on Xen 4.16 with Qemu
version 4.1 for over 6 months.  The guest VMs are very stable, and PCIe
PT is working as was designed (all of the PCIe devices are on the root
PCIe device).  I have successfully passed through GPUs, NICs, etc. I was
asked by those in the community to attempt to once again upstream the
patches.  I have them working with Seabios and OVMF (patches are needed
to OVMF which I will be sending to the mailing list). The Qemu patches 
allow for the xenvbd to properly unplug the AHCI SATA device, and all 
xen pv windows drivers work as intended.

I used the original author of the patches to get a majority of this to work:
Alexey Gerasimenko.  I fixed the patches to be in line with the upstream
Qemu and Xen versions.  Any original issues may still exist; however, I
am sure in time they can be improved. If the code doesn't exist then they
can't be actively looked at by the community.

I am not an expert on the Q35 chipset or PCIe technology.  This is my
first patch to this mailing list.


Joel Upham (23):
  pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
  pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug
  q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35
  q35/xen: Add Xen platform device support for Q35
  q35: Fix incorrect values for PCIEXBAR masks
  xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and
PCIe Extended Capabilities enumeration
  xen/pt: avoid reading PCIe device type and cap version multiple times
  xen/pt: determine the legacy/PCIe mode for a passed through device
  xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology
check
  xen/pt: add support for PCIe Extended Capabilities and larger config
space
  xen/pt: handle PCIe Extended Capabilities Next register
  xen/pt: allow to hide PCIe Extended Capabilities
  xen/pt: add Vendor-specific PCIe Extended Capability descriptor and
sizing
  xen/pt: add fixed-size PCIe Extended Capabilities descriptors
  xen/pt: add AER PCIe Extended Capability descriptor and sizing
  xen/pt: add descriptors and size calculation for
RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities
  xen/pt: add Resizable BAR PCIe Extended Capability descriptor and
sizing
  xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors and
sizing
  xen/pt: Fake capability id
  xen platform: unplug ahci object
  pc/q35: setup q35 for xen
  qdev-monitor/pt: bypass root device check
  s3 support: enabling s3 with q35

 hw/acpi/ich9.c|   22 +-
 hw/acpi/pcihp.c   |6 +-
 hw/core/machine.c |   19 +
 hw/i386/pc_piix.c |3 +-
 hw/i386/pc_q35.c  |   39 +-
 hw/i386/xen/xen-hvm.c |7 +-
 hw/i386/xen/xen_platform.c|   19 +-
 hw/isa/lpc_ich9.c |   53 +-
 hw/isa/piix3.c|2 +-
 hw/pci-host/q35.c |   28 +-
 hw/pci/pci.c  |   17 +
 hw/xen/xen-host-pci-device.c  |  106 +++-
 hw/xen/xen-host-pci-device.h  |6 +-
 hw/xen/xen_pt.c   |   49 +-
 hw/xen/xen_pt.h   |   18 +-
 hw/xen/xen_pt_config_init.c   | 1103 ++---
 include/hw/acpi/pcihp.h   |2 +
 include/hw/boards.h   |1 +
 include/hw/i386/pc.h  |3 +
 include/hw/pci-host/q35.h |4 +-
 include/hw/pci/pci.h  |3 +
 include/hw/southbridge/ich9.h |1 +
 include/hw/xen/xen.h  |4 +-
 qemu-options.hx   |1 +
 softmmu/qdev-monitor.c|4 +-
 stubs/xen-hw-stub.c   |4 +-
 26 files changed, 1394 insertions(+), 130 deletions(-)

-- 
2.34.1

[PATCH v1 10/23] xen/pt: add support for PCIe Extended Capabilities and larger config space

2023-06-20 Thread Joel Upham

This patch provides basic facilities for PCIe Extended Capabilities and
support for controlled (via s->pcie_enabled_dev flag) access to PCIe
config space (>256).

PCIe Extended Capabilities make use of 16-bit capability ID. Also,
a capability size might exceed 8-bit width. So as the very first step
we need to increase type size for grp_id, grp_size, etc -- they were
limited to 8-bit.

The only troublesome issue with PCIe Extended Capability IDs is that their
value range is actually same as for basic PCI capabilities.
Eg. capability ID 3 means VPD Capability for PCI and at the same time
Device Serial Number Capability for PCIe Extended caps. This adds a bit of
inconvenience.

In order to distinguish between two sets of same capability IDs, the patch
introduces a set of macros to mark a capability ID as PCIe Extended one
(or check if it is basic/extended + get a raw ID value):
- PCIE_EXT_CAP_ID(cap_id)
- IS_PCIE_EXT_CAP_ID(grp_id)
- GET_PCIE_EXT_CAP_ID(grp_id)

Here is how it's used:
/* Intel IGD Opregion group */
{
.grp_id  = XEN_PCI_INTEL_OPREGION,  /* no change */
.grp_type= XEN_PT_GRP_TYPE_EMU,
.grp_size= 0x4,
.size_init   = xen_pt_reg_grp_size_init,
.emu_regs= xen_pt_emu_reg_igd_opregion,
},
/* Vendor-specific Extended Capability reg group */
{
.grp_id  = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VNDR),
.grp_type= XEN_PT_GRP_TYPE_EMU,
.grp_size= 0xFF,
.size_init   = xen_pt_ext_cap_vendor_size_init,
.emu_regs= xen_pt_ext_cap_emu_reg_vendor,
},
By using the PCIE_EXT_CAP_ID() macro it is possible to reuse existing
header files with already defined PCIe Extended Capability ID values.

find_cap_offset() receive capabily ID and checks if it's an Extended one
by using IS_PCIE_EXT_CAP_ID(cap) macro, passing the real capabiliy
ID value to either xen_host_pci_find_next_ext_cap
or xen_host_pci_find_next_cap.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt.c | 10 -
 hw/xen/xen_pt.h | 13 --
 hw/xen/xen_pt_config_init.c | 90 ++---
 3 files changed, 83 insertions(+), 30 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 65c5516ef4..f757978800 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -96,8 +96,16 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...)
 
 static int xen_pt_pci_config_access_check(PCIDevice *d, uint32_t addr, int len)
 {
+XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
 /* check offset range */
-if (addr > 0xFF) {
+if (s->pcie_enabled_dev) {
+if (addr >= PCIE_CONFIG_SPACE_SIZE) {
+XEN_PT_ERR(d, "Failed to access register with offset "
+  "exceeding 0xFFF. (addr: 0x%02x, len: %d)\n",
+  addr, len);
+return -1;
+}
+} else if (addr >= PCI_CONFIG_SPACE_SIZE) {
 XEN_PT_ERR(d, "Failed to access register with offset exceeding 0xFF. "
"(addr: 0x%02x, len: %d)\n", addr, len);
 return -1;
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 1c9cd6b615..eb062be3f4 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -33,6 +33,11 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...) 
G_GNUC_PRINTF(2, 3);
 /* Helper */
 #define XEN_PFN(x) ((x) >> XC_PAGE_SHIFT)
 
+/* Macro's for PCIe Extended Capabilities */
+#define PCIE_EXT_CAP_ID(cap_id) ((cap_id) | (1U << 16))
+#define IS_PCIE_EXT_CAP_ID(grp_id)  ((grp_id) & (1U << 16))
+#define GET_PCIE_EXT_CAP_ID(grp_id) ((grp_id) & 0x)
+
 typedef const struct XenPTRegInfo XenPTRegInfo;
 typedef struct XenPTReg XenPTReg;
 
@@ -174,13 +179,13 @@ typedef const struct XenPTRegGroupInfo XenPTRegGroupInfo;
 /* emul reg group size initialize method */
 typedef int (*xen_pt_reg_size_init_fn)
 (XenPCIPassthroughState *, XenPTRegGroupInfo *,
- uint32_t base_offset, uint8_t *size);
+ uint32_t base_offset, uint32_t *size);
 
 /* emulated register group information */
 struct XenPTRegGroupInfo {
-uint8_t grp_id;
+uint32_t grp_id;
 XenPTRegisterGroupType grp_type;
-uint8_t grp_size;
+uint32_t grp_size;
 xen_pt_reg_size_init_fn size_init;
 XenPTRegInfo *emu_regs;
 };
@@ -190,7 +195,7 @@ typedef struct XenPTRegGroup {
 QLIST_ENTRY(XenPTRegGroup) entries;
 XenPTRegGroupInfo *reg_grp;
 uint32_t base_offset;
-uint8_t size;
+uint32_t size;
 QLIST_HEAD(, XenPTReg) reg_tbl_list;
 } XenPTRegGroup;
 
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 757a035aad..34ed9c25c5 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -32,28 +32,40 @@ static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s, 
XenPTRegInfo *reg,
 /* helper */
 
 /* A return value of 1 means the capability should NOT be exposed to guest. */
-static int xen_pt_hide_dev_cap(const XenHostPCIDevice *d, uint8_t

[PATCH v1 19/23] xen/pt: Fake capability id

2023-06-20 Thread Joel Upham

Some PCIe capabilities needed to be faked for the xen implementation to work.

This is the situation when we were asked to hide (aka
"hardwire to 0") some PCIe ext capability, but it was located
at offset 0x100 in PCIe config space. In this case we can't
simply exclude it from the linked list of capabilities
(as it is the first entry in the list), so we must fake its
Capability ID in PCIe Extended Capability header, leaving
the Next Ptr field intact while returning zeroes on attempts
to read capability body (writes are ignored).

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt_config_init.c | 72 -
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 4e14adf2b2..41b43b9445 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -16,6 +16,7 @@
 #include "qapi/error.h"
 #include "qemu/timer.h"
 #include "xen_pt.h"
+#include "xen-host-pci-device.h"
 #include "hw/xen/xen-legacy-backend.h"
 
 #define XEN_PT_MERGE_VALUE(value, data, val_mask) \
@@ -31,6 +32,10 @@ static int 
xen_pt_ext_cap_ptr_reg_init(XenPCIPassthroughState *s,
XenPTRegInfo *reg,
uint32_t real_offset,
uint32_t *data);
+static int xen_pt_ext_cap_capid_reg_init(XenPCIPassthroughState *s,
+ XenPTRegInfo *reg,
+ uint32_t real_offset,
+ uint32_t *data);
 
 /* helper */
 
@@ -995,6 +1000,17 @@ static XenPTRegInfo xen_pt_emu_reg_pcie[] = {
 .u.b.read   = xen_pt_byte_reg_read,
 .u.b.write  = xen_pt_byte_reg_write,
 },
+/* PCI Express Capabilities Register */
+{
+.offset = PCI_EXP_FLAGS,
+.size   = 2,
+.init_val   = 0x,
+.ro_mask= 0x,
+.emu_mask   = 0x,
+.init   = xen_pt_pcie_capabilities_reg_init,
+.u.w.read   = xen_pt_word_reg_read,
+.u.w.write  = xen_pt_word_reg_write,
+},
 /* Device Capabilities reg */
 {
 .offset = PCI_EXP_DEVCAP,
@@ -1633,6 +1649,54 @@ static XenPTRegInfo xen_pt_emu_reg_igd_opregion[] = {
 },
 };
 
+/
+ * Emulated registers for
+ * PCIe Extended Capabilities
+ */
+
+static uint16_t fake_cap_id = XEN_PCIE_FAKE_CAP_ID_BASE;
+
+/* PCIe Extended Capability ID reg */
+static int xen_pt_ext_cap_capid_reg_init(XenPCIPassthroughState *s,
+ XenPTRegInfo *reg,
+ uint32_t real_offset,
+ uint32_t *data)
+{
+uint16_t reg_field;
+int rc;
+XenPTRegGroup *reg_grp_entry = NULL;
+
+/* use real device register's value as initial value */
+rc = xen_host_pci_get_word(>real_device, real_offset, _field);
+if (rc) {
+return rc;
+}
+
+reg_grp_entry = xen_pt_find_reg_grp(s, real_offset);
+
+if (reg_grp_entry) {
+if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED &&
+reg_grp_entry->base_offset == PCI_CONFIG_SPACE_SIZE) {
+/*
+ * This is the situation when we were asked to hide (aka
+ * "hardwire to 0") some PCIe ext capability, but it was located
+ * at offset 0x100 in PCIe config space. In this case we can't
+ * simply exclude it from the linked list of capabilities
+ * (as it is the first entry in the list), so we must fake its
+ * Capability ID in PCIe Extended Capability header, leaving
+ * the Next Ptr field intact while returning zeroes on attempts
+ * to read capability body (writes are ignored).
+ */
+reg_field = fake_cap_id;
+/* increment the value in order to have unique Capability IDs */
+fake_cap_id++;
+}
+}
+
+*data = reg_field;
+return 0;
+}
+
 /* Vendor-specific Ext Capability Structure reg static information table */
 static XenPTRegInfo xen_pt_ext_cap_emu_reg_vendor[] = {
 {
@@ -2938,7 +3002,13 @@ void xen_pt_config_init(XenPCIPassthroughState *s, Error 
**errp)
 }
 }
 
-if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU) {
+if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU ||
+/*
+ * We need to always emulate the PCIe Extended Capability
+ * header for a hidden capability which starts at offset 0x100
+ */
+(xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_HARDWIRED &&
+reg_grp_offset == 0x100)) {
 if (xen_pt_emu_reg_grps[i].emu_regs) {
 int j = 0;
 XenPTRegInfo *regs = xen_pt_emu_reg_grps[i].emu_regs;
-- 
2.34.1

[PATCH v1 14/23] xen/pt: add fixed-size PCIe Extended Capabilities descriptors

2023-06-20 Thread Joel Upham

This adds description structures for all fixed-size PCIe Extended
Capabilities.

For every capability register group, only 2 registers are emulated
currently: Capability ID (16 bit) and Next Capability Offset/Version (16
bit). Both needed to implement selective capability hiding. All other
registers are passed through at the moment (unless they belong to
a "hardwired" capability which is hidden)

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/xen/xen_pt_config_init.c | 183 
 1 file changed, 183 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 20b5561d25..69d8857c66 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1669,6 +1669,37 @@ static XenPTRegInfo xen_pt_ext_cap_emu_reg_vendor[] = {
 .size = 0,
 },
 };
+
+/* Common reg static information table for all passthru-type
+ * PCIe Extended Capabilities. Only Extended Cap ID and
+ * Next pointer are handled (to support capability hiding).
+ */
+static XenPTRegInfo xen_pt_ext_cap_emu_reg_dummy[] = {
+{
+.offset = XEN_PCIE_CAP_ID,
+.size   = 2,
+.init_val   = 0x,
+.ro_mask= 0x,
+.emu_mask   = 0x,
+.init   = xen_pt_ext_cap_capid_reg_init,
+.u.w.read   = xen_pt_word_reg_read,
+.u.w.write  = xen_pt_word_reg_write,
+},
+{
+.offset = XEN_PCIE_CAP_LIST_NEXT,
+.size   = 2,
+.init_val   = 0x,
+.ro_mask= 0x,
+.emu_mask   = 0x,
+.init   = xen_pt_ext_cap_ptr_reg_init,
+.u.w.read   = xen_pt_word_reg_read,
+.u.w.write  = xen_pt_word_reg_write,
+},
+{
+.size = 0,
+},
+};
+
 /
  * Capabilities
  */
@@ -1945,6 +1976,158 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
 .size_init   = xen_pt_ext_cap_vendor_size_init,
 .emu_regs= xen_pt_ext_cap_emu_reg_vendor,
 },
+/* Device Serial Number Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_DSN),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = PCI_EXT_CAP_DSN_SIZEOF,   /*0x0C*/
+.size_init  = xen_pt_reg_grp_size_init,
+.emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+},
+/* Power Budgeting Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PWR),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = PCI_EXT_CAP_PWR_SIZEOF,   /*0x10*/
+.size_init  = xen_pt_reg_grp_size_init,
+.emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+},
+/* Root Complex Internal Link Control Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCILC),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = 0x0C,
+.size_init  = xen_pt_reg_grp_size_init,
+.emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+},
+/* Root Complex Event Collector Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCEC),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = 0x08,
+.size_init  = xen_pt_reg_grp_size_init,
+.emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+},
+/* Root Complex Register Block Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCRB),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = 0x14,
+.size_init  = xen_pt_reg_grp_size_init,
+.emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+},
+/* Configuration Access Correlation Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_CAC),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = 0x08,
+.size_init  = xen_pt_reg_grp_size_init,
+.emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+},
+/* Alternate Routing ID Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ARI),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = PCI_EXT_CAP_ARI_SIZEOF,
+.size_init  = xen_pt_reg_grp_size_init,
+.emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+},
+/* Address Translation Services Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ATS),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = PCI_EXT_CAP_ATS_SIZEOF,
+.size_init  = xen_pt_reg_grp_size_init,
+.emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+},
+/* Single Root I/O Virtualization Extended Capability reg group */
+{
+.grp_id = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_SRIOV),
+.grp_type   = XEN_PT_GRP_TYPE_EMU,
+.grp_size   = PCI_EXT_CAP_SRIOV_SIZEOF,
+.size_init  = xen_pt_reg_grp_size_init,
+.emu_regs   =

[PATCH v1 02/23] pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug

2023-06-20 Thread Joel Upham

On Q35 we still need to assign BSEL property to bus(es) for PCI device
add/hotplug to work.
Extend acpi_set_pci_info() function to support Q35 as well. This patch adds new 
(trivial)
function find_q35() which returns root PCIBus object on Q35, in a way
similar to what find_i440fx does.

Signed-off-by: Alexey Gerasimenko 
Signed-off-by: Joel Upham 
---
 hw/acpi/pcihp.c  | 4 +++-
 hw/pci-host/q35.c| 9 +
 include/hw/i386/pc.h | 3 +++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index cdd6f775a1..f4e39d7a9c 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -40,6 +40,7 @@
 #include "qapi/error.h"
 #include "qom/qom-qobject.h"
 #include "trace.h"
+#include "sysemu/xen.h"
 
 #define ACPI_PCIHP_SIZE 0x0018
 #define PCI_UP_BASE 0x
@@ -84,7 +85,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
 bool is_bridge = IS_PCI_BRIDGE(br);
 
 /* hotplugged bridges can't be described in ACPI ignore them */
-if (qbus_is_hotpluggable(BUS(bus))) {
+/* Xen requires hotplugging to the root device, even on the Q35 chipset */
+if (qbus_is_hotpluggable(BUS(bus)) || xen_enabled()) {
 if (!is_bridge || (!br->hotplugged && info->has_bridge_hotplug)) {
 bus_bsel = g_malloc(sizeof *bus_bsel);
 
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index fd18920e7f..fe5fc0f47c 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -259,6 +259,15 @@ static void q35_host_initfn(Object *obj)
  qdev_prop_allow_set_link_before_realize, 0);
 }
 
+PCIBus *find_q35(void)
+{
+PCIHostState *s = OBJECT_CHECK(PCIHostState,
+   object_resolve_path("/machine/q35", NULL),
+   TYPE_PCI_HOST_BRIDGE);
+return s ? s->bus : NULL;
+}
+
+
 static const TypeInfo q35_host_info = {
 .name   = TYPE_Q35_HOST_DEVICE,
 .parent = TYPE_PCIE_HOST_BRIDGE,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index c661e9cc80..550f8fa221 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -196,6 +196,9 @@ void pc_madt_cpu_entry(int uid, const CPUArchIdList 
*apic_ids,
 /* sgx.c */
 void pc_machine_init_sgx_epc(PCMachineState *pcms);
 
+/* q35.c */
+PCIBus *find_q35(void);
+
 extern GlobalProperty pc_compat_8_0[];
 extern const size_t pc_compat_8_0_len;
 
-- 
2.34.1

[PATCH] STM32F100: add support for external memory via FSMC

2023-06-20 Thread Lucas Villa Real

Add support for FSMC on high-density STM32F100 devices and enable
mapping of additional memory via the `-m SIZE` command-line option.
FSMC Bank1 can address up to 4x64MB of PSRAM memory at 0x6000.

RCC is needed to enable peripheral clock for FSMC; this commit
implements support for RCC through the MMIO interface.

Last, high-density devices support up to 32KB of static SRAM, so
adjust SRAM_SIZE accordingly.

Signed-off-by: Lucas C. Villa Real 
---
 docs/system/arm/stm32.rst|  12 ++-
 hw/arm/Kconfig   |   1 +
 hw/arm/stm32f100_soc.c   | 102 +++-
 hw/arm/stm32f1_generic.c |  12 +++
 hw/misc/Kconfig  |   3 +
 hw/misc/meson.build  |   1 +
 hw/misc/stm32f1xx_fsmc.c | 155 +++
 include/hw/arm/stm32f100_soc.h   |  24 -
 include/hw/misc/stm32f1xx_fsmc.h |  62 +
 9 files changed, 368 insertions(+), 4 deletions(-)
 create mode 100644 hw/misc/stm32f1xx_fsmc.c
 create mode 100644 include/hw/misc/stm32f1xx_fsmc.h

diff --git a/docs/system/arm/stm32.rst b/docs/system/arm/stm32.rst
index d0a3b1a7eb..40de58ed04 100644
--- a/docs/system/arm/stm32.rst
+++ b/docs/system/arm/stm32.rst
@@ -40,6 +40,8 @@ Supported devices
  * SPI controller
  * System configuration (SYSCFG)
  * Timer controller (TIMER)
+ * Reset and Clock Controller (RCC)
+ * Flexible static memory controller (FSMC)
 
 Missing devices
 ---
@@ -57,7 +59,6 @@ Missing devices
  * Power supply configuration (PWR)
  * Random Number Generator (RNG)
  * Real-Time Clock (RTC) controller
- * Reset and Clock Controller (RCC)
  * Secure Digital Input/Output (SDIO) interface
  * USB OTG
  * Watchdog controller (IWDG, WWDG)
@@ -78,4 +79,11 @@ to select the device density line.  The following values are 
supported:
 
 .. code-block:: bash
 
-  $ qemu-system-arm -M stm32f1-generic -global stm32f100-soc.density=medium ...
\ No newline at end of file
+  $ qemu-system-arm -M stm32f1-generic -global stm32f100-soc.density=medium ...
+
+High-density devices can also enable up to 256 MB of external memory using
+the `-m SIZE` option. The memory is mapped at address 0x6000. Example:
+ 
+.. code-block:: bash
+
+  $ qemu-system-arm -M stm32f1-generic -m 64M ...
\ No newline at end of file
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 822441945c..dd48068108 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -433,6 +433,7 @@ config RASPI
 config STM32F100_SOC
 bool
 select ARM_V7M
+select STM32F1XX_FSMC
 select STM32F2XX_USART
 select STM32F2XX_SPI
 
diff --git a/hw/arm/stm32f100_soc.c b/hw/arm/stm32f100_soc.c
index c157ffd644..a2b863d309 100644
--- a/hw/arm/stm32f100_soc.c
+++ b/hw/arm/stm32f100_soc.c
@@ -26,6 +26,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu/module.h"
+#include "qemu/log.h"
 #include "hw/arm/boot.h"
 #include "exec/address-spaces.h"
 #include "hw/arm/stm32f100_soc.h"
@@ -40,9 +41,85 @@ static const uint32_t usart_addr[STM_NUM_USARTS] = { 
0x40013800, 0x40004400,
 0x40004800 };
 static const uint32_t spi_addr[STM_NUM_SPIS] = { 0x40013000, 0x40003800,
 0x40003C00 };
+static const uint32_t fsmc_addr = 0xA000;
 
 static const int usart_irq[STM_NUM_USARTS] = {37, 38, 39};
 static const int spi_irq[STM_NUM_SPIS] = {35, 36, 51};
+static const int fsmc_irq = 48;
+
+static uint64_t stm32f100_rcc_read(void *h, hwaddr offset, unsigned size)
+{
+STM32F100State *s = (STM32F100State *) h;
+switch (offset) {
+case 0x00:
+return s->rcc.cr;
+case 0x04:
+return s->rcc.cfgr;
+case 0x08:
+return s->rcc.cir;
+case 0x0C:
+return s->rcc.apb2rstr;
+case 0x10:
+return s->rcc.apb1rstr;
+case 0x14:
+return s->rcc.ahbenr;
+case 0x18:
+return s->rcc.apb2enr;
+case 0x1C:
+return s->rcc.apb1enr;
+case 0x20:
+return s->rcc.bdcr;
+case 0x24:
+return s->rcc.csr;
+case 0x2C:
+return s->rcc.cfgr2;
+default:
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Bad offset 0x%"HWADDR_PRIx"\n", __func__, offset);
+}
+return 0;
+}
+
+static void stm32f100_rcc_write(void *h, hwaddr offset, uint64_t value64,
+unsigned size)
+{
+STM32F100State *s = (STM32F100State *) h;
+uint32_t value = value64 & 0x;
+
+switch (offset) {
+case 0x00:
+s->rcc.cr = value;
+case 0x04:
+s->rcc.cfgr = value;
+case 0x08:
+s->rcc.cir = value;
+case 0x0C:
+s->rcc.apb2rstr = value;
+case 0x10:
+s->rcc.apb1rstr = value;
+case 0x14:
+s->rcc.ahbenr = value;
+case 0x18:
+s->rcc.apb2enr = value;
+case 0x1C:
+s->rcc.apb1enr = value;
+case 0x20:
+s->rcc.bdcr = value;
+case 0x24:
+s->rcc.csr = value;
+case 0x2C:
+s->rcc.cfgr2 = value;
+default:
+

[PATCH] STM32F100: add support for external memory via FSMC

2023-06-20 Thread Lucas Villa Real

Add support for FSMC on high-density STM32F100 devices and enable
mapping of additional memory via the `-m SIZE` command-line option.
FSMC Bank1 can address up to 4x64MB of PSRAM memory at 0x6000.

RCC is needed to enable peripheral clock for FSMC; this commit
implements support for RCC through the MMIO interface.

Last, high-density devices support up to 32KB of static SRAM, so
adjust SRAM_SIZE accordingly.

Signed-off-by: Lucas C. Villa Real 
---
 docs/system/arm/stm32.rst|  12 ++-
 hw/arm/Kconfig   |   1 +
 hw/arm/stm32f100_soc.c   | 102 +++-
 hw/arm/stm32f1_generic.c |  12 +++
 hw/misc/Kconfig  |   3 +
 hw/misc/meson.build  |   1 +
 hw/misc/stm32f1xx_fsmc.c | 155 +++
 include/hw/arm/stm32f100_soc.h   |  24 -
 include/hw/misc/stm32f1xx_fsmc.h |  62 +
 9 files changed, 368 insertions(+), 4 deletions(-)
 create mode 100644 hw/misc/stm32f1xx_fsmc.c
 create mode 100644 include/hw/misc/stm32f1xx_fsmc.h

diff --git a/docs/system/arm/stm32.rst b/docs/system/arm/stm32.rst
index d0a3b1a7eb..40de58ed04 100644
--- a/docs/system/arm/stm32.rst
+++ b/docs/system/arm/stm32.rst
@@ -40,6 +40,8 @@ Supported devices
  * SPI controller
  * System configuration (SYSCFG)
  * Timer controller (TIMER)
+ * Reset and Clock Controller (RCC)
+ * Flexible static memory controller (FSMC)
 
 Missing devices
 ---
@@ -57,7 +59,6 @@ Missing devices
  * Power supply configuration (PWR)
  * Random Number Generator (RNG)
  * Real-Time Clock (RTC) controller
- * Reset and Clock Controller (RCC)
  * Secure Digital Input/Output (SDIO) interface
  * USB OTG
  * Watchdog controller (IWDG, WWDG)
@@ -78,4 +79,11 @@ to select the device density line.  The following values are 
supported:
 
 .. code-block:: bash
 
-  $ qemu-system-arm -M stm32f1-generic -global stm32f100-soc.density=medium ...
\ No newline at end of file
+  $ qemu-system-arm -M stm32f1-generic -global stm32f100-soc.density=medium ...
+
+High-density devices can also enable up to 256 MB of external memory using
+the `-m SIZE` option. The memory is mapped at address 0x6000. Example:
+ 
+.. code-block:: bash
+
+  $ qemu-system-arm -M stm32f1-generic -m 64M ...
\ No newline at end of file
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 822441945c..dd48068108 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -433,6 +433,7 @@ config RASPI
 config STM32F100_SOC
 bool
 select ARM_V7M
+select STM32F1XX_FSMC
 select STM32F2XX_USART
 select STM32F2XX_SPI
 
diff --git a/hw/arm/stm32f100_soc.c b/hw/arm/stm32f100_soc.c
index c157ffd644..a2b863d309 100644
--- a/hw/arm/stm32f100_soc.c
+++ b/hw/arm/stm32f100_soc.c
@@ -26,6 +26,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu/module.h"
+#include "qemu/log.h"
 #include "hw/arm/boot.h"
 #include "exec/address-spaces.h"
 #include "hw/arm/stm32f100_soc.h"
@@ -40,9 +41,85 @@ static const uint32_t usart_addr[STM_NUM_USARTS] = { 
0x40013800, 0x40004400,
 0x40004800 };
 static const uint32_t spi_addr[STM_NUM_SPIS] = { 0x40013000, 0x40003800,
 0x40003C00 };
+static const uint32_t fsmc_addr = 0xA000;
 
 static const int usart_irq[STM_NUM_USARTS] = {37, 38, 39};
 static const int spi_irq[STM_NUM_SPIS] = {35, 36, 51};
+static const int fsmc_irq = 48;
+
+static uint64_t stm32f100_rcc_read(void *h, hwaddr offset, unsigned size)
+{
+STM32F100State *s = (STM32F100State *) h;
+switch (offset) {
+case 0x00:
+return s->rcc.cr;
+case 0x04:
+return s->rcc.cfgr;
+case 0x08:
+return s->rcc.cir;
+case 0x0C:
+return s->rcc.apb2rstr;
+case 0x10:
+return s->rcc.apb1rstr;
+case 0x14:
+return s->rcc.ahbenr;
+case 0x18:
+return s->rcc.apb2enr;
+case 0x1C:
+return s->rcc.apb1enr;
+case 0x20:
+return s->rcc.bdcr;
+case 0x24:
+return s->rcc.csr;
+case 0x2C:
+return s->rcc.cfgr2;
+default:
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Bad offset 0x%"HWADDR_PRIx"\n", __func__, offset);
+}
+return 0;
+}
+
+static void stm32f100_rcc_write(void *h, hwaddr offset, uint64_t value64,
+unsigned size)
+{
+STM32F100State *s = (STM32F100State *) h;
+uint32_t value = value64 & 0x;
+
+switch (offset) {
+case 0x00:
+s->rcc.cr = value;
+case 0x04:
+s->rcc.cfgr = value;
+case 0x08:
+s->rcc.cir = value;
+case 0x0C:
+s->rcc.apb2rstr = value;
+case 0x10:
+s->rcc.apb1rstr = value;
+case 0x14:
+s->rcc.ahbenr = value;
+case 0x18:
+s->rcc.apb2enr = value;
+case 0x1C:
+s->rcc.apb1enr = value;
+case 0x20:
+s->rcc.bdcr = value;
+case 0x24:
+s->rcc.csr = value;
+case 0x2C:
+s->rcc.cfgr2 = value;
+default:
+

Re: [PATCH 01/42] migration-test: Be consistent for ppc

2023-06-20 Thread Peter Xu

On Tue, Jun 20, 2023 at 09:27:17PM +0200, Laurent Vivier wrote:
> On 6/20/23 16:54, Peter Xu wrote:
> > On Fri, Jun 09, 2023 at 12:49:02AM +0200, Juan Quintela wrote:
> > > It makes no sense that we don't have the same configuration on both sides.
> > 
> > I hope Laurent can see this one out of 40s.
> 
> I had some luck...

:-D

> 
> > 
> > Makes sense to me, but does it mean that the devices are not matching
> > before on ppc?  Confused how did it work then..
> 
> I agree we need the -nodefaults on both sides.
> 
> It has been introduced by
> fc71e3e562b7 ("tests/migration: Speed up the test on ppc64") (Thomas)
> 
> I think it works because destination side doesn't check for what is missing.

Oh!  Makes sense.. just notice this (fact). Then no fixes needed either.

> 
> Reviewed-by: Laurent Vivier 

Thanks, Laurent!

-- 
Peter Xu

Re: [PATCH V2] migration: file URI

2023-06-20 Thread Peter Xu

On Tue, Jun 20, 2023 at 02:36:58PM -0400, Steven Sistare wrote:
> On 6/15/2023 10:50 AM, Fabiano Rosas wrote:
> > Peter Xu  writes:
> > 
> >> On Wed, Jun 14, 2023 at 02:59:54PM -0300, Fabiano Rosas wrote:
> >>> In this message Daniel mentions virDomainSnapshotXXX which would benefit
> >>> from using the same "file" migration, but being done live:
> >>>
> >>> https://lore.kernel.org/r/zd7mrgq+4qsdb...@redhat.com
> >>>
> >>> And from your response here:
> >>>  https://lore.kernel.org/r/ZEA759BSs75ldW6Y@x1n
> >>>
> >>> I had understood that having a new SUSPEND cap to decide whether to do
> >>> it live or non-live would be enough to cover all use-cases.
> >>
> >> Oh, I probably lost some of the contexts there, sorry about that - so it's
> >> about not being able to live snapshot on !LINUX worlds properly, am I
> >> right?
> >>
> > 
> > Right, so that gives us for now a reasonable use-case for keeping live
> > migration behavior possible with "file:".
> > 
> >> In the ideal world where we can always synchronously tracking guest pages
> >> (like what we do with userfaultfd wr-protections on modern Linux), the
> >> !SUSPEND case should always be covered by CAP_BACKGROUND_SNAPSHOT already
> >> in a more performant way.  IOW, !SUSPEND seems to be not useful to Linux,
> >> because whenever we want to set !SUSPEND we should just use BG_SNAPSHOT.
> >>
> > 
> > I agree.
> > 
> >> But I think indeed the live snapshot support is not good enough. Even on
> >> Linux, it lacks different memory type supports, multi-process support, and
> >> also no-go on very old kernels.  So I assume the fallback makes sense, and
> >> then we can't always rely on that.
> >>
> >> Then I agree we can keep "file:" the same as others like proposed here, but
> >> I'd like to double check with all of us so we're on the same page..
> > 
> > +1
> > 
> >> And maybe we should mention some discussions into commit message or
> >> comments where proper in the code, so we can track what has happened
> >> easier.
> >>
> > 
> > I'll add some words where appropriate in my series as well. A v2 is
> > already overdue with all the refactorings that have happened in the
> > migration code.
> 
> Peter, should one of us proceed to submit the file URI as a stand-alone 
> patch, 
> since we both need it, and it has some value on its own? 
> 
> My version adds a watch on the incoming channel so we do not block monitor 
> commands. 
> It also adds tracepoints like the other URI's.
> 
> Fabiano's version adds a nice unit test.  
> 
> Maybe we should submit a small series with both.

I fully agree.  I didn't check the details, but if we know the shared bits
it'll be great if we arrange it before-hand, and then it might also be the
best too for all sides.  Thanks for raising this.

-- 
Peter Xu

Re: [PATCH 01/42] migration-test: Be consistent for ppc

2023-06-20 Thread Laurent Vivier


On 6/20/23 16:54, Peter Xu wrote:

On Fri, Jun 09, 2023 at 12:49:02AM +0200, Juan Quintela wrote:

It makes no sense that we don't have the same configuration on both sides.


I hope Laurent can see this one out of 40s.


I had some luck...



Makes sense to me, but does it mean that the devices are not matching
before on ppc?  Confused how did it work then..


I agree we need the -nodefaults on both sides.

It has been introduced by
fc71e3e562b7 ("tests/migration: Speed up the test on ppc64") (Thomas)

I think it works because destination side doesn't check for what is missing.

Reviewed-by: Laurent Vivier 

Thanks,
Laurent




Signed-off-by: Juan Quintela 
---
  tests/qtest/migration-test.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355bbd9..c5e0c69c6b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -646,7 +646,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
"'nvramrc=hex .\" _\" begin %x %x "
"do i c@ 1 + i c! 1000 +loop .\" B\" 0 "
"until'", end_address, start_address);
-arch_target = g_strdup("");
+arch_target = g_strdup("-nodefaults");
  } else if (strcmp(arch, "aarch64") == 0) {
  init_bootfile(bootpath, aarch64_kernel, sizeof(aarch64_kernel));
  machine_opts = "virt,gic-version=max";
--
2.40.1

Re: [PATCH V2] migration: file URI

2023-06-20 Thread Steven Sistare

On 6/15/2023 10:50 AM, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
>> On Wed, Jun 14, 2023 at 02:59:54PM -0300, Fabiano Rosas wrote:
>>> In this message Daniel mentions virDomainSnapshotXXX which would benefit
>>> from using the same "file" migration, but being done live:
>>>
>>> https://lore.kernel.org/r/zd7mrgq+4qsdb...@redhat.com
>>>
>>> And from your response here:
>>>  https://lore.kernel.org/r/ZEA759BSs75ldW6Y@x1n
>>>
>>> I had understood that having a new SUSPEND cap to decide whether to do
>>> it live or non-live would be enough to cover all use-cases.
>>
>> Oh, I probably lost some of the contexts there, sorry about that - so it's
>> about not being able to live snapshot on !LINUX worlds properly, am I
>> right?
>>
> 
> Right, so that gives us for now a reasonable use-case for keeping live
> migration behavior possible with "file:".
> 
>> In the ideal world where we can always synchronously tracking guest pages
>> (like what we do with userfaultfd wr-protections on modern Linux), the
>> !SUSPEND case should always be covered by CAP_BACKGROUND_SNAPSHOT already
>> in a more performant way.  IOW, !SUSPEND seems to be not useful to Linux,
>> because whenever we want to set !SUSPEND we should just use BG_SNAPSHOT.
>>
> 
> I agree.
> 
>> But I think indeed the live snapshot support is not good enough. Even on
>> Linux, it lacks different memory type supports, multi-process support, and
>> also no-go on very old kernels.  So I assume the fallback makes sense, and
>> then we can't always rely on that.
>>
>> Then I agree we can keep "file:" the same as others like proposed here, but
>> I'd like to double check with all of us so we're on the same page..
> 
> +1
> 
>> And maybe we should mention some discussions into commit message or
>> comments where proper in the code, so we can track what has happened
>> easier.
>>
> 
> I'll add some words where appropriate in my series as well. A v2 is
> already overdue with all the refactorings that have happened in the
> migration code.

Peter, should one of us proceed to submit the file URI as a stand-alone patch, 
since we both need it, and it has some value on its own? 

My version adds a watch on the incoming channel so we do not block monitor 
commands. 
It also adds tracepoints like the other URI's.

Fabiano's version adds a nice unit test.  

Maybe we should submit a small series with both.

- Steve

Re: [PATCH 05/12] hw/virtio: Add support for apple virtio-blk

2023-06-20 Thread Kevin Wolf

Am 20.06.2023 um 16:35 hat Stefan Hajnoczi geschrieben:
> On Wed, Jun 14, 2023 at 10:56:22PM +, Alexander Graf wrote:
> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > index 39e7f23fab..76b85bb3cb 100644
> > --- a/hw/block/virtio-blk.c
> > +++ b/hw/block/virtio-blk.c
> > @@ -1120,6 +1120,20 @@ static int virtio_blk_handle_request(VirtIOBlockReq 
> > *req, MultiReqBuffer *mrb)
> >  
> >  break;
> >  }
> > +case VIRTIO_BLK_T_APPLE1:
> > +{
> > +if (s->conf.x_apple_type) {
> > +/* Only valid on Apple Virtio */
> > +char buf[iov_size(in_iov, in_num)];
> 
> I'm concerned that a variable-sized stack buffer could be abused by a
> malicious guest. Even if it's harmless in the Apple use case, someone
> else might copy this approach and use it where it creates a security
> problem. Please either implement iov_memset() or allocate the temporary
> buffer using bdrv_blockalign() (and free it with qemu_vfree()).
> 
> > +memset(buf, 0, sizeof(buf));
> > +iov_from_buf(in_iov, in_num, 0, buf, sizeof(buf));
> > +virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);

Good point, even more so when iov_memset() should do the job with
simpler code.

Kevin


signature.asc
Description: PGP signature

Re: [PULL 49/52] exec/poison: Do not poison CONFIG_SOFTMMU

2023-06-20 Thread Peter Maydell

On Mon, 5 Jun 2023 at 21:23, Richard Henderson
 wrote:
>
> If CONFIG_USER_ONLY is ok generically, so is CONFIG_SOFTMMU,
> because they are exactly opposite.

This isn't quite right. CONFIG_USER_ONLY is theoretically
something we should poison, because it's unsafe in the general
case to use it in compiled-once source files. But in practice
we make quite a lot of use of it in "we know this specific
use of it is OK" situations, like ifdeffing out function
prototypes. So we'd like to poison it, but we can't poison
it without a huge amoun of refactoring which isn't really
worth the effort.

So it's not a good model for "therefore it's OK not to poison
CONFIG_SOFTMMU" -- we should leave that poisoned if we can,
so we don't introduce either new buggy uses of CONFIG_SOFTMMU,
or new "we know this is safe" uses of it which will make
it difficult to put it back into the poison-list later...

thanks
-- PMM

[PATCH] hw/xen: Clarify (lack of) error handling in transaction_commit()

2023-06-20 Thread David Woodhouse

From: David Woodhouse 

Coverity was unhappy (CID 1508359) because we didn't check the return of
init_walk_op() in transaction_commit(), despite doing so at every other
call site.

Strictly speaking, this is a false positive since it can never fail. It
only fails for invalid user input (transaction ID or path), and both of
those are hard-coded to known sane values in this invocation.

But Coverity doesn't know that, and neither does the casual reader of the
code.

Returning an error here would be weird, since the transaction *is*
committed by this point; all the walk_op is doing is firing watches on
the newly-committed changed nodes. So make it a g_assert(!ret), since
it really should never happen.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xenstore_impl.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xenstore_impl.c b/hw/i386/kvm/xenstore_impl.c
index 305fe75519..d9732b567e 100644
--- a/hw/i386/kvm/xenstore_impl.c
+++ b/hw/i386/kvm/xenstore_impl.c
@@ -1022,6 +1022,7 @@ static int transaction_commit(XenstoreImplState *s, 
XsTransaction *tx)
 {
 struct walk_op op;
 XsNode **n;
+int ret;
 
 if (s->root_tx != tx->base_tx) {
 return EAGAIN;
@@ -1032,7 +1033,16 @@ static int transaction_commit(XenstoreImplState *s, 
XsTransaction *tx)
 s->root_tx = tx->tx_id;
 s->nr_nodes = tx->nr_nodes;
 
-init_walk_op(s, , XBT_NULL, tx->dom_id, "/", );
+ret = init_walk_op(s, , XBT_NULL, tx->dom_id, "/", );
+/*
+ * There are two reasons why init_walk_op() may fail: an invalid tx_id,
+ * or an invalid path. We pass XBT_NULL and "/", and it cannot fail.
+ * If it does, the world is broken. And returning 'ret' would be weird
+ * because the transaction *was* committed, and all this tree walk is
+ * trying to do is fire the resulting watches on newly-committed nodes.
+ */
+g_assert(!ret);
+
 op.deleted_in_tx = false;
 op.mutating = true;
 
-- 
2.34.1




smime.p7s
Description: S/MIME cryptographic signature

Re: [QEMU PATCH 1/1] virtgpu: do not destroy resources when guest suspend

2023-06-20 Thread Kim, Dongwon


Hello,

I just came across this discussion regarding s3/s4 support in virtio-gpu 
driver and QEMU.


We saw similar problem a while ago (QEMU deletes all objects upon 
suspension) and


came up with an experimental solution that is basically making 
virtio-gpu driver to do object creation


for all existing resources once VM is resumed so that he QEMU recreate them.

This method has worked pretty well on our case. I submitted patches for 
this to dri-devel a while ago.


[RFC PATCH 0/2] drm/virtio:virtio-gpu driver freeze-and-restore 
implementation (lists.freedesktop.org) 



This is kernel driver only solution. Nothing has to be changed in QEMU.

Jiqian and other reviewers, can you check this old solution we suggested 
as well?


On 6/20/2023 5:26 AM, Robert Beckett wrote:



On 20/06/2023 10:41, Gerd Hoffmann wrote:

   Hi,


The guest driver should be able to restore resources after resume.

Thank you for your suggestion!
As far as I know, resources are created on host side and guest has 
no backup, if resources are destroyed, guest can't restore them.
Or do you mean guest driver need to send commands to re-create 
resources after resume?

The later.  The guest driver knows which resources it has created,
it can restore them after suspend.



Are you sure that this is viable?

How would you propose that a guest kernel could reproduce a resource, 
including pixel data upload during a resume?


The kernel would not have any of the pixel data to transfer to host. 
This is normally achieved by guest apps calling GL calls and mesa 
asking the kernel to create the textures with the given data (often 
read from a file).
If your suggestion is to get the userland application to do it, that 
would entirely break how suspend/resume is meant to happen. It should 
be transparent to userland applications for the most part.


Could you explain how you anticipate the guest being able to reproduce 
the resources please?






If so, I have some questions. Can guest re-create resources by using
object(virtio_vpu_object) or others? Can the new resources replace the
destroyed resources to continue the suspended display tasks after
resume?

Any display scanout information will be gone too, the guest driver needs
re-create this too (after re-creating the resources).

take care,
   Gerd

Re: [PULL 05/27] hw/xen: Watches on XenStore transactions

2023-06-20 Thread David Woodhouse

On Tue, 2023-06-20 at 13:19 +0100, Peter Maydell wrote:
> On Fri, 2 Jun 2023 at 18:06, Peter Maydell 
> wrote:
> > 
> > On Tue, 2 May 2023 at 18:08, Peter Maydell
> >  wrote:
> > > 
> > > On Tue, 7 Mar 2023 at 18:27, David Woodhouse
> > >  wrote:
> > > > 
> > > > From: David Woodhouse 
> 
> > > Hi; Coverity's "is there missing error handling?"
> > > heuristic fired for a change in this code (CID 1508359):
> > > 
> > > >  static int transaction_commit(XenstoreImplState *s,
> > > > XsTransaction *tx)
> > > >  {
> > > > +    struct walk_op op;
> > > > +    XsNode **n;
> > > > +
> > > >  if (s->root_tx != tx->base_tx) {
> > > >  return EAGAIN;
> > > >  }
> > > > @@ -720,10 +861,18 @@ static int
> > > > transaction_commit(XenstoreImplState *s, XsTransaction *tx)
> > > >  s->root_tx = tx->tx_id;
> > > >  s->nr_nodes = tx->nr_nodes;
> > > > 
> > > > +    init_walk_op(s, , XBT_NULL, tx->dom_id, "/", );
> > > 
> > > This is the only call to init_walk_op() which ignores its
> > > return value. Intentional, or missing error handling?
> > 
> > Hi -- I was going through the unclassified Coverity issues
> > again today, and this one's still on the list. Is this a
> > bug, or intentional?
> 
> Ping^3 -- is this a false positive, or something to be fixed?
> It would be nice to be able to classify the coverity issue
> appropriately.

Oops, sorry for the delay. 

It is arguably a false positive.

There are two cases where init_walk_op() can fail:

 • It's given a transaction ID which doesn't exist. But in this case
   it's actually given XBT_NULL because the transaction is *already*
   committed and all we're doing is setting up a tree walk to fire
   watches on the newly-committed changed nodes.
or,
 •  The given path is invalid. Which it isn't here because we pass a
hard-coded "/".

I was about to stick in the standard if(ret){return ret;} but the
semantics of that would be a bit bizarre.

As noted, by this point the transaction *was* committed already. So all
that gets aborted is the *watches* that were supposed to fire on
changed nodes. Returning an error in that case would be a bit weird.

So I'll go for a g_assert(!ret) with a comment about why. Patch
follows.

I shall also have another go at frowning at the soft-reset locking vs.
the timer and other code, and seeing if I win this time...

smime.p7s
Description: S/MIME cryptographic signature

[PATCH] Revert "cputlb: Restrict SavedIOTLB to system emulation"

2023-06-20 Thread Peter Maydell

This reverts commit d7ee93e24359703debf4137f4cc632563aa4e8d1.

That commit tries to make a field in the CPUState struct not be
present when CONFIG_USER_ONLY is set.  Unfortunately, you can't
conditionally omit fields in structs like this based on ifdefs that
are set per-target.  If you try it, then code in files compiled
per-target (where CONFIG_USER_ONLY is or can be set) will disagree
about the struct layout with files that are compiled once-only (where
this kind of ifdef is never set).

This manifests specifically in 'make check-tcg' failing, because code
in cpus-common.c that sets up the CPUState::cpu_index field puts it
at a different offset from the code in plugins/core.c in
qemu_plugin_vcpu_init_hook() which reads the cpu_index field.  The
latter then hits an assert because from its point of view every
thread has a 0 cpu_index. There might be other weird behaviour too.

Mostly we catch this kind of bug because the CONFIG_whatever is
listed in include/exec/poison.h and so the reference to it in
build-once source files will then cause a compiler error.
Unfortunately CONFIG_USER_ONLY is an exception to that: we have some
places where we use it in "safe" ways in headers that will be seen by
once-only source files (e.g.  ifdeffing out function prototypes) and
it would be a lot of refactoring to be able to get to a position
where we could poison it.  This leaves us in a "you have to be
careful to walk around the bear trap" situation...

Fixes: d7ee93e243597 ("cputlb: Restrict SavedIOTLB to system emulation")
Signed-off-by: Peter Maydell 
---
 include/hw/core/cpu.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index ee8d6b40b3b..4871ad85f07 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -226,7 +226,7 @@ struct CPUWatchpoint {
 QTAILQ_ENTRY(CPUWatchpoint) entry;
 };
 
-#if defined(CONFIG_PLUGIN) && !defined(CONFIG_USER_ONLY)
+#ifdef CONFIG_PLUGIN
 /*
  * For plugins we sometime need to save the resolved iotlb data before
  * the memory regions get moved around  by io_writex.
@@ -410,11 +410,9 @@ struct CPUState {
 
 #ifdef CONFIG_PLUGIN
 GArray *plugin_mem_cbs;
-#if !defined(CONFIG_USER_ONLY)
 /* saved iotlb data from io_writex */
 SavedIOTLB saved_iotlb;
-#endif /* !CONFIG_USER_ONLY */
-#endif /* CONFIG_PLUGIN */
+#endif
 
 /* TODO Move common fields from CPUArchState here. */
 int cpu_index;
-- 
2.34.1

[PATCH qemu v2] change the fdt_load_addr variable datatype to handle 64-bit DRAM address

2023-06-20 Thread ~rlakshmibai

From: Lakshmi Bai Raja Subramanian 


fdt_load_addr is getting overflowed when there is no DRAM at lower 32 bit 
address space.
To support pure 64-bit DRAM address, fdt_load_addr variable's data type is 
changed to uint64_t
instead of uint32_t.

Signed-off-by: Lakshmi Bai Raja Subramanian 

---
 hw/riscv/virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 95708d890e..c348529ac0 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1244,7 +1244,7 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
 target_ulong start_addr = memmap[VIRT_DRAM].base;
 target_ulong firmware_end_addr, kernel_start_addr;
 const char *firmware_name = riscv_default_firmware_name(>soc[0]);
-uint32_t fdt_load_addr;
+uint64_t fdt_load_addr;
 uint64_t kernel_entry = 0;
 BlockBackend *pflash_blk0;
 
-- 
2.38.5

Re: 'make check-tcg' fails with an assert in qemu_plugin_vcpu_init_hook

2023-06-20 Thread Peter Maydell

On Tue, 20 Jun 2023 at 17:56, Peter Maydell  wrote:
>
> $ make -C build/x86 check-tcg
> make: Entering directory 
> '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86'
> [...]
>   TESTmunmap-pthread on arm
> **
> ERROR:../../plugins/core.c:221:qemu_plugin_vcpu_init_hook: assertion
> failed: (success)
> **
> ERROR:../../accel/tcg/cpu-exec.c:1024:cpu_exec_setjmp: assertion
> failed: (cpu == current_cpu)

git bisect blames commit d7ee93e2435970:

cputlb: Restrict SavedIOTLB to system emulation

I think that commit is not correct, because it means that
the size of 'struct CPUState' and also the offset of fields
like 'cpu_index' will be different for files which are
compile-per-target-for-usermode and files which are
compile-once-only. The assert happens here because the
code which sets up cpu_index is build-once, but the code
in qemu_plugin_vcpu_init_hook() which reads cpu_index is
build-per-target and now they don't agree about where in
the struct the field is...

Reverting the commit fixes the bug.

thanks
-- PMM

Re: [PATCH] git-submodule.sh: allow running in validate mode without previous update

2023-06-20 Thread Nina Schoetterl-Glausch

On Sun, 2023-06-18 at 23:20 +0200, Paolo Bonzini wrote:
> The call to git-submodule.sh done in configure may happen without a
> previous checkout of the roms/SLOF submodule, or even without a
> previous run of the script.
> 
> So, handle creating a .git-submodule-status file even in validate
> mode.  If git is absent, ensure that all passed directories exists
> (because you should be in a fresh untar and will not have stale
> arguments to git-submodule.sh) but do no other checks.  If git
> is present, ensure that .git-submodule-status contains an entry
> for all submodules passed on the command line.
> 
> With this change, "ignore" mode is not needed anymore.
> 
> Reported-by: Nina Schoetterl-Glausch 
> Fixes: b11f9bd96f4 ("configure: move SLOF submodule handling to 
> pc-bios/s390-ccw", 2023-06-06)
> Signed-off-by: Paolo Bonzini 
> ---
>  configure|  2 +-
>  scripts/git-submodule.sh | 72 ++--
>  2 files changed, 41 insertions(+), 33 deletions(-)
> 
> diff --git a/configure b/configure
> index 86363a7e508..2b41c49c0d1 100755
> --- a/configure
> +++ b/configure
> @@ -758,7 +758,7 @@ done
>  
>  if ! test -e "$source_path/.git"
>  then
> -git_submodules_action="ignore"
> +git_submodules_action="validate"
>  fi
>  
>  # test for any invalid configuration combinations
> diff --git a/scripts/git-submodule.sh b/scripts/git-submodule.sh
> index 11fad2137cd..c33d8fe4cac 100755
> --- a/scripts/git-submodule.sh
> +++ b/scripts/git-submodule.sh
> @@ -9,13 +9,22 @@ command=$1
>  shift
>  maybe_modules="$@"
>  
> -# if not running in a git checkout, do nothing
> -test "$command" = "ignore" && exit 0
> -
> +test -z "$maybe_modules" && exit 0
>  test -z "$GIT" && GIT=$(command -v git)
>  
>  cd "$(dirname "$0")/.."
>  
> +no_git_error=
> +if test -n "$maybe_modules" && ! test -e ".git"; then
> +no_git_error='no git checkout exists'
> +elif test -n "$maybe_modules" && test -z "$GIT"; then
> +no_git_error='git binary not found'
> +fi

No need to test -n "$maybe_modules" if you exit early above.

> +
> +is_git() {
> +test -z "$no_git_error"
> +}
> +
>  update_error() {
>  echo "$0: $*"
>  echo
> @@ -34,7 +43,7 @@ update_error() {
>  }
>  
>  validate_error() {
> -if test "$1" = "validate"; then
> +if is_git && test "$1" = "validate"; then
>  echo "GIT submodules checkout is out of date, and submodules"
>  echo "configured for validate only. Please run"
>  echo "  scripts/git-submodule.sh update $maybe_modules"
> @@ -51,42 +60,41 @@ check_updated() {
>  test "$CURSTATUS" = "$OLDSTATUS"
>  }
>  
> -if test -n "$maybe_modules" && ! test -e ".git"
> -then
> -echo "$0: unexpectedly called with submodules but no git checkout exists"
> -exit 1
> +if is_git; then
> +test -e $substat || touch $substat
> +modules=""
> +for m in $maybe_modules
> +do
> +$GIT submodule status $m 1> /dev/null 2>&1
> +if test $? = 0
> +then
> +modules="$modules $m"
> +grep $m $substat > /dev/null 2>&1 || $GIT submodule status 
> $module >> $substat
> +else
> +echo "warn: ignoring non-existent submodule $m"

What is the rational for ignoring non-existing submodules, i.e. how do the 
arguments to
the script go stale as you say in the patch description?
I'm asking because the fedora spec file initializes a new git repo in order to 
apply
patches so the script exits with 0.
Nothing that cannot be worked around ofc.

> +fi
> +done
> +else
> +modules=$maybe_modules
>  fi
>  
> -if test -n "$maybe_modules" && test -z "$GIT"
> -then
> -echo "$0: unexpectedly called with submodules but git binary not found"
> -exit 1
> -fi
> -
> -modules=""
> -for m in $maybe_modules
> -do
> -$GIT submodule status $m 1> /dev/null 2>&1
> -if test $? = 0
> -then
> -modules="$modules $m"
> -else
> -echo "warn: ignoring non-existent submodule $m"
> -fi
> -done
> -
>  case "$command" in
>  status|validate)
> -test -f "$substat" || validate_error "$command"
> -test -z "$maybe_modules" && exit 0
>  for module in $modules; do
> -check_updated $module || validate_error "$command"
> +if is_git; then
> +check_updated $module || validate_error "$command"
> +elif ! test -d $module; then

archive-source.sh creates an empty directory for e.g. roms/SLOF,
so this check succeeds even if the submodule sources are unavailable.
Something like

elif ! test -d $module || test -z "$(ls -A "$module")"; then

works.

> +echo "$0: sources not available for $module and $no_git_error"
> +validate_error "$command"
> +fi
>  done
> -exit 0
>  ;;
> +
>  update)
> -test -e $substat || touch $substat
> -test -z "$maybe_modules" && exit 0
> +is_git || {
> +echo "$0: unexpectedly called with submodules but $no_git_error"
> +

Re: [PATCH v4 1/1] hw/arm/sbsa-ref: use XHCI to replace EHCI

2023-06-20 Thread Leif Lindholm


Hi Peter,

On 2023-06-19 13:47, Peter Maydell wrote:

On Wed, 7 Jun 2023 at 03:34, Yuquan Wang  wrote:


The current sbsa-ref cannot use EHCI controller which is only
able to do 32-bit DMA, since sbsa-ref doesn't have RAM below 4GB.
Hence, this uses system bus XHCI to provide a usb controller with
64-bit DMA capablity instead of EHCI.


"capability"


Signed-off-by: Yuquan Wang 


The change itself looks good. We could probably mention in
the commit message that existing firmware/kernel images
still work (with no USB support) with this change.

Is this the sort of change that we should increase the
machine-version-minor for ? The comment says "updated
when features are added that don't break fw compatibility"
and this sounds like one of those.

Leif, do you think we should bump the minor version here?


I think that makes sense, yes.

/
Leif

[PATCH V3 1/4] util: strList_from_string

2023-06-20 Thread Steve Sistare

Generalize hmp_split_at_comma() to take any delimiter string, rename
as strList_from_string(), and move it to util/strList.c.

No functional change.

Signed-off-by: Steve Sistare 
---
 include/monitor/hmp.h  |  1 -
 include/qemu/strList.h | 24 
 monitor/hmp-cmds.c | 19 ---
 net/net-hmp-cmds.c |  3 ++-
 stats/stats-hmp-cmds.c |  3 ++-
 util/meson.build   |  1 +
 util/strList.c | 24 
 7 files changed, 53 insertions(+), 22 deletions(-)
 create mode 100644 include/qemu/strList.h
 create mode 100644 util/strList.c

diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 13f9a2d..2df661e 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -19,7 +19,6 @@
 
 bool hmp_handle_error(Monitor *mon, Error *err);
 void hmp_help_cmd(Monitor *mon, const char *name);
-strList *hmp_split_at_comma(const char *str);
 
 void hmp_info_name(Monitor *mon, const QDict *qdict);
 void hmp_info_version(Monitor *mon, const QDict *qdict);
diff --git a/include/qemu/strList.h b/include/qemu/strList.h
new file mode 100644
index 000..1f4c11d
--- /dev/null
+++ b/include/qemu/strList.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2022, 2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_STR_LIST_H
+#define QEMU_STR_LIST_H
+
+#include "qapi/qapi-builtin-types.h"
+
+/*
+ * Break @in into a strList using the delimiter string @delim.
+ * The delimiter is not included in the result.
+ * Return NULL if @in is NULL or an empty string.
+ * A leading, trailing, or consecutive delimiter produces an
+ * empty string at that position in the output.
+ * All strings are g_strdup'd, and the result can be freed
+ * using qapi_free_strList.
+ */
+strList *strList_from_string(const char *in, const char *delim);
+
+#endif
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 6c559b4..1e833f9 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -39,25 +39,6 @@ bool hmp_handle_error(Monitor *mon, Error *err)
 return false;
 }
 
-/*
- * Split @str at comma.
- * A null @str defaults to "".
- */
-strList *hmp_split_at_comma(const char *str)
-{
-char **split = g_strsplit(str ?: "", ",", -1);
-strList *res = NULL;
-strList **tail = 
-int i;
-
-for (i = 0; split[i]; i++) {
-QAPI_LIST_APPEND(tail, split[i]);
-}
-
-g_free(split);
-return res;
-}
-
 void hmp_info_name(Monitor *mon, const QDict *qdict)
 {
 NameInfo *info;
diff --git a/net/net-hmp-cmds.c b/net/net-hmp-cmds.c
index 41d326b..e893801 100644
--- a/net/net-hmp-cmds.c
+++ b/net/net-hmp-cmds.c
@@ -26,6 +26,7 @@
 #include "qemu/config-file.h"
 #include "qemu/help_option.h"
 #include "qemu/option.h"
+#include "qemu/strList.h"
 
 void hmp_info_network(Monitor *mon, const QDict *qdict)
 {
@@ -72,7 +73,7 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
 migrate_announce_params());
 
 qapi_free_strList(params->interfaces);
-params->interfaces = hmp_split_at_comma(interfaces_str);
+params->interfaces = strList_from_string(interfaces_str, ",");
 params->has_interfaces = params->interfaces != NULL;
 params->id = g_strdup(id);
 qmp_announce_self(params, NULL);
diff --git a/stats/stats-hmp-cmds.c b/stats/stats-hmp-cmds.c
index 1f91bf8..428c0e6 100644
--- a/stats/stats-hmp-cmds.c
+++ b/stats/stats-hmp-cmds.c
@@ -10,6 +10,7 @@
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "qemu/cutils.h"
+#include "qemu/strList.h"
 #include "hw/core/cpu.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/error.h"
@@ -176,7 +177,7 @@ static StatsFilter *stats_filter(StatsTarget target, const 
char *names,
 request->provider = provider_idx;
 if (names && !g_str_equal(names, "*")) {
 request->has_names = true;
-request->names = hmp_split_at_comma(names);
+request->names = strList_from_string(names, ",");
 }
 QAPI_LIST_PREPEND(request_list, request);
 }
diff --git a/util/meson.build b/util/meson.build
index 3a93071..960f233 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -1,4 +1,5 @@
 util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c'))
+util_ss.add(files('strList.c'))
 util_ss.add(files('thread-context.c'), numa)
 if not config_host_data.get('CONFIG_ATOMIC64')
   util_ss.add(files('atomic64.c'))
diff --git a/util/strList.c b/util/strList.c
new file mode 100644
index 000..217746e
--- /dev/null
+++ b/util/strList.c
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2023 Red Hat, Inc.
+ * Copyright (c) 2022, 2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include

[PATCH V3 4/4] util: strList unit tests

2023-06-20 Thread Steve Sistare

Signed-off-by: Steve Sistare 
Reviewed-by: Marc-André Lureau 
---
 tests/unit/meson.build|  1 +
 tests/unit/test-strList.c | 80 +++
 2 files changed, 81 insertions(+)
 create mode 100644 tests/unit/test-strList.c

diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 93977cc..972f2e6 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -34,6 +34,7 @@ tests = {
   'test-rcu-simpleq': [],
   'test-rcu-tailq': [],
   'test-rcu-slist': [],
+  'test-strList': [],
   'test-qdist': [],
   'test-qht': [],
   'test-qtree': [],
diff --git a/tests/unit/test-strList.c b/tests/unit/test-strList.c
new file mode 100644
index 000..56df52b
--- /dev/null
+++ b/tests/unit/test-strList.c
@@ -0,0 +1,80 @@
+/*
+ * Copyright (c) 2022, 2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/strList.h"
+
+static strList *make_list(int length)
+{
+strList *head = 0, *list, **prev = 
+
+while (length--) {
+list = *prev = g_new0(strList, 1);
+list->value = g_strdup("aaa");
+prev = >next;
+}
+return head;
+}
+
+static void test_length(void)
+{
+strList *list;
+int i;
+
+for (i = 0; i < 5; i++) {
+list = make_list(i);
+g_assert_cmpint(i, ==, QAPI_LIST_LENGTH(list));
+qapi_free_strList(list);
+}
+}
+
+struct {
+const char *string;
+const char *delim;
+const char *args[5];
+} list_data[] = {
+{ 0, ",", { 0 } },
+{ "", ",", { 0 } },
+{ "a", ",", { "a", 0 } },
+{ "a,b", ",", { "a", "b", 0 } },
+{ "a,b,c", ",", { "a", "b", "c", 0 } },
+{ "first last", " ", { "first", "last", 0 } },
+{ "a:", ":", { "a", "", 0 } },
+{ "a::b", ":", { "a", "", "b", 0 } },
+{ ":", ":", { "", "", 0 } },
+{ ":a", ":", { "", "a", 0 } },
+{ "::a", ":", { "", "", "a", 0 } },
+};
+
+static void test_strv(void)
+{
+int i, j;
+const char **expect;
+strList *list;
+GStrv args;
+
+for (i = 0; i < ARRAY_SIZE(list_data); i++) {
+expect = list_data[i].args;
+list = strList_from_string(list_data[i].string, list_data[i].delim);
+args = strv_from_strList(list);
+qapi_free_strList(list);
+for (j = 0; expect[j] && args[j]; j++) {
+g_assert_cmpstr(expect[j], ==, args[j]);
+}
+g_assert_null(expect[j]);
+g_assert_null(args[j]);
+g_strfreev(args);
+}
+}
+
+int main(int argc, char **argv)
+{
+g_test_init(, , NULL);
+g_test_add_func("/test-string/length", test_length);
+g_test_add_func("/test-string/strv", test_strv);
+return g_test_run();
+}
-- 
1.8.3.1

[PATCH V3 3/4] util: strv_from_strList

2023-06-20 Thread Steve Sistare

Signed-off-by: Steve Sistare 
Reviewed-by: Marc-André Lureau 
---
 include/qemu/strList.h |  6 ++
 util/strList.c | 14 ++
 2 files changed, 20 insertions(+)

diff --git a/include/qemu/strList.h b/include/qemu/strList.h
index 1f4c11d..629d76b 100644
--- a/include/qemu/strList.h
+++ b/include/qemu/strList.h
@@ -21,4 +21,10 @@
  */
 strList *strList_from_string(const char *in, const char *delim);
 
+/*
+ * Produce and return a NULL-terminated array of strings from @args.
+ * The result is g_malloc'd and all strings are g_strdup'd.
+ */
+GStrv strv_from_strList(const strList *args);
+
 #endif
diff --git a/util/strList.c b/util/strList.c
index 217746e..be40e02 100644
--- a/util/strList.c
+++ b/util/strList.c
@@ -22,3 +22,17 @@ strList *strList_from_string(const char *str, const char 
*delim)
 
 return res;
 }
+
+GStrv strv_from_strList(const strList *args)
+{
+const strList *arg;
+int i = 0;
+GStrv argv = g_new(char *, QAPI_LIST_LENGTH(args) + 1);
+
+for (arg = args; arg != NULL; arg = arg->next) {
+argv[i++] = g_strdup(arg->value);
+}
+argv[i] = NULL;
+
+return argv;
+}
-- 
1.8.3.1

[PATCH V3 0/4] string list functions

2023-06-20 Thread Steve Sistare

Add some handy string list functions, for general use now, and for
eventual use in the cpr/live update patches.

Steve Sistare (4):
  util: strList_from_string
  qapi: QAPI_LIST_LENGTH
  util: strv_from_strList
  util: strList unit tests

 include/monitor/hmp.h |  1 -
 include/qapi/util.h   | 13 
 include/qemu/strList.h| 30 ++
 monitor/hmp-cmds.c| 19 ---
 net/net-hmp-cmds.c|  3 +-
 stats/stats-hmp-cmds.c|  3 +-
 tests/unit/meson.build|  1 +
 tests/unit/test-strList.c | 80 +++
 util/meson.build  |  1 +
 util/strList.c| 38 ++
 10 files changed, 167 insertions(+), 22 deletions(-)
 create mode 100644 include/qemu/strList.h
 create mode 100644 tests/unit/test-strList.c
 create mode 100644 util/strList.c

-- 
1.8.3.1

[PATCH V3 2/4] qapi: QAPI_LIST_LENGTH

2023-06-20 Thread Steve Sistare

Signed-off-by: Steve Sistare 
Reviewed-by: Marc-André Lureau 
---
 include/qapi/util.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/qapi/util.h b/include/qapi/util.h
index 81a2b13..e1b8b1d 100644
--- a/include/qapi/util.h
+++ b/include/qapi/util.h
@@ -56,4 +56,17 @@ int parse_qapi_name(const char *name, bool complete);
 (tail) = &(*(tail))->next; \
 } while (0)
 
+/*
+ * For any GenericList @list, return its length.
+ */
+#define QAPI_LIST_LENGTH(list) \
+({ \
+int len = 0; \
+typeof(list) elem; \
+for (elem = list; elem != NULL; elem = elem->next) { \
+len++; \
+} \
+len; \
+})
+
 #endif
-- 
1.8.3.1

Re: 'make check-tcg' fails with an assert in qemu_plugin_vcpu_init_hook

2023-06-20 Thread Richard Henderson


On 6/20/23 18:56, Peter Maydell wrote:

ERROR:../../accel/tcg/cpu-exec.c:1024:cpu_exec_setjmp: assertion
failed: (cpu == current_cpu)

...

The assertion in cpu-exec.c is interesting too and may or
may not be relevant.


FWIW, the last time I saw this the stack had been clobbered and the saved value of "cpu" 
was garbage.  There is very very little in cpu_exec_setjmp() by design.



r~

'make check-tcg' fails with an assert in qemu_plugin_vcpu_init_hook

2023-06-20 Thread Peter Maydell

$ make -C build/x86 check-tcg
make: Entering directory '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86'
[...]
  TESTmunmap-pthread on arm
**
ERROR:../../plugins/core.c:221:qemu_plugin_vcpu_init_hook: assertion
failed: (success)
**
ERROR:../../accel/tcg/cpu-exec.c:1024:cpu_exec_setjmp: assertion
failed: (cpu == current_cpu)

Here's the backtrace:

#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140737332028096) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737332028096) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737332028096,
signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x76fc1476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x76fa77f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x77497b57 in g_assertion_message (domain=,
file=, line=,
func=0x55800d50 <__func__.3> "qemu_plugin_vcpu_init_hook",
message=) at ../../../glib/gtestutils.c:3253
#6  0x774f170f in g_assertion_message_expr (domain=0x0,
file=0x55800ccf "../../plugins/core.c", line=221,
func=0x55800d50 <__func__.3> "qemu_plugin_vcpu_init_hook",
expr=) at ../../../glib/gtestutils.c:3279
#7  0x556e5747 in qemu_plugin_vcpu_init_hook
(cpu=0x559d0450) at ../../plugins/core.c:221
#8  0x556a9cc3 in cpu_exec_realizefn (cpu=0x559d0450,
errp=0x7fffc630) at ../../cpu.c:153
#9  0x555a44ef in arm_cpu_realizefn (dev=0x559d0450,
errp=0x7fffc780) at ../../target/arm/cpu.c:1673
#10 0x5572ef2e in device_set_realized (obj=0x559d0450,
value=true, errp=0x7fffc9b8) at ../../hw/core/qdev.c:510
#11 0x5573931b in property_set_bool (obj=0x559d0450,
v=0x559c0d40, name=0x5580ef41 "realized",
opaque=0x55924870,
errp=0x7fffc9b8) at ../../qom/object.c:2285
#12 0x55737212 in object_property_set (obj=0x559d0450,
name=0x5580ef41 "realized", v=0x559c0d40, errp=0x7fffc9b8)
at ../../qom/object.c:1420
#13 0x5573b861 in object_property_set_qobject
(obj=0x559d0450, name=0x5580ef41 "realized",
value=0x5592bc90, errp=0x7fffc9b8)
at ../../qom/qom-qobject.c:28
#14 0x55737591 in object_property_set_bool
(obj=0x559d0450, name=0x5580ef41 "realized", value=true,
errp=0x7fffc9b8)
at ../../qom/object.c:1489
#15 0x5572e6bc in qdev_realize (dev=0x559d0450, bus=0x0,
errp=0x7fffc9b8) at ../../hw/core/qdev.c:292
#16 0x5559a65c in cpu_create (typename=0x5591c5c0
"max-arm-cpu") at ../../hw/core/cpu-common.c:61
#17 0x556f1712 in cpu_copy (env=0x55953d80) at
../../linux-user/main.c:231
#18 0x55711c4e in do_fork (env=0x55953d80, flags=4001536,
newsp=1073734008, parent_tidptr=1073735528, newtls=1073736832,
child_tidptr=1073735528) at ../../linux-user/syscall.c:6672
#19 0x5571cdea in do_syscall1 (cpu_env=0x55953d80,
num=120, arg1=4001536, arg2=1073734008, arg3=1073735528,
arg4=1073736832,
arg5=1073735528, arg6=1082129932, arg7=0, arg8=0) at
../../linux-user/syscall.c:10869
#20 0x557243f2 in do_syscall (cpu_env=0x55953d80, num=120,
arg1=4001536, arg2=1073734008, arg3=1073735528, arg4=1073736832,
arg5=1073735528, arg6=1082129932, arg7=0, arg8=0) at
../../linux-user/syscall.c:13610
#21 0x555a1616 in cpu_loop (env=0x55953d80) at
../../linux-user/arm/cpu_loop.c:434
#22 0x556f2ee0 in main (argc=2, argv=0x7fffde68,
envp=0x7fffde80) at ../../linux-user/main.c:973

AFAICT this is happening because we try to insert an entry
into the plugin.cpu_ht hash table whose key is cpu->cpu_index.
But in this "new thread" codepath, the new thread's
cpu_index is 0, which is the same as the old thread's
cpu_index...

The assertion in cpu-exec.c is interesting too and may or
may not be relevant.

thanks
-- PMM

Re: [PATCH 1/4] target/ppc: Fix instruction loading endianness in alignment interrupt

2023-06-20 Thread Nicholas Piggin

On Wed Jun 21, 2023 at 12:26 AM AEST, BALATON Zoltan wrote:
> On Tue, 20 Jun 2023, Nicholas Piggin wrote:
> > powerpc ifetch endianness depends on MSR[LE] so it has to byteswap
> > after cpu_ldl_code(). This corrects DSISR bits in alignment
> > interrupts when running in little endian mode.
> >
> > Reviewed-by: Fabiano Rosas 
> > Signed-off-by: Nicholas Piggin 
> > ---
> > target/ppc/excp_helper.c | 22 +-
> > 1 file changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> > index 12d8a7257b..a2801f6e6b 100644
> > --- a/target/ppc/excp_helper.c
> > +++ b/target/ppc/excp_helper.c
> > @@ -133,6 +133,26 @@ static void dump_hcall(CPUPPCState *env)
> >   env->nip);
> > }
> >
> > +#ifdef CONFIG_TCG
> > +/* Return true iff byteswap is needed to load instruction */
> > +static inline bool insn_need_byteswap(CPUArchState *env)
> > +{
> > +/* SYSTEM builds TARGET_BIG_ENDIAN. Need to swap when MSR[LE] is set */
> > +return !!(env->msr & ((target_ulong)1 << MSR_LE));
> > +}
>
> Don't other places typically use FIELD_EX64 to test for msr bits now? If 

Yeah I should use that, good point. There's at least another case in
that file that doesn't use it but I probably added that too :/

> this really only tests for the LE bit and used only once do we need a new 
> function for that? I don't quite like trivial one line functions unless it 
> does something more complex Because if just makes code harder to read as I 
> have to look up what these do when I could just see it right away where it 
> used without these functions.

It's based on mem_helper.c, which is familiar pattern/name so I 
might keep it. Maybe not, I'll check. I'm on the fence.

> > +
> > +static uint32_t ppc_ldl_code(CPUArchState *env, hwaddr addr)
> > +{
> > +uint32_t insn = cpu_ldl_code(env, addr);
> > +
> > +if (insn_need_byteswap(env)) {
> > +insn = bswap32(insn);
> > +}
> > +
> > +return insn;
> > +}
> > +#endif
>
> Along the same lines I'm not sure this wrapper is needed unless this is a 
> recurring operation. Otherwise you could just add the if and the comment 
> below at the single place where this is needed. If this will be needed at 
> more places later then adding a function may make sense but otherwise I'd 
> avoid making code tangled with single line functions defined away from 
> where they are used as it's simpler to just have the if and swap at the 
> single place where it's needed than adding two new functions that I'd had 
> to look up and comprehend first to see what's happening. (It also would be 
> just 3 lines instead of 20 that way.)

It does get used in a couple more places later. Few-line
"abstraction" used once isn't necessarily wrong though.

Thanks,
Nick

Re: [PATCH] pc-bios/keymaps: Use the official xkb name for Arabic layout, not the legacy synonym

2023-06-20 Thread Philippe Mathieu-Daudé


On 20/6/23 18:20, Peter Maydell wrote:

The xkb official name for the Arabic keyboard layout is 'ara'.
However xkb has for at least the past 15 years also permitted it to
be named via the legacy synonym 'ar'.  In xkeyboard-config 2.39 this
synoynm was removed, which breaks compilation of QEMU:

FAILED: pc-bios/keymaps/ar
/home/fred/qemu-git/src/qemu/build-full/qemu-keymap -f pc-bios/keymaps/ar -l ar
xkbcommon: ERROR: Couldn't find file "symbols/ar" in include paths
xkbcommon: ERROR: 1 include paths searched:
xkbcommon: ERROR:   /usr/share/X11/xkb
xkbcommon: ERROR: 3 include paths could not be added:
xkbcommon: ERROR:   /home/fred/.config/xkb
xkbcommon: ERROR:   /home/fred/.xkb
xkbcommon: ERROR:   /etc/xkb
xkbcommon: ERROR: Abandoning symbols file "(unnamed)"
xkbcommon: ERROR: Failed to compile xkb_symbols
xkbcommon: ERROR: Failed to compile keymap

The upstream xkeyboard-config change removing the compat
mapping is:
https://gitlab.freedesktop.org/xkeyboard-config/xkeyboard-config/-/commit/470ad2cd8fea84d7210377161d86b31999bb5ea6

Make QEMU always ask for the 'ara' xkb layout, which should work on
both older and newer xkeyboard-config.  We leave the QEMU name for
this keyboard layout as 'ar'; it is not the only one where our name
for it deviates from the xkb standard name.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1709
---
  pc-bios/keymaps/meson.build | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH 1/3] exec/memory: Add symbolic value for memory listener priority for accel

2023-06-20 Thread Isaku Yamahata

Add MEMORY_LISTNER_PRIORITY_ACCEL for the symbolic value for the memory
listener to replace the hard-coded value 10 for accel.

No functional change intended.

Signed-off-by: Isaku Yamahata 
---
 accel/hvf/hvf-accel-ops.c   | 2 +-
 accel/kvm/kvm-all.c | 2 +-
 hw/arm/xen_arm.c| 2 +-
 hw/i386/xen/xen-hvm.c   | 2 +-
 hw/xen/xen-hvm-common.c | 2 +-
 hw/xen/xen_pt.c | 4 ++--
 include/exec/memory.h   | 2 ++
 target/i386/hax/hax-mem.c   | 2 +-
 target/i386/nvmm/nvmm-all.c | 2 +-
 target/i386/whpx/whpx-all.c | 2 +-
 10 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index 9c3da03c948f..c0c51841a615 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -304,7 +304,7 @@ static void hvf_region_del(MemoryListener *listener,
 
 static MemoryListener hvf_memory_listener = {
 .name = "hvf",
-.priority = 10,
+.priority = MEMORY_LISTENER_PRIORITY_ACCEL,
 .region_add = hvf_region_add,
 .region_del = hvf_region_del,
 .log_start = hvf_log_start,
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7679f397aec0..36ed4ca246b5 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1775,7 +1775,7 @@ void kvm_memory_listener_register(KVMState *s, 
KVMMemoryListener *kml,
 kml->listener.commit = kvm_region_commit;
 kml->listener.log_start = kvm_log_start;
 kml->listener.log_stop = kvm_log_stop;
-kml->listener.priority = 10;
+kml->listener.priority = MEMORY_LISTENER_PRIORITY_ACCEL;
 kml->listener.name = name;
 
 if (s->kvm_dirty_ring_size) {
diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
index 19b1cb81ade9..044093fec75d 100644
--- a/hw/arm/xen_arm.c
+++ b/hw/arm/xen_arm.c
@@ -45,7 +45,7 @@ static MemoryListener xen_memory_listener = {
 .log_sync = NULL,
 .log_global_start = NULL,
 .log_global_stop = NULL,
-.priority = 10,
+.priority = MEMORY_LISTENER_PRIORITY_ACCEL,
 };
 
 struct XenArmState {
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 5dc5e805351c..3da5a2b23f7d 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -467,7 +467,7 @@ static MemoryListener xen_memory_listener = {
 .log_sync = xen_log_sync,
 .log_global_start = xen_log_global_start,
 .log_global_stop = xen_log_global_stop,
-.priority = 10,
+.priority = MEMORY_LISTENER_PRIORITY_ACCEL,
 };
 
 static void regs_to_cpu(vmware_regs_t *vmport_regs, ioreq_t *req)
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 42339c96bdba..886c3ee944d3 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -155,7 +155,7 @@ MemoryListener xen_io_listener = {
 .name = "xen-io",
 .region_add = xen_io_add,
 .region_del = xen_io_del,
-.priority = 10,
+.priority = MEMORY_LISTENER_PRIORITY_ACCEL,
 };
 
 DeviceListener xen_device_listener = {
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index a5401496399b..36e6f93c372f 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -691,14 +691,14 @@ static const MemoryListener xen_pt_memory_listener = {
 .name = "xen-pt-mem",
 .region_add = xen_pt_region_add,
 .region_del = xen_pt_region_del,
-.priority = 10,
+.priority = MEMORY_LISTENER_PRIORITY_ACCEL,
 };
 
 static const MemoryListener xen_pt_io_listener = {
 .name = "xen-pt-io",
 .region_add = xen_pt_io_region_add,
 .region_del = xen_pt_io_region_del,
-.priority = 10,
+.priority = MEMORY_LISTENER_PRIORITY_ACCEL,
 };
 
 /* destroy. */
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 47c2e0221c35..6d95d5917544 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -811,6 +811,8 @@ struct IOMMUMemoryRegion {
 #define IOMMU_NOTIFIER_FOREACH(n, mr) \
 QLIST_FOREACH((n), &(mr)->iommu_notify, node)
 
+#define MEMORY_LISTENER_PRIORITY_ACCEL  10
+
 /**
  * struct MemoryListener: callbacks structure for updates to the physical 
memory map
  *
diff --git a/target/i386/hax/hax-mem.c b/target/i386/hax/hax-mem.c
index 05dbe8cce3ae..bb5ffbc9ac4f 100644
--- a/target/i386/hax/hax-mem.c
+++ b/target/i386/hax/hax-mem.c
@@ -291,7 +291,7 @@ static MemoryListener hax_memory_listener = {
 .region_add = hax_region_add,
 .region_del = hax_region_del,
 .log_sync = hax_log_sync,
-.priority = 10,
+.priority = MEMORY_LISTENER_PRIORITY_ACCEL,
 };
 
 static void hax_ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
diff --git a/target/i386/nvmm/nvmm-all.c b/target/i386/nvmm/nvmm-all.c
index b75738ee9cdf..19d2f7ef09a6 100644
--- a/target/i386/nvmm/nvmm-all.c
+++ b/target/i386/nvmm/nvmm-all.c
@@ -1138,7 +1138,7 @@ static MemoryListener nvmm_memory_listener = {
 .region_add = nvmm_region_add,
 .region_del = nvmm_region_del,
 .log_sync = nvmm_log_sync,
-.priority = 10,
+.priority = MEMORY_LISTENER_PRIORITY_ACCEL,
 };
 
 static void
diff --git a/target/i386/whpx/whpx-all.c

[PATCH 2/3] exec/memory: Add symbol for memory listener priority for dev backend

2023-06-20 Thread Isaku Yamahata

Add MEMORY_LISTNER_PRIORITY_DEV_BAKCNED for the symbolic value for memory
listener to replace the hard-coded value 10 for the device backend.

No functional change intended.

Signed-off-by: Isaku Yamahata 
---
 accel/kvm/kvm-all.c   | 2 +-
 hw/remote/proxy-memory-listener.c | 2 +-
 hw/virtio/vhost.c | 2 +-
 include/exec/memory.h | 1 +
 4 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 36ed4ca246b5..ae6ecf8326d1 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1800,7 +1800,7 @@ static MemoryListener kvm_io_listener = {
 .name = "kvm-io",
 .eventfd_add = kvm_io_ioeventfd_add,
 .eventfd_del = kvm_io_ioeventfd_del,
-.priority = 10,
+.priority = MEMORY_LISTENER_PRIORITY_DEV_BAKCNED,
 };
 
 int kvm_set_irq(KVMState *s, int irq, int level)
diff --git a/hw/remote/proxy-memory-listener.c 
b/hw/remote/proxy-memory-listener.c
index 18d96a1d04dc..a7f53a0ba464 100644
--- a/hw/remote/proxy-memory-listener.c
+++ b/hw/remote/proxy-memory-listener.c
@@ -217,7 +217,7 @@ void proxy_memory_listener_configure(ProxyMemoryListener 
*proxy_listener,
 proxy_listener->listener.commit = proxy_memory_listener_commit;
 proxy_listener->listener.region_add = proxy_memory_listener_region_addnop;
 proxy_listener->listener.region_nop = proxy_memory_listener_region_addnop;
-proxy_listener->listener.priority = 10;
+proxy_listener->listener.priority = MEMORY_LISTENER_PRIORITY_DEV_BAKCNED;
 proxy_listener->listener.name = "proxy";
 
 memory_listener_register(_listener->listener,
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 23da579ce290..75f7418369cb 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1445,7 +1445,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 .log_sync = vhost_log_sync,
 .log_global_start = vhost_log_global_start,
 .log_global_stop = vhost_log_global_stop,
-.priority = 10
+.priority = MEMORY_LISTENER_PRIORITY_DEV_BAKCNED
 };
 
 hdev->iommu_listener = (MemoryListener) {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6d95d5917544..5c9e04bf1208 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -812,6 +812,7 @@ struct IOMMUMemoryRegion {
 QLIST_FOREACH((n), &(mr)->iommu_notify, node)
 
 #define MEMORY_LISTENER_PRIORITY_ACCEL  10
+#define MEMORY_LISTENER_PRIORITY_DEV_BAKCNED10
 
 /**
  * struct MemoryListener: callbacks structure for updates to the physical 
memory map
-- 
2.25.1

[PATCH 3/3] exec/memory: Add symbol for the min value of memory listener priority

2023-06-20 Thread Isaku Yamahata

Add MEMORY_LISTNER_PRIORITY_MIN for the symbolic value for the min value of
the memory listener instead of the hard-coded magic value 0.  Add explicit
initialization.

No functional change intended.

Signed-off-by: Isaku Yamahata 
---
 accel/kvm/kvm-all.c   | 1 +
 include/exec/memory.h | 1 +
 target/arm/kvm.c  | 1 +
 3 files changed, 3 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ae6ecf8326d1..026859a59cd7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1103,6 +1103,7 @@ static MemoryListener kvm_coalesced_pio_listener = {
 .name = "kvm-coalesced-pio",
 .coalesced_io_add = kvm_coalesce_pio_add,
 .coalesced_io_del = kvm_coalesce_pio_del,
+.priority = MEMORY_LISTENER_PRIORITY_MIN,
 };
 
 int kvm_check_extension(KVMState *s, unsigned int extension)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5c9e04bf1208..dc6daa8364e5 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -811,6 +811,7 @@ struct IOMMUMemoryRegion {
 #define IOMMU_NOTIFIER_FOREACH(n, mr) \
 QLIST_FOREACH((n), &(mr)->iommu_notify, node)
 
+#define MEMORY_LISTENER_PRIORITY_MIN0
 #define MEMORY_LISTENER_PRIORITY_ACCEL  10
 #define MEMORY_LISTENER_PRIORITY_DEV_BAKCNED10
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 84da49332c4b..14fbf786897d 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -341,6 +341,7 @@ static MemoryListener devlistener = {
 .name = "kvm-arm",
 .region_add = kvm_arm_devlistener_add,
 .region_del = kvm_arm_devlistener_del,
+.priority =MEMORY_LISTENER_PRIORITY_MIN,
 };
 
 static void kvm_arm_set_device_addr(KVMDevice *kd)
-- 
2.25.1

[PATCH 0/3] Add symbols for memory listener priority

2023-06-20 Thread Isaku Yamahata

The hard-coded value, 10, is used for memory_listener_register().  Add symbolic
values for priority of struct MemoryLister.  Replace those hard-coded values
with symbols.

The background is KVM guest memory[1] or TDX support.  I'd like to add one more
memory listener whose priority is higher than the KVM memory listener.  And I
don't want to hard-code 10 + 1.

[1] KVM gmem patches
https://github.com/sean-jc/linux/tree/x86/kvm_gmem_solo

Isaku Yamahata (3):
  exec/memory: Add symbolic value for memory listener priority for accel
  exec/memory: Add symbol for memory listener priority for dev backend
  exec/memory: Add symbol for the min value of memory listener priority

 accel/hvf/hvf-accel-ops.c | 2 +-
 accel/kvm/kvm-all.c   | 5 +++--
 hw/arm/xen_arm.c  | 2 +-
 hw/i386/xen/xen-hvm.c | 2 +-
 hw/remote/proxy-memory-listener.c | 2 +-
 hw/virtio/vhost.c | 2 +-
 hw/xen/xen-hvm-common.c   | 2 +-
 hw/xen/xen_pt.c   | 4 ++--
 include/exec/memory.h | 4 
 target/arm/kvm.c  | 1 +
 target/i386/hax/hax-mem.c | 2 +-
 target/i386/nvmm/nvmm-all.c   | 2 +-
 target/i386/whpx/whpx-all.c   | 2 +-
 13 files changed, 19 insertions(+), 13 deletions(-)


base-commit: cab35c73be9d579db105ef73fa8a60728a890098
-- 
2.25.1

Re: [PATCH 2/2] configs: Enable MTTCG for sparc, sparc64

2023-06-20 Thread Philippe Mathieu-Daudé


On 20/6/23 18:40, Richard Henderson wrote:

This will be of small comfort to sparc64, because both
sun4u and sun4v board models force max_cpus = 1.
But it does enable actual smp for sparc32 sun4m.


Yay \o/

Reviewed-by: Philippe Mathieu-Daudé 


Signed-off-by: Richard Henderson 
---
  configs/targets/sparc-softmmu.mak   | 1 +
  configs/targets/sparc64-softmmu.mak | 1 +
  2 files changed, 2 insertions(+)

Re: [PATCH 1/2] target/sparc: Set TCG_GUEST_DEFAULT_MO

2023-06-20 Thread Philippe Mathieu-Daudé


On 20/6/23 18:40, Richard Henderson wrote:

Always use TSO, per the Oracle 2015 manual.
This is slightly less restrictive than the TCG_MO_ALL default,
and happens to match the i386 model, which will eliminate a few
extra barriers on that host.

Signed-off-by: Richard Henderson 
---
  target/sparc/cpu.h | 23 +++
  1 file changed, 23 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

1 2 3 4 5 >

1 - 100 of 442 matches

Mail list logo