date:20230905

Re: [PATCH v5 3/4] vhost-user: add shared_object msg

2023-09-05 Thread Albert Esteve

On Wed, Sep 6, 2023 at 8:10 AM Philippe Mathieu-Daudé 
wrote:

> On 6/9/23 08:04, Philippe Mathieu-Daudé wrote:
> > On 2/8/23 11:08, Albert Esteve wrote:
> >> Add three new vhost-user protocol
> >> `VHOST_USER_BACKEND_SHARED_OBJECT_* messages`.
> >> These new messages are sent from vhost-user
> >> back-ends to interact with the virtio-dmabuf
> >> table in order to add or remove themselves as
> >> virtio exporters, or lookup for virtio dma-buf
> >> shared objects.
> >>
> >> The action taken in the front-end depends
> >> on the type stored in the virtio shared
> >> object hash table.
> >>
> >> When the table holds a pointer to a vhost
> >> backend for a given UUID, the front-end sends
> >> a VHOST_USER_GET_SHARED_OBJECT to the
> >> backend holding the shared object.
> >>
> >> In the libvhost-user library we need to add
> >> helper functions to allow sending messages to
> >> interact with the virtio shared objects
> >> hash table.
> >>
> >> The messages can only be sent after successfully
> >> negotiating a new VHOST_USER_PROTOCOL_F_SHARED_OBJECT
> >> vhost-user protocol feature bit.
> >>
> >> Signed-off-by: Albert Esteve 
> >> ---
> >>   docs/interop/vhost-user.rst   |  57 
> >>   hw/virtio/vhost-user.c| 166 ++
> >>   include/hw/virtio/vhost-backend.h |   3 +
> >>   subprojects/libvhost-user/libvhost-user.c | 118 +++
> >>   subprojects/libvhost-user/libvhost-user.h |  55 ++-
> >>   5 files changed, 398 insertions(+), 1 deletion(-)
> >
> >
> >> +static bool
> >> +vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader
> *hdr,
> >> +  VhostUserPayload *payload)
> >> +{
> >> +Error *local_err = NULL;
> >> +struct iovec iov[2];
> >> +
> >> +if (hdr->flags & VHOST_USER_NEED_REPLY_MASK) {
> >> +hdr->flags &= ~VHOST_USER_NEED_REPLY_MASK;
> >> +}
> >> +hdr->flags |= VHOST_USER_REPLY_MASK;
> >> +
> >> +hdr->size = sizeof(payload->u64);
> >> +
> >> +iov[0].iov_base = hdr;
> >> +iov[0].iov_len = VHOST_USER_HDR_SIZE;
> >> +iov[1].iov_base = payload;
> >> +iov[1].iov_len = hdr->size;
> >> +
> >> +if (qio_channel_writev_all(ioc, iov, ARRAY_SIZE(iov), &local_err))
> {
> >> +error_report_err(local_err);
> >
> > This function could have a 'Error **errp' parameter to propagate
> > the error to the caller.
> >
> >> +return false;
> >> +}
> >> +return true;
> >> +}
> >> +
> >> +static bool
> >> +vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader
> *hdr,
> >> +  VhostUserPayload *payload)
> >> +{
> >> +hdr->size = sizeof(payload->u64);
> >> +return vhost_user_send_resp(ioc, hdr, payload);
> >> +}
> >
> > I'm confused by having two vhost_user_backend_send_dmabuf_fd() functions
> > with different body...
>
> This patch doesn't compile:
>
> ../../hw/virtio/vhost-user.c:1662:1: error: redefinition of
> ‘vhost_user_backend_send_dmabuf_fd’
>   1662 | vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc,
> VhostUserHeader *hdr,
>| ^
> ../../hw/virtio/vhost-user.c:1636:1: note: previous definition of
> ‘vhost_user_backend_send_dmabuf_fd’ with type ‘_Bool(QIOChannel *,
> VhostUserHeader *, VhostUserPayload *)’
>   1636 | vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc,
> VhostUserHeader *hdr,
>| ^
> ../../hw/virtio/vhost-user.c: In function
> ‘vhost_user_backend_send_dmabuf_fd’:
> ../../hw/virtio/vhost-user.c:1666:12: error: implicit declaration of
> function ‘vhost_user_send_resp’; did you mean ‘vhost_user_set_u64’?
> [-Werror=implicit-function-declaration]
>   1666 | return vhost_user_send_resp(ioc, hdr, payload);
>|^~~~
>|vhost_user_set_u64
> ../../hw/virtio/vhost-user.c:1666:12: error: nested extern declaration
> of ‘vhost_user_send_resp’ [-Werror=nested-externs]
> At top level:
> ../../hw/virtio/vhost-user.c:1636:1: error:
> ‘vhost_user_backend_send_dmabuf_fd’ defined but not used
> [-Werror=unused-function]
>   1636 | vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc,
> VhostUserHeader *hdr,
>| ^
> cc1: all warnings being treated as errors
>
>
Uh, nice catch.
This was not happening before, but I did not try the patches individually
for the few last reviews.
I will squash it as suggested with the next patch.
Thanks for checking!

[PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table.

2023-09-05 Thread Harsh Prateek Bora

This is a first step towards enabling support for nested PAPR hcalls for
providing the get/set of various Guest State Buffer (GSB) elements via
h_guest_[g|s]et_state hcalls. This enables for identifying correct
callbacks for get/set for each of the elements supported via
h_guest_[g|s]et_state hcalls, support for which is added in next patch.

Signed-off-by: Michael Neuling 
Signed-off-by: Shivaprasad G Bhat 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr_hcall.c  |   1 +
 hw/ppc/spapr_nested.c | 487 ++
 include/hw/ppc/ppc.h  |   2 +
 include/hw/ppc/spapr_nested.h | 102 +++
 4 files changed, 592 insertions(+)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 9b1f225d4a..ca609cb5a4 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1580,6 +1580,7 @@ static void hypercall_register_types(void)
 spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
 
 spapr_register_nested();
+init_nested();
 }
 
 type_init(hypercall_register_types)
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index e7956685af..6fbb1bcb02 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -7,6 +7,7 @@
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/spapr_nested.h"
 #include "cpu-models.h"
+#include "mmu-book3s-v3.h"
 
 #ifdef CONFIG_TCG
 #define PRTS_MASK  0x1f
@@ -417,6 +418,486 @@ static bool vcpu_check(SpaprMachineStateNestedGuest 
*guest,
 return false;
 }
 
+static void *get_vcpu_env_ptr(SpaprMachineStateNestedGuest *guest,
+  target_ulong vcpuid)
+{
+assert(vcpu_check(guest, vcpuid, false));
+return &guest->vcpu[vcpuid].env;
+}
+
+static void *get_vcpu_ptr(SpaprMachineStateNestedGuest *guest,
+   target_ulong vcpuid)
+{
+assert(vcpu_check(guest, vcpuid, false));
+return &guest->vcpu[vcpuid];
+}
+
+static void *get_guest_ptr(SpaprMachineStateNestedGuest *guest,
+   target_ulong vcpuid)
+{
+return guest;
+}
+
+/*
+ * set=1 means the L1 is trying to set some state
+ * set=0 means the L1 is trying to get some state
+ */
+static void copy_state_8to8(void *a, void *b, bool set)
+{
+/* set takes from the Big endian element_buf and sets internal buffer */
+
+if (set) {
+*(uint64_t *)a = be64_to_cpu(*(uint64_t *)b);
+} else {
+*(uint64_t *)b = cpu_to_be64(*(uint64_t *)a);
+}
+}
+
+static void copy_state_16to16(void *a, void *b, bool set)
+{
+uint64_t *src, *dst;
+
+if (set) {
+src = b;
+dst = a;
+
+dst[1] = be64_to_cpu(src[0]);
+dst[0] = be64_to_cpu(src[1]);
+} else {
+src = a;
+dst = b;
+
+dst[1] = cpu_to_be64(src[0]);
+dst[0] = cpu_to_be64(src[1]);
+}
+}
+
+static void copy_state_4to8(void *a, void *b, bool set)
+{
+if (set) {
+*(uint64_t *)a  = (uint64_t) be32_to_cpu(*(uint32_t *)b);
+} else {
+*(uint32_t *)b = cpu_to_be32((uint32_t) (*((uint64_t *)a)));
+}
+}
+
+static void copy_state_pagetbl(void *a, void *b, bool set)
+{
+uint64_t *pagetbl;
+uint64_t *buf; /* 3 double words */
+uint64_t rts;
+
+assert(set);
+
+pagetbl = a;
+buf = b;
+
+*pagetbl = be64_to_cpu(buf[0]);
+/* as per ISA section 6.7.6.1 */
+*pagetbl |= PATE0_HR; /* Host Radix bit is 1 */
+
+/* RTS */
+rts = be64_to_cpu(buf[1]);
+assert(rts == 52);
+rts = rts - 31; /* since radix tree size = 2^(RTS+31) */
+*pagetbl |=  ((rts & 0x7) << 5); /* RTS2 is bit 56:58 */
+*pagetbl |=  (((rts >> 3) & 0x3) << 61); /* RTS1 is bit 1:2 */
+
+/* RPDS {Size = 2^(RPDS+3) , RPDS >=5} */
+*pagetbl |= 63 - clz64(be64_to_cpu(buf[2])) - 3;
+}
+
+static void copy_state_proctbl(void *a, void *b, bool set)
+{
+uint64_t *proctbl;
+uint64_t *buf; /* 2 double words */
+
+assert(set);
+
+proctbl = a;
+buf = b;
+/* PRTB: Process Table Base */
+*proctbl = be64_to_cpu(buf[0]);
+/* PRTS: Process Table Size = 2^(12+PRTS) */
+if (be64_to_cpu(buf[1]) == (1ULL << 12)) {
+*proctbl |= 0;
+} else if (be64_to_cpu(buf[1]) == (1ULL << 24)) {
+*proctbl |= 12;
+} else {
+g_assert_not_reached();
+}
+}
+
+static void copy_state_runbuf(void *a, void *b, bool set)
+{
+uint64_t *buf; /* 2 double words */
+struct SpaprMachineStateNestedGuestVcpuRunBuf *runbuf;
+
+assert(set);
+
+runbuf = a;
+buf = b;
+
+runbuf->addr = be64_to_cpu(buf[0]);
+assert(runbuf->addr);
+
+/* per spec */
+assert(be64_to_cpu(buf[1]) <= 16384);
+
+/*
+ * This will also hit in the input buffer but should be fine for
+ * now. If not we can split this function.
+ */
+assert(be64_to_cpu(buf[1]) >= VCPU_OUT_BUF_MIN_SZ);
+
+runbuf->size = be64_to_cpu(buf[1]);
+}
+
+/* tell the L1 how big we want the output vcpu run buffer */
+static void out_buf_min_size(void *a, void *b

[PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE

2023-09-05 Thread Harsh Prateek Bora

L1 can reuest to get/set state of any of the supported Guest State
Buffer (GSB) elements using h_guest_[get|set]_state hcalls.
These hcalls needs to do some necessary validation check for each
get/set request based on the flags passed and operation supported.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr_nested.c | 267 ++
 include/hw/ppc/spapr_nested.h |  22 +++
 2 files changed, 289 insertions(+)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 6fbb1bcb02..498e7286fa 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -897,6 +897,138 @@ void init_nested(void)
 }
 }
 
+static struct guest_state_element *guest_state_element_next(
+struct guest_state_element *element,
+int64_t *len,
+int64_t *num_elements)
+{
+uint16_t size;
+
+/* size is of element->value[] only. Not whole guest_state_element */
+size = be16_to_cpu(element->size);
+
+if (len) {
+*len -= size + offsetof(struct guest_state_element, value);
+}
+
+if (num_elements) {
+*num_elements -= 1;
+}
+
+return (struct guest_state_element *)(element->value + size);
+}
+
+static
+struct guest_state_element_type *guest_state_element_type_find(uint16_t id)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(guest_state_element_types); i++)
+if (id == guest_state_element_types[i].id) {
+return &guest_state_element_types[i];
+}
+
+return NULL;
+}
+
+static void print_element(struct guest_state_element *element,
+  struct guest_state_request *gsr)
+{
+printf("id:0x%04x size:0x%04x %s ",
+   be16_to_cpu(element->id), be16_to_cpu(element->size),
+   gsr->flags & GUEST_STATE_REQUEST_SET ? "set" : "get");
+printf("buf:0x%016lx ...\n", be64_to_cpu(*(uint64_t *)element->value));
+}
+
+static bool guest_state_request_check(struct guest_state_request *gsr)
+{
+int64_t num_elements, len = gsr->len;
+struct guest_state_buffer *gsb = gsr->gsb;
+struct guest_state_element *element;
+struct guest_state_element_type *type;
+uint16_t id, size;
+
+/* gsb->num_elements = 0 == 32 bits long */
+assert(len >= 4);
+
+num_elements = be32_to_cpu(gsb->num_elements);
+element = gsb->elements;
+len -= sizeof(gsb->num_elements);
+
+/* Walk the buffer to validate the length */
+while (num_elements) {
+
+id = be16_to_cpu(element->id);
+size = be16_to_cpu(element->size);
+
+if (false) {
+print_element(element, gsr);
+}
+/* buffer size too small */
+if (len < 0) {
+return false;
+}
+
+type = guest_state_element_type_find(id);
+if (!type) {
+printf("%s: Element ID %04x unknown\n", __func__, id);
+print_element(element, gsr);
+return false;
+}
+
+if (id == GSB_HV_VCPU_IGNORED_ID) {
+goto next_element;
+}
+
+if (size != type->size) {
+printf("%s: Size mismatch. Element ID:%04x. Size Exp:%i Got:%i\n",
+   __func__, id, type->size, size);
+print_element(element, gsr);
+return false;
+}
+
+if ((type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY) &&
+(gsr->flags & GUEST_STATE_REQUEST_SET)) {
+printf("%s: trying to set a read-only Element ID:%04x.\n",
+   __func__, id);
+return false;
+}
+
+if (type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE) {
+/* guest wide element type */
+if (!(gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE)) {
+printf("%s: trying to set a guest wide Element ID:%04x.\n",
+   __func__, id);
+return false;
+}
+} else {
+/* thread wide element type */
+if (gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE) {
+printf("%s: trying to set a thread wide Element ID:%04x.\n",
+   __func__, id);
+return false;
+}
+}
+next_element:
+element = guest_state_element_next(element, &len, &num_elements);
+
+}
+return true;
+}
+
+static bool is_gsr_invalid(struct guest_state_request *gsr,
+   struct guest_state_element *element,
+   struct guest_state_element_type *type)
+{
+if ((gsr->flags & GUEST_STATE_REQUEST_SET) &&
+(*(uint64_t *)(element->value) & ~(type->mask))) {
+print_element(element, gsr);
+printf("L1 can't set reserved bits (allowed mask: 0x%08lx)\n",
+   type->mask);
+return true;
+}
+return false;
+}
 
 static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
  SpaprMachineState *spapr,
@@ -1108,6 +1240,139 @@ sta

[PATCH RESEND 04/15] ppc: spapr: Start using nested.api for nested kvm-hv api

2023-09-05 Thread Harsh Prateek Bora

With this patch, isolating kvm-hv nested api code to be executed only
when cap-nested-hv is set. This helps keeping api specific logic
mutually exclusive.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr.c  | 7 ++-
 hw/ppc/spapr_caps.c | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e44686b04d..0aa9f21516 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1334,8 +1334,11 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
 /* Copy PATE1:GR into PATE0:HR */
 entry->dw0 = spapr->patb_entry & PATE0_HR;
 entry->dw1 = spapr->patb_entry;
+return true;
+}
+assert(spapr->nested.api);
 
-} else {
+if (spapr->nested.api == NESTED_API_KVM_HV) {
 uint64_t patb, pats;
 
 assert(lpid != 0);
@@ -3437,6 +3440,8 @@ static void spapr_instance_init(Object *obj)
 spapr_get_host_serial, spapr_set_host_serial);
 object_property_set_description(obj, "host-serial",
 "Host serial number to advertise in guest device tree");
+/* Nested */
+spapr->nested.api = 0;
 }
 
 static void spapr_machine_finalizefn(Object *obj)
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 5a0755d34f..a3a790b026 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -454,6 +454,7 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
 return;
 }
 
+spapr->nested.api = NESTED_API_KVM_HV;
 if (kvm_enabled()) {
 if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
   spapr->max_compat_pvr)) {
-- 
2.39.3

Re: [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space

2023-09-05 Thread Eric Auger

Hi Alex,

On 9/5/23 19:55, Alex Williamson wrote:
> On Mon,  4 Sep 2023 10:03:43 +0200
> Eric Auger  wrote:
>
>> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
>> we encounter the case where the guest tries to map IOVAs beyond 48b
>> whereas the physical VTD IOMMU only supports 48b. This ends up with
>> VFIO_MAP_DMA failures at qemu level because at kernel level,
>> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
>>
>> This is due to the fact the virtio-iommu currently unconditionally
>> exposes an IOVA range of 64b through its config input range fields.
>>
>> This series removes this assumption by retrieving the usable IOVA
>> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
>> a VFIO device is attached. This info is communicated to the
>> virtio-iommu memory region, transformed into the inversed info, ie.
>> the host reserved IOVA regions. Then those latter are combined with the
>> reserved IOVA regions set though the virtio-iommu reserved-regions
>> property. That way, the guest virtio-iommu driver, unchanged, is
>> able to probe the whole set of reserved regions and prevent any IOVA
>> belonging to those ranges from beeing used, achieving the original goal.
> Hi Eric,
>
> I don't quite follow this relative to device hotplug.  Are we
> manipulating a per-device memory region which is created at device add
> time?  Is that memory region actually shared in some cases, for instance
> if we have a PCIe-to-PCI bridge aliasing devices on the conventional
> side?  Thanks,
I agree this deserves more attention and testing in the case of hotplug
and aliasing. Wrt PCIe to PCI bridge, virtio-iommu and smmu are known to
be broken with this latter due to lack of kernel support (issue with
group probing, but this might change in the future) so this is not a
currently supported feature, as opposed to virtual intel iommu. Here I
was mostly assuming one device per container and per IOMMU MR but maybe
I have to detect & forbid more complex scenari.

Thanks

Eric
> Alex
>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v1
>>
>> Eric Auger (13):
>>   memory: Let ReservedRegion use Range
>>   memory: Introduce memory_region_iommu_set_iova_ranges
>>   vfio: Collect container iova range info
>>   virtio-iommu: Rename reserved_regions into prop_resv_regions
>>   virtio-iommu: Introduce per IOMMUDevice reserved regions
>>   range: Introduce range_inverse_array()
>>   virtio-iommu: Implement set_iova_ranges() callback
>>   range: Make range_compare() public
>>   util/reserved-region: Add new ReservedRegion helpers
>>   virtio-iommu: Consolidate host reserved regions and property set ones
>>   test: Add some tests for range and resv-mem helpers
>>   virtio-iommu: Resize memory region according to the max iova info
>>   vfio: Remove 64-bit IOVA address space assumption
>>
>>  include/exec/memory.h|  30 -
>>  include/hw/vfio/vfio-common.h|   2 +
>>  include/hw/virtio/virtio-iommu.h |   7 +-
>>  include/qemu/range.h |   9 ++
>>  include/qemu/reserved-region.h   |  32 +
>>  hw/core/qdev-properties-system.c |   9 +-
>>  hw/vfio/common.c |  70 ---
>>  hw/virtio/virtio-iommu-pci.c |   8 +-
>>  hw/virtio/virtio-iommu.c |  85 +++--
>>  softmmu/memory.c |  15 +++
>>  tests/unit/test-resv-mem.c   | 198 +++
>>  util/range.c |  41 ++-
>>  util/reserved-region.c   |  94 +++
>>  hw/virtio/trace-events   |   1 +
>>  tests/unit/meson.build   |   1 +
>>  util/meson.build |   1 +
>>  16 files changed, 562 insertions(+), 41 deletions(-)
>>  create mode 100644 include/qemu/reserved-region.h
>>  create mode 100644 tests/unit/test-resv-mem.c
>>  create mode 100644 util/reserved-region.c
>>

[PATCH RESEND 15/15] ppc: spapr: Document Nested PAPR API

2023-09-05 Thread Harsh Prateek Bora

Adding initial documentation about Nested PAPR API to describe the set
of APIs and its usage. Also talks about the Guest State Buffer elements
and it's format which is used between L0/L1 to communicate L2 state.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 docs/devel/nested-papr.txt | 500 +
 1 file changed, 500 insertions(+)
 create mode 100644 docs/devel/nested-papr.txt

diff --git a/docs/devel/nested-papr.txt b/docs/devel/nested-papr.txt
new file mode 100644
index 00..c5c2ba7e50
--- /dev/null
+++ b/docs/devel/nested-papr.txt
@@ -0,0 +1,500 @@
+Nested PAPR API (aka KVM on PowerVM)
+
+
+This API aims at providing support to enable nested virtualization with
+KVM on PowerVM. While the existing support for nested KVM on PowerNV was
+introduced with cap-nested-hv option, however, with a slight design change,
+to enable this on papr/pseries, a new cap-nested-papr option is added. eg:
+
+  qemu-system-ppc64 -cpu POWER10 -machine pseries,cap-nested-papr=true ...
+
+Work by:
+Michael Neuling 
+Vaibhav Jain 
+Jordan Niethe 
+Harsh Prateek Bora 
+Shivaprasad G Bhat 
+Kautuk Consul 
+
+Below taken from the kernel documentation:
+
+Introduction
+
+
+This document explains how a guest operating system can act as a
+hypervisor and run nested guests through the use of hypercalls, if the
+hypervisor has implemented them. The terms L0, L1, and L2 are used to
+refer to different software entities. L0 is the hypervisor mode entity
+that would normally be called the "host" or "hypervisor". L1 is a
+guest virtual machine that is directly run under L0 and is initiated
+and controlled by L0. L2 is a guest virtual machine that is initiated
+and controlled by L1 acting as a hypervisor. A significant design change
+wrt existing API is that now the entire L2 state is maintained within L0.
+
+Existing Nested-HV API
+==
+
+Linux/KVM has had support for Nesting as an L0 or L1 since 2018
+
+The L0 code was added::
+
+   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
+   Author: Paul Mackerras 
+   Date:   Mon Oct 8 16:31:03 2018 +1100
+   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
+
+The L1 code was added::
+
+   commit 360cae313702cdd0b90f82c261a8302fecef030a
+   Author: Paul Mackerras 
+   Date:   Mon Oct 8 16:31:04 2018 +1100
+   KVM: PPC: Book3S HV: Nested guest entry via hypercall
+
+This API works primarily using a signal hcall h_enter_nested(). This
+call made by the L1 to tell the L0 to start an L2 vCPU with the given
+state. The L0 then starts this L2 and runs until an L2 exit condition
+is reached. Once the L2 exits, the state of the L2 is given back to
+the L1 by the L0. The full L2 vCPU state is always transferred from
+and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
+vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
+-> L1 exit).
+
+The only state kept by the L0 is the partition table. The L1 registers
+it's partition table using the h_set_partition_table() hcall. All
+other state held by the L0 about the L2s is cached state (such as
+shadow page tables).
+
+The L1 may run any L2 or vCPU without first informing the L0. It
+simply starts the vCPU using h_enter_nested(). The creation of L2s and
+vCPUs is done implicitly whenever h_enter_nested() is called.
+
+In this document, we call this existing API the v1 API.
+
+New PAPR API
+===
+
+The new PAPR API changes from the v1 API such that the creating L2 and
+associated vCPUs is explicit. In this document, we call this the v2
+API.
+
+h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
+be called the L1 must explicitly create the L2 using h_guest_create()
+and any associated vCPUs() created with h_guest_create_vCPU(). Getting
+and setting vCPU state can also be performed using h_guest_{g|s}et
+hcall.
+
+The basic execution flow is for an L1 to create an L2, run it, and
+delete it is:
+
+- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
+  (normally at L1 boot time).
+
+- L1 requests the L0 to create an L2 with H_GUEST_CREATE() and receives a token
+
+- L1 requests the L0 to create an L2 vCPU with H_GUEST_CREATE_VCPU()
+
+- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
+
+- L1 requests the L0 to run the vCPU using H_GUEST_RUN_VCPU() hcall
+
+- L1 deletes L2 with H_GUEST_DELETE()
+
+More details of the individual hcalls follows:
+
+HCALL Details
+=
+
+This documentation is provided to give an overall understating of the
+API. It doesn't aim to provide full details required to implement
+an L1 or L0. Latest PAPR spec shall be referred for more details.
+
+All these HCALLs are made by the L1 to the L0.
+
+H_GUEST_GET_CAPABILITIES()
+--
+
+This is called to get the capabilities of the L0 nested
+hypervisor. This includes capabilities suc

Re: [PATCH v3 0/6] hw/virtio: Build vhost-vdpa.o once for all targets

2023-09-05 Thread Philippe Mathieu-Daudé


On 30/8/23 15:35, Philippe Mathieu-Daudé wrote:

Hi Michael,

This series is now fully reviewed.

On 10/7/23 11:49, Philippe Mathieu-Daudé wrote:

Missing review: patch #4

Since v2:
- Added R-b tags
- Addressed Richard's review comment: page_mask = -page_size

Philippe Mathieu-Daudé (6):
   hw/virtio: Propagate page_mask to
 vhost_vdpa_listener_skipped_section()
   hw/virtio: Propagate page_mask to vhost_vdpa_section_end()
   hw/virtio/vhost-vdpa: Inline TARGET_PAGE_ALIGN() macro
   hw/virtio/vhost-vdpa: Use target-agnostic qemu_target_page_mask()
   hw/virtio: Build vhost-vdpa.o once
   hw/virtio/meson: Rename softmmu_virtio_ss[] -> system_virtio_ss[]


Michael, I have another series unifying virtio endianness blocked
by this one. I can merge it if you provide your Ack-by.

Thanks,

Phil.

Re: [PATCH v13 0/9] rutabaga_gfx + gfxstream

2023-09-05 Thread Marc-André Lureau

Hi

On Wed, Sep 6, 2023 at 5:22 AM Gurchetan Singh
 wrote:
>
>
>
> On Wed, Aug 30, 2023 at 7:26 PM Huang Rui  wrote:
>>
>> On Tue, Aug 29, 2023 at 08:36:20AM +0800, Gurchetan Singh wrote:
>> > From: Gurchetan Singh 
>> >
>> > Changes since v12:
>> > - Added r-b tags from Antonio Caggiano and Akihiko Odaki
>> > - Removed review version from commit messages
>> > - I think we're good to merge since we've had multiple people test and 
>> > review this series??
>> >
>> > How to build both rutabaga and gfxstream guest/host libs:
>> >
>> > https://crosvm.dev/book/appendix/rutabaga_gfx.html
>> >
>> > Branch containing this patch series:
>> >
>> > https://gitlab.com/gurchetansingh/qemu/-/commits/qemu-gfxstream-v13
>> >
>> > Antonio Caggiano (2):
>> >   virtio-gpu: CONTEXT_INIT feature
>> >   virtio-gpu: blob prep
>> >
>> > Dr. David Alan Gilbert (1):
>> >   virtio: Add shared memory capability
>> >
>> > Gerd Hoffmann (1):
>> >   virtio-gpu: hostmem
>>
>> Patch 1 -> 4 are
>>
>> Acked-and-Tested-by: Huang Rui 
>
>
> Thanks Ray, I've rebased 
> https://gitlab.com/gurchetansingh/qemu/-/commits/qemu-gfxstream-v13 and added 
> the additional acks in the commit message.
>
> UI/gfx maintainers, since everything is reviewed and there hasn't been any 
> additional review comments, may we merge the gfxstream + rutabaga_gfx series? 
>  Thank you!
>

I can take it, or Michael (since Gerd is not focused on QEMU atm).

Michael, are you prepping a virtio PR?

thanks

-- 
Marc-André Lureau

Re: [PULL 00/35] ppc queue

2023-09-05 Thread Cédric Le Goater


Hello Stefan,

On 9/4/23 11:05, Cédric Le Goater wrote:

The following changes since commit 17780edd81d27fcfdb7a802efc870a99788bd2fc:

   Merge tag 'quick-fix-pull-request' of https://gitlab.com/bsdimp/qemu into 
staging (2023-08-31 10:06:29 -0400)

are available in the Git repository at:

   https://github.com/legoater/qemu/ tags/pull-ppc-20230904

for you to fetch changes up to 6ed470577a24fe471b09e4be089f34bb1eefc5a0:

   ppc/xive: Add support for the PC MMIOs (2023-09-04 09:34:36 +0200)


There are two little nits to take care of before merging :

* [PATCH v2 05/19] host-utils: Add muldiv64_round_up
  function rename : __muldiv64 -> muldiv64_internal

* [PATCH v2 08/19] target/ppc: Sign-extend large decrementer to 64-bits
  duplicated code which breaks -Wshadow=local

I will address them ASAP in a v2 and rebase.

Thanks,

C.

Re: [PATCH v5 0/4] Virtio shared dma-buf

2023-09-05 Thread Philippe Mathieu-Daudé


Hi Michael,

On 5/9/23 22:45, Michael S. Tsirkin wrote:

I was hoping for some acks from Gerd or anyone else with a clue
about graphics, but as that doesn't seem to happen I'll merge.
Thanks!


I made few late comments. Patch #3 doesn't build (thus
break git-bisections). I also have some concern about locking.
I'd rather see a v6, do you mind dropping v5 from your queue?

Thanks,

Phil.


On Mon, Aug 21, 2023 at 02:37:56PM +0200, Albert Esteve wrote:

Hi all,

A little bump for this patch, sorry for the extra noise.

Regards,
Albert

Re: [PATCH v5 4/4] vhost-user: refactor send_resp code

2023-09-05 Thread Philippe Mathieu-Daudé


On 2/8/23 11:08, Albert Esteve wrote:

Refactor code to send response message so that
all common parts both for the common REPLY_ACK
case, and other data responses, can call it and
avoid code repetition.

Signed-off-by: Albert Esteve 
---
  hw/virtio/vhost-user.c | 36 +---
  1 file changed, 9 insertions(+), 27 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 104a56a48d..28fa0ace42 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1632,29 +1632,23 @@ 
vhost_user_backend_handle_shared_object_remove(VhostUserShared *object)
  return virtio_remove_resource(&uuid);
  }
  
-static bool

-vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
-  VhostUserPayload *payload)
+static bool vhost_user_send_resp(QIOChannel *ioc, VhostUserHeader *hdr,
+ VhostUserPayload *payload)


Squash this patch with the previous one (in particular
because this fixes the build failure).

Re: [PATCH v5 3/4] vhost-user: add shared_object msg

2023-09-05 Thread Philippe Mathieu-Daudé


On 6/9/23 08:04, Philippe Mathieu-Daudé wrote:

On 2/8/23 11:08, Albert Esteve wrote:

Add three new vhost-user protocol
`VHOST_USER_BACKEND_SHARED_OBJECT_* messages`.
These new messages are sent from vhost-user
back-ends to interact with the virtio-dmabuf
table in order to add or remove themselves as
virtio exporters, or lookup for virtio dma-buf
shared objects.

The action taken in the front-end depends
on the type stored in the virtio shared
object hash table.

When the table holds a pointer to a vhost
backend for a given UUID, the front-end sends
a VHOST_USER_GET_SHARED_OBJECT to the
backend holding the shared object.

In the libvhost-user library we need to add
helper functions to allow sending messages to
interact with the virtio shared objects
hash table.

The messages can only be sent after successfully
negotiating a new VHOST_USER_PROTOCOL_F_SHARED_OBJECT
vhost-user protocol feature bit.

Signed-off-by: Albert Esteve 
---
  docs/interop/vhost-user.rst   |  57 
  hw/virtio/vhost-user.c    | 166 ++
  include/hw/virtio/vhost-backend.h |   3 +
  subprojects/libvhost-user/libvhost-user.c | 118 +++
  subprojects/libvhost-user/libvhost-user.h |  55 ++-
  5 files changed, 398 insertions(+), 1 deletion(-)




+static bool
+vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
+  VhostUserPayload *payload)
+{
+    Error *local_err = NULL;
+    struct iovec iov[2];
+
+    if (hdr->flags & VHOST_USER_NEED_REPLY_MASK) {
+    hdr->flags &= ~VHOST_USER_NEED_REPLY_MASK;
+    }
+    hdr->flags |= VHOST_USER_REPLY_MASK;
+
+    hdr->size = sizeof(payload->u64);
+
+    iov[0].iov_base = hdr;
+    iov[0].iov_len = VHOST_USER_HDR_SIZE;
+    iov[1].iov_base = payload;
+    iov[1].iov_len = hdr->size;
+
+    if (qio_channel_writev_all(ioc, iov, ARRAY_SIZE(iov), &local_err)) {
+    error_report_err(local_err);


This function could have a 'Error **errp' parameter to propagate
the error to the caller.


+    return false;
+    }
+    return true;
+}
+
+static bool
+vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
+  VhostUserPayload *payload)
+{
+    hdr->size = sizeof(payload->u64);
+    return vhost_user_send_resp(ioc, hdr, payload);
+}


I'm confused by having two vhost_user_backend_send_dmabuf_fd() functions
with different body...


This patch doesn't compile:

../../hw/virtio/vhost-user.c:1662:1: error: redefinition of 
‘vhost_user_backend_send_dmabuf_fd’
 1662 | vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, 
VhostUserHeader *hdr,

  | ^
../../hw/virtio/vhost-user.c:1636:1: note: previous definition of 
‘vhost_user_backend_send_dmabuf_fd’ with type ‘_Bool(QIOChannel *, 
VhostUserHeader *, VhostUserPayload *)’
 1636 | vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, 
VhostUserHeader *hdr,

  | ^
../../hw/virtio/vhost-user.c: In function 
‘vhost_user_backend_send_dmabuf_fd’:
../../hw/virtio/vhost-user.c:1666:12: error: implicit declaration of 
function ‘vhost_user_send_resp’; did you mean ‘vhost_user_set_u64’? 
[-Werror=implicit-function-declaration]

 1666 | return vhost_user_send_resp(ioc, hdr, payload);
  |^~~~
  |vhost_user_set_u64
../../hw/virtio/vhost-user.c:1666:12: error: nested extern declaration 
of ‘vhost_user_send_resp’ [-Werror=nested-externs]

At top level:
../../hw/virtio/vhost-user.c:1636:1: error: 
‘vhost_user_backend_send_dmabuf_fd’ defined but not used 
[-Werror=unused-function]
 1636 | vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, 
VhostUserHeader *hdr,

  | ^
cc1: all warnings being treated as errors

Re: [PATCH v5 3/4] vhost-user: add shared_object msg

2023-09-05 Thread Philippe Mathieu-Daudé


On 2/8/23 11:08, Albert Esteve wrote:

Add three new vhost-user protocol
`VHOST_USER_BACKEND_SHARED_OBJECT_* messages`.
These new messages are sent from vhost-user
back-ends to interact with the virtio-dmabuf
table in order to add or remove themselves as
virtio exporters, or lookup for virtio dma-buf
shared objects.

The action taken in the front-end depends
on the type stored in the virtio shared
object hash table.

When the table holds a pointer to a vhost
backend for a given UUID, the front-end sends
a VHOST_USER_GET_SHARED_OBJECT to the
backend holding the shared object.

In the libvhost-user library we need to add
helper functions to allow sending messages to
interact with the virtio shared objects
hash table.

The messages can only be sent after successfully
negotiating a new VHOST_USER_PROTOCOL_F_SHARED_OBJECT
vhost-user protocol feature bit.

Signed-off-by: Albert Esteve 
---
  docs/interop/vhost-user.rst   |  57 
  hw/virtio/vhost-user.c| 166 ++
  include/hw/virtio/vhost-backend.h |   3 +
  subprojects/libvhost-user/libvhost-user.c | 118 +++
  subprojects/libvhost-user/libvhost-user.h |  55 ++-
  5 files changed, 398 insertions(+), 1 deletion(-)




+static bool
+vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
+  VhostUserPayload *payload)
+{
+Error *local_err = NULL;
+struct iovec iov[2];
+
+if (hdr->flags & VHOST_USER_NEED_REPLY_MASK) {
+hdr->flags &= ~VHOST_USER_NEED_REPLY_MASK;
+}
+hdr->flags |= VHOST_USER_REPLY_MASK;
+
+hdr->size = sizeof(payload->u64);
+
+iov[0].iov_base = hdr;
+iov[0].iov_len = VHOST_USER_HDR_SIZE;
+iov[1].iov_base = payload;
+iov[1].iov_len = hdr->size;
+
+if (qio_channel_writev_all(ioc, iov, ARRAY_SIZE(iov), &local_err)) {
+error_report_err(local_err);


This function could have a 'Error **errp' parameter to propagate
the error to the caller.


+return false;
+}
+return true;
+}
+
+static bool
+vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
+  VhostUserPayload *payload)
+{
+hdr->size = sizeof(payload->u64);
+return vhost_user_send_resp(ioc, hdr, payload);
+}


I'm confused by having two vhost_user_backend_send_dmabuf_fd() functions
with different body...


+int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid,
+ int *dmabuf_fd)
+{
+struct vhost_user *u = dev->opaque;
+CharBackend *chr = u->user->chr;
+int ret;
+VhostUserMsg msg = {
+.hdr.request = VHOST_USER_GET_SHARED_OBJECT,
+.hdr.flags = VHOST_USER_VERSION,
+};
+memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid));
+
+ret = vhost_user_write(dev, &msg, NULL, 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = vhost_user_read(dev, &msg);
+if (ret < 0) {
+return ret;
+}
+
+if (msg.hdr.request != VHOST_USER_GET_SHARED_OBJECT) {
+error_report("Received unexpected msg type. "
+ "Expected %d received %d",
+ VHOST_USER_GET_SHARED_OBJECT, msg.hdr.request);
+return -EPROTO;
+}
+
+*dmabuf_fd = qemu_chr_fe_get_msgfd(chr);
+if (*dmabuf_fd < 0) {
+error_report("Failed to get dmabuf fd");
+return -EIO;
+}
+
+return 0;
+}
+
+static int
+vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
+   QIOChannel *ioc,
+   VhostUserHeader *hdr,
+   VhostUserPayload *payload)


Also propagate a 'Error **errp'.


+{
+QemuUUID uuid;
+CharBackend *chr = u->user->chr;
+int dmabuf_fd = -1;
+int fd_num = 0;
+
+memcpy(uuid.data, payload->object.uuid, sizeof(payload->object.uuid));
+
+payload->u64 = 0;
+switch (virtio_object_type(&uuid)) {
+case TYPE_DMABUF:
+dmabuf_fd = virtio_lookup_dmabuf(&uuid);
+break;
+case TYPE_VHOST_DEV:
+{
+struct vhost_dev *dev = virtio_lookup_vhost_device(&uuid);
+if (dev == NULL) {
+payload->u64 = -EINVAL;
+break;
+}
+int ret = vhost_user_get_shared_object(dev, uuid.data, &dmabuf_fd);
+if (ret < 0) {
+payload->u64 = ret;
+}
+break;
+}
+case TYPE_INVALID:
+payload->u64 = -EINVAL;
+break;
+}
+
+if (dmabuf_fd != -1) {
+fd_num++;
+}
+
+if (qemu_chr_fe_set_msgfds(chr, &dmabuf_fd, fd_num) < 0) {
+error_report("Failed to set msg fds.");
+payload->u64 = -EINVAL;
+}
+
+if (!vhost_user_backend_send_dmabuf_fd(ioc, hdr, payload)) {
+error_report("Failed to write response msg.")

Re: [PATCH v5 2/4] virtio-dmabuf: introduce virtio-dmabuf

2023-09-05 Thread Philippe Mathieu-Daudé


Hi Albert,

On 2/8/23 11:08, Albert Esteve wrote:

This API manages objects (in this iteration,
dmabuf fds) that can be shared along different
virtio devices, associated to a UUID.

The API allows the different devices to add,
remove and/or retrieve the objects by simply
invoking the public functions that reside in the
virtio-dmabuf file.

For vhost backends, the API stores the pointer
to the backend holding the object.

Suggested-by: Gerd Hoffmann 
Signed-off-by: Albert Esteve 
---
  MAINTAINERS   |   7 ++
  hw/display/meson.build|   1 +
  hw/display/virtio-dmabuf.c| 136 +
  include/hw/virtio/virtio-dmabuf.h | 103 ++
  tests/unit/meson.build|   1 +
  tests/unit/test-virtio-dmabuf.c   | 137 ++
  6 files changed, 385 insertions(+)
  create mode 100644 hw/display/virtio-dmabuf.c
  create mode 100644 include/hw/virtio/virtio-dmabuf.h
  create mode 100644 tests/unit/test-virtio-dmabuf.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 12e59b6b27..cd8487785a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2158,6 +2158,13 @@ T: git https://gitlab.com/cohuck/qemu.git s390-next
  T: git https://github.com/borntraeger/qemu.git s390-next
  L: qemu-s3...@nongnu.org
  
+virtio-dmabuf

+M: Albert Esteve 
+S: Supported
+F: hw/display/virtio-dmabuf.c
+F: include/hw/virtio/virtio-dmabuf.h
+F: tests/unit/test-virtio-dmabuf.c
+
  virtiofs
  M: Stefan Hajnoczi 
  S: Supported
diff --git a/hw/display/meson.build b/hw/display/meson.build
index 413ba4ab24..05619c6968 100644
--- a/hw/display/meson.build
+++ b/hw/display/meson.build
@@ -37,6 +37,7 @@ system_ss.add(when: 'CONFIG_MACFB', if_true: files('macfb.c'))
  system_ss.add(when: 'CONFIG_NEXTCUBE', if_true: files('next-fb.c'))
  
  system_ss.add(when: 'CONFIG_VGA', if_true: files('vga.c'))

+system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('virtio-dmabuf.c'))
  
  if (config_all_devices.has_key('CONFIG_VGA_CIRRUS') or

  config_all_devices.has_key('CONFIG_VGA_PCI') or
diff --git a/hw/display/virtio-dmabuf.c b/hw/display/virtio-dmabuf.c
new file mode 100644
index 00..e852c71ba9
--- /dev/null
+++ b/hw/display/virtio-dmabuf.c
@@ -0,0 +1,136 @@
+/*
+ * Virtio Shared dma-buf
+ *
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors:
+ * Albert Esteve 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "hw/virtio/virtio-dmabuf.h"
+
+
+static GMutex lock;
+static GHashTable *resource_uuids;
+
+/*
+ * uuid_equal_func: wrapper for UUID is_equal function to
+ * satisfy g_hash_table_new expected parameters signatures.
+ */
+static int uuid_equal_func(const void *lhv, const void *rhv)
+{
+return qemu_uuid_is_equal(lhv, rhv);
+}
+
+static bool virtio_add_resource(QemuUUID *uuid, struct VirtioSharedObject 
*value)


Per QEMU coding style we use typedefs, so "VirtioSharedObject" here.


+{
+if (resource_uuids == NULL) {
+resource_uuids = g_hash_table_new_full(
+qemu_uuid_hash, uuid_equal_func, NULL, g_free);
+}
+if (g_hash_table_lookup(resource_uuids, uuid) != NULL) {
+return false;
+}
+
+return g_hash_table_insert(resource_uuids, uuid, value);


Hmm shouldn't this function take the lock to access resource_uuids?


+}
+
+static gpointer virtio_lookup_resource(const QemuUUID *uuid)
+{
+if (resource_uuids == NULL) {
+return NULL;
+}
+
+return g_hash_table_lookup(resource_uuids, uuid);


Ditto.

Here you can directly return the casted type (VirtioSharedObject *),
since a plain gpointer isn't really used / useful.


+}
+
+bool virtio_add_dmabuf(QemuUUID *uuid, int udmabuf_fd)
+{
+bool result;
+struct VirtioSharedObject *vso;
+if (udmabuf_fd < 0) {
+return false;
+}
+vso = g_new0(struct VirtioSharedObject, 1);


s/g_new0/g_new/


+g_mutex_lock(&lock);
+vso->type = TYPE_DMABUF;
+vso->value = GINT_TO_POINTER(udmabuf_fd);
+result = virtio_add_resource(uuid, vso);
+g_mutex_unlock(&lock);
+
+return result;
+}
+
+bool virtio_add_vhost_device(QemuUUID *uuid, struct vhost_dev *dev)
+{
+bool result;
+struct VirtioSharedObject *vso;
+if (dev == NULL) {
+return false;
+}
+vso = g_new0(struct VirtioSharedObject, 1);
+g_mutex_lock(&lock);
+vso->type = TYPE_VHOST_DEV;
+vso->value = dev;
+result = virtio_add_resource(uuid, vso);


Ah, you lock here... I'd rather do it in the callee.


+g_mutex_unlock(&lock);
+
+return result;
+}
+
+bool virtio_remove_resource(const QemuUUID *uuid)
+{
+bool result;
+g_mutex_lock(&lock);
+result = g_hash_table_remove(resource_uuids, uuid);
+g_mutex_unlock(&lock);


virtio_remove_resource() correctly locks. For API parity,
virtio_add_resource() should too.


+
+return result;
+}
+
+static struct VirtioSharedObject *get_shared_object(const QemuUUID *uui

[PATCH RESEND 09/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU

2023-09-05 Thread Harsh Prateek Bora

This patch implements support for hcall H_GUEST_CREATE_VCPU which is
used to instantiate a new VCPU for a previously created nested guest.
The L1 provide the guest-id (returned by L0 during call to
H_GUEST_CREATE) and an associated unique vcpu-id to refer to this
instance in future calls. It is assumed that vcpu-ids are being
allocated in a sequential manner and max vcpu limit is 2048.

Signed-off-by: Michael Neuling 
Signed-off-by: Shivaprasad G Bhat 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr_nested.c | 110 ++
 include/hw/ppc/spapr.h|   1 +
 include/hw/ppc/spapr_nested.h |   1 +
 3 files changed, 112 insertions(+)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 09bbbfb341..e7956685af 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -376,6 +376,47 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }
 
+static
+SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
+ target_ulong lpid)
+{
+SpaprMachineStateNestedGuest *guest;
+
+guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
+return guest;
+}
+
+static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
+   target_ulong vcpuid,
+   bool inoutbuf)
+{
+struct SpaprMachineStateNestedGuestVcpu *vcpu;
+
+if (vcpuid >= NESTED_GUEST_VCPU_MAX) {
+return false;
+}
+
+if (!(vcpuid < guest->vcpus)) {
+return false;
+}
+
+vcpu = &guest->vcpu[vcpuid];
+if (!vcpu->enabled) {
+return false;
+}
+
+if (!inoutbuf) {
+return true;
+}
+
+/* Check to see if the in/out buffers are registered */
+if (vcpu->runbufin.addr && vcpu->runbufout.addr) {
+return true;
+}
+
+return false;
+}
+
 static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
  SpaprMachineState *spapr,
  target_ulong opcode,
@@ -448,6 +489,11 @@ static void
 destroy_guest_helper(gpointer value)
 {
 struct SpaprMachineStateNestedGuest *guest = value;
+int i = 0;
+for (i = 0; i < guest->vcpus; i++) {
+cpu_ppc_tb_free(&guest->vcpu[i].env);
+}
+g_free(guest->vcpu);
 g_free(guest);
 }
 
@@ -518,6 +564,69 @@ static target_ulong h_guest_create(PowerPCCPU *cpu,
 return H_SUCCESS;
 }
 
+static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,
+SpaprMachineState *spapr,
+target_ulong opcode,
+target_ulong *args)
+{
+CPUPPCState *env = &cpu->env, *l2env;
+target_ulong flags = args[0];
+target_ulong lpid = args[1];
+target_ulong vcpuid = args[2];
+SpaprMachineStateNestedGuest *guest;
+
+if (flags) { /* don't handle any flags for now */
+return H_UNSUPPORTED_FLAG;
+}
+
+guest = spapr_get_nested_guest(spapr, lpid);
+if (!guest) {
+return H_P2;
+}
+
+if (vcpuid < guest->vcpus) {
+return H_IN_USE;
+}
+
+if (guest->vcpus >= NESTED_GUEST_VCPU_MAX) {
+return H_P3;
+}
+
+if (guest->vcpus) {
+struct SpaprMachineStateNestedGuestVcpu *vcpus;
+vcpus = g_try_renew(struct SpaprMachineStateNestedGuestVcpu,
+guest->vcpu,
+guest->vcpus + 1);
+if (!vcpus) {
+return H_NO_MEM;
+}
+memset(&vcpus[guest->vcpus], 0,
+   sizeof(struct SpaprMachineStateNestedGuestVcpu));
+guest->vcpu = vcpus;
+l2env = &vcpus[guest->vcpus].env;
+} else {
+guest->vcpu = g_try_new0(struct SpaprMachineStateNestedGuestVcpu, 1);
+if (guest->vcpu == NULL) {
+return H_NO_MEM;
+}
+l2env = &guest->vcpu->env;
+}
+/* need to memset to zero otherwise we leak L1 state to L2 */
+memset(l2env, 0, sizeof(CPUPPCState));
+/* Copy L1 PVR to L2 */
+l2env->spr[SPR_PVR] = env->spr[SPR_PVR];
+cpu_ppc_tb_init(l2env, SPAPR_TIMEBASE_FREQ);
+
+guest->vcpus++;
+assert(vcpuid < guest->vcpus); /* linear vcpuid allocation only */
+guest->vcpu[vcpuid].enabled = true;
+
+if (!vcpu_check(guest, vcpuid, false)) {
+return H_PARAMETER;
+}
+return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
 spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -531,6 +640,7 @@ void spapr_register_nested_phyp(void)
 spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, 
h_guest_get_capabilities);
 spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, 
h_guest_set_capabilities);
 spapr_register_hypercall(H_GUEST_CREATE  , h_guest_create);
+spapr_register_hypercall(H_GUEST_CREATE_VCPU , h_guest_creat

[PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API

2023-09-05 Thread Harsh Prateek Bora

This patch introduces a new cmd line option cap-nested-papr to enable
support for nested PAPR API by setting the nested.api version accordingly.
It requires the user to launch the L0 Qemu in TCG mode and then L1 Linux
can then launch the nested guest in KVM mode. Unlike cap-nested-hv,
this is meant for nested guest on pseries (PowerVM) where L0 retains
whole state of the nested guest. Both APIs are thus mutually exclusive.
Support for related hcalls is being added in next set of patches.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr.c |  2 ++
 hw/ppc/spapr_caps.c| 48 ++
 include/hw/ppc/spapr.h |  5 -
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0aa9f21516..cbab7a825f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2092,6 +2092,7 @@ static const VMStateDescription vmstate_spapr = {
 &vmstate_spapr_cap_fwnmi,
 &vmstate_spapr_fwnmi,
 &vmstate_spapr_cap_rpt_invalidate,
+&vmstate_spapr_cap_nested_papr,
 NULL
 }
 };
@@ -4685,6 +4686,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
 smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
 smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
+smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
 smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index a3a790b026..d3b9f107aa 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -491,6 +491,44 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
*spapr,
 }
 }
 
+static void cap_nested_papr_apply(SpaprMachineState *spapr,
+uint8_t val, Error **errp)
+{
+ERRP_GUARD();
+PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+CPUPPCState *env = &cpu->env;
+
+if (!val) {
+/* capability disabled by default */
+return;
+}
+
+if (tcg_enabled()) {
+if (!(env->insns_flags2 & PPC2_ISA300)) {
+error_setg(errp, "Nested-PAPR only supported on POWER9 and later");
+error_append_hint(errp,
+  "Try appending -machine cap-nested-papr=off\n");
+return;
+}
+spapr->nested.api = NESTED_API_PAPR;
+} else if (kvm_enabled()) {
+/*
+ * this gets executed in L1 qemu when L2 is launched,
+ * needs kvm-hv support in L1 kernel.
+ */
+if (!kvmppc_has_cap_nested_kvm_hv()) {
+error_setg(errp,
+   "KVM implementation does not support Nested-HV");
+error_append_hint(errp,
+  "Try appending -machine cap-nested-hv=off\n");
+} else if (kvmppc_set_cap_nested_kvm_hv(val) < 0) {
+error_setg(errp, "Error enabling cap-nested-hv with KVM");
+error_append_hint(errp,
+  "Try appending -machine cap-nested-hv=off\n");
+}
+}
+}
+
 static void cap_large_decr_apply(SpaprMachineState *spapr,
  uint8_t val, Error **errp)
 {
@@ -736,6 +774,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 .type = "bool",
 .apply = cap_nested_kvm_hv_apply,
 },
+[SPAPR_CAP_NESTED_PAPR] = {
+.name = "nested-papr",
+.description = "Allow Nested PAPR (Phyp)",
+.index = SPAPR_CAP_NESTED_PAPR,
+.get = spapr_cap_get_bool,
+.set = spapr_cap_set_bool,
+.type = "bool",
+.apply = cap_nested_papr_apply,
+},
 [SPAPR_CAP_LARGE_DECREMENTER] = {
 .name = "large-decr",
 .description = "Allow Large Decrementer",
@@ -920,6 +967,7 @@ SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
 SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
 SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
 SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
+SPAPR_CAP_MIG_STATE(nested_papr, SPAPR_CAP_NESTED_PAPR);
 SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
 SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index c8b42af430..8a6e9ce929 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,8 +81,10 @@ typedef enum {
 #define SPAPR_CAP_RPT_INVALIDATE0x0B
 /* Support for AIL modes */
 #define SPAPR_CAP_AIL_MODE_30x0C
+/* Nested PAPR */
+#define SPAPR_CAP_NESTED_PAPR   0x0D
 /* Num Caps */
-#define SPAPR_CAP_NUM   (SPAPR_CAP_AIL_MODE_3 + 1)
+#define SPAPR_CAP_NUM   (SPAPR_CAP_NESTED_PAPR + 1)

[PATCH 00/15] Nested PAPR API (KVM on PowerVM)

2023-09-05 Thread Harsh Prateek Bora

There is an existing Nested-HV API to enable nested guests on powernv
machines. However, that is not supported on pseries/PowerVM LPARs.
This patch series implements required hcall interfaces to enable nested
guests with KVM on PowerVM.
Unlike Nested-HV, with this API, entire L2 state is retained by L0
during guest entry/exit and uses pre-defined Guest State Buffer (GSB)
format to communicate guest state between L1 and L2 via L0.

L0 here refers to the phyp/PowerVM, or launching a Qemu TCG L0 with the
newly introduced option cap-nested-papr=true (refer patch 5/15).
L1 refers to the LPAR host on PowerVM or Linux booted on Qemu TCG with
above mentioned option cap-nested-papr=true.
L2 refers to nested guest running on top of L1 using KVM.
No SW changes needed for Qemu running in L1 Linux as well as L2 Kernel.

There is a Linux Kernel side patch series to enable support for Nested
PAPR in L1 and same can be found at below url:

Linux Kernel RFC PATCH v4:
- 
https://lore.kernel.org/linuxppc-dev/20230905034658.82835-1-jniet...@gmail.com/

For more details, documentation can be referred in either of patch
series.

There are scripts available to assist in setting up an environment for
testing nested guests at https://github.com/mikey/kvm-powervm-test

Thanks to Michael Neuling, Shivaprasad Bhat, Kautuk Consul, Vaibhav Jain
and Jordan Niethe.

PS: This is a resend of patch series after rebasing to upstream master.

Harsh Prateek Bora (15):
  ppc: spapr: Introduce Nested PAPR API related macros
  ppc: spapr: Add new/extend structs to support Nested PAPR API
  ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr
  ppc: spapr: Start using nested.api for nested kvm-hv api
  ppc: spapr: Introduce cap-nested-papr for nested PAPR API
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU
  ppc: spapr: Initialize the GSB Elements lookup table.
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE
  ppc: spapr: Use correct source for parttbl info for nested PAPR API.
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU
  ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE
  ppc: spapr: Document Nested PAPR API

 docs/devel/nested-papr.txt  |  500 ++
 hw/ppc/spapr.c  |   28 +-
 hw/ppc/spapr_caps.c |   50 +
 hw/ppc/spapr_hcall.c|1 +
 hw/ppc/spapr_nested.c   | 1504 +--
 include/hw/ppc/ppc.h|2 +
 include/hw/ppc/spapr.h  |   35 +-
 include/hw/ppc/spapr_cpu_core.h |7 +-
 include/hw/ppc/spapr_nested.h   |  378 
 target/ppc/cpu.h|2 +
 10 files changed, 2433 insertions(+), 74 deletions(-)
 create mode 100644 docs/devel/nested-papr.txt

-- 
2.39.3

Re: [RFC 0/1] virtio-net: add support for SR-IOV emulation

2023-09-05 Thread Yui Washizu




Hi Jason,


On 2023/08/30 14:28, Yui Washizu wrote:


On 2023/07/24 15:58, Jason Wang wrote:
On Mon, Jul 24, 2023 at 10:32 AM Yui Washizu  
wrote:


On 2023/07/20 11:20, Jason Wang wrote:
On Wed, Jul 19, 2023 at 9:59 AM Yui Washizu  
wrote:

This patch series is the first step towards enabling
hardware offloading of the L2 packet switching feature on 
virtio-net device to host machine.

We are considering that this hardware offloading enables
the use of high-performance networks in virtual infrastructures,
such as container infrastructures on VMs.

To enable L2 packet switching by SR-IOV VFs, we are considering 
the following:

- making the guest recognize virtio-net devices as SR-IOV PF devices
    (archived with this patch series)
- allowing virtio-net devices to connect SR-IOV VFs to the backend 
networks,
    leaving the L2 packet switching feature to the management 
layer like libvirt

Could you please show the qemu command line you want to propose here?


I am considering how to specify the properties of VFs to connect SR-IOV
VFs to the backend networks.


For example:


qemu-system-x86_64 -device
pcie-root-port,port=8,chassis=8,id=pci.8,bus=pcie.0,multifunction=on
 -netdev tap,id=hostnet0,vhost=on
 -netdev tap,id=vfnet1,vhost=on # backend 
network for

SR-IOV VF 1
 -netdev tap,id=vfnet2,vhost=on # backend 
network for

SR-IOV VF 2
 -device
virtio-net-pci,netdev=hostnet0,sriov_max_vfs=2,sriov_netdev=vfnet1:vfnet2,... 




In this example, we can specify multiple backend networks to the VFs
by adding "sriov_netdev" and separating them with ":".

This seems what is in my mind as well, more below


Additionally, when passing properties like "rx_queue_size" to VFs, we
can utilize new properties,
such as "sriov_rx_queue_size_per_vfs," to ensure that the same value is
passed to all VFs.

Or we can introduce new device like:

-netdev tap,id=hn0 \
-device virtio-net-pci,netdev=hn0,id=vnet_pf \
-netdev tap,netdev=hn1 \
-device 
virtio-net-pci-vf,netdev=hn1,id=vf0,pf=vnet_pf,rx_queue_size=XYZ ... \


This allows us to reuse the codes for configuring vf parameters. But
note that rx_queue_size doesn't make too much sense to vhost-vDPA, as
qemu can perform nothing more than a simple sanity test.

Thanks



Thanks for proposing this new way.

I have considered how to implement this.


As virtio-net-pci-vf device should show up

on the guest only when the guest OS creates a VF,

the guest must not be able to see the VF device on PCI bus when qemu 
starts.


However, it's hard to realize this without overcomplicating

relevant code due to current qemu implementation.

It's because qdev_device_add_from_qdict,

a function which is called when devices are specified

with "-device" option of qemu startup command,

always create devices by qdev_new and qdev_realize.

It might be possible that we fix it

so that qdev_new/qdev_realize aren't triggered for virtio-net-pci-vf 
devices,


but It seems that we need to special case the device in very generic code

like qdev_device_add_from_qdict(), qdev_device_add(),

device_init_func() or their caller function.


Given my current ideas,

it seems like this PATCH could become complex.

Woule you have any suggestions

for achieving this in more simple way possible ?




I was wondering if you could give me some feedback.
Best regard.





I'm still considering about how to specify it, so please give me any
comments if you have any.



    - This makes hardware offloading of L2 packet switching possible.
  For example, when using vDPA devices, it allows the guest
  to utilize SR-IOV NIC embedded switch of hosts.

This would be interesting.

Thanks

This patch series aims to enable SR-IOV emulation on virtio-net 
devices.
With this series, the guest can identify the virtio-net device as 
an SR-IOV PF device.
The newly added property 'sriov_max_vfs' allows us to enable the 
SR-IOV feature

on the virtio-net device.
Currently, we are unable to specify the properties of a VF created 
from the guest.

The properties are set to their default values.
In the future, we plan to allow users to set the properties.

qemu-system-x86_64 --device virtio-net,sriov_max_vfs=
# when 'sriov_max_vfs' is present, the SR-IOV feature will be 
automatically enabled

#  means the max number of VF on guest

Example commands to create VFs in virtio-net device from the guest:

guest% readlink -f /sys/class/net/eth1/device
/sys/devices/pci:00/:00:02.0/:01:00.0/virtio1
guest% echo "2" > 
/sys/devices/pci:00/:00:02.0/:01:00.0/sriov_numvfs

guest% ip link show
   eth0: 
   eth1: 
   eth2:  #virtual VF created
   eth3:  #virtual VF created

Please note that communication between VF and PF/VF is not 
possible by this patch series itself.


Yui Washizu (1):
    virtio-pci: add SR-IOV capability

   hw/pci/msix.c  |  8 +++--
   hw/pci/pci.c   |  4 +++
   hw

Re: [PATCH v5 1/4] uuid: add a hash function

2023-09-05 Thread Philippe Mathieu-Daudé


On 2/8/23 11:08, Albert Esteve wrote:

Add hash function to uuid module using the
djb2 hash algorithm.


^ This info ...


Add a couple simple unit tests for the hash
function, checking collisions for similar UUIDs.

Signed-off-by: Albert Esteve 
---
  include/qemu/uuid.h|  2 ++
  tests/unit/test-uuid.c | 27 +++
  util/uuid.c| 14 ++
  3 files changed, 43 insertions(+)

diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
index dc40ee1fc9..e24a1099e4 100644
--- a/include/qemu/uuid.h
+++ b/include/qemu/uuid.h
@@ -96,4 +96,6 @@ int qemu_uuid_parse(const char *str, QemuUUID *uuid);
  
  QemuUUID qemu_uuid_bswap(QemuUUID uuid);
  
+uint32_t qemu_uuid_hash(const void *uuid);

+
  #endif
diff --git a/tests/unit/test-uuid.c b/tests/unit/test-uuid.c
index c111de5fc1..aedc125ae9 100644
--- a/tests/unit/test-uuid.c
+++ b/tests/unit/test-uuid.c
@@ -171,6 +171,32 @@ static void test_uuid_unparse_strdup(void)
  }
  }
  
+static void test_uuid_hash(void)

+{
+QemuUUID uuid;
+int i;
+
+for (i = 0; i < 100; i++) {
+qemu_uuid_generate(&uuid);
+/* Obtain the UUID hash */
+uint32_t hash_a = qemu_uuid_hash(&uuid);
+int data_idx = g_random_int_range(0, 15);
+/* Change a single random byte of the UUID */
+if (uuid.data[data_idx] < 0xFF) {
+uuid.data[data_idx]++;
+} else {
+uuid.data[data_idx]--;
+}
+/* Obtain the UUID hash again */
+uint32_t hash_b = qemu_uuid_hash(&uuid);
+/*
+ * Both hashes shall be different (avoid collision)
+ * for any change in the UUID fields
+ */
+g_assert_cmpint(hash_a, !=, hash_b);
+}
+}
+
  int main(int argc, char **argv)
  {
  g_test_init(&argc, &argv, NULL);
@@ -179,6 +205,7 @@ int main(int argc, char **argv)
  g_test_add_func("/uuid/parse", test_uuid_parse);
  g_test_add_func("/uuid/unparse", test_uuid_unparse);
  g_test_add_func("/uuid/unparse_strdup", test_uuid_unparse_strdup);
+g_test_add_func("/uuid/hash", test_uuid_hash);
  
  return g_test_run();

  }
diff --git a/util/uuid.c b/util/uuid.c
index b1108dde78..64eaf2e208 100644
--- a/util/uuid.c
+++ b/util/uuid.c
@@ -116,3 +116,17 @@ QemuUUID qemu_uuid_bswap(QemuUUID uuid)
  bswap16s(&uuid.fields.time_high_and_version);
  return uuid;
  }


... would be more useful as a comment here.

/* djb2 hash algorithm */

Anyhow,

Reviewed-by: Philippe Mathieu-Daudé 


+uint32_t qemu_uuid_hash(const void *uuid)
+{
+QemuUUID *qid = (QemuUUID *) uuid;
+uint32_t h = 5381;
+int i;
+
+for (i = 0; i < ARRAY_SIZE(qid->data); i++) {
+h = (h << 5) + h + qid->data[i];
+}
+
+return h;
+}
+

Re: [RFC Patch 4/5] hw/display: Allwinner A10 LCDC emulation

2023-09-05 Thread Philippe Mathieu-Daudé


+Gerd & Marc-André for the ui/fb parts.

On 5/9/23 22:14, Strahinja Jankovic wrote:

This patch adds support for Allwinner A10 LCD controller.
Current emulation supports only RGB32 colorspace and interacts with
DEBE0 to obtain framebuffer address and screen size.

Signed-off-by: Strahinja Jankovic 
---
  hw/arm/allwinner-a10.c  |  10 +
  hw/display/allwinner-a10-lcdc.c | 275 
  hw/display/meson.build  |   1 +
  hw/display/trace-events |   5 +
  include/hw/arm/allwinner-a10.h  |   2 +
  include/hw/display/allwinner-a10-lcdc.h |  77 +++
  6 files changed, 370 insertions(+)
  create mode 100644 hw/display/allwinner-a10-lcdc.c
  create mode 100644 include/hw/display/allwinner-a10-lcdc.h

diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
index 624e95af46..f93bc5266d 100644
--- a/hw/arm/allwinner-a10.c
+++ b/hw/arm/allwinner-a10.c
@@ -41,6 +41,7 @@
  #define AW_A10_WDT_BASE 0x01c20c90
  #define AW_A10_RTC_BASE 0x01c20d00
  #define AW_A10_I2C0_BASE0x01c2ac00
+#define AW_A10_LCDC0_BASE   0x01c0c000
  #define AW_A10_HDMI_BASE0x01c16000
  #define AW_A10_GPU_BASE 0x01c4
  #define AW_A10_DE_BE0_BASE  0x01e6
@@ -101,6 +102,8 @@ static void aw_a10_init(Object *obj)
  
  object_initialize_child(obj, "hdmi", &s->hdmi, TYPE_AW_A10_HDMI);
  
+object_initialize_child(obj, "lcd0", &s->lcd0, TYPE_AW_A10_LCDC);

+
  object_initialize_child(obj, "de_be0", &s->de_be0, TYPE_AW_A10_DEBE);
  
  object_initialize_child(obj, "mali400", &s->gpu, TYPE_AW_GPU);

@@ -230,6 +233,13 @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
  sysbus_realize(SYS_BUS_DEVICE(&s->de_be0), &error_fatal);
  sysbus_mmio_map(SYS_BUS_DEVICE(&s->de_be0), 0, AW_A10_DE_BE0_BASE);
  
+/* LCD Controller */

+object_property_set_link(OBJECT(&s->lcd0), "debe",
+ OBJECT(&s->de_be0), &error_fatal);


IIUC you have LCDC polling DEBE for size update then invalidate,
shouldn't be the opposite, LCDC linked to DEBE and DEBE call the
LCDC invalidate handler on resize?


+sysbus_realize(SYS_BUS_DEVICE(&s->lcd0), &error_fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->lcd0), 0, AW_A10_LCDC0_BASE);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->lcd0), 0, qdev_get_gpio_in(dev, 44));
+
  /* MALI GPU */
  sysbus_realize(SYS_BUS_DEVICE(&s->gpu), &error_fatal);
  sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpu), 0, AW_A10_GPU_BASE);
diff --git a/hw/display/allwinner-a10-lcdc.c b/hw/display/allwinner-a10-lcdc.c
new file mode 100644
index 00..8367ac32be
--- /dev/null
+++ b/hw/display/allwinner-a10-lcdc.c
@@ -0,0 +1,275 @@
+/*
+ * Allwinner A10 LCD Control Module emulation
+ *
+ * Copyright (C) 2023 Strahinja Jankovic 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "qemu/module.h"
+#include "hw/display/allwinner-a10-lcdc.h"
+#include "hw/irq.h"
+#include "ui/pixel_ops.h"
+#include "trace.h"
+#include "sysemu/dma.h"
+#include "framebuffer.h"
+
+/* LCDC register offsets */
+enum {
+REG_TCON_GCTL   = 0x, /* TCON Global control register */
+REG_TCON_GINT0  = 0x0004, /* TCON Global interrupt register 0 */
+};
+
+/* TCON_GCTL register fields */
+#define REG_TCON_GCTL_EN(1 << 31)
+
+/* TCON_GINT0 register fields */
+#define REG_TCON_GINT0_VB_INT_EN(1 << 31)
+#define REG_TCON_GINT0_VB_INT_FLAG  (1 << 14)
+
+#define REG_INDEX(offset)(offset / sizeof(uint32_t))
+
+static void allwinner_a10_lcdc_tick(void *opaque)
+{
+AwA10LcdcState *s = AW_A10_LCDC(opaque);
+
+if (s->regs[REG_INDEX(REG_TCON_GINT0)] & REG_TCON_GINT0_VB_INT_EN) {
+s->regs[REG_INDEX(REG_TCON_GINT0)] |= REG_TCON_GINT0_VB_INT_FLAG;
+qemu_irq_raise(s->irq);
+}
+}
+
+static uint64_t allwinner_a10_lcdc_read(void *opaque, hwaddr offset,
+   unsigned size)
+{
+AwA10LcdcState *s = AW_A10_LCDC(opaque);
+const uint32_t idx = REG_INDEX(offset);
+uint32_t val = s->regs[idx];
+
+switch (offset) {
+case 0x800 ... AW_A10_LCDC_IOSIZE:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n",
+

[PATCH RESEND 03/15] ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr

2023-09-05 Thread Harsh Prateek Bora

Use nested guest state specific struct for storing related info.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr.c | 4 ++--
 hw/ppc/spapr_nested.c  | 4 ++--
 include/hw/ppc/spapr.h | 3 ++-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 07e91e3800..e44686b04d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1340,8 +1340,8 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
 
 assert(lpid != 0);
 
-patb = spapr->nested_ptcr & PTCR_PATB;
-pats = spapr->nested_ptcr & PTCR_PATS;
+patb = spapr->nested.ptcr & PTCR_PATB;
+pats = spapr->nested.ptcr & PTCR_PATS;
 
 /* Check if partition table is properly aligned */
 if (patb & MAKE_64BIT_MASK(0, pats + 12)) {
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 121aa96ddc..a669470f1a 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -25,7 +25,7 @@ static target_ulong h_set_ptbl(PowerPCCPU *cpu,
 return H_PARAMETER;
 }
 
-spapr->nested_ptcr = ptcr; /* Save new partition table */
+spapr->nested.ptcr = ptcr; /* Save new partition table */
 
 return H_SUCCESS;
 }
@@ -157,7 +157,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 struct kvmppc_pt_regs *regs;
 hwaddr len;
 
-if (spapr->nested_ptcr == 0) {
+if (spapr->nested.ptcr == 0) {
 return H_NOT_AVAILABLE;
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 3990fed1d9..c8b42af430 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -12,6 +12,7 @@
 #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
 #include "hw/ppc/xics.h"/* For ICSState */
 #include "hw/ppc/spapr_tpm_proxy.h"
+#include "hw/ppc/spapr_nested.h" /* for SpaprMachineStateNested */
 
 struct SpaprVioBus;
 struct SpaprPhbState;
@@ -216,7 +217,7 @@ struct SpaprMachineState {
 uint32_t vsmt;   /* Virtual SMT mode (KVM's "core stride") */
 
 /* Nested HV support (TCG only) */
-uint64_t nested_ptcr;
+struct SpaprMachineStateNested nested;
 
 Notifier epow_notifier;
 QTAILQ_HEAD(, SpaprEventLogEntry) pending_events;
-- 
2.39.3

[PATCH RESEND 13/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU

2023-09-05 Thread Harsh Prateek Bora

Once the L1 has created a nested guest and its associated VCPU, it can
request for the execution of nested guest by setting its initial state
which can be done either using the h_guest_set_state or using the input
buffers along with the call to h_guest_run_vcpu(). On guest exit, L0
uses output buffers to convey the exit cause to the L1. L0 takes care of
switching context from L1 to L2 during guest entry and restores L1 context
on guest exit.

Unlike nested-hv, L2 (nested) guest's entire state is retained with
L0 after guest exit and restored on next entry in case of nested-papr.

Signed-off-by: Michael Neuling 
Signed-off-by: Kautuk Consul 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr_nested.c   | 471 +++-
 include/hw/ppc/spapr_cpu_core.h |   7 +-
 include/hw/ppc/spapr_nested.h   |   6 +
 3 files changed, 408 insertions(+), 76 deletions(-)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 67e389a762..3605f27115 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -12,6 +12,17 @@
 #ifdef CONFIG_TCG
 #define PRTS_MASK  0x1f
 
+static void exit_nested_restore_vcpu(PowerPCCPU *cpu, int excp,
+ SpaprMachineStateNestedGuestVcpu *vcpu);
+static void exit_process_output_buffer(PowerPCCPU *cpu,
+  SpaprMachineStateNestedGuest *guest,
+  target_ulong vcpuid,
+  target_ulong *r3);
+static void restore_common_regs(CPUPPCState *dst, CPUPPCState *src);
+static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
+   target_ulong vcpuid,
+   bool inoutbuf);
+
 static target_ulong h_set_ptbl(PowerPCCPU *cpu,
SpaprMachineState *spapr,
target_ulong opcode,
@@ -187,21 +198,21 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 return H_PARAMETER;
 }
 
-spapr_cpu->nested_host_state = g_try_new(struct nested_ppc_state, 1);
-if (!spapr_cpu->nested_host_state) {
+spapr_cpu->nested_hv_host = g_try_new(struct nested_ppc_state, 1);
+if (!spapr_cpu->nested_hv_host) {
 return H_NO_MEM;
 }
 
 assert(env->spr[SPR_LPIDR] == 0);
 assert(env->spr[SPR_DPDES] == 0);
-nested_save_state(spapr_cpu->nested_host_state, cpu);
+nested_save_state(spapr_cpu->nested_hv_host, cpu);
 
 len = sizeof(*regs);
 regs = address_space_map(CPU(cpu)->as, regs_ptr, &len, false,
 MEMTXATTRS_UNSPECIFIED);
 if (!regs || len != sizeof(*regs)) {
 address_space_unmap(CPU(cpu)->as, regs, len, 0, false);
-g_free(spapr_cpu->nested_host_state);
+g_free(spapr_cpu->nested_hv_host);
 return H_P2;
 }
 
@@ -276,105 +287,146 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
 
 void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 {
+SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+CPUState *cs = CPU(cpu);
 CPUPPCState *env = &cpu->env;
 SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+target_ulong r3_return = env->excp_vectors[excp]; /* hcall return value */
 struct nested_ppc_state l2_state;
-target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
-target_ulong regs_ptr = spapr_cpu->nested_host_state->gpr[5];
-target_ulong hsrr0, hsrr1, hdar, asdr, hdsisr;
+target_ulong hv_ptr, regs_ptr;
+target_ulong hsrr0 = 0, hsrr1 = 0, hdar = 0, asdr = 0, hdsisr = 0;
 struct kvmppc_hv_guest_state *hvstate;
 struct kvmppc_pt_regs *regs;
 hwaddr len;
+target_ulong lpid = 0, vcpuid = 0;
+struct SpaprMachineStateNestedGuestVcpu *vcpu = NULL;
+struct SpaprMachineStateNestedGuest *guest = NULL;
 
 assert(spapr_cpu->in_nested);
-
-nested_save_state(&l2_state, cpu);
-hsrr0 = env->spr[SPR_HSRR0];
-hsrr1 = env->spr[SPR_HSRR1];
-hdar = env->spr[SPR_HDAR];
-hdsisr = env->spr[SPR_HDSISR];
-asdr = env->spr[SPR_ASDR];
+if (spapr->nested.api == NESTED_API_KVM_HV) {
+nested_save_state(&l2_state, cpu);
+hsrr0 = env->spr[SPR_HSRR0];
+hsrr1 = env->spr[SPR_HSRR1];
+hdar = env->spr[SPR_HDAR];
+hdsisr = env->spr[SPR_HDSISR];
+asdr = env->spr[SPR_ASDR];
+} else if (spapr->nested.api == NESTED_API_PAPR) {
+lpid = spapr_cpu->nested_papr_host->gpr[5];
+vcpuid = spapr_cpu->nested_papr_host->gpr[6];
+guest = spapr_get_nested_guest(spapr, lpid);
+assert(guest);
+vcpu_check(guest, vcpuid, false);
+vcpu = &guest->vcpu[vcpuid];
+
+exit_nested_restore_vcpu(cpu, excp, vcpu);
+/* do the output buffer for run_vcpu*/
+exit_process_output_buffer(cpu, guest, vcpuid, &r3_return);
+} else
+g_assert_not_reached();
 
 /*
  * Switch back to the host environment (including for any error).
  */
 assert(env->spr[

[PATCH RESEND 14/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE

2023-09-05 Thread Harsh Prateek Bora

This hcall is used by L1 to delete a guest entry in L0 or can also be
used to delete all guests if needed (usually in shutdown scenarios).

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr_nested.c | 32 
 include/hw/ppc/spapr_nested.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 3605f27115..5afdad4990 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -1692,6 +1692,37 @@ static void exit_process_output_buffer(PowerPCCPU *cpu,
 return;
 }
 
+static target_ulong h_guest_delete(PowerPCCPU *cpu,
+   SpaprMachineState *spapr,
+   target_ulong opcode,
+   target_ulong *args)
+{
+target_ulong flags = args[0];
+target_ulong lpid = args[1];
+struct SpaprMachineStateNestedGuest *guest;
+
+if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
+return H_FUNCTION;
+}
+
+/* handle flag deleteAllGuests, remaining bits reserved */
+if (flags & ~H_GUEST_DELETE_ALL_MASK) {
+return H_UNSUPPORTED_FLAG;
+} else if (flags & H_GUEST_DELETE_ALL_MASK) {
+g_hash_table_destroy(spapr->nested.guests);
+return H_SUCCESS;
+}
+
+guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
+if (!guest) {
+return H_P2;
+}
+
+g_hash_table_remove(spapr->nested.guests, GINT_TO_POINTER(lpid));
+
+return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
 spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -1709,6 +1740,7 @@ void spapr_register_nested_phyp(void)
 spapr_register_hypercall(H_GUEST_SET_STATE   , h_guest_set_state);
 spapr_register_hypercall(H_GUEST_GET_STATE   , h_guest_get_state);
 spapr_register_hypercall(H_GUEST_RUN_VCPU, h_guest_run_vcpu);
+spapr_register_hypercall(H_GUEST_DELETE  , h_guest_delete);
 }
 
 #else
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index ca5d28c06e..9eb43778ad 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -209,6 +209,7 @@
 #define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000 /* BE in GSB */
 #define GUEST_STATE_REQUEST_GUEST_WIDE   0x1
 #define GUEST_STATE_REQUEST_SET  0x2
+#define H_GUEST_DELETE_ALL_MASK  0x8000ULL
 
 #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
 .id = (i), \
-- 
2.39.3

[PATCH RESEND 07/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES

2023-09-05 Thread Harsh Prateek Bora

This patch implements nested PAPR hcall H_GUEST_SET_CAPABILITIES.
This is used by L1 to set capabilities of the nested guest being
created. The capabilities being set are subset of the capabilities
returned from the previous call to H_GUEST_GET_CAPABILITIES hcall.
Currently, it only supports P9/P10 capability check through PVR.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr.c|  1 +
 hw/ppc/spapr_nested.c | 46 +++
 include/hw/ppc/spapr_nested.h |  3 +++
 3 files changed, 50 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index cbab7a825f..7c6f6ee25d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3443,6 +3443,7 @@ static void spapr_instance_init(Object *obj)
 "Host serial number to advertise in guest device tree");
 /* Nested */
 spapr->nested.api = 0;
+spapr->nested.capabilities_set = false;
 }
 
 static void spapr_machine_finalizefn(Object *obj)
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 37f3a49be2..9af65f257f 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -399,6 +399,51 @@ static target_ulong h_guest_get_capabilities(PowerPCCPU 
*cpu,
 return H_SUCCESS;
 }
 
+static target_ulong h_guest_set_capabilities(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+  target_ulong *args)
+{
+CPUPPCState *env = &cpu->env;
+target_ulong flags = args[0];
+target_ulong capabilities = args[1];
+
+if (flags) { /* don't handle any flags capabilities for now */
+return H_PARAMETER;
+}
+
+
+/* isn't supported */
+if (capabilities & H_GUEST_CAPABILITIES_COPY_MEM) {
+env->gpr[4] = 0;
+return H_P2;
+}
+
+if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
+(CPU_POWERPC_POWER9_BASE)) {
+/* We are a P9 */
+if (!(capabilities & H_GUEST_CAPABILITIES_P9_MODE)) {
+env->gpr[4] = 1;
+return H_P2;
+}
+}
+
+if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
+(CPU_POWERPC_POWER10_BASE)) {
+/* We are a P10 */
+if (!(capabilities & H_GUEST_CAPABILITIES_P10_MODE)) {
+env->gpr[4] = 2;
+return H_P2;
+}
+}
+
+spapr->nested.capabilities_set = true;
+
+spapr->nested.pvr_base = env->spr[SPR_PVR];
+
+return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
 spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -410,6 +455,7 @@ void spapr_register_nested(void)
 void spapr_register_nested_phyp(void)
 {
 spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, 
h_guest_get_capabilities);
+spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, 
h_guest_set_capabilities);
 }
 
 #else
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index ce198e9f70..a7996251cb 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -193,6 +193,9 @@
 #define H_GUEST_CAPABILITIES_COPY_MEM 0x8000
 #define H_GUEST_CAPABILITIES_P9_MODE  0x4000
 #define H_GUEST_CAPABILITIES_P10_MODE 0x2000
+#define H_GUEST_CAP_COPY_MEM_BMAP   0
+#define H_GUEST_CAP_P9_MODE_BMAP1
+#define H_GUEST_CAP_P10_MODE_BMAP   2
 
 typedef struct SpaprMachineStateNestedGuest {
 unsigned long vcpus;
-- 
2.39.3

[PATCH RESEND 12/15] ppc: spapr: Use correct source for parttbl info for nested PAPR API.

2023-09-05 Thread Harsh Prateek Bora

For nested PAPR API, we use SpaprMachineStateNestedGuest struct to store
partition table info. Therefore, use the same in spapr_get_pate() as
well.

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr.c | 14 ++
 hw/ppc/spapr_nested.c  |  1 -
 include/hw/ppc/spapr.h |  3 +++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7c6f6ee25d..ee4b073d19 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1361,9 +1361,23 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
PowerPCCPU *cpu,
 patb += 16 * lpid;
 entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
 entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
+return true;
 }
 
+#ifdef CONFIG_TCG
+/* Nested PAPR API */
+SpaprMachineStateNestedGuest *guest;
+assert(lpid != 0);
+guest = spapr_get_nested_guest(spapr, lpid);
+assert(guest != NULL);
+
+entry->dw0 = guest->parttbl[0];
+entry->dw1 = guest->parttbl[1];
+
 return true;
+#else
+return false;
+#endif
 }
 
 #define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 2))
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 498e7286fa..67e389a762 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -377,7 +377,6 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }
 
-static
 SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
  target_ulong lpid)
 {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index c9f9682a46..cdc256f057 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -1052,5 +1052,8 @@ void spapr_vof_client_dt_finalize(SpaprMachineState 
*spapr, void *fdt);
 
 /* H_WATCHDOG */
 void spapr_watchdog_init(SpaprMachineState *spapr);
+/* Nested PAPR */
+SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState *spapr,
+ target_ulong lpid);
 
 #endif /* HW_SPAPR_H */
-- 
2.39.3

[PATCH RESEND 02/15] ppc: spapr: Add new/extend structs to support Nested PAPR API

2023-09-05 Thread Harsh Prateek Bora

This patch introduces new data structures to be used with Nested PAPR
API. Also extends kvmppc_hv_guest_state with additional set of registers
supported with nested PAPR API.

Signed-off-by: Michael Neuling 
Signed-off-by: Shivaprasad G Bhat 
Signed-off-by: Harsh Prateek Bora 
---
 include/hw/ppc/spapr_nested.h | 48 +++
 1 file changed, 48 insertions(+)

diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index 5cb668dd53..f8db31075b 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -189,6 +189,39 @@
 /* End of list of Guest State Buffer Element IDs */
 #define GSB_LASTGSB_VCPU_SPR_ASDR
 
+typedef struct SpaprMachineStateNestedGuest {
+unsigned long vcpus;
+struct SpaprMachineStateNestedGuestVcpu *vcpu;
+uint64_t parttbl[2];
+uint32_t pvr_logical;
+uint64_t tb_offset;
+} SpaprMachineStateNestedGuest;
+
+struct SpaprMachineStateNested {
+
+uint8_t api;
+#define NESTED_API_KVM_HV  1
+#define NESTED_API_PAPR2
+uint64_t ptcr;
+uint32_t lpid_max;
+uint32_t pvr_base;
+bool capabilities_set;
+GHashTable *guests;
+};
+
+struct SpaprMachineStateNestedGuestVcpuRunBuf {
+uint64_t addr;
+uint64_t size;
+};
+
+typedef struct SpaprMachineStateNestedGuestVcpu {
+bool enabled;
+struct SpaprMachineStateNestedGuestVcpuRunBuf runbufin;
+struct SpaprMachineStateNestedGuestVcpuRunBuf runbufout;
+CPUPPCState env;
+int64_t tb_offset;
+int64_t dec_expiry_tb;
+} SpaprMachineStateNestedGuestVcpu;
 
 /*
  * Register state for entering a nested guest with H_ENTER_NESTED.
@@ -228,6 +261,21 @@ struct kvmppc_hv_guest_state {
 uint64_t dawr1;
 uint64_t dawrx1;
 /* Version 2 ends here */
+uint64_t dec;
+uint64_t fscr;
+uint64_t fpscr;
+uint64_t bescr;
+uint64_t ebbhr;
+uint64_t ebbrr;
+uint64_t tar;
+uint64_t dexcr;
+uint64_t hdexcr;
+uint64_t hashkeyr;
+uint64_t hashpkeyr;
+uint64_t ctrl;
+uint64_t vscr;
+uint64_t vrsave;
+ppc_vsr_t vsr[64];
 };
 
 /* Latest version of hv_guest_state structure */
-- 
2.39.3

[PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES

2023-09-05 Thread Harsh Prateek Bora

This patch implements nested PAPR hcall H_GUEST_GET_CAPABILITIES and
also enables registration of nested PAPR hcalls whenever an L0 is
launched with cap-nested-papr=true. The common registration routine
shall be used by future patches for registration of related hcall
support
being added. This hcall is used by L1 kernel to get the set of guest
capabilities that are supported by L0 (Qemu TCG).

Signed-off-by: Michael Neuling 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr_caps.c   |  1 +
 hw/ppc/spapr_nested.c | 35 +++
 include/hw/ppc/spapr_nested.h |  6 ++
 3 files changed, 42 insertions(+)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index d3b9f107aa..cbe53a79ec 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -511,6 +511,7 @@ static void cap_nested_papr_apply(SpaprMachineState *spapr,
 return;
 }
 spapr->nested.api = NESTED_API_PAPR;
+spapr_register_nested_phyp();
 } else if (kvm_enabled()) {
 /*
  * this gets executed in L1 qemu when L2 is launched,
diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index a669470f1a..37f3a49be2 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -6,6 +6,7 @@
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/spapr_nested.h"
+#include "cpu-models.h"
 
 #ifdef CONFIG_TCG
 #define PRTS_MASK  0x1f
@@ -375,6 +376,29 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 address_space_unmap(CPU(cpu)->as, regs, len, len, true);
 }
 
+static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
+ SpaprMachineState *spapr,
+ target_ulong opcode,
+ target_ulong *args)
+{
+CPUPPCState *env = &cpu->env;
+target_ulong flags = args[0];
+
+if (flags) { /* don't handle any flags capabilities for now */
+return H_PARAMETER;
+}
+
+if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
+(CPU_POWERPC_POWER9_BASE))
+env->gpr[4] = H_GUEST_CAPABILITIES_P9_MODE;
+
+if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
+(CPU_POWERPC_POWER10_BASE))
+env->gpr[4] = H_GUEST_CAPABILITIES_P10_MODE;
+
+return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
 spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -382,6 +406,12 @@ void spapr_register_nested(void)
 spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
 spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, h_copy_tofrom_guest);
 }
+
+void spapr_register_nested_phyp(void)
+{
+spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, 
h_guest_get_capabilities);
+}
+
 #else
 void spapr_exit_nested(PowerPCCPU *cpu, int excp)
 {
@@ -392,4 +422,9 @@ void spapr_register_nested(void)
 {
 /* DO NOTHING */
 }
+
+void spapr_register_nested_phyp(void)
+{
+/* DO NOTHING */
+}
 #endif
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index f8db31075b..ce198e9f70 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -189,6 +189,11 @@
 /* End of list of Guest State Buffer Element IDs */
 #define GSB_LASTGSB_VCPU_SPR_ASDR
 
+/* Bit masks to be used in nested PAPR API */
+#define H_GUEST_CAPABILITIES_COPY_MEM 0x8000
+#define H_GUEST_CAPABILITIES_P9_MODE  0x4000
+#define H_GUEST_CAPABILITIES_P10_MODE 0x2000
+
 typedef struct SpaprMachineStateNestedGuest {
 unsigned long vcpus;
 struct SpaprMachineStateNestedGuestVcpu *vcpu;
@@ -331,6 +336,7 @@ struct nested_ppc_state {
 };
 
 void spapr_register_nested(void);
+void spapr_register_nested_phyp(void);
 void spapr_exit_nested(PowerPCCPU *cpu, int excp);
 
 #endif /* HW_SPAPR_NESTED_H */
-- 
2.39.3

[PATCH RESEND 08/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE

2023-09-05 Thread Harsh Prateek Bora

This hcall is used by L1 to indicate to L0 that a new nested guest needs
to be created and therefore necessary resource allocation shall be made.
The L0 uses a hash table for nested guest specific resource management.
This data structure is further utilized by other hcalls to operate on
related members during entire life cycle of the nested guest.

Signed-off-by: Michael Neuling 
Signed-off-by: Shivaprasad G Bhat 
Signed-off-by: Harsh Prateek Bora 
---
 hw/ppc/spapr_nested.c | 75 +++
 include/hw/ppc/spapr_nested.h |  3 ++
 2 files changed, 78 insertions(+)

diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
index 9af65f257f..09bbbfb341 100644
--- a/hw/ppc/spapr_nested.c
+++ b/hw/ppc/spapr_nested.c
@@ -444,6 +444,80 @@ static target_ulong h_guest_set_capabilities(PowerPCCPU 
*cpu,
 return H_SUCCESS;
 }
 
+static void
+destroy_guest_helper(gpointer value)
+{
+struct SpaprMachineStateNestedGuest *guest = value;
+g_free(guest);
+}
+
+static target_ulong h_guest_create(PowerPCCPU *cpu,
+   SpaprMachineState *spapr,
+   target_ulong opcode,
+   target_ulong *args)
+{
+CPUPPCState *env = &cpu->env;
+target_ulong flags = args[0];
+target_ulong continue_token = args[1];
+uint64_t lpid;
+int nguests = 0;
+struct SpaprMachineStateNestedGuest *guest;
+
+if (flags) { /* don't handle any flags for now */
+return H_UNSUPPORTED_FLAG;
+}
+
+if (continue_token != -1) {
+return H_P2;
+}
+
+if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
+return H_FUNCTION;
+}
+
+if (!spapr->nested.capabilities_set) {
+return H_STATE;
+}
+
+if (!spapr->nested.guests) {
+spapr->nested.lpid_max = NESTED_GUEST_MAX;
+spapr->nested.guests = g_hash_table_new_full(NULL,
+ NULL,
+ NULL,
+ destroy_guest_helper);
+}
+
+nguests = g_hash_table_size(spapr->nested.guests);
+
+if (nguests == spapr->nested.lpid_max) {
+return H_NO_MEM;
+}
+
+/* Lookup for available lpid */
+for (lpid = 1; lpid < spapr->nested.lpid_max; lpid++) {
+if (!(g_hash_table_lookup(spapr->nested.guests,
+  GINT_TO_POINTER(lpid {
+break;
+}
+}
+if (lpid == spapr->nested.lpid_max) {
+return H_NO_MEM;
+}
+
+guest = g_try_new0(struct SpaprMachineStateNestedGuest, 1);
+if (!guest) {
+return H_NO_MEM;
+}
+
+guest->pvr_logical = spapr->nested.pvr_base;
+
+g_hash_table_insert(spapr->nested.guests, GINT_TO_POINTER(lpid), guest);
+printf("%s: lpid: %lu (MAX: %i)\n", __func__, lpid, 
spapr->nested.lpid_max);
+
+env->gpr[4] = lpid;
+return H_SUCCESS;
+}
+
 void spapr_register_nested(void)
 {
 spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
@@ -456,6 +530,7 @@ void spapr_register_nested_phyp(void)
 {
 spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, 
h_guest_get_capabilities);
 spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, 
h_guest_set_capabilities);
+spapr_register_hypercall(H_GUEST_CREATE  , h_guest_create);
 }
 
 #else
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index a7996251cb..7841027df8 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -197,6 +197,9 @@
 #define H_GUEST_CAP_P9_MODE_BMAP1
 #define H_GUEST_CAP_P10_MODE_BMAP   2
 
+/* Nested PAPR API macros */
+#define NESTED_GUEST_MAX 4096
+
 typedef struct SpaprMachineStateNestedGuest {
 unsigned long vcpus;
 struct SpaprMachineStateNestedGuestVcpu *vcpu;
-- 
2.39.3

[PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros

2023-09-05 Thread Harsh Prateek Bora

Adding new macros for the new hypercall op-codes, their return codes,
Guest State Buffer (GSB) element IDs and few registers which shall be
used in following patches to support Nested PAPR API.

Signed-off-by: Michael Neuling 
Signed-off-by: Shivaprasad G Bhat 
Signed-off-by: Harsh Prateek Bora 
---
 include/hw/ppc/spapr.h|  23 -
 include/hw/ppc/spapr_nested.h | 186 ++
 target/ppc/cpu.h  |   2 +
 3 files changed, 209 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 538b2dfb89..3990fed1d9 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -367,6 +367,16 @@ struct SpaprMachineState {
 #define H_NOOP-63
 #define H_UNSUPPORTED -67
 #define H_OVERLAP -68
+#define H_STATE   -75
+#define H_INVALID_ELEMENT_ID   -79
+#define H_INVALID_ELEMENT_SIZE -80
+#define H_INVALID_ELEMENT_VALUE-81
+#define H_INPUT_BUFFER_NOT_DEFINED -82
+#define H_INPUT_BUFFER_TOO_SMALL   -83
+#define H_OUTPUT_BUFFER_NOT_DEFINED-84
+#define H_OUTPUT_BUFFER_TOO_SMALL  -85
+#define H_PARTITION_PAGE_TABLE_NOT_DEFINED -86
+#define H_GUEST_VCPU_STATE_NOT_HV_OWNED-87
 #define H_UNSUPPORTED_FLAG -256
 #define H_MULTI_THREADS_ACTIVE -9005
 
@@ -586,8 +596,17 @@ struct SpaprMachineState {
 #define H_RPT_INVALIDATE0x448
 #define H_SCM_FLUSH 0x44C
 #define H_WATCHDOG  0x45C
-
-#define MAX_HCALL_OPCODEH_WATCHDOG
+#define H_GUEST_GET_CAPABILITIES 0x460
+#define H_GUEST_SET_CAPABILITIES 0x464
+#define H_GUEST_CREATE   0x470
+#define H_GUEST_CREATE_VCPU  0x474
+#define H_GUEST_GET_STATE0x478
+#define H_GUEST_SET_STATE0x47C
+#define H_GUEST_RUN_VCPU 0x480
+#define H_GUEST_COPY_MEMORY  0x484
+#define H_GUEST_DELETE   0x488
+
+#define MAX_HCALL_OPCODEH_GUEST_DELETE
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
index d383486476..5cb668dd53 100644
--- a/include/hw/ppc/spapr_nested.h
+++ b/include/hw/ppc/spapr_nested.h
@@ -4,6 +4,192 @@
 #include "qemu/osdep.h"
 #include "target/ppc/cpu.h"
 
+/* Guest State Buffer Element IDs */
+#define GSB_HV_VCPU_IGNORED_ID  0x /* An element whose value is ignored */
+#define GSB_HV_VCPU_STATE_SIZE  0x0001 /* HV internal format VCPU state size */
+#define GSB_VCPU_OUT_BUF_MIN_SZ 0x0002 /* Min size of the Run VCPU o/p buffer 
*/
+#define GSB_VCPU_LPVR   0x0003 /* Logical PVR */
+#define GSB_TB_OFFSET   0x0004 /* Timebase Offset */
+#define GSB_PART_SCOPED_PAGETBL 0x0005 /* Partition Scoped Page Table */
+#define GSB_PROCESS_TBL 0x0006 /* Process Table */
+/* RESERVED 0x0007 - 0x0BFF */
+#define GSB_VCPU_IN_BUFFER  0x0C00 /* Run VCPU Input Buffer */
+#define GSB_VCPU_OUT_BUFFER 0x0C01 /* Run VCPU Out Buffer */
+#define GSB_VCPU_VPA0x0C02 /* HRA to Guest VCPU VPA */
+/* RESERVED 0x0C03 - 0x0FFF */
+#define GSB_VCPU_GPR0   0x1000
+#define GSB_VCPU_GPR1   0x1001
+#define GSB_VCPU_GPR2   0x1002
+#define GSB_VCPU_GPR3   0x1003
+#define GSB_VCPU_GPR4   0x1004
+#define GSB_VCPU_GPR5   0x1005
+#define GSB_VCPU_GPR6   0x1006
+#define GSB_VCPU_GPR7   0x1007
+#define GSB_VCPU_GPR8   0x1008
+#define GSB_VCPU_GPR9   0x1009
+#define GSB_VCPU_GPR10  0x100A
+#define GSB_VCPU_GPR11  0x100B
+#define GSB_VCPU_GPR12  0x100C
+#define GSB_VCPU_GPR13  0x100D
+#define GSB_VCPU_GPR14  0x100E
+#define GSB_VCPU_GPR15  0x100F
+#define GSB_VCPU_GPR16  0x1010
+#define GSB_VCPU_GPR17  0x1011
+#define GSB_VCPU_GPR18  0x1012
+#define GSB_VCPU_GPR19  0x1013
+#define GSB_VCPU_GPR20  0x1014
+#define GSB_VCPU_GPR21  0x1015
+#define GSB_VCPU_GPR22  0x1016
+#define GSB_VCPU_GPR23  0x1017
+#define GSB_VCPU_GPR24  0x1018
+#define GSB_VCPU_GPR25  0x1019
+#define GSB_VCPU_GPR26  0x101A
+#define GSB_VCPU_GPR27  0x101B
+#define GSB_VCPU_GPR28  0x101C
+#define GSB_VCPU_GPR29  0x101D
+#define GSB_VCPU_GPR30  0x101E
+#define GSB_VCPU_GPR31  0x101F
+#define GSB_VCPU_HDEC_EXPIRY_TB 0x1020
+#define GSB_VCPU_SPR_NIA0x1021
+#define GSB_VCPU_SPR_MSR0x1022
+#define GSB_VCPU_SPR_LR 0x1023
+#define GSB_VCPU_SPR_XER0x1024
+#define GSB_VCPU_SPR_CTR0x1025
+#define GSB_VCPU_SPR_CFAR   0x1026
+#define GSB_VCPU_SPR_SRR0   0x1027
+#define GSB_VCPU_SPR_SRR1   0x1028
+#define GSB_VCPU_SPR_DAR0x1029
+#define GSB_VCPU_DEC_EXPIRE_TB  0x102A
+#define GSB_VCPU_SPR_VTB0x102B
+#define GSB_VCPU_SPR_LPCR   0x102C
+#define GSB_VCPU_SPR_HFSCR  0x102D
+#define GSB_V

Re: [RFC Patch 1/5] hw/display: Allwinner A10 HDMI controller emulation

2023-09-05 Thread Philippe Mathieu-Daudé


Hi Strahinja,

On 5/9/23 22:14, Strahinja Jankovic wrote:

This patch adds basic Allwinner A10 HDMI controller support.
Emulated HDMI component will always show that a display is connected and
provide default EDID info.

Signed-off-by: Strahinja Jankovic 
---
  hw/arm/allwinner-a10.c  |   7 +
  hw/display/allwinner-a10-hdmi.c | 214 
  hw/display/meson.build  |   2 +
  hw/display/trace-events |   4 +
  include/hw/arm/allwinner-a10.h  |   2 +
  include/hw/display/allwinner-a10-hdmi.h |  69 
  6 files changed, 298 insertions(+)
  create mode 100644 hw/display/allwinner-a10-hdmi.c
  create mode 100644 include/hw/display/allwinner-a10-hdmi.h




diff --git a/hw/display/allwinner-a10-hdmi.c b/hw/display/allwinner-a10-hdmi.c
new file mode 100644
index 00..0f046e3cc7
--- /dev/null
+++ b/hw/display/allwinner-a10-hdmi.c




+#define REG_INDEX(offset)(offset / sizeof(uint32_t))
+
+static uint64_t allwinner_a10_hdmi_read(void *opaque, hwaddr offset,
+   unsigned size)
+{
+AwA10HdmiState *s = AW_A10_HDMI(opaque);
+const uint32_t idx = REG_INDEX(offset);
+uint32_t val = s->regs[idx];
+
+switch (offset) {
+case REG_HPD:
+val = FIELD_HPD_HOTPLUG_DET_HIGH;
+break;




+}
+
+static void allwinner_a10_hdmi_write(void *opaque, hwaddr offset,
+   uint64_t val, unsigned size)
+{
+AwA10HdmiState *s = AW_A10_HDMI(opaque);
+const uint32_t idx = REG_INDEX(offset);
+
+switch (offset) {
+case REG_DDC_CTRL:
+if (val & FIELD_DDC_CTRL_SW_RST) {
+val &= ~FIELD_DDC_CTRL_SW_RST;
+}




+s->regs[idx] = (uint32_t) val;
+}
+
+static const MemoryRegionOps allwinner_a10_hdmi_ops = {
+.read = allwinner_a10_hdmi_read,
+.write = allwinner_a10_hdmi_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 4,
+},
+.impl.min_access_size = 1,


Per REG_INDEX() you have .impl.min/max = 4.

Otherwise your patch LGTM :)


+};

Re: [PATCH v3 4/4] migration/qapi: Drop @MigrationParameter enum

2023-09-05 Thread Philippe Mathieu-Daudé


On 5/9/23 18:23, Peter Xu wrote:

Drop the enum in qapi because it is never used in QMP APIs.  Instead making
it an internal definition for QEMU so that we can decouple it from QAPI,
and also we can deduplicate the QAPI documentations.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Peter Xu 
---
  qapi/migration.json| 179 -
  migration/options.h|  47 +
  migration/migration-hmp-cmds.c |   3 +-
  migration/options.c|  51 ++
  4 files changed, 100 insertions(+), 180 deletions(-)




diff --git a/migration/options.h b/migration/options.h
index 124a5d450f..4591545c62 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -66,6 +66,53 @@ bool migrate_cap_set(int cap, bool value, Error **errp);
  
  /* parameters */
  
+typedef enum {

+MIGRATION_PARAMETER_ANNOUNCE_INITIAL,
+MIGRATION_PARAMETER_ANNOUNCE_MAX,
+MIGRATION_PARAMETER_ANNOUNCE_ROUNDS,
+MIGRATION_PARAMETER_ANNOUNCE_STEP,
+MIGRATION_PARAMETER_COMPRESS_LEVEL,
+MIGRATION_PARAMETER_COMPRESS_THREADS,
+MIGRATION_PARAMETER_DECOMPRESS_THREADS,
+MIGRATION_PARAMETER_COMPRESS_WAIT_THREAD,
+MIGRATION_PARAMETER_THROTTLE_TRIGGER_THRESHOLD,
+MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL,
+MIGRATION_PARAMETER_CPU_THROTTLE_INCREMENT,
+MIGRATION_PARAMETER_CPU_THROTTLE_TAILSLOW,
+MIGRATION_PARAMETER_TLS_CREDS,
+MIGRATION_PARAMETER_TLS_HOSTNAME,
+MIGRATION_PARAMETER_TLS_AUTHZ,
+MIGRATION_PARAMETER_MAX_BANDWIDTH,
+MIGRATION_PARAMETER_DOWNTIME_LIMIT,
+MIGRATION_PARAMETER_X_CHECKPOINT_DELAY,
+MIGRATION_PARAMETER_BLOCK_INCREMENTAL,
+MIGRATION_PARAMETER_MULTIFD_CHANNELS,
+MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE,
+MIGRATION_PARAMETER_MAX_POSTCOPY_BANDWIDTH,
+MIGRATION_PARAMETER_MAX_CPU_THROTTLE,
+MIGRATION_PARAMETER_MULTIFD_COMPRESSION,
+MIGRATION_PARAMETER_MULTIFD_ZLIB_LEVEL,
+MIGRATION_PARAMETER_MULTIFD_ZSTD_LEVEL,
+MIGRATION_PARAMETER_BLOCK_BITMAP_MAPPING,
+MIGRATION_PARAMETER_X_VCPU_DIRTY_LIMIT_PERIOD,
+MIGRATION_PARAMETER_VCPU_DIRTY_LIMIT,
+MIGRATION_PARAMETER__MAX,


MIGRATION_PARAMETER__MAX is not part of the enum, so:

   #define MIGRATION_PARAMETER__MAX \
   (MIGRATION_PARAMETER_VCPU_DIRTY_LIMIT + 1)


+} MigrationParameter;
+
+extern const char *MigrationParameter_string[MIGRATION_PARAMETER__MAX];
+#define  MigrationParameter_str(p)  MigrationParameter_string[p]


Hmm this is only used once by HMP. Following our style I suggest here:

 const char *const MigrationParameter_string(enum MigrationParameter 
param);


And in options.c:

 static const char *const 
MigrationParameter_str[MIGRATION_PARAMETER__MAX] = {

...
 };

 const char *const MigrationParameter_string(enum MigrationParameter param)
 {
 return MigrationParameter_str[param];
 }


+
+/**
+ * @MigrationParameter_from_str(): Parse string into a MigrationParameter
+ *
+ * @param: input string
+ * @errp: error message if failed to parse the string
+ *
+ * Returns MigrationParameter enum (>=0) if succeed, or negative otherwise
+ * which will always setup @errp.
+ */
+int MigrationParameter_from_str(const char *param, Error **errp);
+
  const BitmapMigrationNodeAliasList *migrate_block_bitmap_mapping(void);
  bool migrate_has_block_bitmap_mapping(void);


With the changes:
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 1/7] migration/rdma: Fix save_page method to fail on polling error

2023-09-05 Thread Zhijian Li (Fujitsu)



On 31/08/2023 21:25, Markus Armbruster wrote:
> qemu_rdma_save_page() reports polling error with error_report(), then
> succeeds anyway.  This is because the variable holding the polling
> status *shadows* the variable the function returns.  The latter
> remains zero.
> 
> Broken since day one, and duplicated more recently.
> 
> Fixes: 2da776db4846 (rdma: core logic)
> Fixes: b390afd8c50b (migration/rdma: Fix out of order wrid)

Thanks for the fixes


> Signed-off-by: Markus Armbruster 


Reviewed-by: Li Zhijian 




> ---
>   migration/rdma.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index ca430d319d..b2e869aced 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3281,7 +3281,8 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>*/
>   while (1) {
>   uint64_t wr_id, wr_id_in;
> -int ret = qemu_rdma_poll(rdma, rdma->recv_cq, &wr_id_in, NULL);
> +ret = qemu_rdma_poll(rdma, rdma->recv_cq, &wr_id_in, NULL);
> +
>   if (ret < 0) {
>   error_report("rdma migration: polling error! %d", ret);
>   goto err;
> @@ -3296,7 +3297,8 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>   
>   while (1) {
>   uint64_t wr_id, wr_id_in;
> -int ret = qemu_rdma_poll(rdma, rdma->send_cq, &wr_id_in, NULL);
> +ret = qemu_rdma_poll(rdma, rdma->send_cq, &wr_id_in, NULL);
> +
>   if (ret < 0) {
>   error_report("rdma migration: polling error! %d", ret);
>   goto err;

Re: [QEMU PATCH v4 09/13] virtio-gpu: Handle resource blob commands

2023-09-05 Thread Akihiko Odaki


On 2023/09/06 12:09, Huang Rui wrote:

On Tue, Sep 05, 2023 at 05:20:43PM +0800, Akihiko Odaki wrote:

On 2023/09/05 18:08, Huang Rui wrote:

On Thu, Aug 31, 2023 at 06:24:32PM +0800, Akihiko Odaki wrote:

On 2023/08/31 18:32, Huang Rui wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
---

v1->v2:
   - Remove unused #include "hw/virtio/virtio-iommu.h"

   - Add a local function, called virgl_resource_destroy(), that is used
 to release a vgpu resource on error paths and in resource_unref.

   - Remove virtio_gpu_virgl_resource_unmap from 
virtio_gpu_cleanup_mapping(),
 since this function won't be called on blob resources and also because
 blob resources are unmapped via virgl_cmd_resource_unmap_blob().

   - In virgl_cmd_resource_create_blob(), do proper cleanup in error paths
 and move QTAILQ_INSERT_HEAD(&g->reslist, res, next) after the resource
 has been fully initialized.

   - Memory region has a different life-cycle from virtio gpu resources
 i.e. cannot be released synchronously along with the vgpu resource.
 So, here the field "region" was changed to a pointer that will be
 released automatically once the memory region is unparented and all
 of its references have been released.
 Also, since the pointer can be used to indicate whether the blob
 is mapped, the explicit field "mapped" was removed.

   - In virgl_cmd_resource_map_blob(), add check on the value of
 res->region, to prevent beeing called twice on the same resource.

   - Remove direct references to parent_obj.

   - Separate declarations from code.

hw/display/virtio-gpu-virgl.c  | 213 +
hw/display/virtio-gpu.c|   4 +-
include/hw/virtio/virtio-gpu.h |   5 +
meson.build|   4 +
4 files changed, 225 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 312953ec16..17b634d4ee 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -17,6 +17,7 @@
#include "trace.h"
#include "hw/virtio/virtio.h"
#include "hw/virtio/virtio-gpu.h"
+#include "hw/virtio/virtio-gpu-bswap.h"

#include "ui/egl-helpers.h"

@@ -78,9 +79,24 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,

virgl_renderer_resource_create(&args, NULL, 0);
}

+static void virgl_resource_destroy(VirtIOGPU *g,

+   struct virtio_gpu_simple_resource *res)
+{
+if (!res)
+return;
+
+QTAILQ_REMOVE(&g->reslist, res, next);
+
+virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
+g_free(res->addrs);
+
+g_free(res);
+}
+
static void virgl_cmd_resource_unref(VirtIOGPU *g,
 struct virtio_gpu_ctrl_command *cmd)
{
+struct virtio_gpu_simple_resource *res;
struct virtio_gpu_resource_unref unref;
struct iovec *res_iovs = NULL;
int num_iovs = 0;
@@ -88,13 +104,22 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
VIRTIO_GPU_FILL_CMD(unref);
trace_virtio_gpu_cmd_res_unref(unref.resource_id);

+res = virtio_gpu_find_resource(g, unref.resource_id);

+
virgl_renderer_resource_detach_iov(unref.resource_id,
   &res_iovs,
   &num_iovs);
if (res_iovs != NULL && num_iovs != 0) {
virtio_gpu_cleanup_mapping_iov(g, res_iovs, num_iovs);
+if (res) {
+res->iov = NULL;
+res->iov_cnt = 0;
+}
}
+
virgl_renderer_resource_unref(unref.resource_id);
+
+virgl_resource_destroy(g, res);
}

static void virgl_cmd_context_create(VirtIOGPU *g,

@@ -426,6 +451,183 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
g_free(resp);
}

+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+
+static void virgl_cmd_resource_create_blob(VirtIOGPU *g,
+   struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_simple_resource *res;
+struct virtio_gpu_resource_create_blob cblob;
+struct virgl_renderer_resource_create_blob_args virgl_args = { 0 };
+int ret;
+
+VIRTIO_GPU_FILL_CMD(cblob);
+virtio_gpu_create_blob_bswap(&cblob);
+trace_virtio_gpu_cmd_res_create_blob(cblob.resource_id, cblob.size);
+
+if (cblob.resource_id == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource id 0 is not allowed\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_I

Re: [QEMU PATCH v4 09/13] virtio-gpu: Handle resource blob commands

2023-09-05 Thread Huang Rui

On Tue, Sep 05, 2023 at 05:20:43PM +0800, Akihiko Odaki wrote:
> On 2023/09/05 18:08, Huang Rui wrote:
> > On Thu, Aug 31, 2023 at 06:24:32PM +0800, Akihiko Odaki wrote:
> >> On 2023/08/31 18:32, Huang Rui wrote:
> >>> From: Antonio Caggiano 
> >>>
> >>> Support BLOB resources creation, mapping and unmapping by calling the
> >>> new stable virglrenderer 0.10 interface. Only enabled when available and
> >>> via the blob config. E.g. -device virtio-vga-gl,blob=true
> >>>
> >>> Signed-off-by: Antonio Caggiano 
> >>> Signed-off-by: Dmitry Osipenko 
> >>> Signed-off-by: Xenia Ragiadakou 
> >>> Signed-off-by: Huang Rui 
> >>> ---
> >>>
> >>> v1->v2:
> >>>   - Remove unused #include "hw/virtio/virtio-iommu.h"
> >>>
> >>>   - Add a local function, called virgl_resource_destroy(), that is 
> >>> used
> >>> to release a vgpu resource on error paths and in resource_unref.
> >>>
> >>>   - Remove virtio_gpu_virgl_resource_unmap from 
> >>> virtio_gpu_cleanup_mapping(),
> >>> since this function won't be called on blob resources and also 
> >>> because
> >>> blob resources are unmapped via virgl_cmd_resource_unmap_blob().
> >>>
> >>>   - In virgl_cmd_resource_create_blob(), do proper cleanup in error 
> >>> paths
> >>> and move QTAILQ_INSERT_HEAD(&g->reslist, res, next) after the 
> >>> resource
> >>> has been fully initialized.
> >>>
> >>>   - Memory region has a different life-cycle from virtio gpu resources
> >>> i.e. cannot be released synchronously along with the vgpu 
> >>> resource.
> >>> So, here the field "region" was changed to a pointer that will be
> >>> released automatically once the memory region is unparented and 
> >>> all
> >>> of its references have been released.
> >>> Also, since the pointer can be used to indicate whether the blob
> >>> is mapped, the explicit field "mapped" was removed.
> >>>
> >>>   - In virgl_cmd_resource_map_blob(), add check on the value of
> >>> res->region, to prevent beeing called twice on the same resource.
> >>>
> >>>   - Remove direct references to parent_obj.
> >>>
> >>>   - Separate declarations from code.
> >>>
> >>>hw/display/virtio-gpu-virgl.c  | 213 +
> >>>hw/display/virtio-gpu.c|   4 +-
> >>>include/hw/virtio/virtio-gpu.h |   5 +
> >>>meson.build|   4 +
> >>>4 files changed, 225 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
> >>> index 312953ec16..17b634d4ee 100644
> >>> --- a/hw/display/virtio-gpu-virgl.c
> >>> +++ b/hw/display/virtio-gpu-virgl.c
> >>> @@ -17,6 +17,7 @@
> >>>#include "trace.h"
> >>>#include "hw/virtio/virtio.h"
> >>>#include "hw/virtio/virtio-gpu.h"
> >>> +#include "hw/virtio/virtio-gpu-bswap.h"
> >>>
> >>>#include "ui/egl-helpers.h"
> >>>
> >>> @@ -78,9 +79,24 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
> >>>virgl_renderer_resource_create(&args, NULL, 0);
> >>>}
> >>>
> >>> +static void virgl_resource_destroy(VirtIOGPU *g,
> >>> +   struct virtio_gpu_simple_resource 
> >>> *res)
> >>> +{
> >>> +if (!res)
> >>> +return;
> >>> +
> >>> +QTAILQ_REMOVE(&g->reslist, res, next);
> >>> +
> >>> +virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
> >>> +g_free(res->addrs);
> >>> +
> >>> +g_free(res);
> >>> +}
> >>> +
> >>>static void virgl_cmd_resource_unref(VirtIOGPU *g,
> >>> struct virtio_gpu_ctrl_command 
> >>> *cmd)
> >>>{
> >>> +struct virtio_gpu_simple_resource *res;
> >>>struct virtio_gpu_resource_unref unref;
> >>>struct iovec *res_iovs = NULL;
> >>>int num_iovs = 0;
> >>> @@ -88,13 +104,22 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
> >>>VIRTIO_GPU_FILL_CMD(unref);
> >>>trace_virtio_gpu_cmd_res_unref(unref.resource_id);
> >>>
> >>> +res = virtio_gpu_find_resource(g, unref.resource_id);
> >>> +
> >>>virgl_renderer_resource_detach_iov(unref.resource_id,
> >>>   &res_iovs,
> >>>   &num_iovs);
> >>>if (res_iovs != NULL && num_iovs != 0) {
> >>>virtio_gpu_cleanup_mapping_iov(g, res_iovs, num_iovs);
> >>> +if (res) {
> >>> +res->iov = NULL;
> >>> +res->iov_cnt = 0;
> >>> +}
> >>>}
> >>> +
> >>>virgl_renderer_resource_unref(unref.resource_id);
> >>> +
> >>> +virgl_resource_destroy(g, res);
> >>>}
> >>>
> >>>static void virgl_cmd_context_create(VirtIOGPU *g,
> >>> @@ -426,6 +451,183 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
> >>>g_free(resp);
> >>>}
> >>>
> >>> +#ifdef HAVE_VIRGL_RESOURCE_BLOB
> >>> +
> >>> +static void virgl_cmd_resource_create_blob(Vi

Re: [PATCH] hw/loongarch: Add virtio-mmio bus support

2023-09-05 Thread bibo mao




在 2023/9/6 10:50, Tianrui Zhao 写道:
> Add virtio-mmio bus support for LoongArch, so that devices
> could be added in the virtio-mmio bus.
> 
> Signed-off-by: Tianrui Zhao 
> ---
>  hw/loongarch/Kconfig | 1 +
>  hw/loongarch/virt.c  | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/hw/loongarch/Kconfig b/hw/loongarch/Kconfig
> index 1e7c5b43c5..01ab8ce8e7 100644
> --- a/hw/loongarch/Kconfig
> +++ b/hw/loongarch/Kconfig
> @@ -22,3 +22,4 @@ config LOONGARCH_VIRT
>  select DIMM
>  select PFLASH_CFI01
>  select ACPI_HMAT
> +select VIRTIO_MMIO
> diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
> index 2629128aed..06f4bc3a5e 100644
> --- a/hw/loongarch/virt.c
> +++ b/hw/loongarch/virt.c
> @@ -560,6 +560,9 @@ static void loongarch_devices_init(DeviceState *pch_pic, 
> LoongArchMachineState *
>   VIRT_RTC_IRQ - VIRT_GSI_BASE));
>  fdt_add_rtc_node(lams);
>  
> +/* virtio-mmio device */
> +sysbus_create_simple("virtio-mmio", 0x1e20, 
> qdev_get_gpio_in(pch_pic, 7));
It had better use macro rather than hardcoded value like 0x1e20/7.

Another way multiple virtio-mmio devices should be added like other arches.
And there should be fdt/acpi table added for the device so that users can use 
it.

Regards
Bibo Mao
> +
>  pm_mem = g_new(MemoryRegion, 1);
>  memory_region_init_io(pm_mem, NULL, &loongarch_virt_pm_ops,
>NULL, "loongarch_virt_pm", PM_SIZE);

[PATCH] hw/loongarch: Add virtio-mmio bus support

2023-09-05 Thread Tianrui Zhao

Add virtio-mmio bus support for LoongArch, so that devices
could be added in the virtio-mmio bus.

Signed-off-by: Tianrui Zhao 
---
 hw/loongarch/Kconfig | 1 +
 hw/loongarch/virt.c  | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/hw/loongarch/Kconfig b/hw/loongarch/Kconfig
index 1e7c5b43c5..01ab8ce8e7 100644
--- a/hw/loongarch/Kconfig
+++ b/hw/loongarch/Kconfig
@@ -22,3 +22,4 @@ config LOONGARCH_VIRT
 select DIMM
 select PFLASH_CFI01
 select ACPI_HMAT
+select VIRTIO_MMIO
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 2629128aed..06f4bc3a5e 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -560,6 +560,9 @@ static void loongarch_devices_init(DeviceState *pch_pic, 
LoongArchMachineState *
  VIRT_RTC_IRQ - VIRT_GSI_BASE));
 fdt_add_rtc_node(lams);
 
+/* virtio-mmio device */
+sysbus_create_simple("virtio-mmio", 0x1e20, qdev_get_gpio_in(pch_pic, 
7));
+
 pm_mem = g_new(MemoryRegion, 1);
 memory_region_init_io(pm_mem, NULL, &loongarch_virt_pm_ops,
   NULL, "loongarch_virt_pm", PM_SIZE);
-- 
2.39.1

Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth

2023-09-05 Thread Wang, Lei

On 9/6/2023 0:46, Peter Xu wrote:
> On Fri, Sep 01, 2023 at 09:37:32AM +0100, Daniel P. Berrangé wrote:
 When the user wants to have migration only use 5Gbps out of that 10Gbps,
 one can set max-bandwidth to 5Gbps, along with max-switchover-bandwidth to
 5Gbps so it'll never use over 5Gbps too (so the user can have the rest
>>>
>>> Hi Peter. I'm curious if we specify max-switchover-bandwidth to 5Gbps over a
>>> 10Gbps network, in the completion stage will it send the remaining data in 
>>> 5Gbps
>>> using downtime_limit time or in 10Gbps (saturate the network) using the
>>> downtime_limit / 2 time? Seems this parameter won't rate limit the final 
>>> stage:)
>>
>> Effectively the mgmt app is telling QEMU to assume that this
>> much bandwidth is available for use during switchover. If QEMU
>> determines that, given this available bandwidth, the remaining
>> data can be sent over the link within the downtime limit, it
>> will perform the switchover. When sending this sitchover data,
>> it will actually transmit the data at full line rate IIUC.
> 
> I'm right at reposting this patch, but then I found that the
> max-available-bandwidth is indeed confusing (as Lei's question shows).
> 
> We do have all the bandwidth throttling values in the pattern of
> max-*-bandwidth and this one will start to be the outlier that it won't
> really throttle the network.
> 
> If the old name "available-bandwidth" is too general, I'm now considering
> "avail-switchover-bandwidth" just to leave max- out of the name to
> differenciate, if some day we want to add a real throttle for switchover we
> can still have a sane name.
> 
> Any objections before I repost?

I'm also OK with it. "avail" has semantics that we have a lower bound of the
bandwidth when switchover so we can promise at least those amount of bandwidth
can be used, so it can cover both the throttling and non-throuttling case.
"switchover" means this parameter only works in the switchover phase rather than
the bulk stage.

> 
> Thanks,
>

Re: [PATCH v13 0/9] rutabaga_gfx + gfxstream

2023-09-05 Thread Gurchetan Singh

On Wed, Aug 30, 2023 at 7:26 PM Huang Rui  wrote:

> On Tue, Aug 29, 2023 at 08:36:20AM +0800, Gurchetan Singh wrote:
> > From: Gurchetan Singh 
> >
> > Changes since v12:
> > - Added r-b tags from Antonio Caggiano and Akihiko Odaki
> > - Removed review version from commit messages
> > - I think we're good to merge since we've had multiple people test and
> review this series??
> >
> > How to build both rutabaga and gfxstream guest/host libs:
> >
> > https://crosvm.dev/book/appendix/rutabaga_gfx.html
> >
> > Branch containing this patch series:
> >
> > https://gitlab.com/gurchetansingh/qemu/-/commits/qemu-gfxstream-v13
> >
> > Antonio Caggiano (2):
> >   virtio-gpu: CONTEXT_INIT feature
> >   virtio-gpu: blob prep
> >
> > Dr. David Alan Gilbert (1):
> >   virtio: Add shared memory capability
> >
> > Gerd Hoffmann (1):
> >   virtio-gpu: hostmem
>
> Patch 1 -> 4 are
>
> Acked-and-Tested-by: Huang Rui 
>

Thanks Ray, I've rebased
https://gitlab.com/gurchetansingh/qemu/-/commits/qemu-gfxstream-v13 and
added the additional acks in the commit message.

UI/gfx maintainers, since everything is reviewed and there hasn't been any
additional review comments, may we merge the gfxstream + rutabaga_gfx
series?  Thank you!




>
> >
> > Gurchetan Singh (5):
> >   gfxstream + rutabaga prep: added need defintions, fields, and options
> >   gfxstream + rutabaga: add initial support for gfxstream
> >   gfxstream + rutabaga: meson support
> >   gfxstream + rutabaga: enable rutabaga
> >   docs/system: add basic virtio-gpu documentation
> >
> >  docs/system/device-emulation.rst |1 +
> >  docs/system/devices/virtio-gpu.rst   |  112 +++
> >  hw/display/meson.build   |   22 +
> >  hw/display/virtio-gpu-base.c |6 +-
> >  hw/display/virtio-gpu-pci-rutabaga.c |   47 ++
> >  hw/display/virtio-gpu-pci.c  |   14 +
> >  hw/display/virtio-gpu-rutabaga.c | 1119 ++
> >  hw/display/virtio-gpu.c  |   16 +-
> >  hw/display/virtio-vga-rutabaga.c |   50 ++
> >  hw/display/virtio-vga.c  |   33 +-
> >  hw/virtio/virtio-pci.c   |   18 +
> >  include/hw/virtio/virtio-gpu-bswap.h |   15 +
> >  include/hw/virtio/virtio-gpu.h   |   41 +
> >  include/hw/virtio/virtio-pci.h   |4 +
> >  meson.build  |7 +
> >  meson_options.txt|2 +
> >  scripts/meson-buildoptions.sh|3 +
> >  softmmu/qdev-monitor.c   |3 +
> >  softmmu/vl.c |1 +
> >  19 files changed, 1495 insertions(+), 19 deletions(-)
> >  create mode 100644 docs/system/devices/virtio-gpu.rst
> >  create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c
> >  create mode 100644 hw/display/virtio-gpu-rutabaga.c
> >  create mode 100644 hw/display/virtio-vga-rutabaga.c
> >
> > --
> > 2.42.0.rc2.253.gd59a3bf2b4-goog
> >
>

Re: [PATCH v3] iothread: Set the GSource "name" field

2023-09-05 Thread Stefan Hajnoczi

On Tue, 5 Sept 2023 at 16:57, Stefan Hajnoczi  wrote:
>
> On Tue, Sep 05, 2023 at 03:03:59PM -0300, Fabiano Rosas wrote:
> > Having a name in the source helps with debugging core dumps when one
> > might not have access to TLS data to cross-reference AioContexts with
> > their addresses.
> >
> > Signed-off-by: Fabiano Rosas 
> > ---
> > v3:
> > used const
> > v2:
> > used g_autofree where appropriate
> > ---
> >  iothread.c | 14 --
> >  1 file changed, 8 insertions(+), 6 deletions(-)
>
> Thanks, applied to my monitor-drain_call_rcu tree:
> https://gitlab.com/stefanha/qemu/commits/monitor-drain_call_rcu

That should have been https://gitlab.com/stefanha/qemu/-/commits/block. :)

Stefan

Re: [PATCH v3] iothread: Set the GSource "name" field

2023-09-05 Thread Stefan Hajnoczi

On Tue, Sep 05, 2023 at 03:03:59PM -0300, Fabiano Rosas wrote:
> Having a name in the source helps with debugging core dumps when one
> might not have access to TLS data to cross-reference AioContexts with
> their addresses.
> 
> Signed-off-by: Fabiano Rosas 
> ---
> v3:
> used const
> v2:
> used g_autofree where appropriate
> ---
>  iothread.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)

Thanks, applied to my monitor-drain_call_rcu tree:
https://gitlab.com/stefanha/qemu/commits/monitor-drain_call_rcu

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] ahci: enable pci bus master MemoryRegion before loading ahci engines

2023-09-05 Thread Michael S. Tsirkin

On Tue, Sep 10, 2019 at 10:08:20AM -0400, John Snow wrote:
> 
> 
> On 9/10/19 9:58 AM, Michael S. Tsirkin wrote:
> > On Tue, Sep 10, 2019 at 09:50:41AM -0400, John Snow wrote:
> >>
> >>
> >> On 9/10/19 3:04 AM, Michael S. Tsirkin wrote:
> >>> On Tue, Sep 10, 2019 at 01:18:37AM +0800, andychiu wrote:
>  If Windows 10 guests have enabled 'turn off hard disk after idle'
>  option in power settings, and the guest has a SATA disk plugged in,
>  the SATA disk will be turned off after a specified idle time.
>  If the guest is live migrated or saved/loaded with its SATA disk
>  turned off, the following error will occur:
> 
>  qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS 
>  receive buffer address
>  qemu-system-x86_64: Failed to load ich9_ahci:ahci
>  qemu-system-x86_64: error while loading state for instance 0x0 of device 
>  ':00:1a.0/ich9_ahci'
>  qemu-system-x86_64: load of migration failed: Operation not permitted
> 
>  Observation from trace logs shows that a while after Windows 10 turns off
>  a SATA disk (IDE disks don't have the following behavior),
>  it will disable the PCI_COMMAND_MASTER flag of the pci device containing
>  the ahci device. When the the disk is turning back on,
>  the PCI_COMMAND_MASTER flag will be restored first.
>  But if the guest is migrated or saved/loaded while the disk is off,
>  the post_load callback of ahci device, ahci_state_post_load(), will fail
>  at ahci_cond_start_engines() if the MemoryRegion
>  pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing
>  to the PCIDevice struct containing the ahci device.
> 
>  This patch enables pci_dev->bus_master_enable_region before calling
>  ahci_cond_start_engines() in ahci_state_post_load(), and restore the
>  MemoryRegion to its original state afterwards.
> 
>  Signed-off-by: andychiu 
> >>>
> >>> Poking at PCI device internals like this seems fragile.  And force
> >>> enabling bus master can lead to unpleasantness like corrupting guest
> >>> memory, unhandled interrupts, etc.  E.g. it's quite reasonable,
> >>> spec-wise, for the guest to move thing in memory around while bus
> >>> mastering is off.
> >>>
> >>> Can you teach ahci that region being disabled
> >>> during migration is ok, and recover from it?
> >>
> >> That's what I'm wondering.
> >>
> >> I could try to just disable the FIS RX engine if the mapping fails, but
> >> that will require a change to guest visible state.
> >>
> >> My hunch, though, is that when windows re-enables the device it will
> >> need to re-program the address registers anyway, so it might cope well
> >> with the FIS RX bit getting switched off.
> >>
> >> (I'm wondering if it isn't a mistake that QEMU is trying to re-map this
> >> address in the first place. Is it legal that the PCI device has pci bus
> >> master disabled but we've held on to a mapping?
> > 
> > If you are poking at guest memory when bus master is off, then most likely 
> > yes.
> > 
> >> Should there be some
> >> callback where AHCI knows to invalidate mappings at that point...?)
> > 
> > ATM the callback is the config write, you check
> > proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER
> > and if disabled invalidate the mapping.
> > 
> > virtio at least has code that pokes at
> > proxy->pci_dev.config[PCI_COMMAND] too, I'm quite
> > open to a function along the lines of
> > pci_is_bus_master_enabled()
> > that will do this.
> > 
> 
> Well, that's not a callback. I don't think it's right to check the
> PCI_COMMAND register *every* time AHCI does anything at all to see if
> its mappings are still valid.
> 
> AHCI makes a mapping *once* when FIS RX is turned on, and it unmaps it
> when it's turned off. It assumes it remains valid that whole time. When
> we migrate, it checks to see if it was running, and performs the
> mappings again to re-boot the state machine.
> 
> What I'm asking is; what are the implications of a guest disabling
> PCI_COMMAND_MASTER? (I don't know PCI as well as you do.)

The implication is that no reads or writes must be initiated by device:
either memory or IO reads, or sending MSI. INT#x is unaffected.
writes into device memory are unaffected. whether reads from
device memory are affected kind of depends, but maybe not.

Whether device caches anything internally has nothing to do
with PCI_COMMAND_MASTER and PCI spec says nothing about it.
Windows uses PCI_COMMAND_MASTER to emulate surprise removal
so there's that.


> What should that mean for the AHCI state machine?
> 
> Does this *necessarily* invalidate the mappings?
> (In which case -- it's an error that AHCI held on to them after Windows
> disabled the card, even if AHCI isn't being engaged by the guest
> anymore. Essentially, we were turned off but didn't clean up a dangling
> pointer, but we need the event that tells us to clean the dangling mapping.)

It does not have to but it mus

Re: [PATCH v5 0/4] Virtio shared dma-buf

2023-09-05 Thread Michael S. Tsirkin

I was hoping for some acks from Gerd or anyone else with a clue
about graphics, but as that doesn't seem to happen I'll merge.
Thanks!

On Mon, Aug 21, 2023 at 02:37:56PM +0200, Albert Esteve wrote:
> Hi all,
> 
> A little bump for this patch, sorry for the extra noise.
> 
> Regards,
> Albert
> 
> 
> On Wed, Aug 2, 2023 at 11:08 AM Albert Esteve  wrote:
> 
> v1 link -> https://lists.gnu.org/archive/html/qemu-devel/2023-05/
> msg00598.html
> v2 link -> https://lists.gnu.org/archive/html/qemu-devel/2023-05/
> msg04530.html
> v3 link -> https://lists.gnu.org/archive/html/qemu-devel/2023-05/
> msg06126.html
> v4 link -> https://lists.gnu.org/archive/html/qemu-devel/2023-06/
> msg05174.html
> v4 -> v5:
> - Allow shared table to hold pointers for vhost devices, in a struct that
> defines the types that the table can store
> - New message VHOST_USER_GET_SHARED_OBJECT to retrieve objects stored in
> vhost backends
> - Minor additions to support the previous items (e.g. new test usecases).
> 
> This patch covers the required steps to add support for virtio 
> cross-device
> resource sharing[1],
> which support is already available in the kernel.
> 
> The main usecase will be sharing dma buffers from virtio-gpu devices (as
> the exporter
> -see VIRTIO_GPU_CMD_RESOURCE_ASSIGN_UUID in [2]), to virtio-video (under
> discussion)
> devices (as the buffer-user or importer). Therefore, even though virtio
> specs talk about
> resources or objects[3], this patch adds the infrastructure with dma-bufs
> in mind.
> Note that virtio specs let the devices themselves define what a vitio
> object is.
> 
> These are the main parts that are covered in the patch:
> 
> - Add hash function to uuid module
> - Shared resources table, to hold all resources that can be shared in the
> host and their assigned UUID,
>   or pointers to the backend holding the resource
> - Internal shared table API for virtio devices to add, lookup and remove
> resources
> - Unit test to verify the API
> - New messages to the vhost-user protocol to allow backend to interact 
> with
> the shared
>   table API through the control socket
> - New vhost-user feature bit to enable shared objects feature
> 
> Applies cleanly to 38a6de80b917b2a822cff0e38d83563ab401c890
> 
> [1] - https://lwn.net/Articles/828988/
> [2] - https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/
> virtio-v1.2-csd01.html#x1-3730006
> [3] - https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/
> virtio-v1.2-csd01.html#x1-10500011
> 
> Albert Esteve (4):
>   uuid: add a hash function
>   virtio-dmabuf: introduce virtio-dmabuf
>   vhost-user: add shared_object msg
>   vhost-user: refactor send_resp code
> 
>  MAINTAINERS                               |   7 +
>  docs/interop/vhost-user.rst               |  57 +++
>  hw/display/meson.build                    |   1 +
>  hw/display/virtio-dmabuf.c                | 136 +
>  hw/virtio/vhost-user.c                    | 174 --
>  include/hw/virtio/vhost-backend.h         |   3 +
>  include/hw/virtio/virtio-dmabuf.h         | 103 +
>  include/qemu/uuid.h                       |   2 +
>  subprojects/libvhost-user/libvhost-user.c | 118 +++
>  subprojects/libvhost-user/libvhost-user.h |  55 ++-
>  tests/unit/meson.build                    |   1 +
>  tests/unit/test-uuid.c                    |  27 
>  tests/unit/test-virtio-dmabuf.c           | 137 +
>  util/uuid.c                               |  14 ++
>  14 files changed, 821 insertions(+), 14 deletions(-)
>  create mode 100644 hw/display/virtio-dmabuf.c
>  create mode 100644 include/hw/virtio/virtio-dmabuf.h
>  create mode 100644 tests/unit/test-virtio-dmabuf.c
> 
> --
> 2.40.0
> 
>

Re: PCI Hotplug ACPI device names only 3 characters long

2023-09-05 Thread Michael S. Tsirkin

On Tue, Sep 05, 2023 at 07:45:12PM +0200, Marcello Sylverster Bauer wrote:
> Hi Michael,
> 
> On 9/5/23 18:44, Michael S. Tsirkin wrote:
> > On Tue, Sep 05, 2023 at 05:05:33PM +0200, Marcello Sylverster Bauer wrote:
> > > Greetings,
> > > 
> > > I'm currently working on a project to support Intel IPU6 in QEMU via VFIO 
> > > so
> > > that the guest system can access the camera. This requires extending the
> > > ACPI device definition so that the guest knows how to access the camera.
> > > 
> > > However, I cannot extend the PCI devices because their names are not 4
> > > characters long and therefore do not follow the ACPI specification.
> > > 
> > > When I use '-acpitable' to include my own SSDT for the IPU6 PCI device, it
> > > does not allow me to declare the device as an External Object because it
> > > automatically adds padding underscores.
> > > 
> > > e.g.
> > > Before:
> > > ```
> > > External(_SB.PCI0.S18.SA0, DeviceObj)
> > > ```
> > > After:
> > > ```
> > > External(_SB.PCI0.S18_.SA0_, DeviceObj)
> > > ```
> > > 
> > > Adding the underscore padding is hard coded in iASL and also in QEMU when
> > > parsing an ASL file. (see: build_append_nameseg())
> > > 
> > > So here are my questions:
> > > 1. Is there a solution to extend the ACPI PCI device using '-acpitable'
> > > without having to patch iASL or QEMU?
> > > 2. Are there any plans to change the names to comply with the ACPI spec?
> > > (e.g. use "S%.03X" format string instead)
> > > 
> > > Thanks
> > > Marcello
> > 
> > 
> > 1.  All names in ACPI are always exactly 4 characters long. _ is a legal 
> > character
> >  but names beginning with _ are reserved.
> 
> Exactly, which is why I want to address this issue here. Currently, Qemu
> generates ACPI device names with only 3 characters. (See
> build_append_pci_bus_devices() in hw/i386/acpi-build.c).
> For example, the device I want to append entries to has the path
> "_SB.PCI0.S18.SA0", but I can't because of the two auto-generated devices
> with only 3 characters in their names.

They are 4 characters otherwise OSPMs wouldn't work.
In your example the path is _SB.PCI0.S18_.SA0_ - you disassembler probably
just helpfully hides it for readability.

> > There's no rule in ACPI
> >  spec that says they need to follow S%.03X or any other specific format.
> >  I'm pretty sure we do follow the ACPI specification in this but feel 
> > free to
> >  prove me wrong.
> 
> You have misunderstood me. Currently, Qemu uses the following format to
> create PCI ACPI devices:
> 
> ```
> aml_name("S%.02X", devfn)
> ```
> 
> My question is whether we should change it to something that results in a 4
> character name like "S%.03X" or "S%.02X_".

I think you misunderstand the code. Look at build_append_nameseg and you will
see that the name is always ACPI_NAMESEG_LEN characters which equals 4.

> I have tested it and it works fine as long as any hardcoded path references
> are adjusted. But I'm not 100% sure if this could cause any regressions.
> 
> > 2.  You can probably add something to existing ACPI devices using Scope().
> 
> I'm pretty sure the external object is required when loading a separate
> SSDT, but I'll try by just using scopes.
> 
> >  I would not advise relying on this - current names are not a stable
> >  interface that we guarantee across QEMU versions.
> >  If adding this functionality is desirable, I think we'll need some new 
> > interface
> >  to set a stable ACPI name. Maybe using aliases.
> 
> Currently I'm just working on a PoW to get IPU6 working in QEMU, so
> instability is fine.
> 
> Thanks,
> Marcello
> 
> > 
> >

[RFC Patch 4/5] hw/display: Allwinner A10 LCDC emulation

2023-09-05 Thread Strahinja Jankovic

This patch adds support for Allwinner A10 LCD controller.
Current emulation supports only RGB32 colorspace and interacts with
DEBE0 to obtain framebuffer address and screen size.

Signed-off-by: Strahinja Jankovic 
---
 hw/arm/allwinner-a10.c  |  10 +
 hw/display/allwinner-a10-lcdc.c | 275 
 hw/display/meson.build  |   1 +
 hw/display/trace-events |   5 +
 include/hw/arm/allwinner-a10.h  |   2 +
 include/hw/display/allwinner-a10-lcdc.h |  77 +++
 6 files changed, 370 insertions(+)
 create mode 100644 hw/display/allwinner-a10-lcdc.c
 create mode 100644 include/hw/display/allwinner-a10-lcdc.h

diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
index 624e95af46..f93bc5266d 100644
--- a/hw/arm/allwinner-a10.c
+++ b/hw/arm/allwinner-a10.c
@@ -41,6 +41,7 @@
 #define AW_A10_WDT_BASE 0x01c20c90
 #define AW_A10_RTC_BASE 0x01c20d00
 #define AW_A10_I2C0_BASE0x01c2ac00
+#define AW_A10_LCDC0_BASE   0x01c0c000
 #define AW_A10_HDMI_BASE0x01c16000
 #define AW_A10_GPU_BASE 0x01c4
 #define AW_A10_DE_BE0_BASE  0x01e6
@@ -101,6 +102,8 @@ static void aw_a10_init(Object *obj)
 
 object_initialize_child(obj, "hdmi", &s->hdmi, TYPE_AW_A10_HDMI);
 
+object_initialize_child(obj, "lcd0", &s->lcd0, TYPE_AW_A10_LCDC);
+
 object_initialize_child(obj, "de_be0", &s->de_be0, TYPE_AW_A10_DEBE);
 
 object_initialize_child(obj, "mali400", &s->gpu, TYPE_AW_GPU);
@@ -230,6 +233,13 @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
 sysbus_realize(SYS_BUS_DEVICE(&s->de_be0), &error_fatal);
 sysbus_mmio_map(SYS_BUS_DEVICE(&s->de_be0), 0, AW_A10_DE_BE0_BASE);
 
+/* LCD Controller */
+object_property_set_link(OBJECT(&s->lcd0), "debe",
+ OBJECT(&s->de_be0), &error_fatal);
+sysbus_realize(SYS_BUS_DEVICE(&s->lcd0), &error_fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->lcd0), 0, AW_A10_LCDC0_BASE);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->lcd0), 0, qdev_get_gpio_in(dev, 44));
+
 /* MALI GPU */
 sysbus_realize(SYS_BUS_DEVICE(&s->gpu), &error_fatal);
 sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpu), 0, AW_A10_GPU_BASE);
diff --git a/hw/display/allwinner-a10-lcdc.c b/hw/display/allwinner-a10-lcdc.c
new file mode 100644
index 00..8367ac32be
--- /dev/null
+++ b/hw/display/allwinner-a10-lcdc.c
@@ -0,0 +1,275 @@
+/*
+ * Allwinner A10 LCD Control Module emulation
+ *
+ * Copyright (C) 2023 Strahinja Jankovic 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "qemu/module.h"
+#include "hw/display/allwinner-a10-lcdc.h"
+#include "hw/irq.h"
+#include "ui/pixel_ops.h"
+#include "trace.h"
+#include "sysemu/dma.h"
+#include "framebuffer.h"
+
+/* LCDC register offsets */
+enum {
+REG_TCON_GCTL   = 0x, /* TCON Global control register */
+REG_TCON_GINT0  = 0x0004, /* TCON Global interrupt register 0 */
+};
+
+/* TCON_GCTL register fields */
+#define REG_TCON_GCTL_EN(1 << 31)
+
+/* TCON_GINT0 register fields */
+#define REG_TCON_GINT0_VB_INT_EN(1 << 31)
+#define REG_TCON_GINT0_VB_INT_FLAG  (1 << 14)
+
+#define REG_INDEX(offset)(offset / sizeof(uint32_t))
+
+static void allwinner_a10_lcdc_tick(void *opaque)
+{
+AwA10LcdcState *s = AW_A10_LCDC(opaque);
+
+if (s->regs[REG_INDEX(REG_TCON_GINT0)] & REG_TCON_GINT0_VB_INT_EN) {
+s->regs[REG_INDEX(REG_TCON_GINT0)] |= REG_TCON_GINT0_VB_INT_FLAG;
+qemu_irq_raise(s->irq);
+}
+}
+
+static uint64_t allwinner_a10_lcdc_read(void *opaque, hwaddr offset,
+   unsigned size)
+{
+AwA10LcdcState *s = AW_A10_LCDC(opaque);
+const uint32_t idx = REG_INDEX(offset);
+uint32_t val = s->regs[idx];
+
+switch (offset) {
+case 0x800 ... AW_A10_LCDC_IOSIZE:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n",
+  __func__, (uint32_t)offset);
+return 0;
+default:
+break;
+}
+
+trace_allwinner_a10_lcdc_read(offset, val);
+
+return val;
+}
+
+static void allwinner_a10_lcdc_write(void *opaque, hwaddr offset,
+   uint

[RFC Patch 5/5] hw/input: Add Allwinner-A10 PS2 emulation

2023-09-05 Thread Strahinja Jankovic

Add emulation for PS2-0 and PS2-1 for keyboard/mouse.

Signed-off-by: Strahinja Jankovic 
---
 hw/arm/allwinner-a10.c   |  18 ++
 hw/input/allwinner-a10-ps2.c | 345 +++
 hw/input/meson.build |   2 +
 include/hw/arm/allwinner-a10.h   |   3 +
 include/hw/input/allwinner-a10-ps2.h |  96 
 5 files changed, 464 insertions(+)
 create mode 100644 hw/input/allwinner-a10-ps2.c
 create mode 100644 include/hw/input/allwinner-a10-ps2.h

diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
index f93bc5266d..3d25dbb4e3 100644
--- a/hw/arm/allwinner-a10.c
+++ b/hw/arm/allwinner-a10.c
@@ -40,6 +40,8 @@
 #define AW_A10_SATA_BASE0x01c18000
 #define AW_A10_WDT_BASE 0x01c20c90
 #define AW_A10_RTC_BASE 0x01c20d00
+#define AW_A10_PS2_0_BASE   0x01c2a000
+#define AW_A10_PS2_1_BASE   0x01c2a400
 #define AW_A10_I2C0_BASE0x01c2ac00
 #define AW_A10_LCDC0_BASE   0x01c0c000
 #define AW_A10_HDMI_BASE0x01c16000
@@ -107,6 +109,12 @@ static void aw_a10_init(Object *obj)
 object_initialize_child(obj, "de_be0", &s->de_be0, TYPE_AW_A10_DEBE);
 
 object_initialize_child(obj, "mali400", &s->gpu, TYPE_AW_GPU);
+
+object_initialize_child(obj, "keyboard", &s->kbd,
+TYPE_AW_A10_PS2_KBD_DEVICE);
+
+object_initialize_child(obj, "mouse", &s->mouse,
+TYPE_AW_A10_PS2_MOUSE_DEVICE);
 }
 
 static void aw_a10_realize(DeviceState *dev, Error **errp)
@@ -243,6 +251,16 @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
 /* MALI GPU */
 sysbus_realize(SYS_BUS_DEVICE(&s->gpu), &error_fatal);
 sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpu), 0, AW_A10_GPU_BASE);
+
+/* PS2-0 - keyboard */
+sysbus_realize(SYS_BUS_DEVICE(&s->kbd), &error_fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->kbd), 0, AW_A10_PS2_0_BASE);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->kbd), 0, qdev_get_gpio_in(dev, 62));
+
+/* PS2-1 - mouse */
+sysbus_realize(SYS_BUS_DEVICE(&s->mouse), &error_fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->mouse), 0, AW_A10_PS2_1_BASE);
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->mouse), 0, qdev_get_gpio_in(dev, 
63));
 }
 
 static void aw_a10_class_init(ObjectClass *oc, void *data)
diff --git a/hw/input/allwinner-a10-ps2.c b/hw/input/allwinner-a10-ps2.c
new file mode 100644
index 00..c4b09c0ea3
--- /dev/null
+++ b/hw/input/allwinner-a10-ps2.c
@@ -0,0 +1,345 @@
+/*
+ * Allwinner A10 PS2 Module emulation
+ *
+ * Copyright (C) 2023 Strahinja Jankovic 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/input/allwinner-a10-ps2.h"
+#include "hw/input/ps2.h"
+#include "hw/irq.h"
+
+/* PS2 register offsets */
+enum {
+REG_GCTL= 0x, /* Global Control Reg */
+REG_DATA= 0x0004, /* Data Reg */
+REG_LCTL= 0x0008, /* Line Control Reg */
+REG_LSTS= 0x000C, /* Line Status Reg */
+REG_FCTL= 0x0010, /* FIFO Control Reg */
+REG_FSTS= 0x0014, /* FIFO Status Reg */
+REG_CLKDR   = 0x0018, /* Clock Divider Reg */
+};
+
+#define REG_INDEX(offset)(offset / sizeof(uint32_t))
+
+/* PS2 register reset values */
+enum {
+REG_GCTL_RST= 0x0002,
+REG_DATA_RST= 0x,
+REG_LCTL_RST= 0x,
+REG_LSTS_RST= 0x0003,
+REG_FCTL_RST= 0x,
+REG_FSTS_RST= 0x0100,
+REG_CLKDR_RST   = 0x2F4F,
+};
+
+/* REG_GCTL Fields */
+#define FIELD_REG_GCTL_SOFT_RST (1 << 2)
+#define FIELD_REG_GCTL_INT_EN   (1 << 3)
+#define FIELD_REG_GCTL_INT_FLAG (1 << 4)
+
+/* REG_FCTL Fields */
+#define FIELD_REG_FCTL_RXRDY_IEN(1 << 0)
+#define FIELD_REG_FCTL_TXRDY_IEN(1 << 8)
+
+/* REG_FSTS Fields */
+#define FIELD_REG_FSTS_RX_RDY   (1 << 0)
+#define FIELD_REG_FSTS_TX_RDY   (1 << 8)
+#define FIELD_REG_FSTS_RX_LEVEL1(1 << 16)
+
+static int allwinner_a10_ps2_fctl_is_irq(AwA10PS2State *s)
+{
+return (s->regs[REG_INDEX(REG_FCTL)] & FIELD_REG_FCTL_TXRDY_IEN) ||
+(s->pending &&
+ (s->regs[REG_INDEX(REG_FCTL)] & FIELD_REG_FCTL_RXRDY_IEN));
+}
+
+static void allwinner_a10_ps2_update_irq(AwA10PS2State *s)
+{
+int level = (s->regs[

[RFC Patch 2/5] hw/display: Allwinner basic MALI GPU emulation

2023-09-05 Thread Strahinja Jankovic

This patch adds minimal MALI GPU emulation needed so emulated system
thinks GPU is working.

Signed-off-by: Strahinja Jankovic 
---
 hw/arm/allwinner-a10.c |   7 +
 hw/display/allwinner-gpu.c | 212 +
 hw/display/meson.build |   3 +-
 hw/display/trace-events|   4 +
 include/hw/arm/allwinner-a10.h |   2 +
 include/hw/display/allwinner-gpu.h |  64 +
 6 files changed, 291 insertions(+), 1 deletion(-)
 create mode 100644 hw/display/allwinner-gpu.c
 create mode 100644 include/hw/display/allwinner-gpu.h

diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
index 2351d1a69b..75cd879d24 100644
--- a/hw/arm/allwinner-a10.c
+++ b/hw/arm/allwinner-a10.c
@@ -42,6 +42,7 @@
 #define AW_A10_RTC_BASE 0x01c20d00
 #define AW_A10_I2C0_BASE0x01c2ac00
 #define AW_A10_HDMI_BASE0x01c16000
+#define AW_A10_GPU_BASE 0x01c4
 
 void allwinner_a10_bootrom_setup(AwA10State *s, BlockBackend *blk)
 {
@@ -98,6 +99,8 @@ static void aw_a10_init(Object *obj)
 object_initialize_child(obj, "wdt", &s->wdt, TYPE_AW_WDT_SUN4I);
 
 object_initialize_child(obj, "hdmi", &s->hdmi, TYPE_AW_A10_HDMI);
+
+object_initialize_child(obj, "mali400", &s->gpu, TYPE_AW_GPU);
 }
 
 static void aw_a10_realize(DeviceState *dev, Error **errp)
@@ -217,6 +220,10 @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
 /* HDMI */
 sysbus_realize(SYS_BUS_DEVICE(&s->hdmi), &error_fatal);
 sysbus_mmio_map(SYS_BUS_DEVICE(&s->hdmi), 0, AW_A10_HDMI_BASE);
+
+/* MALI GPU */
+sysbus_realize(SYS_BUS_DEVICE(&s->gpu), &error_fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpu), 0, AW_A10_GPU_BASE);
 }
 
 static void aw_a10_class_init(ObjectClass *oc, void *data)
diff --git a/hw/display/allwinner-gpu.c b/hw/display/allwinner-gpu.c
new file mode 100644
index 00..735976d206
--- /dev/null
+++ b/hw/display/allwinner-gpu.c
@@ -0,0 +1,212 @@
+/*
+ * Allwinner GPU Module emulation
+ *
+ * Copyright (C) 2023 Strahinja Jankovic 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/display/allwinner-gpu.h"
+#include "trace.h"
+
+/* GPU register offsets - only the important ones. */
+enum {
+REG_MALI_GP_CMD = 0x0020,
+REG_MALI_GP_INT_RAWSTAT = 0x0024,
+REG_MALI_GP_VERSION = 0x006C,
+REG_MALI_GP_MMU_DTE = 0x3000,
+REG_MALI_GP_MMU_STATUS  = 0x3004,
+REG_MALI_GP_MMU_COMMAND = 0x3008,
+REG_MALI_PP0_MMU_DTE= 0x4000,
+REG_MALI_PP0_MMU_STATUS = 0x4004,
+REG_MALI_PP0_MMU_COMMAND= 0x4008,
+REG_MALI_PP0_VERSION= 0x9000,
+REG_MALI_PP0_CTRL   = 0x900C,
+REG_MALI_PP0_INT_RAWSTAT= 0x9020,
+};
+
+#define REG_INDEX(offset)(offset / sizeof(uint32_t))
+
+#define MALI_GP_VERSION_READ_VAL(0x0B07u << 16)
+#define MALI_PP0_VERSION_READ_VAL   (0xCD07u << 16)
+#define MALI_MMU_DTE_MASK   (0x0FFF)
+
+/* MALI_GP_CMD register fields */
+#define MALI_GP_CMD_SOFT_RESET(1 << 10)
+
+/* MALI_GP_INT_RAWSTAT register fields */
+#define MALI_GP_INT_RAWSTAT_RESET_COMPLETED (1 << 19)
+
+/* MALI_MMU_COMMAND values */
+enum {
+MALI_MMU_COMMAND_ENABLE_PAGING = 0,
+MALI_MMU_COMMAND_HARD_RESET= 6,
+};
+
+/* MALI_MMU_STATUS register fields */
+#define MALI_MMU_STATUS_PAGING_ENABLED  (1 << 0)
+
+/* MALI_PP_CTRL register fields */
+#define MALI_PP_CTRL_SOFT_RESET (1 << 7)
+
+/* MALI_PP_INT_RAWSTAT register fields */
+#define MALI_PP_INT_RAWSTAT_RESET_COMPLETED (1 << 12)
+
+static uint64_t allwinner_gpu_read(void *opaque, hwaddr offset,
+   unsigned size)
+{
+const AwGpuState *s = AW_GPU(opaque);
+const uint32_t idx = REG_INDEX(offset);
+uint32_t val = s->regs[idx];
+
+switch (offset) {
+case REG_MALI_GP_VERSION:
+val = MALI_GP_VERSION_READ_VAL;
+break;
+case REG_MALI_GP_MMU_DTE:
+case REG_MALI_PP0_MMU_DTE:
+val &= ~MALI_MMU_DTE_MASK;
+break;
+case REG_MALI_PP0_VERSION:
+val = MALI_PP0_VERSION_READ_VAL;
+break;
+case 0xF0B8 ... AW_GPU_IOSIZE:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n",
+

[RFC Patch 0/5] Allwinner A10 input/output peripherals

2023-09-05 Thread Strahinja Jankovic

This patch set adds minimal support for PS2 input and HDMI output for
Allwinner A10 and Cubieboard.

For the display part, minimal emulation of HDMI, MALI GPU, Display
Engine Backend and LCD controller is added.

For the PS2 both keyboard and mouse support is added and attached to the
two PS2 contollers present in Allwinner A10.

Functionality has been tested with custom Yocto image.

This is sent as RFC for now, since there might be some use cases which
have not been tested.


Strahinja Jankovic (5):
  hw/display: Allwinner A10 HDMI controller emulation
  hw/display: Allwinner basic MALI GPU emulation
  hw/display: Allwinner A10 Display Engine Backend emulation
  hw/display: Allwinner A10 LCDC emulation
  hw/input: Add Allwinner-A10 PS2 emulation

 hw/arm/allwinner-a10.c  |  51 
 hw/display/allwinner-a10-debe.c | 229 
 hw/display/allwinner-a10-hdmi.c | 214 +++
 hw/display/allwinner-a10-lcdc.c | 275 +++
 hw/display/allwinner-gpu.c  | 212 +++
 hw/display/meson.build  |   5 +
 hw/display/trace-events |  17 ++
 hw/input/allwinner-a10-ps2.c| 345 
 hw/input/meson.build|   2 +
 include/hw/arm/allwinner-a10.h  |  11 +
 include/hw/display/allwinner-a10-debe.h |  71 +
 include/hw/display/allwinner-a10-hdmi.h |  69 +
 include/hw/display/allwinner-a10-lcdc.h |  77 ++
 include/hw/display/allwinner-gpu.h  |  64 +
 include/hw/input/allwinner-a10-ps2.h|  96 +++
 15 files changed, 1738 insertions(+)
 create mode 100644 hw/display/allwinner-a10-debe.c
 create mode 100644 hw/display/allwinner-a10-hdmi.c
 create mode 100644 hw/display/allwinner-a10-lcdc.c
 create mode 100644 hw/display/allwinner-gpu.c
 create mode 100644 hw/input/allwinner-a10-ps2.c
 create mode 100644 include/hw/display/allwinner-a10-debe.h
 create mode 100644 include/hw/display/allwinner-a10-hdmi.h
 create mode 100644 include/hw/display/allwinner-a10-lcdc.h
 create mode 100644 include/hw/display/allwinner-gpu.h
 create mode 100644 include/hw/input/allwinner-a10-ps2.h

-- 
2.34.1

[RFC Patch 3/5] hw/display: Allwinner A10 Display Engine Backend emulation

2023-09-05 Thread Strahinja Jankovic

This patch adds Display Engine Backend 0 (DEBE0) support.
This peripheral will hold runtime configuration for the display size and
framebuffer offset which will be used by other components.

Signed-off-by: Strahinja Jankovic 
---
 hw/arm/allwinner-a10.c  |   9 +
 hw/display/allwinner-a10-debe.c | 229 
 hw/display/meson.build  |   3 +-
 hw/display/trace-events |   4 +
 include/hw/arm/allwinner-a10.h  |   2 +
 include/hw/display/allwinner-a10-debe.h |  71 
 6 files changed, 317 insertions(+), 1 deletion(-)
 create mode 100644 hw/display/allwinner-a10-debe.c
 create mode 100644 include/hw/display/allwinner-a10-debe.h

diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
index 75cd879d24..624e95af46 100644
--- a/hw/arm/allwinner-a10.c
+++ b/hw/arm/allwinner-a10.c
@@ -43,6 +43,7 @@
 #define AW_A10_I2C0_BASE0x01c2ac00
 #define AW_A10_HDMI_BASE0x01c16000
 #define AW_A10_GPU_BASE 0x01c4
+#define AW_A10_DE_BE0_BASE  0x01e6
 
 void allwinner_a10_bootrom_setup(AwA10State *s, BlockBackend *blk)
 {
@@ -100,6 +101,8 @@ static void aw_a10_init(Object *obj)
 
 object_initialize_child(obj, "hdmi", &s->hdmi, TYPE_AW_A10_HDMI);
 
+object_initialize_child(obj, "de_be0", &s->de_be0, TYPE_AW_A10_DEBE);
+
 object_initialize_child(obj, "mali400", &s->gpu, TYPE_AW_GPU);
 }
 
@@ -221,6 +224,12 @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
 sysbus_realize(SYS_BUS_DEVICE(&s->hdmi), &error_fatal);
 sysbus_mmio_map(SYS_BUS_DEVICE(&s->hdmi), 0, AW_A10_HDMI_BASE);
 
+/* Display Engine Backend */
+object_property_set_uint(OBJECT(&s->de_be0), "ram-base",
+ AW_A10_SDRAM_BASE, &error_fatal);
+sysbus_realize(SYS_BUS_DEVICE(&s->de_be0), &error_fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->de_be0), 0, AW_A10_DE_BE0_BASE);
+
 /* MALI GPU */
 sysbus_realize(SYS_BUS_DEVICE(&s->gpu), &error_fatal);
 sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpu), 0, AW_A10_GPU_BASE);
diff --git a/hw/display/allwinner-a10-debe.c b/hw/display/allwinner-a10-debe.c
new file mode 100644
index 00..3760728eab
--- /dev/null
+++ b/hw/display/allwinner-a10-debe.c
@@ -0,0 +1,229 @@
+/*
+ * Allwinner A10 Display Engine Backend emulation
+ *
+ * Copyright (C) 2023 Strahinja Jankovic 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/qdev-properties.h"
+#include "hw/display/allwinner-a10-debe.h"
+#include "trace.h"
+
+/* DEBE register offsets - only important ones */
+enum {
+REG_DEBE_MODCTL = 0x0800, /* DE mode control */
+REG_DEBE_DISSIZE= 0x0808, /* DE display size */
+REG_DEBE_LAY0FB_L32ADD  = 0x0850, /* DE Layer 0 lower 32-bit address */
+REG_DEBE_REGBUFFCTL = 0x0870, /* DE buffer control register */
+REG_DEBE_ATTCTL_REG1_L0 = 0x08A0, /* DE Layer 0 attribute ctrl reg 1 */
+};
+
+/* DEBE_DISSIZE fields */
+#define FIELD_DEBE_DISSIZE_DIS_HEIGHT   (16)
+#define FIELD_DEBE_DISSIZE_DIS_WIDTH(0)
+#define DEBE_DISSIZE_DIS_MASK   (0xu)
+
+/* DEBE_REGBUFFCTL fields */
+#define FIELD_DEBE_REGBUFFCTL_REGLOADCTL(1)
+#define FIELD_DEBE_REGBUFFCTL_REGAUTOLOAD_DIS   (2)
+
+/* DEBE_ATTCTL_REG1_L0 fields */
+#define FIELD_DEBE_ATTCTL_REG1_L0_LAY_FBFMT (8)
+#define DEBE_ATTCTL_REG1_L0_LAY_FBFMT_MASK  (0xFu)
+enum {
+ATTCTL_REG1_LAY_FBFMT_MONO_1BPP = 0,
+ATTCTL_REG1_LAY_FBFMT_MONO_2BPP,
+ATTCTL_REG1_LAY_FBFMT_MONO_4BPP,
+ATTCTL_REG1_LAY_FBFMT_MONO_8BPP,
+ATTCTL_REG1_LAY_FBFMT_COLOR_16BPP_655,
+ATTCTL_REG1_LAY_FBFMT_COLOR_16BPP_565,
+ATTCTL_REG1_LAY_FBFMT_COLOR_16BPP_556,
+ATTCTL_REG1_LAY_FBFMT_COLOR_16BPP_1555,
+ATTCTL_REG1_LAY_FBFMT_COLOR_16BPP_5551,
+ATTCTL_REG1_LAY_FBFMT_COLOR_32BPP_P888,
+ATTCTL_REG1_LAY_FBFMT_COLOR_32BPP_,
+ATTCTL_REG1_LAY_FBFMT_COLOR_24BPP_888,
+ATTCTL_REG1_LAY_FBFMT_COLOR_16BPP_,
+};
+
+static uint8_t debe_lay_fbfmt_bpp[] = {
+1,
+2,
+4,
+8,
+16,
+16,
+16,
+16,
+16,
+32,
+32,
+24,
+16
+};
+
+#define REG_INDEX(offset)(offset / sizeof(uint32_t))
+
+static uint64_t allwinner_a

[RFC Patch 1/5] hw/display: Allwinner A10 HDMI controller emulation

2023-09-05 Thread Strahinja Jankovic

This patch adds basic Allwinner A10 HDMI controller support.
Emulated HDMI component will always show that a display is connected and
provide default EDID info.

Signed-off-by: Strahinja Jankovic 
---
 hw/arm/allwinner-a10.c  |   7 +
 hw/display/allwinner-a10-hdmi.c | 214 
 hw/display/meson.build  |   2 +
 hw/display/trace-events |   4 +
 include/hw/arm/allwinner-a10.h  |   2 +
 include/hw/display/allwinner-a10-hdmi.h |  69 
 6 files changed, 298 insertions(+)
 create mode 100644 hw/display/allwinner-a10-hdmi.c
 create mode 100644 include/hw/display/allwinner-a10-hdmi.h

diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
index b0ea3f7f66..2351d1a69b 100644
--- a/hw/arm/allwinner-a10.c
+++ b/hw/arm/allwinner-a10.c
@@ -41,6 +41,7 @@
 #define AW_A10_WDT_BASE 0x01c20c90
 #define AW_A10_RTC_BASE 0x01c20d00
 #define AW_A10_I2C0_BASE0x01c2ac00
+#define AW_A10_HDMI_BASE0x01c16000
 
 void allwinner_a10_bootrom_setup(AwA10State *s, BlockBackend *blk)
 {
@@ -95,6 +96,8 @@ static void aw_a10_init(Object *obj)
 object_initialize_child(obj, "rtc", &s->rtc, TYPE_AW_RTC_SUN4I);
 
 object_initialize_child(obj, "wdt", &s->wdt, TYPE_AW_WDT_SUN4I);
+
+object_initialize_child(obj, "hdmi", &s->hdmi, TYPE_AW_A10_HDMI);
 }
 
 static void aw_a10_realize(DeviceState *dev, Error **errp)
@@ -210,6 +213,10 @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
 /* WDT */
 sysbus_realize(SYS_BUS_DEVICE(&s->wdt), &error_fatal);
 sysbus_mmio_map_overlap(SYS_BUS_DEVICE(&s->wdt), 0, AW_A10_WDT_BASE, 1);
+
+/* HDMI */
+sysbus_realize(SYS_BUS_DEVICE(&s->hdmi), &error_fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(&s->hdmi), 0, AW_A10_HDMI_BASE);
 }
 
 static void aw_a10_class_init(ObjectClass *oc, void *data)
diff --git a/hw/display/allwinner-a10-hdmi.c b/hw/display/allwinner-a10-hdmi.c
new file mode 100644
index 00..0f046e3cc7
--- /dev/null
+++ b/hw/display/allwinner-a10-hdmi.c
@@ -0,0 +1,214 @@
+/*
+ * Allwinner A10 HDMI Module emulation
+ *
+ * Copyright (C) 2023 Strahinja Jankovic 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "qemu/module.h"
+#include "hw/display/allwinner-a10-hdmi.h"
+#include "trace.h"
+
+/* HDMI register offsets */
+enum {
+REG_HPD = 0x000C, /* HDMI Hotplug detect */
+REG_DDC_CTRL= 0x0500, /* DDC Control */
+REG_DDC_SLAVE_ADDRESS   = 0x0504, /* DDC Slave address */
+REG_DDC_INT_STATUS  = 0x050C, /* DDC Interrupt status */
+REG_DDC_FIFO_CTRL   = 0x0510, /* DDC FIFO Control */
+REG_DDC_FIFO_ACCESS = 0x0518, /* DDC FIFO access */
+REG_DDC_COMMAND = 0x0520, /* DDC Command */
+};
+
+/* HPD register fields */
+#define FIELD_HPD_HOTPLUG_DET_HIGH  (1 << 0)
+
+/* DDC_CTRL register fields */
+#define FIELD_DDC_CTRL_SW_RST   (1 << 0)
+#define FIELD_DDC_CTRL_ACCESS_CMD_START (1 << 30)
+
+/* FIFO_CTRL register fields */
+#define FIELD_FIFO_CTRL_ADDRESS_CLEAR   (1 << 31)
+
+/* DDC_SLAVE_ADDRESS register fields */
+#define FIELD_DDC_SLAVE_ADDRESS_SEGMENT_SHIFT   (24)
+#define FIELD_DDC_SLAVE_ADDRESS_OFFSET_SHIFT(8)
+
+/* DDC_INT_STATUS register fields */
+#define FIELD_DDC_INT_STATUS_TRANSFER_COMPLETE  (1 << 0)
+
+/* DDC access command */
+enum {
+DDC_COMMAND_E_DDC_READ = 6,
+};
+
+
+
+#define REG_INDEX(offset)(offset / sizeof(uint32_t))
+
+static uint64_t allwinner_a10_hdmi_read(void *opaque, hwaddr offset,
+   unsigned size)
+{
+AwA10HdmiState *s = AW_A10_HDMI(opaque);
+const uint32_t idx = REG_INDEX(offset);
+uint32_t val = s->regs[idx];
+
+switch (offset) {
+case REG_HPD:
+val = FIELD_HPD_HOTPLUG_DET_HIGH;
+break;
+case REG_DDC_FIFO_ACCESS:
+val = s->edid_blob[s->edid_reg % sizeof(s->edid_blob)];
+s->edid_reg++;
+break;
+case 0x544 ... AW_A10_HDMI_IOSIZE:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n",
+  __func__, (uint32_t)offset);
+return 0;
+default:
+break;
+}
+
+trace_allwinner_a10_hdmi_read(

Re: [PATCH 65/67] ppc/kconfig: make SAM460EX depend on PPC & PIXMAN

2023-09-05 Thread Marc-André Lureau

Hi

On Wed, Aug 30, 2023 at 4:35 PM BALATON Zoltan  wrote:
>
> On Wed, 30 Aug 2023, marcandre.lur...@redhat.com wrote:
> > From: Marc-André Lureau 
> >
> > SM501 is going to depend on PIXMAN next.
>
> Why is this patch needed when SM501 is the one that depends on PIXMAN and
> should pull in the dependency? Also what's the change in default.mak?

(see Paolo answer)

> ati-vga also uses pixman and currently has no fall back.

Indeed, and it is disabled not by Kconfig but by meson:
system_ss.add(when: [pixman, 'CONFIG_ATI_VGA'], if_true:
files('ati.c', 'ati_2d.c', 'ati_dbg.c'))


> The sm501 already
> has fallback when pixman fails so could work without pixman too, see
> x-pixman property in sm501.c.

Correct, I have changed it to conditionally compile x-pixman related code.

thanks

[PATCH v1 4/7] qapi: fix example of cancel-vcpu-dirty-limit command

2023-09-05 Thread Victor Toso

Example output has extra end curly bracket. Remove it.

Signed-off-by: Victor Toso 
---
 qapi/migration.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59..9385b9f87c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -2010,7 +2010,7 @@
 #
 # Example:
 #
-# -> {"execute": "cancel-vcpu-dirty-limit"},
+# -> {"execute": "cancel-vcpu-dirty-limit",
 # "arguments": { "cpu-index": 1 } }
 # <- { "return": {} }
 ##
-- 
2.41.0

[PATCH v1 7/7] qapi: fix example of NETDEV_STREAM_CONNECTED event

2023-09-05 Thread Victor Toso

Example output was using single quotes. Fix it.

Signed-off-by: Victor Toso 
---
 qapi/net.json | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index 313c8a606e..81988e499a 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -930,9 +930,9 @@
 #
 # Example:
 #
-# <- { 'event': 'NETDEV_STREAM_DISCONNECTED',
-#  'data': {'netdev-id': 'netdev0'},
-#  'timestamp': {'seconds': 1663330937, 'microseconds': 526695} }
+# <- { "event": "NETDEV_STREAM_DISCONNECTED",
+#  "data": {"netdev-id": "netdev0"},
+#  "timestamp": {"seconds": 1663330937, "microseconds": 526695} }
 ##
 { 'event': 'NETDEV_STREAM_DISCONNECTED',
   'data': { 'netdev-id': 'str' } }
-- 
2.41.0

[PATCH v1 1/7] qapi: scripts: add a generator for qapi's examples

2023-09-05 Thread Victor Toso

This generator has two goals:
 1. Mechanical validation of QAPI examples
 2. Generate the examples in a JSON format to be consumed for extra
validation.

The generator iterates over every Example section, parsing both server
and client messages. The generator prints any inconsistency found, for
example:

 |  Error: Extra data: line 1 column 39 (char 38)
 |  Location: cancel-vcpu-dirty-limit at qapi/migration.json:2017
 |  Data: {"execute": "cancel-vcpu-dirty-limit"},
 |  "arguments": { "cpu-index": 1 } }

The generator will output other JSON file with all the examples in the
QAPI module that they came from. This can be used to validate the
introspection between QAPI/QMP to language bindings, for example:

 | { "examples": [
 |   {
 | "id": "ksuxwzfayw",
 | "client": [
 | {
 |   "sequence-order": 1
 |   "message-type": "command",
 |   "message":
 |   { "arguments":
 | { "device": "scratch", "size": 1073741824 },
 | "execute": "block_resize"
 |   },
 |} ],
 |"server": [
 |{
 |  "sequence-order": 2
 |  "message-type": "return",
 |  "message": { "return": {} },
 |} ]
 |}
 |  ] }

Note that the order matters, as read by the Example section and
translated into "sequence-order". A language binding project can then
consume this files to Marshal and Unmarshal, comparing if the results
are what is to be expected.

RFC discussion:
https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg04641.html

Signed-off-by: Victor Toso 
---
 scripts/qapi/dumpexamples.py | 194 +++
 scripts/qapi/main.py |   2 +
 2 files changed, 196 insertions(+)
 create mode 100644 scripts/qapi/dumpexamples.py

diff --git a/scripts/qapi/dumpexamples.py b/scripts/qapi/dumpexamples.py
new file mode 100644
index 00..c14ed11774
--- /dev/null
+++ b/scripts/qapi/dumpexamples.py
@@ -0,0 +1,194 @@
+"""
+Dump examples for Developers
+"""
+# Copyright (c) 2022 Red Hat Inc.
+#
+# Authors:
+#  Victor Toso 
+#
+# This work is licensed under the terms of the GNU GPL, version 2.
+# See the COPYING file in the top-level directory.
+
+# Just for type hint on self
+from __future__ import annotations
+
+import os
+import json
+import random
+import string
+
+from typing import Dict, List, Optional
+
+from .schema import (
+QAPISchema,
+QAPISchemaType,
+QAPISchemaVisitor,
+QAPISchemaEnumMember,
+QAPISchemaFeature,
+QAPISchemaIfCond,
+QAPISchemaObjectType,
+QAPISchemaObjectTypeMember,
+QAPISchemaVariants,
+)
+from .source import QAPISourceInfo
+
+
+def gen_examples(schema: QAPISchema,
+ output_dir: str,
+ prefix: str) -> None:
+vis = QAPISchemaGenExamplesVisitor(prefix)
+schema.visit(vis)
+vis.write(output_dir)
+
+
+def get_id(random, size: int) -> str:
+letters = string.ascii_lowercase
+return ''.join(random.choice(letters) for i in range(size))
+
+
+def next_object(text, start, end, context) -> Dict:
+# Start of json object
+start = text.find("{", start)
+end = text.rfind("}", start, end+1)
+
+# try catch, pretty print issues
+try:
+ret = json.loads(text[start:end+1])
+except Exception as e:
+print("Error: {}\nLocation: {}\nData: {}\n".format(
+  str(e), context, text[start:end+1]))
+return {}
+else:
+return ret
+
+
+def parse_text_to_dicts(text: str, context: str) -> List[Dict]:
+examples, clients, servers = [], [], []
+
+count = 1
+c, s = text.find("->"), text.find("<-")
+while c != -1 or s != -1:
+if c == -1 or (s != -1 and s < c):
+start, target = s, servers
+else:
+start, target = c, clients
+
+# Find the client and server, if any
+if c != -1:
+c = text.find("->", start + 1)
+if s != -1:
+s = text.find("<-", start + 1)
+
+# Find the limit of current's object.
+# We first look for the next message, either client or server. If none
+# is avaible, we set the end of the text as limit.
+if c == -1 and s != -1:
+end = s
+elif c != -1 and s == -1:
+end = c
+elif c != -1 and s != -1:
+end = (c < s) and c or s
+else:
+end = len(text) - 1
+
+message = next_object(text, start, end, context)
+if len(message) > 0:
+message_type = "return"
+if "execute" in message:
+message_type = "command"
+elif "event" in message:
+message_type = "event"
+
+target.append({
+"sequence-order": count,
+"message-type": message_type,
+"message": message
+})
+count += 1
+
+examples.append({"client": clients, "server": servers})
+return examples
+
+
+def parse_examples_of(self: QAPISchemaGenExamplesVisitor,
+

[PATCH v1 2/7] qapi: fix example of get-win32-socket command

2023-09-05 Thread Victor Toso

Example output lacks double quotes. Fix it.

Fixes: 4cda177c60 "qmp: add 'get-win32-socket'"
Signed-off-by: Victor Toso 
---
 qapi/misc.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index cda2effa81..be302cadeb 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -290,7 +290,7 @@
 #
 # Example:
 #
-# -> { "execute": "get-win32-socket", "arguments": { "info": "abcd123..", 
fdname": "skclient" } }
+# -> { "execute": "get-win32-socket", "arguments": { "info": "abcd123..", 
"fdname": "skclient" } }
 # <- { "return": {} }
 ##
 { 'command': 'get-win32-socket', 'data': {'info': 'str', 'fdname': 'str'}, 
'if': 'CONFIG_WIN32' }
-- 
2.41.0

[PATCH v1 0/7] Validate and test qapi examples

2023-09-05 Thread Victor Toso

Hi,

This is a follow up from the RFC sent in the end of 08-2022:
https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg04525.html

The generator code was rebased, without conflicts. The commit log was
improved as per Markus suggestion [0], altough I'm sure it can be
improved further.

To clarify, consuming the Examples as data for testing the qapi-go
work has been very very helpful. I'm positive it can be of use for other
bindings in the future, besides keeping the examples functional.

Cheers,

[0] https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg04682.html

Victor Toso (7):
  qapi: scripts: add a generator for qapi's examples
  qapi: fix example of get-win32-socket command
  qapi: fix example of dumpdtb command
  qapi: fix example of cancel-vcpu-dirty-limit command
  qapi: fix example of set-vcpu-dirty-limit command
  qapi: fix example of calc-dirty-rate command
  qapi: fix example of NETDEV_STREAM_CONNECTED event

 qapi/machine.json|   2 +-
 qapi/migration.json  |   6 +-
 qapi/misc.json   |   2 +-
 qapi/net.json|   6 +-
 scripts/qapi/dumpexamples.py | 194 +++
 scripts/qapi/main.py |   2 +
 6 files changed, 204 insertions(+), 8 deletions(-)
 create mode 100644 scripts/qapi/dumpexamples.py

-- 
2.41.0

[PATCH v1 6/7] qapi: fix example of calc-dirty-rate command

2023-09-05 Thread Victor Toso

Example output has property name with single quotes. Fix it.

Signed-off-by: Victor Toso 
---
 qapi/migration.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 2658cdbcbe..45dac41f67 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1922,7 +1922,7 @@
 # Example:
 #
 # -> {"execute": "calc-dirty-rate", "arguments": {"calc-time": 1,
-# 'sample-pages': 512} }
+# "sample-pages": 512} }
 # <- { "return": {} }
 ##
 { 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
-- 
2.41.0

[PATCH v1 5/7] qapi: fix example of set-vcpu-dirty-limit command

2023-09-05 Thread Victor Toso

Example output has extra end curly bracket. Remove it.

Signed-off-by: Victor Toso 
---
 qapi/migration.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 9385b9f87c..2658cdbcbe 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1986,7 +1986,7 @@
 #
 # Example:
 #
-# -> {"execute": "set-vcpu-dirty-limit"}
+# -> {"execute": "set-vcpu-dirty-limit",
 # "arguments": { "dirty-rate": 200,
 #"cpu-index": 1 } }
 # <- { "return": {} }
-- 
2.41.0

[PATCH v1 3/7] qapi: fix example of dumpdtb command

2023-09-05 Thread Victor Toso

Example output has extra end curly bracket. Switch with comma.

Signed-off-by: Victor Toso 
---
 qapi/machine.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index a08b6576ca..9eb76193e0 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1684,7 +1684,7 @@
 #
 # Example:
 #
-# -> { "execute": "dumpdtb" }
+# -> { "execute": "dumpdtb",
 #  "arguments": { "filename": "fdt.dtb" } }
 # <- { "return": {} }
 ##
-- 
2.41.0

[PATCH v3 1/1] migration: Allow user to specify available switchover bandwidth

2023-09-05 Thread Peter Xu

Migration bandwidth is a very important value to live migration.  It's
because it's one of the major factors that we'll make decision on when to
switchover to destination in a precopy process.

This value is currently estimated by QEMU during the whole live migration
process by monitoring how fast we were sending the data.  This can be the
most accurate bandwidth if in the ideal world, where we're always feeding
unlimited data to the migration channel, and then it'll be limited to the
bandwidth that is available.

However in reality it may be very different, e.g., over a 10Gbps network we
can see query-migrate showing migration bandwidth of only a few tens of
MB/s just because there are plenty of other things the migration thread
might be doing.  For example, the migration thread can be busy scanning
zero pages, or it can be fetching dirty bitmap from other external dirty
sources (like vhost or KVM).  It means we may not be pushing data as much
as possible to migration channel, so the bandwidth estimated from "how many
data we sent in the channel" can be dramatically inaccurate sometimes.

With that, the decision to switchover will be affected, by assuming that we
may not be able to switchover at all with such a low bandwidth, but in
reality we can.

The migration may not even converge at all with the downtime specified,
with that wrong estimation of bandwidth, keeping iterations forever with a
low estimation of bandwidth.

The issue is QEMU itself may not be able to avoid those uncertainties on
measuing the real "available migration bandwidth".  At least not something
I can think of so far.

One way to fix this is when the user is fully aware of the available
bandwidth, then we can allow the user to help providing an accurate value.

For example, if the user has a dedicated channel of 10Gbps for migration
for this specific VM, the user can specify this bandwidth so QEMU can
always do the calculation based on this fact, trusting the user as long as
specified.  It may not be the exact bandwidth when switching over (in which
case qemu will push migration data as fast as possible), but much better
than QEMU trying to wildly guess, especially when very wrong.

A new parameter "avail-switchover-bandwidth" is introduced just for this.
So when the user specified this parameter, instead of trusting the
estimated value from QEMU itself (based on the QEMUFile send speed), it
trusts the user more by using this value to decide when to switchover,
assuming that we'll have such bandwidth available then.

Note that specifying this value will not throttle the bandwidth for
switchover yet, so QEMU will always use the full bandwidth possible for
sending switchover data, assuming that should always be the most important
way to use the network at that time.

This can resolve issues like "unconvergence migration" which is caused by
hilarious low "migration bandwidth" detected for whatever reason.

Reported-by: Zhiyi Guo 
Signed-off-by: Peter Xu 
---
 qapi/migration.json| 11 +++
 migration/migration.h  |  2 +-
 migration/options.h|  2 ++
 migration/migration-hmp-cmds.c | 14 ++
 migration/migration.c  | 24 +---
 migration/options.c| 25 +
 migration/trace-events |  2 +-
 7 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index eeb1878c4f..49c36ec9c0 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -766,6 +766,16 @@
 # @max-bandwidth: to set maximum speed for migration.  maximum speed
 # in bytes per second.  (Since 2.8)
 #
+# @avail-switchover-bandwidth: to set the available bandwidth that
+# migration can use during switchover phase.  NOTE!  This does not
+# limit the bandwidth during switchover, but only for calculations when
+# making decisions to switchover.  By default, this value is zero,
+# which means QEMU will estimate the bandwidth automatically.  This can
+# be set when the estimated value is not accurate, while the user is
+# able to guarantee such bandwidth is available when switching over.
+# When specified correctly, this can make the switchover decision much
+# more accurate.  (Since 8.2)
+#
 # @downtime-limit: set maximum tolerated downtime for migration.
 # maximum downtime in milliseconds (Since 2.8)
 #
@@ -856,6 +866,7 @@
 '*tls-hostname': 'StrOrNull',
 '*tls-authz': 'StrOrNull',
 '*max-bandwidth': 'size',
+'*avail-switchover-bandwidth': 'size',
 '*downtime-limit': 'uint64',
 '*x-checkpoint-delay': { 'type': 'uint32',
  'features': [ 'unstable' ] },
diff --git a/migration/migration.h b/migration/migration.h
index 6eea18db36..ce910c1db2 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -283,7 +283,7 @@ struct MigrationState {
 /*
  * The final stage happ

[PATCH v3 0/1] migration: Add avail-switchover-bandwidth parameter

2023-09-05 Thread Peter Xu

This single-patch series is based on:

[PATCH v3 0/4] qapi/migration: Dedup migration parameter objects and fix 
tls-authz crash
Based-on: <20230905162335.235619-1-pet...@redhat.com>

I still added a cover letter to make sure the "Based-on" will be parsed all
right for e.g. patchew.

v3:
- Rebased to above patchset, dropped the 1st patch
- Renamed the parameter from "max-switchover-bandwidth" to
  "avail-switchover-bandwidth"
- Fixed calculation [Joao]

For more information on the new parameter and why we need it, please read
commit message in the patch.

Please have a look, thanks.

Peter Xu (1):
  migration: Allow user to specify available switchover bandwidth

 qapi/migration.json| 11 +++
 migration/migration.h  |  2 +-
 migration/options.h|  2 ++
 migration/migration-hmp-cmds.c | 14 ++
 migration/migration.c  | 24 +---
 migration/options.c| 25 +
 migration/trace-events |  2 +-
 7 files changed, 75 insertions(+), 5 deletions(-)

-- 
2.41.0

Re: [PATCH v3] iothread: Set the GSource "name" field

2023-09-05 Thread Philippe Mathieu-Daudé


On 5/9/23 20:03, Fabiano Rosas wrote:

Having a name in the source helps with debugging core dumps when one
might not have access to TLS data to cross-reference AioContexts with
their addresses.

Signed-off-by: Fabiano Rosas 
---
v3:
used const
v2:
used g_autofree where appropriate
---
  iothread.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH v3] iothread: Set the GSource "name" field

2023-09-05 Thread Fabiano Rosas

Having a name in the source helps with debugging core dumps when one
might not have access to TLS data to cross-reference AioContexts with
their addresses.

Signed-off-by: Fabiano Rosas 
---
v3:
used const
v2:
used g_autofree where appropriate
---
 iothread.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/iothread.c b/iothread.c
index b41c305bd9..b753286414 100644
--- a/iothread.c
+++ b/iothread.c
@@ -138,12 +138,14 @@ static void iothread_instance_finalize(Object *obj)
 qemu_sem_destroy(&iothread->init_done_sem);
 }
 
-static void iothread_init_gcontext(IOThread *iothread)
+static void iothread_init_gcontext(IOThread *iothread, const char *thread_name)
 {
 GSource *source;
+g_autofree char *name = g_strdup_printf("%s aio-context", thread_name);
 
 iothread->worker_context = g_main_context_new();
 source = aio_get_g_source(iothread_get_aio_context(iothread));
+g_source_set_name(source, name);
 g_source_attach(source, iothread->worker_context);
 g_source_unref(source);
 iothread->main_loop = g_main_loop_new(iothread->worker_context, TRUE);
@@ -180,7 +182,7 @@ static void iothread_init(EventLoopBase *base, Error **errp)
 {
 Error *local_error = NULL;
 IOThread *iothread = IOTHREAD(base);
-char *thread_name;
+g_autofree char *thread_name = NULL;
 
 iothread->stopping = false;
 iothread->running = true;
@@ -189,11 +191,14 @@ static void iothread_init(EventLoopBase *base, Error 
**errp)
 return;
 }
 
+thread_name = g_strdup_printf("IO %s",
+object_get_canonical_path_component(OBJECT(base)));
+
 /*
  * Init one GMainContext for the iothread unconditionally, even if
  * it's not used
  */
-iothread_init_gcontext(iothread);
+iothread_init_gcontext(iothread, thread_name);
 
 iothread_set_aio_context_params(base, &local_error);
 if (local_error) {
@@ -206,11 +211,8 @@ static void iothread_init(EventLoopBase *base, Error 
**errp)
 /* This assumes we are called from a thread with useful CPU affinity for us
  * to inherit.
  */
-thread_name = g_strdup_printf("IO %s",
-object_get_canonical_path_component(OBJECT(base)));
 qemu_thread_create(&iothread->thread, thread_name, iothread_run,
iothread, QEMU_THREAD_JOINABLE);
-g_free(thread_name);
 
 /* Wait for initialization to complete */
 while (iothread->thread_id == -1) {
-- 
2.35.3

Re: [PATCH 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space

2023-09-05 Thread Alex Williamson

On Mon,  4 Sep 2023 10:03:43 +0200
Eric Auger  wrote:

> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
> we encounter the case where the guest tries to map IOVAs beyond 48b
> whereas the physical VTD IOMMU only supports 48b. This ends up with
> VFIO_MAP_DMA failures at qemu level because at kernel level,
> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
> 
> This is due to the fact the virtio-iommu currently unconditionally
> exposes an IOVA range of 64b through its config input range fields.
> 
> This series removes this assumption by retrieving the usable IOVA
> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
> a VFIO device is attached. This info is communicated to the
> virtio-iommu memory region, transformed into the inversed info, ie.
> the host reserved IOVA regions. Then those latter are combined with the
> reserved IOVA regions set though the virtio-iommu reserved-regions
> property. That way, the guest virtio-iommu driver, unchanged, is
> able to probe the whole set of reserved regions and prevent any IOVA
> belonging to those ranges from beeing used, achieving the original goal.

Hi Eric,

I don't quite follow this relative to device hotplug.  Are we
manipulating a per-device memory region which is created at device add
time?  Is that memory region actually shared in some cases, for instance
if we have a PCIe-to-PCI bridge aliasing devices on the conventional
side?  Thanks,

Alex

> This series can be found at:
> https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v1
> 
> Eric Auger (13):
>   memory: Let ReservedRegion use Range
>   memory: Introduce memory_region_iommu_set_iova_ranges
>   vfio: Collect container iova range info
>   virtio-iommu: Rename reserved_regions into prop_resv_regions
>   virtio-iommu: Introduce per IOMMUDevice reserved regions
>   range: Introduce range_inverse_array()
>   virtio-iommu: Implement set_iova_ranges() callback
>   range: Make range_compare() public
>   util/reserved-region: Add new ReservedRegion helpers
>   virtio-iommu: Consolidate host reserved regions and property set ones
>   test: Add some tests for range and resv-mem helpers
>   virtio-iommu: Resize memory region according to the max iova info
>   vfio: Remove 64-bit IOVA address space assumption
> 
>  include/exec/memory.h|  30 -
>  include/hw/vfio/vfio-common.h|   2 +
>  include/hw/virtio/virtio-iommu.h |   7 +-
>  include/qemu/range.h |   9 ++
>  include/qemu/reserved-region.h   |  32 +
>  hw/core/qdev-properties-system.c |   9 +-
>  hw/vfio/common.c |  70 ---
>  hw/virtio/virtio-iommu-pci.c |   8 +-
>  hw/virtio/virtio-iommu.c |  85 +++--
>  softmmu/memory.c |  15 +++
>  tests/unit/test-resv-mem.c   | 198 +++
>  util/range.c |  41 ++-
>  util/reserved-region.c   |  94 +++
>  hw/virtio/trace-events   |   1 +
>  tests/unit/meson.build   |   1 +
>  util/meson.build |   1 +
>  16 files changed, 562 insertions(+), 41 deletions(-)
>  create mode 100644 include/qemu/reserved-region.h
>  create mode 100644 tests/unit/test-resv-mem.c
>  create mode 100644 util/reserved-region.c
>

Re: [PATCH] iothread: Set the GSource "name" field

2023-09-05 Thread Stefan Hajnoczi

On Tue, 5 Sept 2023 at 12:51, Philippe Mathieu-Daudé  wrote:
>
> On 5/9/23 17:45, Peter Xu wrote:
> > On Mon, Sep 04, 2023 at 11:48:11AM -0300, Fabiano Rosas wrote:
> >> @@ -189,11 +193,14 @@ static void iothread_init(EventLoopBase *base, Error 
> >> **errp)
> >>   return;
> >>   }
> >>
> >> +thread_name = g_strdup_printf("IO %s",
> >> +
> >> object_get_canonical_path_component(OBJECT(base)));
> >> +
> >>   /*
> >>* Init one GMainContext for the iothread unconditionally, even if
> >>* it's not used
> >>*/
> >> -iothread_init_gcontext(iothread);
> >> +iothread_init_gcontext(iothread, thread_name);
> >>
> >>   iothread_set_aio_context_params(base, &local_error);
> >>   if (local_error) {
> >
> > I think thread_name might be leaked if error here.  Thanks,
>
> Oops, good catch. Better switch to g_autofree.

Yes, please.

And also make the iothread_init_gcontext(..., char *thread_name)
argument const char * to indicate that the name is owned by the caller
and not modified by iothread_init_gcontext().

Thanks,
Stefan

[PATCH v2] iothread: Set the GSource "name" field

2023-09-05 Thread Fabiano Rosas

Having a name in the source helps with debugging core dumps when one
might not have access to TLS data to cross-reference AioContexts with
their addresses.

Signed-off-by: Fabiano Rosas 
---
v2:
used g_autofree where appropriate
---
 iothread.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/iothread.c b/iothread.c
index b41c305bd9..78ac1153ac 100644
--- a/iothread.c
+++ b/iothread.c
@@ -138,12 +138,14 @@ static void iothread_instance_finalize(Object *obj)
 qemu_sem_destroy(&iothread->init_done_sem);
 }
 
-static void iothread_init_gcontext(IOThread *iothread)
+static void iothread_init_gcontext(IOThread *iothread, char *thread_name)
 {
 GSource *source;
+g_autofree char *name = g_strdup_printf("%s aio-context", thread_name);
 
 iothread->worker_context = g_main_context_new();
 source = aio_get_g_source(iothread_get_aio_context(iothread));
+g_source_set_name(source, name);
 g_source_attach(source, iothread->worker_context);
 g_source_unref(source);
 iothread->main_loop = g_main_loop_new(iothread->worker_context, TRUE);
@@ -180,7 +182,7 @@ static void iothread_init(EventLoopBase *base, Error **errp)
 {
 Error *local_error = NULL;
 IOThread *iothread = IOTHREAD(base);
-char *thread_name;
+g_autofree char *thread_name = NULL;
 
 iothread->stopping = false;
 iothread->running = true;
@@ -189,11 +191,14 @@ static void iothread_init(EventLoopBase *base, Error 
**errp)
 return;
 }
 
+thread_name = g_strdup_printf("IO %s",
+object_get_canonical_path_component(OBJECT(base)));
+
 /*
  * Init one GMainContext for the iothread unconditionally, even if
  * it's not used
  */
-iothread_init_gcontext(iothread);
+iothread_init_gcontext(iothread, thread_name);
 
 iothread_set_aio_context_params(base, &local_error);
 if (local_error) {
@@ -206,11 +211,8 @@ static void iothread_init(EventLoopBase *base, Error 
**errp)
 /* This assumes we are called from a thread with useful CPU affinity for us
  * to inherit.
  */
-thread_name = g_strdup_printf("IO %s",
-object_get_canonical_path_component(OBJECT(base)));
 qemu_thread_create(&iothread->thread, thread_name, iothread_run,
iothread, QEMU_THREAD_JOINABLE);
-g_free(thread_name);
 
 /* Wait for initialization to complete */
 while (iothread->thread_id == -1) {
-- 
2.35.3

Re: PCI Hotplug ACPI device names only 3 characters long

2023-09-05 Thread Marcello Sylverster Bauer


Hi Michael,

On 9/5/23 18:44, Michael S. Tsirkin wrote:

On Tue, Sep 05, 2023 at 05:05:33PM +0200, Marcello Sylverster Bauer wrote:

Greetings,

I'm currently working on a project to support Intel IPU6 in QEMU via VFIO so
that the guest system can access the camera. This requires extending the
ACPI device definition so that the guest knows how to access the camera.

However, I cannot extend the PCI devices because their names are not 4
characters long and therefore do not follow the ACPI specification.

When I use '-acpitable' to include my own SSDT for the IPU6 PCI device, it
does not allow me to declare the device as an External Object because it
automatically adds padding underscores.

e.g.
Before:
```
External(_SB.PCI0.S18.SA0, DeviceObj)
```
After:
```
External(_SB.PCI0.S18_.SA0_, DeviceObj)
```

Adding the underscore padding is hard coded in iASL and also in QEMU when
parsing an ASL file. (see: build_append_nameseg())

So here are my questions:
1. Is there a solution to extend the ACPI PCI device using '-acpitable'
without having to patch iASL or QEMU?
2. Are there any plans to change the names to comply with the ACPI spec?
(e.g. use "S%.03X" format string instead)

Thanks
Marcello



1.  All names in ACPI are always exactly 4 characters long. _ is a legal 
character
 but names beginning with _ are reserved.


Exactly, which is why I want to address this issue here. Currently, Qemu 
generates ACPI device names with only 3 characters. (See 
build_append_pci_bus_devices() in hw/i386/acpi-build.c).
For example, the device I want to append entries to has the path 
"_SB.PCI0.S18.SA0", but I can't because of the two auto-generated 
devices with only 3 characters in their names.



There's no rule in ACPI
 spec that says they need to follow S%.03X or any other specific format.
 I'm pretty sure we do follow the ACPI specification in this but feel free 
to
 prove me wrong.


You have misunderstood me. Currently, Qemu uses the following format to 
create PCI ACPI devices:


```
aml_name("S%.02X", devfn)
```

My question is whether we should change it to something that results in 
a 4 character name like "S%.03X" or "S%.02X_".


I have tested it and it works fine as long as any hardcoded path 
references are adjusted. But I'm not 100% sure if this could cause any 
regressions.



2.  You can probably add something to existing ACPI devices using Scope().


I'm pretty sure the external object is required when loading a separate 
SSDT, but I'll try by just using scopes.



 I would not advise relying on this - current names are not a stable
 interface that we guarantee across QEMU versions.
 If adding this functionality is desirable, I think we'll need some new 
interface
 to set a stable ACPI name. Maybe using aliases.


Currently I'm just working on a PoW to get IPU6 working in QEMU, so 
instability is fine.


Thanks,
Marcello

Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth

2023-09-05 Thread Daniel P . Berrangé

On Tue, Sep 05, 2023 at 12:46:03PM -0400, Peter Xu wrote:
> On Fri, Sep 01, 2023 at 09:37:32AM +0100, Daniel P. Berrangé wrote:
> > > > When the user wants to have migration only use 5Gbps out of that 10Gbps,
> > > > one can set max-bandwidth to 5Gbps, along with max-switchover-bandwidth 
> > > > to
> > > > 5Gbps so it'll never use over 5Gbps too (so the user can have the rest
> > > 
> > > Hi Peter. I'm curious if we specify max-switchover-bandwidth to 5Gbps 
> > > over a
> > > 10Gbps network, in the completion stage will it send the remaining data 
> > > in 5Gbps
> > > using downtime_limit time or in 10Gbps (saturate the network) using the
> > > downtime_limit / 2 time? Seems this parameter won't rate limit the final 
> > > stage:)
> > 
> > Effectively the mgmt app is telling QEMU to assume that this
> > much bandwidth is available for use during switchover. If QEMU
> > determines that, given this available bandwidth, the remaining
> > data can be sent over the link within the downtime limit, it
> > will perform the switchover. When sending this sitchover data,
> > it will actually transmit the data at full line rate IIUC.
> 
> I'm right at reposting this patch, but then I found that the
> max-available-bandwidth is indeed confusing (as Lei's question shows).
> 
> We do have all the bandwidth throttling values in the pattern of
> max-*-bandwidth and this one will start to be the outlier that it won't
> really throttle the network.
> 
> If the old name "available-bandwidth" is too general, I'm now considering
> "avail-switchover-bandwidth" just to leave max- out of the name to
> differenciate, if some day we want to add a real throttle for switchover we
> can still have a sane name.
> 
> Any objections before I repost?

I think the 'avail-' prefix is good given the confusion Lei pointed out.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v5 8/8] migration: Add a wrapper to cleanup migration files

2023-09-05 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Sep 01, 2023 at 03:29:51PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Thu, Aug 31, 2023 at 03:39:16PM -0300, Fabiano Rosas wrote:
>> >> @@ -1166,16 +1183,9 @@ static void migrate_fd_cleanup(MigrationState *s)
>> >>  qemu_mutex_lock_iothread();
>> >>  
>> >>  multifd_save_cleanup();
>> >> -qemu_mutex_lock(&s->qemu_file_lock);
>> >> -tmp = s->to_dst_file;
>> >> -s->to_dst_file = NULL;
>> >> -qemu_mutex_unlock(&s->qemu_file_lock);
>> >> -/*
>> >> - * Close the file handle without the lock to make sure the
>> >> - * critical section won't block for long.
>> >> - */
>> >> -migration_ioc_unregister_yank_from_file(tmp);
>> >> -qemu_fclose(tmp);
>> >> +
>> >> +migration_ioc_unregister_yank_from_file(s->to_dst_file);
>> >
>> > I think you suggested that we should always take the file lock when
>> > operating on them, so this is slightly going backwards to not hold any lock
>> > when doing it. But doing so in migrate_fd_cleanup() is probably fine (as it
>> > serializes with bql on all the rest qmp commands, neither should migration
>> > thread exist at this point).  Your call; it's still much cleaner.
>> 
>> I think I was mistaken. We need the lock on the thread that clears the
>> pointer so that we can safely dereference it on another thread under the
>> lock.
>> 
>> Here we're accessing it from the same thread that later does the
>> clearing. So that's a slightly different problem.
>
> But this is not the only place to clear it, so you still need to justify
> why the other call sites (e.g., postcopy_pause() won't happen in parallel
> with this call site.
>
> The good thing about your proposal (of always taking that lock) is we avoid
> those justifications, as you said before. :)
>

Yes, I should probably try harder to keep it under the lock.

The issue is that without using the QIOChannel reference count or
keeping a flag there's no way to pair the register/unregister of the
yank. Because 1) we'll never be sure whether the yank was previously
registered when calling the unregister and 2) we don't store the ioc, so
we need to access it from the QEMUFile, but then several QEMUFiles can
have the same ioc.

The easiest way to keep it under the lock would be to add a flag:

migration_file_release(QEMUFile **file, bool unregister_yank);

... and only set it when we're sure the yank has been registered. It is
still a bit hand-wavy though.

Re: [PATCH] hw/intc/arm_gicv3: Simplify gicv3_class_name() logic

2023-09-05 Thread Richard Henderson


On 9/5/23 07:56, Philippe Mathieu-Daudé wrote:

Simplify gicv3_class_name() logic. No functional change intended.

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/intc/arm_gicv3_common.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v3 07/20] virtio: add vhost-user-base and a generic vhost-user-device

2023-09-05 Thread Alex Bennée



Matias Ezequiel Vara Larsen  writes:

> On Mon, Jul 10, 2023 at 04:35:09PM +0100, Alex Bennée wrote:
>> In theory we shouldn't need to repeat so much boilerplate to support
>> vhost-user backends. This provides a generic vhost-user-base QOM
>> object and a derived vhost-user-device for which the user needs to
>> provide the few bits of information that aren't currently provided by
>> the vhost-user protocol. This should provide a baseline implementation
>> from which the other vhost-user stub can specialise.
>> 
>> Signed-off-by: Alex Bennée 
>> 
>> ---
>> v2
>>   - split into vub and vud

>> +
>> +/*
>> + * Disable guest notifiers, by default all notifications will be via the
>> + * asynchronous vhost-user socket.
>> + */
>> +vdev->use_guest_notifier_mask = false;
>> +
>> +/* Allocate queues */
>> +vub->vqs = g_ptr_array_sized_new(vub->num_vqs);
>> +for (int i = 0; i < vub->num_vqs; i++) {
>> +g_ptr_array_add(vub->vqs,
>> +virtio_add_queue(vdev, 4, vub_handle_output));
>> +}
>> +
>
> Hello Alex, apologies if someone already asked this. If I understand
> correctly, the second parameter of virtio_add_queue() is the len of the
> queue. Why have you chosen "4" as its value? Shall qemu query the len of
> the queue from the vhost-user device instead?

Hmm yeah that is inherited from the virtio-rng backend which has a
pretty short queue. I don't think it is intrinsic to the device
implementation (although I guess that depends if a device will have
multiple requests in flight).

I propose making is some useful ^2 (like 64) and adding a config knob to
increase it if needed.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 1/2] hw/cxl: Add utility functions decoder interleave ways and target count.

2023-09-05 Thread Philippe Mathieu-Daudé


On 5/9/23 17:06, Jonathan Cameron wrote:

On Tue, 5 Sep 2023 15:56:39 +0100
Jonathan Cameron via  wrote:


On Mon, 4 Sep 2023 20:26:59 +0200
Philippe Mathieu-Daudé  wrote:


On 4/9/23 18:47, Jonathan Cameron wrote:

As an encoded version of these key configuration parameters is
a register, provide functions to extract it again so as to avoid
the need for duplicating the storage.

Signed-off-by: Jonathan Cameron 
---
   include/hw/cxl/cxl_component.h | 14 ++
   hw/cxl/cxl-component-utils.c   | 17 +
   2 files changed, 31 insertions(+)

diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index 42c7e581a7..f0ad9cf7de 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -238,7 +238,21 @@ static inline int cxl_decoder_count_enc(int count)
   return 0;
   }
   
+static inline int cxl_decoder_count_dec(int enc_cnt)

+{
+switch (enc_cnt) {
+case 0: return 1;
+case 1: return 2;
+case 2: return 4;
+case 3: return 6;
+case 4: return 8;
+case 5: return 10;
+}
+return 0;
+}


Why inline?
   


Bad habit.

Nope. I'm being slow.  This is in a header so if I don't
mark it inline I get a bunch of defined but not used warnings.

Obviously I could move the implementation of this and the matching
encoding routines out of the header. I haven't done so for now.


Inlined function in hw/ are hardly justifiable. They make the headers
and debugging sessions harder to read in my experience. Compilers are
becoming clever and clever, and we have LTO, so I rather privilege
code maintainability. My 2 cents :)


Alternatively:

unsigned cxl_decoder_count_dec(unsigned enc_cnt)
{
return enc_cnt <= 5 ? 2 * enc_cnt : 0;


It gets a little more fiddly than the code I'm proposing implies.
For Switches and Host Bridges larger values are defined
(we just don't emulate them yet and may never do so) and those
don't have a sensible mapping.

I guess there is no harm in adding the full decode however
which will make it more obvious why it was a switch statement.


Right, no problem.

Preferably having this tiny function not inlined:

Reviewed-by: Philippe Mathieu-Daudé

Re: mips system emulation failure with virtio

2023-09-05 Thread Richard Purdie

On Tue, 2023-09-05 at 18:46 +0200, Philippe Mathieu-Daudé wrote:
> On 5/9/23 17:53, Richard Purdie wrote:
> > On Tue, 2023-09-05 at 17:12 +0200, Philippe Mathieu-Daudé wrote:
> > > Hi Richard,
> > > 
> > > On 5/9/23 16:50, Richard Purdie wrote:
> > > > On Tue, 2023-09-05 at 14:59 +0100, Alex Bennée wrote:
> > > > > Richard Purdie  writes:
> > > > > 
> > > > > > With qemu 8.1.0 we see boot hangs fox x86-64 targets.
> > > > > > 
> > > > > > These are fixed by 0d58c660689f6da1e3feff8a997014003d928b3b 
> > > > > > (softmmu:
> > > > > > Use async_run_on_cpu in tcg_commit) but if I add that commit, mips 
> > > > > > and
> > > > > > mips64 break, hanging at boot unable to find a rootfs.
> > > 
> > > Are you testing mipsel / mips64el?
> > 
> > No, it was mips/mips64, i.e. big endian.
> 
> Sorry my question was not clear. I meant: Do you also
> test mipsel / mips64el guests, and if so, do they work?
> (IOW, is this bug only big-endian guest specific?)

Sorry, I misunderstood. We don't test mipsel/mips64el so I don't know
if that is working or not unfortunately.

Cheers,

Richard

Re: [PATCH] iothread: Set the GSource "name" field

2023-09-05 Thread Philippe Mathieu-Daudé


On 5/9/23 17:45, Peter Xu wrote:

On Mon, Sep 04, 2023 at 11:48:11AM -0300, Fabiano Rosas wrote:

@@ -189,11 +193,14 @@ static void iothread_init(EventLoopBase *base, Error 
**errp)
  return;
  }
  
+thread_name = g_strdup_printf("IO %s",

+object_get_canonical_path_component(OBJECT(base)));
+
  /*
   * Init one GMainContext for the iothread unconditionally, even if
   * it's not used
   */
-iothread_init_gcontext(iothread);
+iothread_init_gcontext(iothread, thread_name);
  
  iothread_set_aio_context_params(base, &local_error);

  if (local_error) {


I think thread_name might be leaked if error here.  Thanks,


Oops, good catch. Better switch to g_autofree.

Re: mips system emulation failure with virtio

2023-09-05 Thread Philippe Mathieu-Daudé


On 5/9/23 17:53, Richard Purdie wrote:

On Tue, 2023-09-05 at 17:12 +0200, Philippe Mathieu-Daudé wrote:

Hi Richard,

On 5/9/23 16:50, Richard Purdie wrote:

On Tue, 2023-09-05 at 14:59 +0100, Alex Bennée wrote:

Richard Purdie  writes:


With qemu 8.1.0 we see boot hangs fox x86-64 targets.

These are fixed by 0d58c660689f6da1e3feff8a997014003d928b3b (softmmu:
Use async_run_on_cpu in tcg_commit) but if I add that commit, mips and
mips64 break, hanging at boot unable to find a rootfs.


Are you testing mipsel / mips64el?


No, it was mips/mips64, i.e. big endian.


Sorry my question was not clear. I meant: Do you also
test mipsel / mips64el guests, and if so, do they work?
(IOW, is this bug only big-endian guest specific?)

Thanks,

Phil.

Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth

2023-09-05 Thread Peter Xu

On Fri, Sep 01, 2023 at 09:37:32AM +0100, Daniel P. Berrangé wrote:
> > > When the user wants to have migration only use 5Gbps out of that 10Gbps,
> > > one can set max-bandwidth to 5Gbps, along with max-switchover-bandwidth to
> > > 5Gbps so it'll never use over 5Gbps too (so the user can have the rest
> > 
> > Hi Peter. I'm curious if we specify max-switchover-bandwidth to 5Gbps over a
> > 10Gbps network, in the completion stage will it send the remaining data in 
> > 5Gbps
> > using downtime_limit time or in 10Gbps (saturate the network) using the
> > downtime_limit / 2 time? Seems this parameter won't rate limit the final 
> > stage:)
> 
> Effectively the mgmt app is telling QEMU to assume that this
> much bandwidth is available for use during switchover. If QEMU
> determines that, given this available bandwidth, the remaining
> data can be sent over the link within the downtime limit, it
> will perform the switchover. When sending this sitchover data,
> it will actually transmit the data at full line rate IIUC.

I'm right at reposting this patch, but then I found that the
max-available-bandwidth is indeed confusing (as Lei's question shows).

We do have all the bandwidth throttling values in the pattern of
max-*-bandwidth and this one will start to be the outlier that it won't
really throttle the network.

If the old name "available-bandwidth" is too general, I'm now considering
"avail-switchover-bandwidth" just to leave max- out of the name to
differenciate, if some day we want to add a real throttle for switchover we
can still have a sane name.

Any objections before I repost?

Thanks,

-- 
Peter Xu

Re: PCI Hotplug ACPI device names only 3 characters long

2023-09-05 Thread Michael S. Tsirkin

On Tue, Sep 05, 2023 at 05:05:33PM +0200, Marcello Sylverster Bauer wrote:
> Greetings,
> 
> I'm currently working on a project to support Intel IPU6 in QEMU via VFIO so
> that the guest system can access the camera. This requires extending the
> ACPI device definition so that the guest knows how to access the camera.
> 
> However, I cannot extend the PCI devices because their names are not 4
> characters long and therefore do not follow the ACPI specification.
> 
> When I use '-acpitable' to include my own SSDT for the IPU6 PCI device, it
> does not allow me to declare the device as an External Object because it
> automatically adds padding underscores.
> 
> e.g.
> Before:
> ```
> External(_SB.PCI0.S18.SA0, DeviceObj)
> ```
> After:
> ```
> External(_SB.PCI0.S18_.SA0_, DeviceObj)
> ```
> 
> Adding the underscore padding is hard coded in iASL and also in QEMU when
> parsing an ASL file. (see: build_append_nameseg())
> 
> So here are my questions:
> 1. Is there a solution to extend the ACPI PCI device using '-acpitable'
> without having to patch iASL or QEMU?
> 2. Are there any plans to change the names to comply with the ACPI spec?
> (e.g. use "S%.03X" format string instead)
> 
> Thanks
> Marcello

1.  All names in ACPI are always exactly 4 characters long. _ is a legal 
character
but names beginning with _ are reserved. There's no rule in ACPI
spec that says they need to follow S%.03X or any other specific format.
I'm pretty sure we do follow the ACPI specification in this but feel free to
prove me wrong.
2.  You can probably add something to existing ACPI devices using Scope().
I would not advise relying on this - current names are not a stable
interface that we guarantee across QEMU versions.
If adding this functionality is desirable, I think we'll need some new 
interface
to set a stable ACPI name. Maybe using aliases.

-- 
MST

Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()

2023-09-05 Thread Kevin Wolf

Am 22.08.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote:
> > bdrv_unref() is called by a lot of places that need to hold the graph
> > lock (it naturally happens in the context of operations that change the
> > graph). However, bdrv_unref() takes the graph writer lock internally, so
> > it can't actually be called while already holding a graph lock without
> > causing a deadlock.
> > 
> > bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> > node before closing it, and draining requires that the graph is
> > unlocked.
> > 
> > The solution is to defer deleting the node until we don't hold the lock
> > any more and draining is possible again.
> > 
> > Note that keeping images open for longer than necessary can create
> > problems, too: You can't open an image again before it is really closed
> > (if image locking didn't prevent it, it would cause corruption).
> > Reopening an image immediately happens at least during bdrv_open() and
> > bdrv_co_create().
> > 
> > In order to solve this problem, make sure to run the deferred unref in
> > bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> > again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
> > 
> > The output of iotest 051 is updated because the additional polling
> > changes the order of HMP output, resulting in a new "(qemu)" prompt in
> > the test output that was previously on a separate line and filtered out.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  include/block/block-global-state.h |  1 +
> >  block.c|  9 +
> >  block/graph-lock.c | 23 ---
> >  tests/qemu-iotests/051.pc.out  |  6 +++---
> >  4 files changed, 29 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/block/block-global-state.h 
> > b/include/block/block-global-state.h
> > index f347199bff..e570799f85 100644
> > --- a/include/block/block-global-state.h
> > +++ b/include/block/block-global-state.h
> > @@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char 
> > *fmt,
> >  void bdrv_ref(BlockDriverState *bs);
> >  void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
> >  void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
> > +void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
> >  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
> >  BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> >   BlockDriverState *child_bs,
> > diff --git a/block.c b/block.c
> > index 6376452768..9c4f24f4b9 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs)
> >  }
> >  }
> >  
> > +void bdrv_schedule_unref(BlockDriverState *bs)
> 
> Please add a doc comment explaining when and why this should be used.

Ok.

> > +{
> > +if (!bs) {
> > +return;
> > +}
> > +aio_bh_schedule_oneshot(qemu_get_aio_context(),
> > +(QEMUBHFunc *) bdrv_unref, bs);
> > +}
> > +
> >  struct BdrvOpBlocker {
> >  Error *reason;
> >  QLIST_ENTRY(BdrvOpBlocker) list;
> > diff --git a/block/graph-lock.c b/block/graph-lock.c
> > index 5e66f01ae8..395d387651 100644
> > --- a/block/graph-lock.c
> > +++ b/block/graph-lock.c
> > @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
> >  void bdrv_graph_wrunlock(void)
> >  {
> >  GLOBAL_STATE_CODE();
> > -QEMU_LOCK_GUARD(&aio_context_list_lock);
> >  assert(qatomic_read(&has_writer));
> >  
> > +WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> > +/*
> > + * No need for memory barriers, this works in pair with
> > + * the slow path of rdlock() and both take the lock.
> > + */
> > +qatomic_store_release(&has_writer, 0);
> > +
> > +/* Wake up all coroutine that are waiting to read the graph */
> 
> s/coroutine/coroutines/

I only changed the indentation, but I guess I can just fix it while I
touch it.

> > +qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > +}
> > +
> >  /*
> > - * No need for memory barriers, this works in pair with
> > - * the slow path of rdlock() and both take the lock.
> > + * Run any BHs that were scheduled during the wrlock section and that
> > + * callers might expect to have finished (e.g. bdrv_unref() calls). Do 
> > this
> 
> Referring directly to bdrv_schedule_unref() would help make it clearer
> what you mean.
> 
> > + * only after restarting coroutines so that nested event loops in BHs 
> > don't
> > + * deadlock if their condition relies on the coroutine making progress.
> >   */
> > -qatomic_store_release(&has_writer, 0);
> > -
> > -/* Wake up all coroutine that are waiting to read the graph */
> > -qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > +aio_bh_poll(qemu_get_aio_context());
> 
> Ke

Re: [PATCH] docs/devel/loads-stores: Fix git grep regexes

2023-09-05 Thread Peter Maydell

On Tue, 5 Sept 2023 at 15:31, Eric Blake  wrote:
>
> On Mon, Sep 04, 2023 at 05:17:03PM +0100, Peter Maydell wrote:
> > The loads-and-stores documentation includes git grep regexes to find
> > occurrences of the various functions.  Some of these regexes have
> > errors, typically failing to escape the '?', '(' and ')' when they
> > should be metacharacters (since these are POSIX basic REs). We also
> > weren't consistent about whether to have a ':' on the end of the
> > line introducing the list of regexes in each section.
> >
> > Fix the errors.
> >
> > The following shell rune will complain about any REs in the
> > file which don't have any matches in the codebase:
> >  for re in $(sed -ne 's/ - ``\(\\<.*\)``/\1/p' 
> > docs/devel/loads-stores.rst); do git grep -q "$re" || echo "no matches for 
> > re $re"; done
> >
> > Signed-off-by: Peter Maydell 
> > ---
> >  docs/devel/loads-stores.rst | 40 ++---
> >  1 file changed, 20 insertions(+), 20 deletions(-)
> >
> > diff --git a/docs/devel/loads-stores.rst b/docs/devel/loads-stores.rst
> > index dab6dfa0acc..ec627aa9c06 100644
> > --- a/docs/devel/loads-stores.rst
> > +++ b/docs/devel/loads-stores.rst
> > @@ -63,12 +63,12 @@ which stores ``val`` to ``ptr`` as an ``{endian}`` 
> > order value
> >  of size ``sz`` bytes.
> >
> >
> > -Regexes for git grep
> > +Regexes for git grep:
> >   - ``\``
>
> This claims that ldul_be_p() is a valid function name

No, it's not claiming that. It's just claiming that this
regex will catch all the function names defined in this
section, not that it will avoid matching on some non-existent
function names.

The documentation section above tells you what is actually
valid, and that says that "sign" is empty for 32 or 64 bit
accesses.

> (which I would
> expect to take a pointer to a 32-bit integer and produce an unsigned
> result suitable for assigning into a 64-bit value).  But it does not
> exist, and the fact that ldl_be_p() returns 'int' means I had to add a
> cast to avoid unintended sign-extension:
>
> https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg05234.html
>
> cast added in relation to v5 patch at
> https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg04923.html
>
> >   - ``\``
> >   - ``\``
> > - - ``\``
> > - - ``\``
> > + - ``\``
> > + - ``\``
>
> So as long as we are touching the docs, is it worth considering the
> larger task of auditing whether it is appropriate to have all of the
> ld*_ functions return unsigned values, and/or implement ldu/lds
> variants that guarantee zero or sign extension for widening 32-bit
> values when assigning to 64-bit destinations?

No, I think it clearly is not. All I want to do here is
fix the busted regular expressions.

If you would like to try to tidy up some of the semantics
of the load/store APIs you're welcome to have a go at
that. The major obstacle is the obvious one that there
are an absolute ton of existing uses for all of these
API families.

thanks
-- PMM

[PATCH v3 0/4] qapi/migration: Dedup migration parameter objects and fix tls-authz crash

2023-09-05 Thread Peter Xu

v3:
- Collected R-bs
- Patch 2: some reindents, use ARRAY_SIZE (Thomas)

v2:
- Collected R-bs
- Patch 3: convert to use StrOrNull rather than str for the tls_fields
  (it contains a lot of changes, I'll skip listing details, but please
   refer to the commit message)

Patch 1 fixes the tls-authz crashing when someone specifies "null"
parameter for tls-authz.

Patch 2 added a test case for all three tls-auth parameters specifying
"null" to make sure nothing will crash ever with 'null' passed into it.

Patch 3-4 are the proposed patches to deduplicate the three migration
parameter objects in qapi/migration.json.  Note that in this version (patch
3) we used 'str' to replace 'StrOrNull' for tls-* parameters to make then
deduplicate-able.

Please review, thanks.

Peter Xu (4):
  migration/qmp: Fix crash on setting tls-authz with null
  tests/migration-test: Add a test for null parameter setups
  migration/qapi: Replace @MigrateSetParameters with
@MigrationParameters
  migration/qapi: Drop @MigrationParameter enum

 qapi/migration.json| 370 +
 include/hw/qdev-properties.h   |   3 +
 migration/options.h|  50 +
 hw/core/qdev-properties.c  |  40 
 migration/migration-hmp-cmds.c |  23 +-
 migration/options.c| 266 ++--
 migration/tls.c|   3 +-
 tests/qtest/migration-test.c   |  22 ++
 8 files changed, 247 insertions(+), 530 deletions(-)

-- 
2.41.0

[PATCH v3 1/4] migration/qmp: Fix crash on setting tls-authz with null

2023-09-05 Thread Peter Xu

QEMU will crash if anyone tries to set tls-authz (which is a type
StrOrNull) with 'null' value.  Fix it in the easy way by converting it to
qstring just like the other two tls parameters.

Cc: qemu-sta...@nongnu.org # v4.0+
Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration 
parameter")
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Peter Xu 
---
 migration/options.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index 1d1e1321b0..6bbfd4853d 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -1408,20 +1408,25 @@ void qmp_migrate_set_parameters(MigrateSetParameters 
*params, Error **errp)
 {
 MigrationParameters tmp;
 
-/* TODO Rewrite "" to null instead */
+/* TODO Rewrite "" to null instead for all three tls_* parameters */
 if (params->tls_creds
 && params->tls_creds->type == QTYPE_QNULL) {
 qobject_unref(params->tls_creds->u.n);
 params->tls_creds->type = QTYPE_QSTRING;
 params->tls_creds->u.s = strdup("");
 }
-/* TODO Rewrite "" to null instead */
 if (params->tls_hostname
 && params->tls_hostname->type == QTYPE_QNULL) {
 qobject_unref(params->tls_hostname->u.n);
 params->tls_hostname->type = QTYPE_QSTRING;
 params->tls_hostname->u.s = strdup("");
 }
+if (params->tls_authz
+&& params->tls_authz->type == QTYPE_QNULL) {
+qobject_unref(params->tls_authz->u.n);
+params->tls_authz->type = QTYPE_QSTRING;
+params->tls_authz->u.s = strdup("");
+}
 
 migrate_params_test_apply(params, &tmp);
 
-- 
2.41.0

[PATCH v3 3/4] migration/qapi: Replace @MigrateSetParameters with @MigrationParameters

2023-09-05 Thread Peter Xu

Quotting from Markus in his replies:

  migrate-set-parameters sets migration parameters, and
  query-migrate-parameters gets them.  Unsurprisingly, the former's
  argument type MigrateSetParameters is quite close to the latter's
  return type MigrationParameters.  The differences are subtle:

  1. Since migrate-set-parameters supports setting selected parameters,
 its arguments must all be optional (so you can omit the ones you
 don't want to change).  query-migrate-parameters results are also
 all optional, but almost all of them are in fact always present.

  2. For parameters @tls_creds, @tls_hostname, @tls_authz,
 migrate-set-parameters interprets special value "" as "reset to
 default".  Works, because "" is semantically invalid.  Not a
 general solution, because a semantically invalid value need not
 exist.  Markus added a general solution in commit 01fa559826
 ("migration: Use JSON null instead of "" to reset parameter to
 default").  This involved changing the type from 'str' to
 'StrOrNull'.

  3. When parameter @block-bitmap-mapping has not been set,
 query-migrate-parameters does not return it (absent optional
 member).  Clean (but undocumented).  When parameters @tls_creds,
 @tls_hostname, @tls_authz have not been set, it returns the
 semantically invalid value "".  Not so clean (and just as
 undocumented).

Here to deduplicate the two objects: keep @MigrationParameters as the name
of object to use in both places, drop @MigrateSetParameters, at the
meantime switch types of @tls* fields from "str" to "StrOrNull" types.

I found that the TLS code wasn't so much relying on tls_* fields being
non-NULL at all.  Actually on the other way round: if we set tls_authz to
an empty string (NOTE: currently, migrate_init() missed initializing
tls_authz; also touched it up in this patch), we can already fail one of
the migration-test (tls/x509/default-host), as qauthz_is_allowed_by_id()
will assume tls_authz set even if tls_auths is an empty string.

It means we're actually relying on tls_* fields being NULL even if it's the
empty string.

Let's just make it a rule to return NULL for empty string on these fields
internally.  For that, when converting a StrOrNull into a char* (where we
introduced a helper here in this patch) we'll also make the empty string to
be NULL, to make it always work.  And it doesn't show any issue either when
applying that logic to both tls_creds and tls_hostname.

With above, we can safely change both migration_tls_client_create() and
migrate_tls() to not check the empty string too finally.. not needed
anymore.

Also, we can drop the hackish conversions in qmp_migrate_set_parameters()
where we want to make sure it's a QSTRING; it's not needed now.

This greatly deduplicates the code not only in qapi/migration.json, but
also in the generic migration code.

Markus helped greatly with this patch.  Besides a better commit
message (where I just "stole" from the reply), debugged and resolved a
double free, but also provided the StrOrNull property implementation to be
used in MigrationState object when switching tls_* fields to StrOrNull.

Co-developed-by: Markus Armbruster 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Peter Xu 
---
 qapi/migration.json| 191 +---
 include/hw/qdev-properties.h   |   3 +
 migration/options.h|   3 +
 hw/core/qdev-properties.c  |  40 ++
 migration/migration-hmp-cmds.c |  20 +--
 migration/options.c| 220 ++---
 migration/tls.c|   3 +-
 7 files changed, 125 insertions(+), 355 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59..45d69787ae 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -851,189 +851,6 @@
{ 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
'vcpu-dirty-limit'] }
 
-##
-# @MigrateSetParameters:
-#
-# @announce-initial: Initial delay (in milliseconds) before sending
-# the first announce (Since 4.0)
-#
-# @announce-max: Maximum delay (in milliseconds) between packets in
-# the announcement (Since 4.0)
-#
-# @announce-rounds: Number of self-announce packets sent after
-# migration (Since 4.0)
-#
-# @announce-step: Increase in delay (in milliseconds) between
-# subsequent packets in the announcement (Since 4.0)
-#
-# @compress-level: compression level
-#
-# @compress-threads: compression thread count
-#
-# @compress-wait-thread: Controls behavior when all compression
-# threads are currently busy.  If true (default), wait for a free
-# compression thread to become available; otherwise, send the page
-# uncompressed.  (Since 3.1)
-#
-# @decompress-threads: decompression thread count
-#
-# @throttle-trigger-threshold: The ratio of bytes_dirty_period and
-# bytes_xfer_period to trigger throttling.  It is expressed as
-# percentage.  The default value is 50. (Sin

[PATCH v3 4/4] migration/qapi: Drop @MigrationParameter enum

2023-09-05 Thread Peter Xu

Drop the enum in qapi because it is never used in QMP APIs.  Instead making
it an internal definition for QEMU so that we can decouple it from QAPI,
and also we can deduplicate the QAPI documentations.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Peter Xu 
---
 qapi/migration.json| 179 -
 migration/options.h|  47 +
 migration/migration-hmp-cmds.c |   3 +-
 migration/options.c|  51 ++
 4 files changed, 100 insertions(+), 180 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 45d69787ae..eeb1878c4f 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -672,185 +672,6 @@
   'bitmaps': [ 'BitmapMigrationBitmapAlias' ]
   } }
 
-##
-# @MigrationParameter:
-#
-# Migration parameters enumeration
-#
-# @announce-initial: Initial delay (in milliseconds) before sending
-# the first announce (Since 4.0)
-#
-# @announce-max: Maximum delay (in milliseconds) between packets in
-# the announcement (Since 4.0)
-#
-# @announce-rounds: Number of self-announce packets sent after
-# migration (Since 4.0)
-#
-# @announce-step: Increase in delay (in milliseconds) between
-# subsequent packets in the announcement (Since 4.0)
-#
-# @compress-level: Set the compression level to be used in live
-# migration, the compression level is an integer between 0 and 9,
-# where 0 means no compression, 1 means the best compression
-# speed, and 9 means best compression ratio which will consume
-# more CPU.
-#
-# @compress-threads: Set compression thread count to be used in live
-# migration, the compression thread count is an integer between 1
-# and 255.
-#
-# @compress-wait-thread: Controls behavior when all compression
-# threads are currently busy.  If true (default), wait for a free
-# compression thread to become available; otherwise, send the page
-# uncompressed.  (Since 3.1)
-#
-# @decompress-threads: Set decompression thread count to be used in
-# live migration, the decompression thread count is an integer
-# between 1 and 255. Usually, decompression is at least 4 times as
-# fast as compression, so set the decompress-threads to the number
-# about 1/4 of compress-threads is adequate.
-#
-# @throttle-trigger-threshold: The ratio of bytes_dirty_period and
-# bytes_xfer_period to trigger throttling.  It is expressed as
-# percentage.  The default value is 50. (Since 5.0)
-#
-# @cpu-throttle-initial: Initial percentage of time guest cpus are
-# throttled when migration auto-converge is activated.  The
-# default value is 20. (Since 2.7)
-#
-# @cpu-throttle-increment: throttle percentage increase each time
-# auto-converge detects that migration is not making progress.
-# The default value is 10. (Since 2.7)
-#
-# @cpu-throttle-tailslow: Make CPU throttling slower at tail stage At
-# the tail stage of throttling, the Guest is very sensitive to CPU
-# percentage while the @cpu-throttle -increment is excessive
-# usually at tail stage.  If this parameter is true, we will
-# compute the ideal CPU percentage used by the Guest, which may
-# exactly make the dirty rate match the dirty rate threshold.
-# Then we will choose a smaller throttle increment between the one
-# specified by @cpu-throttle-increment and the one generated by
-# ideal CPU percentage.  Therefore, it is compatible to
-# traditional throttling, meanwhile the throttle increment won't
-# be excessive at tail stage.  The default value is false.  (Since
-# 5.1)
-#
-# @tls-creds: ID of the 'tls-creds' object that provides credentials
-# for establishing a TLS connection over the migration data
-# channel.  On the outgoing side of the migration, the credentials
-# must be for a 'client' endpoint, while for the incoming side the
-# credentials must be for a 'server' endpoint.  Setting this will
-# enable TLS for all migrations.  The default is unset, resulting
-# in unsecured migration at the QEMU level.  (Since 2.7)
-#
-# @tls-hostname: hostname of the target host for the migration.  This
-# is required when using x509 based TLS credentials and the
-# migration URI does not already include a hostname.  For example
-# if using fd: or exec: based migration, the hostname must be
-# provided so that the server's x509 certificate identity can be
-# validated.  (Since 2.7)
-#
-# @tls-authz: ID of the 'authz' object subclass that provides access
-# control checking of the TLS x509 certificate distinguished name.
-# This object is only resolved at time of use, so can be deleted
-# and recreated on the fly while the migration server is active.
-# If missing, it will default to denying access (Since 4.0)
-#
-# @max-bandwidth: to set maximum speed for migration.  maximum speed
-# in bytes per second.  (Since 2.8)
-#
-# @downtime-limit: set maximum tolerated downtime for

[PATCH v3 2/4] tests/migration-test: Add a test for null parameter setups

2023-09-05 Thread Peter Xu

Add a test for StrOrNull parameters (tls-*).

Reviewed-by: Fabiano Rosas 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 62d3f37021..ff86838ec3 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1471,6 +1471,27 @@ static void test_postcopy_preempt_all(void)
 
 #endif
 
+/*
+ * We have a few parameters that allows null as input, test them to make
+ * sure they won't crash (where some used to).
+ */
+static void test_null_parameters(void)
+{
+static const char *allow_null_params[] = {
+"tls-authz", "tls-hostname", "tls-creds"
+};
+QTestState *vm = qtest_init("");
+int i;
+
+for (i = 0; i < ARRAY_SIZE(allow_null_params); i++) {
+qtest_qmp_assert_success(vm, "{ 'execute': 'migrate-set-parameters',"
+ "'arguments': { %s: null } }",
+ allow_null_params[i]);
+}
+
+qtest_quit(vm);
+}
+
 static void test_baddest(void)
 {
 MigrateStart args = {
@@ -2827,6 +2848,7 @@ int main(int argc, char **argv)
 }
 }
 
+qtest_add_func("/migration/null_parameters", test_null_parameters);
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
 qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
-- 
2.41.0

Re: [PATCH] hw/pci-bridge/cxl-upstream: Add serial number extended capability support

2023-09-05 Thread Jonathan Cameron via

On Tue, 5 Sep 2023 10:48:54 +0200
Philippe Mathieu-Daudé  wrote:

> Hi Jonathan,
> 
> On 4/9/23 19:57, Jonathan Cameron wrote:
> > Will be needed so there is a defined serial number for
> > information queries via the Switch CCI.
> > 
> > Signed-off-by: Jonathan Cameron 
> > ---
> > No ordering dependencies wrt to other CXL patch sets.
> > 
> > Whilst we 'need' it for the Switch CCI set it is valid without
> > it and aligns with existing EP serial number support. Seems sensible
> > to upstream this first and reduce my out of tree backlog a little!
> > 
> >   hw/pci-bridge/cxl_upstream.c | 15 +--
> >   1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
> > index 2b9cf0cc97..15c4d84a56 100644
> > --- a/hw/pci-bridge/cxl_upstream.c
> > +++ b/hw/pci-bridge/cxl_upstream.c
> > @@ -14,6 +14,11 @@
> >   #include "hw/pci/msi.h"
> >   #include "hw/pci/pcie.h"
> >   #include "hw/pci/pcie_port.h"
> > +/*
> > + * Null value of all Fs suggested by IEEE RA guidelines for use of
> > + * EU, OUI and CID
> > + */
> > +#define UI64_NULL (~0ULL)  
> 
> Already defined in hw/mem/cxl_type3.c, can we move it to some common
> CXL header? Or include/qemu/units.h?
> 
> >   #define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 2
> >   
> > @@ -30,6 +35,7 @@ typedef struct CXLUpstreamPort {
> >   /*< public >*/
> >   CXLComponentState cxl_cstate;
> >   DOECap doe_cdat;
> > +uint64_t sn;
> >   } CXLUpstreamPort;
> >   
> >   CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp)
> > @@ -326,8 +332,12 @@ static void cxl_usp_realize(PCIDevice *d, Error **errp)
> >   if (rc) {
> >   goto err_cap;
> >   }
> > -
> > -cxl_cstate->dvsec_offset = CXL_UPSTREAM_PORT_DVSEC_OFFSET;
> > +if (usp->sn != UI64_NULL) {
> > +pcie_dev_ser_num_init(d, CXL_UPSTREAM_PORT_DVSEC_OFFSET, usp->sn);
> > +cxl_cstate->dvsec_offset = CXL_UPSTREAM_PORT_DVSEC_OFFSET + 0x0c;  
> 
> Could it be clearer to have:
> 
> diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
> @@ -23,2 +23,2 @@
> -#define CXL_UPSTREAM_PORT_DVSEC_OFFSET \
> -(CXL_UPSTREAM_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
> +#define CXL_UPSTREAM_PORT_DVSEC_OFFSET(offset) \
> +(CXL_UPSTREAM_PORT_AER_OFFSET + PCI_ERR_SIZEOF + offset)
> 
> ?

The naming is going to be very confusing if we do as it becomes
an offset of an offset.

Given we've never yet cared that much about keeping these devices
looking stable to a guest, I can just leave a gap if this cap
not defined and use fixed offsets instead thus avoiding this
complexity.

> 
> > +} else {
> > +cxl_cstate->dvsec_offset = CXL_UPSTREAM_PORT_DVSEC_OFFSET;
> > +}
> >   cxl_cstate->pdev = d;
> >   build_dvsecs(cxl_cstate);
> >   cxl_component_register_block_init(OBJECT(d), cxl_cstate, 
> > TYPE_CXL_USP);
> > @@ -366,6 +376,7 @@ static void cxl_usp_exitfn(PCIDevice *d)
> >   }
> >   
> >   static Property cxl_upstream_props[] = {
> > +DEFINE_PROP_UINT64("sn", CXLUpstreamPort, sn, UI64_NULL),
> >   DEFINE_PROP_STRING("cdat", CXLUpstreamPort, cxl_cstate.cdat.filename),
> >   DEFINE_PROP_END_OF_LIST()
> >   };  
> 
>

Re: [PATCH] hw/pci-bridge/cxl-upstream: Add serial number extended capability support

2023-09-05 Thread Jonathan Cameron via

On Tue, 5 Sep 2023 05:02:47 -0400
"Michael S. Tsirkin"  wrote:

> On Tue, Sep 05, 2023 at 10:48:54AM +0200, Philippe Mathieu-Daudé wrote:
> > Hi Jonathan,
> > 
> > On 4/9/23 19:57, Jonathan Cameron wrote:  
> > > Will be needed so there is a defined serial number for
> > > information queries via the Switch CCI.
> > > 
> > > Signed-off-by: Jonathan Cameron 
> > > ---
> > > No ordering dependencies wrt to other CXL patch sets.
> > > 
> > > Whilst we 'need' it for the Switch CCI set it is valid without
> > > it and aligns with existing EP serial number support. Seems sensible
> > > to upstream this first and reduce my out of tree backlog a little!
> > > 
> > >   hw/pci-bridge/cxl_upstream.c | 15 +--
> > >   1 file changed, 13 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
> > > index 2b9cf0cc97..15c4d84a56 100644
> > > --- a/hw/pci-bridge/cxl_upstream.c
> > > +++ b/hw/pci-bridge/cxl_upstream.c
> > > @@ -14,6 +14,11 @@
> > >   #include "hw/pci/msi.h"
> > >   #include "hw/pci/pcie.h"
> > >   #include "hw/pci/pcie_port.h"
> > > +/*
> > > + * Null value of all Fs suggested by IEEE RA guidelines for use of
> > > + * EU, OUI and CID
> > > + */
> > > +#define UI64_NULL (~0ULL)  
> > 
> > Already defined in hw/mem/cxl_type3.c, can we move it to some common
> > CXL header? Or include/qemu/units.h?  
> 
> not the last one I think - this is a cxl specific hack to detect that
> user has changed the property.

The chosen default is also the one that the relevant specifications says
means 'NULL' for a EUI64 code so is at least a valid hack...
https://standards.ieee.org/wp-content/uploads/import/documents/tutorials/eui.pdf
"Unassigned and NULL EUI values"
specifically recommend NULL values in that section.

However, it's obscure enough that we probably don't want it in a generic
header.

> 
> 
> I think we really should have a variant of DEFINE_PROP_XXX that sets a
> flag allowing us to detect whether a property has been set manually.
> This would be a generalization of DEFINE_PROP_ON_OFF_AUTO.

Agreed that would be generally useful but here there is a reasonable
default value so I don't think we need this.

> 
> 
> > >   #define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 2
> > > @@ -30,6 +35,7 @@ typedef struct CXLUpstreamPort {
> > >   /*< public >*/
> > >   CXLComponentState cxl_cstate;
> > >   DOECap doe_cdat;
> > > +uint64_t sn;
> > >   } CXLUpstreamPort;
> > >   CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp)
> > > @@ -326,8 +332,12 @@ static void cxl_usp_realize(PCIDevice *d, Error 
> > > **errp)
> > >   if (rc) {
> > >   goto err_cap;
> > >   }
> > > -
> > > -cxl_cstate->dvsec_offset = CXL_UPSTREAM_PORT_DVSEC_OFFSET;
> > > +if (usp->sn != UI64_NULL) {
> > > +pcie_dev_ser_num_init(d, CXL_UPSTREAM_PORT_DVSEC_OFFSET, 
> > > usp->sn);
> > > +cxl_cstate->dvsec_offset = CXL_UPSTREAM_PORT_DVSEC_OFFSET + 
> > > 0x0c;  
> > 
> > Could it be clearer to have:
> > 
> > diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
> > @@ -23,2 +23,2 @@
> > -#define CXL_UPSTREAM_PORT_DVSEC_OFFSET \
> > -(CXL_UPSTREAM_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
> > +#define CXL_UPSTREAM_PORT_DVSEC_OFFSET(offset) \
> > +(CXL_UPSTREAM_PORT_AER_OFFSET + PCI_ERR_SIZEOF + offset)
> > 
> > ?
> >   
> > > +} else {
> > > +cxl_cstate->dvsec_offset = CXL_UPSTREAM_PORT_DVSEC_OFFSET;
> > > +}
> > >   cxl_cstate->pdev = d;
> > >   build_dvsecs(cxl_cstate);
> > >   cxl_component_register_block_init(OBJECT(d), cxl_cstate, 
> > > TYPE_CXL_USP);
> > > @@ -366,6 +376,7 @@ static void cxl_usp_exitfn(PCIDevice *d)
> > >   }
> > >   static Property cxl_upstream_props[] = {
> > > +DEFINE_PROP_UINT64("sn", CXLUpstreamPort, sn, UI64_NULL),
> > >   DEFINE_PROP_STRING("cdat", CXLUpstreamPort, 
> > > cxl_cstate.cdat.filename),
> > >   DEFINE_PROP_END_OF_LIST()
> > >   };  
>

Re: [PATCH v2 3/5] migration: Add .save_prepare() handler to struct SaveVMHandlers

2023-09-05 Thread Cédric Le Goater


On 9/1/23 17:49, Peter Xu wrote:

On Thu, Aug 31, 2023 at 03:57:00PM +0300, Avihai Horon wrote:

Add a new .save_prepare() handler to struct SaveVMHandlers. This handler
is called early, even before migration starts, and can be used by
devices to perform early checks.

Suggested-by: Peter Xu 
Signed-off-by: Avihai Horon 


Shouldn't be hard to unify the two call sites for qmp migrate and save
snapshot, but we can leave that for later:


yes. It could be called from migrate_init() with minor changes.

We could probably move :

memset(&mig_stats, 0, sizeof(mig_stats));
memset(&compression_counters, 0, sizeof(compression_counters));
migration_reset_vfio_bytes_transferred();

under migrate_init() also. Anyhow,

Reviewed-by: Cédric Le Goater 

Thanks,

C.

Re: mips system emulation failure with virtio

2023-09-05 Thread Richard Purdie

On Tue, 2023-09-05 at 17:12 +0200, Philippe Mathieu-Daudé wrote:
> Hi Richard,
> 
> On 5/9/23 16:50, Richard Purdie wrote:
> > On Tue, 2023-09-05 at 14:59 +0100, Alex Bennée wrote:
> > > Richard Purdie  writes:
> > > 
> > > > With qemu 8.1.0 we see boot hangs fox x86-64 targets.
> > > > 
> > > > These are fixed by 0d58c660689f6da1e3feff8a997014003d928b3b (softmmu:
> > > > Use async_run_on_cpu in tcg_commit) but if I add that commit, mips and
> > > > mips64 break, hanging at boot unable to find a rootfs.
> 
> Are you testing mipsel / mips64el?

No, it was mips/mips64, i.e. big endian.

Cheers,

Richard

Re: [RFC PATCH v2 22/22] softmmu/physmem: Clean up local variable shadowing

2023-09-05 Thread Peter Xu

On Mon, Sep 04, 2023 at 05:31:30PM +0100, Daniel P. Berrangé wrote:
> On Mon, Sep 04, 2023 at 06:12:34PM +0200, Philippe Mathieu-Daudé wrote:
> > Fix:
> > 
> >   softmmu/physmem.c: In function 
> > ‘cpu_physical_memory_snapshot_and_clear_dirty’:
> >   softmmu/physmem.c:916:27: warning: declaration of ‘offset’ shadows a 
> > parameter [-Wshadow=compatible-local]
> > 916 | unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
> > |   ^~
> >   softmmu/physmem.c:892:31: note: shadowed declaration is here
> > 892 | (MemoryRegion *mr, hwaddr offset, hwaddr length, unsigned 
> > client)
> > |~~~^~
> > 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> > RFC: Please double-check how 'offset' is used few lines later.
> 
> I don't see an issue - those lines are in an outer scope, so won't
> be accessing the 'offset' you've changed, they'll be the parameter
> instead. If you want to sanity check though, presumably the asm
> dissassembly for this method should be the same before/after this
> change

(and if it didn't do so then it's a bug..)

> 
> > ---
> >  softmmu/physmem.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> Reviewed-by: Daniel P. Berrangé 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v2 21/22] softmmu/memory: Clean up local variable shadowing

2023-09-05 Thread Peter Xu

On Mon, Sep 04, 2023 at 06:12:33PM +0200, Philippe Mathieu-Daudé wrote:
> Fix:
> 
>   softmmu/memory.c: In function ‘mtree_print_mr’:
>   softmmu/memory.c:3236:27: warning: declaration of ‘ml’ shadows a previous 
> local [-Wshadow=compatible-local]
>3236 | MemoryRegionList *ml;
> |   ^~
>   softmmu/memory.c:3213:32: note: shadowed declaration is here
>3213 | MemoryRegionList *new_ml, *ml, *next_ml;
> |^~
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH] iothread: Set the GSource "name" field

2023-09-05 Thread Peter Xu

On Mon, Sep 04, 2023 at 11:48:11AM -0300, Fabiano Rosas wrote:
> @@ -189,11 +193,14 @@ static void iothread_init(EventLoopBase *base, Error 
> **errp)
>  return;
>  }
>  
> +thread_name = g_strdup_printf("IO %s",
> +object_get_canonical_path_component(OBJECT(base)));
> +
>  /*
>   * Init one GMainContext for the iothread unconditionally, even if
>   * it's not used
>   */
> -iothread_init_gcontext(iothread);
> +iothread_init_gcontext(iothread, thread_name);
>  
>  iothread_set_aio_context_params(base, &local_error);
>  if (local_error) {

I think thread_name might be leaked if error here.  Thanks,

-- 
Peter Xu

Re: [PULL 0/7] s390x and qtest patches

2023-09-05 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH 2/2] hw/cxl: Support 4 HDM decoders at all levels of topology

2023-09-05 Thread Jonathan Cameron via

On Mon, 4 Sep 2023 20:36:02 +0200
Philippe Mathieu-Daudé  wrote:

> Hi Jonathan,
> 
> Few style comments inlined.
> 
> On 4/9/23 18:47, Jonathan Cameron wrote:
> > Support these decoders in CXL host bridges (pxb-cxl), CXL Switch USP
> > and CXL Type 3 end points.
> > 
> > Signed-off-by: Jonathan Cameron 
> > ---
Hi Philippe,

Thanks for the particularly quick reviews! 

...

> > diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> > index e96398e8af..79b9369756 100644
> > --- a/hw/cxl/cxl-component-utils.c
> > +++ b/hw/cxl/cxl-component-utils.c
> > @@ -42,6 +42,9 @@ static void dumb_hdm_handler(CXLComponentState 
> > *cxl_cstate, hwaddr offset,
> >   
> >   switch (offset) {
> >   case A_CXL_HDM_DECODER0_CTRL:
> > +case A_CXL_HDM_DECODER1_CTRL:
> > +case A_CXL_HDM_DECODER2_CTRL:
> > +case A_CXL_HDM_DECODER3_CTRL:
> >   should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
> >   should_uncommit = !should_commit;
> >   break;
> > @@ -81,7 +84,7 @@ static void cxl_cache_mem_write_reg(void *opaque, hwaddr 
> > offset, uint64_t value,
> >   }
> >   
> >   if (offset >= A_CXL_HDM_DECODER_CAPABILITY &&
> > -offset <= A_CXL_HDM_DECODER0_TARGET_LIST_HI) {
> > +offset <= A_CXL_HDM_DECODER3_TARGET_LIST_HI) {
> >   dumb_hdm_handler(cxl_cstate, offset, value);
> >   } else {
> >   cregs->cache_mem_registers[offset / 
> > sizeof(*cregs->cache_mem_registers)] = value;
> > @@ -161,7 +164,7 @@ static void ras_init_common(uint32_t *reg_state, 
> > uint32_t *write_msk)
> >   static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk,
> >   enum reg_type type)
> >   {
> > -int decoder_count = 1;
> > +int decoder_count = 4;  
> 
>unsigned decoder_count = HDM_DECODER_COUNT;
> 
> >   int i;
> >   
> >   ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, DECODER_COUNT,
> > @@ -174,19 +177,22 @@ static void hdm_init_common(uint32_t *reg_state, 
> > uint32_t *write_msk,
> >HDM_DECODER_ENABLE, 0);
> >   write_msk[R_CXL_HDM_DECODER_GLOBAL_CONTROL] = 0x3;
> >   for (i = 0; i < decoder_count; i++) {  
> 
> Alternatively:
> 
>  for (i = 0; i < decoder_count; i++, write_msk += 8) {
>  write_msk[R_CXL_HDM_DECODER0_BASE_LO] = 0xf000;

That's a bit nasty and fragile given we are offsetting the base register than
indexing into it (so applying a later offset).

> 
> > -write_msk[R_CXL_HDM_DECODER0_BASE_LO + i * 0x20] = 0xf000;
> > -write_msk[R_CXL_HDM_DECODER0_BASE_HI + i * 0x20] = 0x;
> > -write_msk[R_CXL_HDM_DECODER0_SIZE_LO + i * 0x20] = 0xf000;
> > -write_msk[R_CXL_HDM_DECODER0_SIZE_HI + i * 0x20] = 0x;
> > -write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x13ff;
> > +write_msk[R_CXL_HDM_DECODER0_BASE_LO + i * 0x20 / 4] = 0xf000; 
> >  
> 
> (this 0x20 / 4 bugs me a bit).

Instead, I've gone with a local variable which leaves me room for deriving
this based on the step between the registers for decoders 0 and 1.

hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;

I haven't added a define for this because it would probably have to be
long enough that it will cause line length problems :(
So it is replicated in a few different places which isn't ideal
but definitely better than the 0x20 / 4

> 
> > +write_msk[R_CXL_HDM_DECODER0_BASE_HI + i * 0x20 / 4]  = 0x;
> > +write_msk[R_CXL_HDM_DECODER0_SIZE_LO + i * 0x20 / 4] = 0xf000;
> > +write_msk[R_CXL_HDM_DECODER0_SIZE_HI + i * 0x20 / 4] = 0x;
> > +write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20 / 4] = 0x13ff;
> >   if (type == CXL2_DEVICE ||
> >   type == CXL2_TYPE3_DEVICE ||
> >   type == CXL2_LOGICAL_DEVICE) {
> > -write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * 0x20] = 
> > 0xf000;
> > +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * 0x20 / 4] =
> > +0xf000;
> >   } else {
> > -write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * 0x20] = 
> > 0x;
> > +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * 0x20 / 4] =
> > +0x;
> >   }
> > -write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_HI + i * 0x20] = 
> > 0x;
> > +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_HI + i * 0x20 / 4] =
> > +0x;
> >   }
> >   }  
>

Re: [PULL v2 00/40] Misc patches for 2023-08-31

2023-09-05 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v5 8/8] migration: Add a wrapper to cleanup migration files

2023-09-05 Thread Peter Xu

On Fri, Sep 01, 2023 at 03:29:51PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Thu, Aug 31, 2023 at 03:39:16PM -0300, Fabiano Rosas wrote:
> >> @@ -1166,16 +1183,9 @@ static void migrate_fd_cleanup(MigrationState *s)
> >>  qemu_mutex_lock_iothread();
> >>  
> >>  multifd_save_cleanup();
> >> -qemu_mutex_lock(&s->qemu_file_lock);
> >> -tmp = s->to_dst_file;
> >> -s->to_dst_file = NULL;
> >> -qemu_mutex_unlock(&s->qemu_file_lock);
> >> -/*
> >> - * Close the file handle without the lock to make sure the
> >> - * critical section won't block for long.
> >> - */
> >> -migration_ioc_unregister_yank_from_file(tmp);
> >> -qemu_fclose(tmp);
> >> +
> >> +migration_ioc_unregister_yank_from_file(s->to_dst_file);
> >
> > I think you suggested that we should always take the file lock when
> > operating on them, so this is slightly going backwards to not hold any lock
> > when doing it. But doing so in migrate_fd_cleanup() is probably fine (as it
> > serializes with bql on all the rest qmp commands, neither should migration
> > thread exist at this point).  Your call; it's still much cleaner.
> 
> I think I was mistaken. We need the lock on the thread that clears the
> pointer so that we can safely dereference it on another thread under the
> lock.
> 
> Here we're accessing it from the same thread that later does the
> clearing. So that's a slightly different problem.

But this is not the only place to clear it, so you still need to justify
why the other call sites (e.g., postcopy_pause() won't happen in parallel
with this call site.

The good thing about your proposal (of always taking that lock) is we avoid
those justifications, as you said before. :)

Thanks,

-- 
Peter Xu

Re: [PATCH for-8.2 v2 2/2] migration: Allow user to specify migration switchover bandwidth

2023-09-05 Thread Peter Xu

On Fri, Sep 01, 2023 at 07:39:07PM +0100, Joao Martins wrote:
> 
> 
> On 01/09/2023 18:59, Joao Martins wrote:
> > On 03/08/2023 16:53, Peter Xu wrote:
> >> @@ -2694,7 +2694,17 @@ static void 
> >> migration_update_counters(MigrationState *s,
> >>  transferred = current_bytes - s->iteration_initial_bytes;
> >>  time_spent = current_time - s->iteration_start_time;
> >>  bandwidth = (double)transferred / time_spent;
> >> -s->threshold_size = bandwidth * migrate_downtime_limit();
> >> +if (migrate_max_switchover_bandwidth()) {
> >> +/*
> >> + * If the user specified an available bandwidth, let's trust the
> >> + * user so that can be more accurate than what we estimated.
> >> + */
> >> +avail_bw = migrate_max_switchover_bandwidth();
> >> +} else {
> >> +/* If the user doesn't specify bandwidth, we use the estimated */
> >> +avail_bw = bandwidth;
> >> +}
> >> +s->threshold_size = avail_bw * migrate_downtime_limit();
> >>  
> > 
> > [ sorry for giving review comments in piecemeal :/ ]

This is never a problem.

> > 
> > There might be something odd with the calculation. It would be right if
> > downtime_limit was in seconds. But we are multipling a value that is in
> > bytes/sec with a time unit that is in miliseconds. When avail_bw is set to
> > switchover_bandwidth, it sounds to me this should be a:
> > 
> > /* bytes/msec; @max-switchover-bandwidth is per-seconds */
> > avail_bw = migrate_max_switchover_bandwidth() / 1000.0;
> > 
> > Otherwise it looks like that we end up overestimating how much we can still 
> > send
> > during switchover? If this is correct and I am not missing some assumption, 
> 
> (...)
> 
> > then
> > same is applicable to the threshold_size calculation in general without
> > switchover-bandwidth but likely in a different way:
> > 
> > /* bytes/msec; but @bandwidth is calculated in 100msec quantas */
> > avail_bw = bandwidth / 100.0;
> > 
> 
> Nevermind this part. I was wrong in the @bandwidth adjustment as it is already
> calculated in bytes/ms. It's max_switchover_bandwidth that needs an adjustment
> it seems.
> 
> > There's a very good chance I'm missing details, so apologies beforehand on
> > wasting your time if I didn't pick up on it through the code.

My fault, thanks for catching this.  So it seems even if the test will
switchover with this patch, it might be too aggresive if we calculate with
a number 1000x larger than the real bandwidth provided..

I'll rename this to expected_bw_per_ms to be clear when repost, too.

Thanks,

-- 
Peter Xu

Re: [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate()

2023-09-05 Thread Kevin Wolf

Am 22.08.2023 um 21:35 hat Stefan Hajnoczi geschrieben:
> On Thu, Aug 17, 2023 at 02:50:16PM +0200, Kevin Wolf wrote:
> > The function reads the parents list, so it needs to hold the graph lock.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  block.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/block.c b/block.c
> > index 7df8780d6e..a82389f742 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -5934,9 +5934,11 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
> > BlockDriverState *base,
> >  backing_file_str = base->filename;
> >  }
> >  
> > +bdrv_graph_rdlock_main_loop();
> >  QLIST_FOREACH(c, &top->parents, next_parent) {
> >  updated_children = g_slist_prepend(updated_children, c);
> >  }
> > +bdrv_graph_rdunlock_main_loop();
> 
> This is GLOBAL_STATE_CODE, so why take the read lock? I thought nothing
> can modify the graph here. If it could, then stashing the parents in
> the updated_children probably wouldn't be safe anyway.

The only thing bdrv_graph_rdlock_main_loop() does is asserting that the
conditions for doing nothing are met (GLOBAL_STATE_CODE + non-coroutine
context) and providing the right TSA attributes to make the compiler
happy.

Kevin


signature.asc
Description: PGP signature

Re: [PATCH v22 03/20] target/s390x/cpu topology: handle STSI(15) and build the SYSIB

2023-09-05 Thread Nina Schoetterl-Glausch

On Tue, 2023-09-05 at 15:26 +0200, Thomas Huth wrote:
> On 01/09/2023 17.57, Nina Schoetterl-Glausch wrote:
> > From: Pierre Morel 
> > 
> > On interception of STSI(15.1.x) the System Information Block
> > (SYSIB) is built from the list of pre-ordered topology entries.
> > 
> > Signed-off-by: Pierre Morel 
> > Co-developed-by: Nina Schoetterl-Glausch 
> > Signed-off-by: Nina Schoetterl-Glausch 
> > ---
> >   MAINTAINERS  |   1 +
> >   qapi/machine-target.json |  14 ++
> >   include/hw/s390x/cpu-topology.h  |  25 +++
> >   include/hw/s390x/sclp.h  |   1 +
> >   target/s390x/cpu.h   |  76 
> >   hw/s390x/cpu-topology.c  |   2 +
> >   target/s390x/kvm/kvm.c   |   5 +-
> >   target/s390x/kvm/stsi-topology.c | 296 +++
> >   target/s390x/kvm/meson.build |   3 +-
> >   9 files changed, 421 insertions(+), 2 deletions(-)
> >   create mode 100644 target/s390x/kvm/stsi-topology.c

[...]

> > diff --git a/include/hw/s390x/cpu-topology.h 
> > b/include/hw/s390x/cpu-topology.h
> > index 97b0af2795..fc15acf297 100644
> > --- a/include/hw/s390x/cpu-topology.h
> > +++ b/include/hw/s390x/cpu-topology.h
> > @@ -15,10 +15,35 @@
> >   #include "hw/boards.h"
> >   #include "qapi/qapi-types-machine-target.h"
> >   
> > +#define S390_TOPOLOGY_CPU_IFL   0x03
> > +
> > +typedef union s390_topology_id {
> > +uint64_t id;
> > +struct {
> > +uint8_t _reserved0;
> > +uint8_t drawer;
> > +uint8_t book;
> > +uint8_t socket;
> > +uint8_t type;
> > +uint8_t inv_polarization;
> 
> What sense does it make to store the polarization in an inverted way? ... I 
> don't get that ... could you please at least add a comment somewhere for the 
> rationale?
> 

It inverts the ordering with regards to polarization, as required by
the  PoP. The dedication is inverted for the same reason, dedicated
CPUs show up before non dedicated ones, so the id must have a lower
value.
I will add a comment.

> > +uint8_t not_dedicated;
> > +uint8_t origin;
> > +};
> > +} s390_topology_id;

[...]

> > + * fill_tle_cpu:
> > + * @p: The address of the CPU TLE to fill
> > + * @entry: a pointer to the S390TopologyEntry defining this
> > + * CPU container.
> > + *
> > + * Returns the next free TLE entry.
> > + */
> > +static char *fill_tle_cpu(char *p, S390TopologyEntry *entry)
> > +{
> > +SysIBCPUListEntry *tle = (SysIBCPUListEntry *)p;
> > +s390_topology_id topology_id = entry->id;
> > +
> > +tle->nl = 0;
> > +tle->flags = 3 - topology_id.inv_polarization;
> 
> Can you avoid the magic number 3 here?

Hmm, any number larger than 2 will do.
I could also use a int8_t and just negate, but relying on the
reinterpretation of two's complement is also magical.
I guess S390_CPU_ENTITLEMENT_HIGH makes the most sense.

[...]

> > +/**
> > + * s390_topology_fill_list_sorted:
> > + *
> > + * Loop over all CPU and insert it at the right place
> > + * inside the TLE entry list.
> > + * Fill the S390Topology list with entries according to the order
> > + * specified by the PoP.
> > + */
> > +static void s390_topology_fill_list_sorted(S390TopologyList *topology_list)
> > +{
> > +CPUState *cs;
> > +S390TopologyEntry sentinel;
> > +
> > +QTAILQ_INIT(topology_list);
> > +
> > +sentinel.id.id = cpu_to_be64(UINT64_MAX);
> > +QTAILQ_INSERT_HEAD(topology_list, &sentinel, next);
> > +
> > +CPU_FOREACH(cs) {
> > +s390_topology_id id = s390_topology_from_cpu(S390_CPU(cs));
> > +S390TopologyEntry *entry, *tmp;
> > +
> > +QTAILQ_FOREACH(tmp, topology_list, next) {
> > +if (id.id == tmp->id.id) {
> > +entry = tmp;
> > +break;

I think I'll add a comment here.

/*
 * Earlier bytes have higher order -> big endian.
 * E.g. an entry with higher drawer number should be later in the list,
 * no matter the later fields (book, socket, etc)
 */


> > +} else if (be64_to_cpu(id.id) < be64_to_cpu(tmp->id.id)) {
> > +entry = g_malloc0(sizeof(*entry));
> 
> Maybe nicer to use g_new0 here instead?

I don't think it makes much of a difference.

> 
> > +entry->id.id = id.id;
> 
> Should this get a cpu_to_be64() ?

No, there is no interpretation of the value here, just a copy.
> 
> > +QTAILQ_INSERT_BEFORE(tmp, entry, next);
> > +break;
> > +}
> > +}
> > +s390_topology_add_cpu_to_entry(entry, S390_CPU(cs));
> > +}
> > +
> > +QTAILQ_REMOVE(topology_list, &sentinel, next);
> > +}
> 
>   Thomas
> 
>

Re: PCI Hotplug ACPI device names only 3 characters long

2023-09-05 Thread Marcello Sylverster Bauer


Hi Philippe,

On 9/5/23 17:09, Philippe Mathieu-Daudé wrote:

Hi Marcello,

On 5/9/23 17:05, Marcello Sylverster Bauer wrote:

Greetings,

I'm currently working on a project to support Intel IPU6 in QEMU via 
VFIO so that the guest system can access the camera. This requires 
extending the ACPI device definition so that the guest knows how to 
access the camera.


However, I cannot extend the PCI devices because their names are not 4 
characters long and therefore do not follow the ACPI specification.


When I use '-acpitable' to include my own SSDT for the IPU6 PCI 
device, it does not allow me to declare the device as an External 
Object because it automatically adds padding underscores.


e.g.
Before:
```
External(_SB.PCI0.S18.SA0, DeviceObj)
```
After:
```
External(_SB.PCI0.S18_.SA0_, DeviceObj)
```


What do you mean by "before" / "after"?


Before is what is written in my SSDT ASL source file that is provided to 
QEMU via the "-acpitable" flag. After is what is actually written to the 
SSDT inside the VM.


If you compile and decompile the source file with iasl, you will get the 
same result.




Adding the underscore padding is hard coded in iASL and also in QEMU 
when parsing an ASL file. (see: build_append_nameseg())


So here are my questions:
1. Is there a solution to extend the ACPI PCI device using 
'-acpitable' without having to patch iASL or QEMU?
2. Are there any plans to change the names to comply with the ACPI 
spec? (e.g. use "S%.03X" format string instead)


Thanks
Marcello

Re: [PATCH v2 0/2] Fix MCE handling on AMD hosts

2023-09-05 Thread John Allen via

On Thu, Aug 31, 2023 at 11:40:08PM +0200, William Roche wrote:
> Hello John,
> 
> I could test your fixes and I can confirm that the BUS_MCEERR_AR is now
> working on AMD:
> 
> Before the fix, the VM panics with:
> 
> qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7f89573ce000 and
> GUEST addr 0x10b5ce000 of type BUS_MCEERR_AR injected
> [   83.562579] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank
> 1: a000
> [   83.562585] mce: [Hardware Error]: RIP !INEXACT! 10:
> {pv_native_safe_halt+0xf/0x20}
> [   83.562592] mce: [Hardware Error]: TSC 3d39402bdc
> [   83.562593] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693515449
> SOCKET 0 APIC 0 microcode 800126e
> [   83.562596] mce: [Hardware Error]: Machine check: Uncorrected error
> without MCA Recovery
> [   83.562597] Kernel panic - not syncing: Fatal local machine check
> [   83.563401] Kernel Offset: disabled
> 
> With the fix, the same error injection doesn't kill the VM, but generates
> the following console messages:
> 
> qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7fa430ab9000 and
> GUEST addr 0x118cb9000 of type BUS_MCEERR_AR injected
> [  250.851996] Disabling lock debugging due to kernel taint
> [  250.852928] mce: Uncorrected hardware memory error in user-access at
> 118cb9000
> [  250.853261] Memory failure: 0x118cb9: Sending SIGBUS to
> mce_process_rea:1227 due to hardware memory corruption
> [  250.854933] mce: [Hardware Error]: Machine check events logged
> [  250.855800] Memory failure: 0x118cb9: recovery action for dirty LRU page:
> Recovered
> [  250.856661] mce: [Hardware Error]: CPU 2: Machine Check Exception: 7 Bank
> 9: bc00
> [  250.860552] mce: [Hardware Error]: RIP 33:<7f56b9ecbee5>
> [  250.861405] mce: [Hardware Error]: TSC 8c2c664410 ADDR 118cb9000 MISC 8c
> [  250.862679] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693508937
> SOCKET 0 APIC 2 microcode 800126e
> 
> 
> But a problem still exists with BUS_MCEERR_AO that kills the VM with:
> 
> qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr
> 0x7f1d108e5000 and GUEST addr 0x114ae5000 of type BUS_MCEERR_AO injected
> [  157.392905] mce: [Hardware Error]: CPU 0: Machine Check Exception: 7 Bank
> 9: bc00
> [  157.392912] mce: [Hardware Error]: RIP 10:
> {pv_native_safe_halt+0xf/0x20}
> [  157.392919] mce: [Hardware Error]: TSC 60b92a54d0 ADDR 114ae5000 MISC 8c
> [  157.392921] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693500765
> SOCKET 0 APIC 0 microcode 800126e
> [  157.392924] mce: [Hardware Error]: Machine check: Uncorrected
> unrecoverable error in kernel context
> [  157.392925] Kernel panic - not syncing: Fatal local machine check
> [  157.402582] Kernel Offset: disabled
> 
> As AMD guests can't currently deal with BUS_MCEERR_AO MCE injection,
> according to me the fix is not complete, the 'AO' case must be handled. The
> simplest way is probably to filter it at the qemu level, to only inject the
> 'AR' case -- and it also gives the possibility to let qemu provide a message
> about an ignored 'AO' error.
> 
> I would suggest to add a 3rd patch implementing this AMD specific filter:
> 
> 
> commit bf8cc74df3fcc7bf958a7c42b876e9c059fe4d06
> Author: William Roche 
> Date:   Thu Aug 31 18:54:57 2023 +
> 
>     i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest
> 
>     AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
>     as it panics the VM kernel. We filter this event and provide a
>     warning message.
> 
>     Signed-off-by: William Roche 
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 9ca7187628..bd60d5697b 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -606,6 +606,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr,
> int code)
>  mcg_status |= MCG_STATUS_RIPV;
>  }
>  } else {
> +    if (code == BUS_MCEERR_AO) {
> +    /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
> +    return;
> +    }
>  mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
>  }
> 
> @@ -657,7 +661,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void
> *addr)
>  if (ram_addr != RAM_ADDR_INVALID &&
>  kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr))
> {
>  kvm_hwpoison_page_add(ram_addr);
> -    kvm_mce_inject(cpu, paddr, code);
> +    if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO)
> +    kvm_mce_inject(cpu, paddr, code);
> 
>  /*
>   * Use different logging severity based on error type.
> @@ -670,8 +675,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void
> *addr)
>  addr, paddr, "BUS_MCEERR_AR");
>  } else {
>   warn_report("Guest MCE Memory Error at QEMU addr %p and "
> - "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
> - ad

Re: mips system emulation failure with virtio

2023-09-05 Thread Philippe Mathieu-Daudé


Hi Richard,

On 5/9/23 16:50, Richard Purdie wrote:

On Tue, 2023-09-05 at 14:59 +0100, Alex Bennée wrote:

Richard Purdie  writes:


With qemu 8.1.0 we see boot hangs fox x86-64 targets.

These are fixed by 0d58c660689f6da1e3feff8a997014003d928b3b (softmmu:
Use async_run_on_cpu in tcg_commit) but if I add that commit, mips and
mips64 break, hanging at boot unable to find a rootfs.


Are you testing mipsel / mips64el?


We use virtio for network and disk and both of those change in the
bootlog from messages like:

[1.726118] virtio-pci :00:13.0: enabling device ( -> 0003)
[1.728864] virtio-pci :00:14.0: enabling device ( -> 0003)
[1.729948] virtio-pci :00:15.0: enabling device ( -> 0003)
...
[2.162148] virtio_blk virtio2: 1/0/0 default/read/poll queues
[2.168311] virtio_blk virtio2: [vda] 1184242 512-byte logical

to:

[1.777051] virtio-pci :00:13.0: enabling device ( -> 0003)
[1.779822] virtio-pci :00:14.0: enabling device ( -> 0003)
[1.780926] virtio-pci :00:15.0: enabling device ( -> 0003)
...
[1.894852] virtio_rng: probe of virtio1 failed with error -28
...
[2.063553] virtio_blk virtio2: 1/0/0 default/read/poll queues
[2.064260] virtio_blk: probe of virtio2 failed with error -28
[2.069080] virtio_net: probe of virtio0 failed with error -28


i.e. the virtio drivers no longer work.


Interesting, as you say this seems to be VirtIO specific as the baseline
tests (using IDE) work fine:

   ➜  ./tests/venv/bin/avocado run 
./tests/avocado/tuxrun_baselines.py:test_mips64
   JOB ID : 71f3e3b7080164b78ef1c8c1bb6bc880932d8c9b
   JOB LOG: 
/home/alex/avocado/job-results/job-2023-09-05T15.01-71f3e3b/job.log
(1/2) ./tests/avocado/tuxrun_baselines.py:TuxRunBaselineTest.test_mips64: 
PASS (12.19 s)
(2/2) ./tests/avocado/tuxrun_baselines.py:TuxRunBaselineTest.test_mips64el: 
PASS (11.78 s)
   RESULTS: PASS 2 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | 
CANCEL 0
   JOB TIME   : 24.79 s


I tested with current qemu master
(17780edd81d27fcfdb7a802efc870a99788bd2fc) and mips is still broken
there.

1 2 3 >

1 - 100 of 213 matches

Mail list logo