Re: [PATCH 00/10] riscv: create_fdt() related cleanups

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 18:09, Daniel Henrique Barboza wrote:

Hi,

This is a follow-up of:

"[PATCH v5 00/11] riscv: OpenSBI boot test and cleanups"

Patches were based on top of riscv-to-apply.next [1] + the series above.

The recent FDT changes made in hw/riscv (all machines are now using the
FDT via MachineState::fdt) allowed for most of the cleanups made here.

Patches 9 and 10 were based on a suggestion made by Phil a few weeks ago.
I decided to go for it.

[1] https://github.com/alistair23/qemu/tree/riscv-to-apply.next

Daniel Henrique Barboza (10):
   hw/riscv/spike.c: simplify create_fdt()
   hw/riscv/virt.c: simplify create_fdt()
   hw/riscv/sifive_u.c: simplify create_fdt()
   hw/riscv/virt.c: remove 'is_32_bit' param from
 create_fdt_socket_cpus()
   hw/riscv: use MachineState::fdt in riscv_socket_fdt_write_id()
   hw/riscv: use ms->fdt in riscv_socket_fdt_write_distance_matrix()
   hw/riscv: simplify riscv_load_fdt()
   hw/riscv/virt.c: calculate socket count once in create_fdt_imsic()
   hw/riscv/virt.c: rename MachineState 'mc' pointers to 'ms'
   hw/riscv/spike.c: rename MachineState 'mc' pointers to' ms'


Patch 7 likely needs rework (problem predating your series).

Meanwhile for patches 1-6 & 8-10:
Reviewed-by: Philippe Mathieu-Daudé 

Thanks for this cleanup!



Re: [PATCH 07/10] hw/riscv: simplify riscv_load_fdt()

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 18:09, Daniel Henrique Barboza wrote:

All callers of riscv_load_fdt() are using machine->ram_size as
'mem_size' and the fdt is always retrievable via machine->fdt.

Signed-off-by: Daniel Henrique Barboza 
---
  hw/riscv/boot.c| 4 +++-
  hw/riscv/microchip_pfsoc.c | 4 ++--
  hw/riscv/sifive_u.c| 3 +--
  hw/riscv/spike.c   | 3 +--
  hw/riscv/virt.c| 3 +--
  include/hw/riscv/boot.h| 2 +-
  6 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index e868fb6ade..21dea7eac2 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -265,10 +265,12 @@ out:
  return kernel_entry;
  }
  
-uint64_t riscv_load_fdt(hwaddr dram_base, uint64_t mem_size, void *fdt)

+uint64_t riscv_load_fdt(MachineState *ms, hwaddr dram_base)
  {
  uint64_t temp, fdt_addr;
+uint64_t mem_size = ms->ram_size;
  hwaddr dram_end = dram_base + mem_size;


What about sparse memory ...?


  if (fdtsize <= 0) {
diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
index c45023a2b1..6bb08f66bd 100644
--- a/hw/riscv/microchip_pfsoc.c
+++ b/hw/riscv/microchip_pfsoc.c
@@ -633,8 +633,8 @@ static void microchip_icicle_kit_machine_init(MachineState 
*machine)
   true, NULL);
  
  /* Compute the fdt load address in dram */

-fdt_load_addr = riscv_load_fdt(memmap[MICROCHIP_PFSOC_DRAM_LO].base,
-   machine->ram_size, machine->fdt);
+fdt_load_addr = riscv_load_fdt(machine,
+   memmap[MICROCHIP_PFSOC_DRAM_LO].base);


... i.e:

hw/riscv/microchip_pfsoc.c:[MICROCHIP_PFSOC_DRAM_LO] = { 
0x8000,   0x4000 },
hw/riscv/microchip_pfsoc.c:[MICROCHIP_PFSOC_DRAM_HI] =   { 
0x10,  0x0 },


IIUC FDT is always loaded into a contiguous region, so maybe simply
'dram_base' is a bad name for '[fdt_]load_address'?

'mem_size' is used to calculate 'dram_end'... This function seems buggy.

We should pass either start_addr/end_addr or base/size, but the range
passed must be contiguous.



Re: [PATCH] hw/mips/boston.c: rename MachineState 'mc' pointer to 'ms'

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 18:21, Daniel Henrique Barboza wrote:

Follow the QEMU convention of naming MachineState pointers as 'ms' by
renaming the instance in create_fdt() where we're calling it 'mc'.

Cc: Paul Burton 
Cc: Aleksandar Rikalo 
Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel Henrique Barboza 
---
  hw/mips/boston.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 05/18] target/arm: Move CPU QOM type definitions to "hw/arm/cpu.h"

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 21:02, Richard Henderson wrote:

On 1/10/23 08:43, Philippe Mathieu-Daudé wrote:

+++ b/target/arm/cpu.h
@@ -26,6 +26,7 @@
  #include "cpu-qom.h"
  #include "exec/cpu-defs.h"
  #include "qapi/qapi-types-common.h"
+#include "hw/arm/cpu.h"


I'm not a fan of this.

If you want a smaller version of cpu-qom.h here in target/arm/, for use 
by hw/, that's one thing.  But target/ should not be reaching back into 
hw/, IMO.


I concur, but currently we have:

$ git grep '#include "hw' target | wc -l
 220

$ git grep -h '#include "hw' target | sort | uniq -c
   1 #include "hw/acpi/acpi.h"
   1 #include "hw/acpi/ghes.h"
   1 #include "hw/arm/boot.h"
   1 #include "hw/arm/virt.h"
  19 #include "hw/boards.h"
   2 #include "hw/clock.h"
   3 #include "hw/core/accel-cpu.h"
  24 #include "hw/core/cpu.h"
  20 #include "hw/core/sysemu-cpu-ops.h"
  24 #include "hw/core/tcg-cpu-ops.h"
   1 #include "hw/hppa/hppa_hardware.h"
   3 #include "hw/hw.h"
   1 #include "hw/hyperv/hyperv-proto.h"
   2 #include "hw/hyperv/hyperv.h"
   2 #include "hw/i386/apic-msidef.h"
   2 #include "hw/i386/apic.h"
   8 #include "hw/i386/apic_internal.h"
   1 #include "hw/i386/e820_memory_layout.h"
   1 #include "hw/i386/intel_iommu.h"
   1 #include "hw/i386/ioapic.h"
   2 #include "hw/i386/pc.h"
   1 #include "hw/i386/sgx-epc.h"
   1 #include "hw/i386/topology.h"
   1 #include "hw/i386/x86-iommu.h"
   2 #include "hw/i386/x86.h"
   1 #include "hw/intc/riscv_aclint.h"
   8 #include "hw/irq.h"
   1 #include "hw/isa/isa.h"
   5 #include "hw/loader.h"
   1 #include "hw/loongarch/virt.h"
   2 #include "hw/mips/cpudevs.h"
   2 #include "hw/pci/msi.h"
   1 #include "hw/pci/msix.h"
   3 #include "hw/pci/pci.h"
   1 #include "hw/ppc/openpic_kvm.h"
   5 #include "hw/ppc/ppc.h"
   2 #include "hw/ppc/spapr.h"
   1 #include "hw/ppc/spapr_cpu_core.h"
   2 #include "hw/qdev-clock.h"
  12 #include "hw/qdev-properties.h"
  11 #include "hw/registerfields.h"
   2 #include "hw/s390x/ebcdic.h"
   5 #include "hw/s390x/ioinst.h"
   2 #include "hw/s390x/ipl.h"
   8 #include "hw/s390x/pv.h"
   2 #include "hw/s390x/s390-pci-bus.h"
   2 #include "hw/s390x/s390-pci-inst.h"
   2 #include "hw/s390x/s390-virtio-ccw.h"
   2 #include "hw/s390x/s390-virtio-hcall.h"
   3 #include "hw/s390x/s390_flic.h"
   1 #include "hw/s390x/sclp.h"
   2 #include "hw/s390x/storage-keys.h"
   1 #include "hw/s390x/tod.h"
   1 #include "hw/sh4/sh_intc.h"
   2 #include "hw/sysbus.h"
   1 #include "hw/watchdog/wdt_diag288.h"
   1 #include "hw/xtensa/xtensa-isa.h"

Assuming we want to have a self-contained libtarget$arch, how can we
deal with HW tied to the arch such CPU timers or NVIC?



Re: [PATCH 5/5] migration: Established connection for listener sockets on the dest interface

2023-01-11 Thread Markus Armbruster
Het Gala  writes:

> From: Author Het Gala 
>
> Modified 'migrate-incoming' QAPI supports MigrateChannel parameters to prevent
> multi-level encodings of uri on the destination side.
>
> socket_start_incoming_migration() has been depricated as it's only purpose was
> uri parsing.
> Renamed socket_outgoing_migration_internal() as 
> socket_start_incoming_migration().
> qemu_uri_parsing() is used to populate the new struct from 'uri' string
> needed for migration connection. The function will no longer be used once the
> 'uri' parameter is depricated, as the parameters will already be mapped into
> new struct.
>
> Suggested-by: Daniel P. Berrange 
> Suggested-by: Manish Mishra 
> Suggested-by: Aravind Retnakaran 
> Signed-off-by: Het Gala 
> ---
>  migration/migration.c | 48 ---
>  migration/socket.c| 16 ++-
>  migration/socket.h|  2 +-
>  3 files changed, 35 insertions(+), 31 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 838940fd55..c70fd0ab4f 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -520,27 +520,43 @@ static void qemu_uri_parsing(const char *uri,
>  }
>  }
>  
> -static void qemu_start_incoming_migration(const char *uri, Error **errp)
> +static void qemu_start_incoming_migration(const char *uri,
> +  MigrateChannel *channel,
> +  Error **errp)
>  {
> -const char *p = NULL;
> +MigrateAddress *addrs = g_new0(MigrateAddress, 1);
> +SocketAddress *saddr = g_new0(SocketAddress, 1);
> +
> +/*
> + * motive here is just to have checks and convert uri into
> + * socketaddress struct
> + */
> +if (uri && channel) {
> +error_setg(errp, "uri and channels options should be used "
> +  "mutually exclusive");
> +} else if (uri) {
> +qemu_uri_parsing(uri, &channel, errp);
> +}
>  
>  migrate_protocol_allow_multi_channels(false); /* reset it anyway */
>  qapi_event_send_migration(MIGRATION_STATUS_SETUP);
> -if (strstart(uri, "tcp:", &p) ||
> -strstart(uri, "unix:", NULL) ||
> -strstart(uri, "vsock:", NULL)) {
> -migrate_protocol_allow_multi_channels(true);
> -socket_start_incoming_migration(p ? p : uri, errp);
> +if (addrs->transport == MIGRATE_TRANSPORT_SOCKET) {
> +if (saddr->type == SOCKET_ADDRESS_TYPE_INET ||
> +saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
> +saddr->type == SOCKET_ADDRESS_TYPE_VSOCK) {
> +migrate_protocol_allow_multi_channels(true);
> +socket_start_incoming_migration(saddr, errp);
> +} else if (saddr->type == SOCKET_ADDRESS_TYPE_FD) {
> +fd_start_incoming_migration(saddr->u.fd.str, errp);
> +}
>  #ifdef CONFIG_RDMA
> -} else if (strstart(uri, "rdma:", &p)) {
> -rdma_start_incoming_migration(p, errp);
> +} else if (addrs->transport == MIGRATE_TRANSPORT_RDMA) {
> +rdma_start_incomng_migration(addrs->u.rdma.rdma_str, errp);


Fails to compile:

../migration/migration.c: In function ‘qemu_start_incoming_migration’:
../migration/migration.c:554:9: error: implicit declaration of function 
‘rdma_start_incomng_migration’; did you mean ‘rdma_start_incoming_migration’? 
[-Werror=implicit-function-declaration]
  554 | rdma_start_incomng_migration(addrs->u.rdma.rdma_str, errp);
  | ^~~~
  | rdma_start_incoming_migration
../migration/migration.c:554:9: error: nested extern declaration of 
‘rdma_start_incomng_migration’ [-Werror=nested-externs]

Please fix that, and also test RDMA.

>  #endif
> -} else if (strstart(uri, "exec:", &p)) {
> -exec_start_incoming_migration(p, errp);
> -} else if (strstart(uri, "fd:", &p)) {
> -fd_start_incoming_migration(p, errp);
> +} else if (addrs->transport == MIGRATE_TRANSPORT_EXEC) {
> +exec_start_incoming_migration(addrs->u.exec.exec_str, errp);
>  } else {
> -error_setg(errp, "unknown migration protocol: %s", uri);
> +error_setg(errp, "unknown migration protocol: %i", addrs->transport);
>  }
>  }
>  

[...]




[PATCH 1/1] hw/loongarch/virt: add system_powerdown hmp command support

2023-01-11 Thread Song Gao
For loongarch virt machine, add powerdown notification callback
and send ACPI_POWER_DOWN_STATUS event by acpi ged. Also add
acpi dsdt table for ACPI_POWER_BUTTON_DEVICE device in this
patch.

Signed-off-by: Song Gao 
---
 hw/loongarch/acpi-build.c   |  1 +
 hw/loongarch/virt.c | 14 ++
 include/hw/loongarch/virt.h |  1 +
 3 files changed, 16 insertions(+)

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index c2b237736d..b7601cb284 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -261,6 +261,7 @@ build_la_ged_aml(Aml *dsdt, MachineState *machine)
  AML_SYSTEM_MEMORY,
  VIRT_GED_MEM_ADDR);
 }
+acpi_dsdt_add_power_button(dsdt);
 }
 
 static void build_pci_device_aml(Aml *scope, LoongArchMachineState *lams)
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 66be925068..a4998599d3 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -316,6 +316,16 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
 loongarch_acpi_setup(lams);
 }
 
+static void virt_powerdown_req(Notifier *notifier, void *opaque)
+{
+LoongArchMachineState *s = container_of(notifier,
+   LoongArchMachineState, powerdown_notifier);
+
+if (s->acpi_ged) {
+acpi_send_event(s->acpi_ged, ACPI_POWER_DOWN_STATUS);
+}
+}
+
 struct memmap_entry {
 uint64_t address;
 uint64_t length;
@@ -859,6 +869,10 @@ static void loongarch_init(MachineState *machine)
VIRT_PLATFORM_BUS_IRQ);
 lams->machine_done.notify = virt_machine_done;
 qemu_add_machine_init_done_notifier(&lams->machine_done);
+ /* connect powerdown request */
+lams->powerdown_notifier.notify = virt_powerdown_req;
+qemu_register_powerdown_notifier(&lams->powerdown_notifier);
+
 fdt_add_pcie_node(lams);
 /*
  * Since lowmem region starts from 0 and Linux kernel legacy start address
diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index f5f818894e..7ae8a91229 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -45,6 +45,7 @@ struct LoongArchMachineState {
 /* State for other subsystems/APIs: */
 FWCfgState  *fw_cfg;
 Notifier machine_done;
+Notifier powerdown_notifier;
 OnOffAutoacpi;
 char *oem_id;
 char *oem_table_id;
-- 
2.31.1




RE: [PULL v4 76/83] vhost-user: Support vhost_dev_start

2023-01-11 Thread Yajun Wu
Hi,

VHOST_USER_PROTOCOL_F_STATUS is enabled by default (dpdk):

lib/vhost/vhost_user.h

17 #define VHOST_USER_PROTOCOL_FEATURES((1ULL << VHOST_USER_PROTOCOL_F_MQ) 
| \
 18  (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
 19  (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \
 20  (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
 21  (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU) | \
 22  (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
 23  (1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \
 24  (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \
 25  (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
 26  (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT) | \
 27  (1ULL << VHOST_USER_PROTOCOL_F_STATUS))

Remove VHOST_USER_PROTOCOL_F_STATUS can disable VHOST_USER_SET/GET_STATUS 
message.
Should W.A. this issue.

Thanks,
Yajun

-Original Message-
From: Laurent Vivier  
Sent: Wednesday, January 11, 2023 5:50 PM
To: Maxime Coquelin 
Cc: qemu-devel@nongnu.org; Peter Maydell ; Yajun Wu 
; Parav Pandit ; Michael S. Tsirkin 

Subject: Re: [PULL v4 76/83] vhost-user: Support vhost_dev_start

External email: Use caution opening links or attachments


On 1/9/23 11:55, Michael S. Tsirkin wrote:
> On Fri, Jan 06, 2023 at 03:21:43PM +0100, Laurent Vivier wrote:
>> Hi,
>>
>> it seems this patch breaks vhost-user with DPDK.
>>
>> See 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
>> zilla.redhat.com%2Fshow_bug.cgi%3Fid%3D2155173&data=05%7C01%7Cyajunw%
>> 40nvidia.com%7Cf4c581251ab548d64ae708daf3b94867%7C43083d15727340c1b7d
>> b39efd9ccc17a%7C0%7C0%7C638090274351645141%7CUnknown%7CTWFpbGZsb3d8ey
>> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C30
>> 00%7C%7C%7C&sdata=m582YO4Sd2jJ0S%2F%2FSv9zx6NSuXQIrRwkqBPgYedO%2Fr8%3
>> D&reserved=0
>>
>> it seems QEMU doesn't receive the expected commands sequence:
>>
>> Received unexpected msg type. Expected 22 received 40 Fail to update 
>> device iotlb Received unexpected msg type. Expected 40 received 22 
>> Received unexpected msg type. Expected 22 received 11 Fail to update 
>> device iotlb Received unexpected msg type. Expected 11 received 22 
>> vhost VQ 1 ring restore failed: -71: Protocol error (71) Received 
>> unexpected msg type. Expected 22 received 11 Fail to update device 
>> iotlb Received unexpected msg type. Expected 11 received 22 vhost VQ 
>> 0 ring restore failed: -71: Protocol error (71) unable to start vhost 
>> net: 71: falling back on userspace virtio
>>
>> It receives VHOST_USER_GET_STATUS (40) when it expects 
>> VHOST_USER_IOTLB_MSG (22) and VHOST_USER_IOTLB_MSG when it expects 
>> VHOST_USER_GET_STATUS.
>> and VHOST_USER_GET_VRING_BASE (11) when it expect VHOST_USER_GET_STATUS and 
>> so on.
>>
>> Any idea?
>>
>> Thanks,
>> Laurent
>
>
> So I am guessing it's coming from:
>
>  if (msg.hdr.request != request) {
>  error_report("Received unexpected msg type. Expected %d received %d",
>   request, msg.hdr.request);
>  return -EPROTO;
>  }
>
> in process_message_reply and/or in vhost_user_get_u64.
>
>
>> On 11/7/22 23:53, Michael S. Tsirkin wrote:
>>> From: Yajun Wu 
>>>
>>> The motivation of adding vhost-user vhost_dev_start support is to 
>>> improve backend configuration speed and reduce live migration VM 
>>> downtime.
>>>
>>> Today VQ configuration is issued one by one. For virtio net with 
>>> multi-queue support, backend needs to update RSS (Receive side
>>> scaling) on every rx queue enable. Updating RSS is time-consuming 
>>> (typical time like 7ms).
>>>
>>> Implement already defined vhost status and message in the vhost 
>>> specification [1].
>>> (a) VHOST_USER_PROTOCOL_F_STATUS
>>> (b) VHOST_USER_SET_STATUS
>>> (c) VHOST_USER_GET_STATUS
>>>
>>> Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK 
>>> for device start and reset(0) for device stop.
>>>
>>> On reception of the DRIVER_OK message, backend can apply the needed 
>>> setting only once (instead of incremental) and also utilize 
>>> parallelism on enabling queues.
>>>
>>> This improves QEMU's live migration downtime with vhost user backend 
>>> implementation by great margin, specially for the large number of 
>>> VQs of 64 from 800 msec to 250 msec.
>>>
>>> [1] 
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fqe
>>> mu-project.gitlab.io%2Fqemu%2Finterop%2Fvhost-user.html&data=05%7C01
>>> %7Cyajunw%40nvidia.com%7Cf4c581251ab548d64ae708daf3b94867%7C43083d15
>>> 727340c1b7db39efd9ccc17a%7C0%7C0%7C638090274351645141%7CUnknown%7CTW
>>> FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC
>>> I6Mn0%3D%7C3000%7C%7C%7C&sdata=eEmHPgZlmImC5LTDZ2jTJauNW7cRFDhsme8%2
>>> Fjk7ywIE%3D&reserved=0
>>>
>>> Signed-off-by: Yajun Wu 
>>> Acked-by: Parav Pandit 
>>> Message-Id: <20221017064452.12265

Re: [PATCH 1/2] hw/ide/core.c (cmd_read_native_max): Avoid limited device parameters

2023-01-11 Thread Lev Kujawski


John Snow writes:

> On Mon, Oct 10, 2022 at 4:52 AM Lev Kujawski  wrote:
>>
>> Always use the native CHS device parameters for the ATA commands READ
>> NATIVE MAX ADDRESS and READ NATIVE MAX ADDRESS EXT, not those limited
>> by the ATA command INITIALIZE_DEVICE_PARAMETERS (introduced in patch
>> 176e4961, hw/ide/core.c: Implement ATA INITIALIZE_DEVICE_PARAMETERS
>> command, 2022-07-07.)
>>
>> As stated by the ATA/ATAPI specification, "[t]he native maximum is the
>> highest address accepted by the device in the factory default
>> condition."  Therefore this patch substitutes the native values in
>> drive_heads and drive_sectors before calling ide_set_sector().
>>
>> One consequence of the prior behavior was that setting zero sectors
>> per track could lead to an FPE within ide_set_sector().  Thanks to
>> Alexander Bulekov for reporting this issue.
>>
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1243
>> Signed-off-by: Lev Kujawski 
>
> Does this need attention?
>
> --js
>

Hi John,

This patch needs to be merged to mitigate issue 1243, which is still
present within QEMU master as of aa96ab7c9d.

Thanks, Lev



Re: [PATCH] hw/loongarch/virt: rename PCH_PIC_IRQ_OFFSET with VIRT_GSI_BASE

2023-01-11 Thread gaosong



在 2022/12/28 上午11:07, Bibo Mao 写道:

In theory gsi base can start from 0 on loongarch virt machine,
however gsi base is hard-coded in linux kernel loongarch system,
else system fails to boot.

This patch renames macro PCH_PIC_IRQ_OFFSET with VIRT_GSI_BASE,
keeps value unchanged. GSI base is common concept in acpi spec
and easy to understand.

Signed-off-by: Bibo Mao 
---
  hw/loongarch/acpi-build.c  |  2 +-
  hw/loongarch/virt.c|  8 
  include/hw/pci-host/ls7a.h | 17 +
  3 files changed, 14 insertions(+), 13 deletions(-)


Reviewed-by: Song Gao 

Thanks.
Song Gao

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index c2b237736d..33e04e4b76 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -272,7 +272,7 @@ static void build_pci_device_aml(Aml *scope, 
LoongArchMachineState *lams)
  .pio.size= VIRT_PCI_IO_SIZE,
  .ecam.base   = VIRT_PCI_CFG_BASE,
  .ecam.size   = VIRT_PCI_CFG_SIZE,
-.irq = PCH_PIC_IRQ_OFFSET + VIRT_DEVICE_IRQS,
+.irq = VIRT_GSI_BASE + VIRT_DEVICE_IRQS,
  .bus = lams->pci_bus,
  };
  
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c

index c8a495ea30..3754e2151f 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -432,7 +432,7 @@ static DeviceState *create_acpi_ged(DeviceState *pch_pic, 
LoongArchMachineState
  sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, VIRT_GED_REG_ADDR);
  
  sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0,

-   qdev_get_gpio_in(pch_pic, VIRT_SCI_IRQ - 
PCH_PIC_IRQ_OFFSET));
+   qdev_get_gpio_in(pch_pic, VIRT_SCI_IRQ - 
VIRT_GSI_BASE));
  sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
  return dev;
  }
@@ -452,7 +452,7 @@ static DeviceState *create_platform_bus(DeviceState 
*pch_pic)
  
  sysbus = SYS_BUS_DEVICE(dev);

  for (i = 0; i < VIRT_PLATFORM_BUS_NUM_IRQS; i++) {
-irq = VIRT_PLATFORM_BUS_IRQ - PCH_PIC_IRQ_OFFSET + i;
+irq = VIRT_PLATFORM_BUS_IRQ - VIRT_GSI_BASE + i;
  sysbus_connect_irq(sysbus, i, qdev_get_gpio_in(pch_pic, irq));
  }
  
@@ -509,7 +509,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, LoongArchMachineState *
  
  serial_mm_init(get_system_memory(), VIRT_UART_BASE, 0,

 qdev_get_gpio_in(pch_pic,
-VIRT_UART_IRQ - PCH_PIC_IRQ_OFFSET),
+VIRT_UART_IRQ - VIRT_GSI_BASE),
 115200, serial_hd(0), DEVICE_LITTLE_ENDIAN);
  fdt_add_uart_node(lams);
  
@@ -531,7 +531,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, LoongArchMachineState *

  create_unimplemented_device("pci-dma-cfg", 0x1001041c, 0x4);
  sysbus_create_simple("ls7a_rtc", VIRT_RTC_REG_BASE,
   qdev_get_gpio_in(pch_pic,
- VIRT_RTC_IRQ - PCH_PIC_IRQ_OFFSET));
+ VIRT_RTC_IRQ - VIRT_GSI_BASE));
  fdt_add_rtc_node(lams);
  
  pm_mem = g_new(MemoryRegion, 1);

diff --git a/include/hw/pci-host/ls7a.h b/include/hw/pci-host/ls7a.h
index df7fa55a30..194aac905e 100644
--- a/include/hw/pci-host/ls7a.h
+++ b/include/hw/pci-host/ls7a.h
@@ -28,24 +28,25 @@
  #define VIRT_PCH_MSI_ADDR_LOW0x2FF0UL
  
  /*

- * According to the kernel pch irq start from 64 offset
- * 0 ~ 16 irqs used for non-pci device while 16 ~ 64 irqs
- * used for pci device.
+ * GSI_BASE is hard-coded with 64 in linux kernel, else kernel fails to boot
+ * 0  - 15  GSI for ISA devices even if there is no ISA devices
+ * 16 - 63  GSI for CPU devices such as timers/perf monitor etc
+ * 64 - GSI for external devices
   */
-#define PCH_PIC_IRQ_OFFSET   64
+#define VIRT_GSI_BASE64
  #define VIRT_DEVICE_IRQS 16
  #define VIRT_PCI_IRQS48
-#define VIRT_UART_IRQ(PCH_PIC_IRQ_OFFSET + 2)
+#define VIRT_UART_IRQ(VIRT_GSI_BASE + 2)
  #define VIRT_UART_BASE   0x1fe001e0
  #define VIRT_UART_SIZE   0X100
-#define VIRT_RTC_IRQ (PCH_PIC_IRQ_OFFSET + 3)
+#define VIRT_RTC_IRQ (VIRT_GSI_BASE + 3)
  #define VIRT_MISC_REG_BASE   (VIRT_PCH_REG_BASE + 0x0008)
  #define VIRT_RTC_REG_BASE(VIRT_MISC_REG_BASE + 0x00050100)
  #define VIRT_RTC_LEN 0x100
-#define VIRT_SCI_IRQ (PCH_PIC_IRQ_OFFSET + 4)
+#define VIRT_SCI_IRQ (VIRT_GSI_BASE + 4)
  
  #define VIRT_PLATFORM_BUS_BASEADDRESS   0x1600

  #define VIRT_PLATFORM_BUS_SIZE  0x200
  #define VIRT_PLATFORM_BUS_NUM_IRQS  2
-#define VIRT_PLATFORM_BUS_IRQ   69
+#define VIRT_PLATFORM_BUS_IRQ   (VIRT_GSI_BASE + 5)
  #endif





[PATCH] target/arm: Introduce aarch64_set_svcr

2023-01-11 Thread Richard Henderson
Unify the two helper_set_pstate_{sm,za} in this function.
Do not call helper_* functions from svcr_write.
Cleans up linux-user usage by consolodating logic.

Cc: Fabiano Rosas 
Signed-off-by: Richard Henderson 
---

Fabiano, I expect this to replace much of your

  [RFC PATCH v2 07/19] target/arm: Move helper_set_pstate_* into cpregs.c

r~
---
 target/arm/cpu.h  |  2 +-
 target/arm/helper-sme.h   |  3 +--
 linux-user/aarch64/cpu_loop.c | 11 ++
 linux-user/aarch64/signal.c   | 13 ++-
 target/arm/helper.c   | 41 ---
 target/arm/sme_helper.c   | 37 ++-
 target/arm/translate-a64.c| 19 ++--
 7 files changed, 53 insertions(+), 73 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index bf2bce046d..a471add499 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1123,7 +1123,7 @@ int aarch64_cpu_gdb_write_register(CPUState *cpu, uint8_t 
*buf, int reg);
 void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq);
 void aarch64_sve_change_el(CPUARMState *env, int old_el,
int new_el, bool el0_a64);
-void arm_reset_sve_state(CPUARMState *env);
+void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask);
 
 /*
  * SVE registers are encoded in KVM's memory in an endianness-invariant format.
diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index d2d544a696..27eef49a11 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -17,8 +17,7 @@
  * License along with this library; if not, see .
  */
 
-DEF_HELPER_FLAGS_2(set_pstate_sm, TCG_CALL_NO_RWG, void, env, i32)
-DEF_HELPER_FLAGS_2(set_pstate_za, TCG_CALL_NO_RWG, void, env, i32)
+DEF_HELPER_FLAGS_3(set_svcr, TCG_CALL_NO_RWG, void, env, i32, i32)
 
 DEF_HELPER_FLAGS_3(sme_zero, TCG_CALL_NO_RWG, void, env, i32, i32)
 
diff --git a/linux-user/aarch64/cpu_loop.c b/linux-user/aarch64/cpu_loop.c
index 9875d609a9..2e2f7cf218 100644
--- a/linux-user/aarch64/cpu_loop.c
+++ b/linux-user/aarch64/cpu_loop.c
@@ -89,15 +89,8 @@ void cpu_loop(CPUARMState *env)
 
 switch (trapnr) {
 case EXCP_SWI:
-/*
- * On syscall, PSTATE.ZA is preserved, along with the ZA matrix.
- * PSTATE.SM is cleared, per SMSTOP, which does ResetSVEState.
- */
-if (FIELD_EX64(env->svcr, SVCR, SM)) {
-env->svcr = FIELD_DP64(env->svcr, SVCR, SM, 0);
-arm_rebuild_hflags(env);
-arm_reset_sve_state(env);
-}
+/* On syscall, PSTATE.ZA is preserved, PSTATE.SM is cleared. */
+aarch64_set_svcr(env, 0, R_SVCR_SM_MASK);
 ret = do_syscall(env,
  env->xregs[8],
  env->xregs[0],
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 6a2c6e06d2..b265cfd470 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -665,17 +665,8 @@ static void target_setup_frame(int usig, struct 
target_sigaction *ka,
 env->btype = 2;
 }
 
-/*
- * Invoke the signal handler with both SM and ZA disabled.
- * When clearing SM, ResetSVEState, per SMSTOP.
- */
-if (FIELD_EX64(env->svcr, SVCR, SM)) {
-arm_reset_sve_state(env);
-}
-if (env->svcr) {
-env->svcr = 0;
-arm_rebuild_hflags(env);
-}
+/* Invoke the signal handler with both SM and ZA disabled. */
+aarch64_set_svcr(env, 0, R_SVCR_SM_MASK | R_SVCR_ZA_MASK);
 
 if (info) {
 tswap_siginfo(&frame->info, info);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index cee3804354..1d74b95971 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6722,12 +6722,47 @@ static CPAccessResult access_esm(CPUARMState *env, 
const ARMCPRegInfo *ri,
 return CP_ACCESS_OK;
 }
 
+/* ResetSVEState */
+static void arm_reset_sve_state(CPUARMState *env)
+{
+memset(env->vfp.zregs, 0, sizeof(env->vfp.zregs));
+/* Recall that FFR is stored as pregs[16]. */
+memset(env->vfp.pregs, 0, sizeof(env->vfp.pregs));
+vfp_set_fpcr(env, 0x089f);
+}
+
+void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask)
+{
+uint64_t change = (env->svcr ^ new) & mask;
+
+if (change == 0) {
+return;
+}
+env->svcr ^= change;
+
+if (change & R_SVCR_SM_MASK) {
+arm_reset_sve_state(env);
+}
+
+/*
+ * ResetSMEState.
+ *
+ * SetPSTATE_ZA zeros on enable and disable.  We can zero this only
+ * on enable: while disabled, the storage is inaccessible and the
+ * value does not matter.  We're not saving the storage in vmstate
+ * when disabled either.
+ */
+if (change & new & R_SVCR_ZA_MASK) {
+memset(env->zarray, 0, sizeof(env->zarray));
+}
+
+arm_rebuild_hflags(env);
+}
+
 static void svcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
 

Re: [PATCH v3 1/3] linux-user: Clean up when exiting due to a signal

2023-01-11 Thread Richard Henderson

On 1/10/23 17:47, Ilya Leoshkevich wrote:

When exiting due to an exit() syscall, qemu-user calls
preexit_cleanup(), but this is currently not the case when exiting due
to a signal. This leads to various buffers not being flushed (e.g.,
for gprof, for gcov, and for the upcoming perf support).

Add the missing call.

Signed-off-by: Ilya Leoshkevich
---
  linux-user/signal.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 24/26] translator: always pair plugin_gen_insn_{start, end} calls

2023-01-11 Thread Richard Henderson

On 1/10/23 09:39, Alex Bennée wrote:

From: Emilio Cota

Related: #1381

Signed-off-by: Emilio Cota
Reviewed-by: Philippe Mathieu-Daudé
Message-Id:<20230108164731.61469-3-c...@braap.org>
Signed-off-by: Alex Bennée
---
  accel/tcg/translator.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [RFC PATCH v2 12/19] target/arm: Move hflags code into the tcg directory

2023-01-11 Thread Richard Henderson

On 1/9/23 14:42, Fabiano Rosas wrote:

+inline void assert_hflags_rebuild_correctly(CPUARMState *env)
+{
+#ifdef CONFIG_DEBUG_TCG
+CPUARMTBFlags c = env->hflags;
+CPUARMTBFlags r = rebuild_hflags_internal(env);
+
+if (unlikely(c.flags != r.flags || c.flags2 != r.flags2)) {
+fprintf(stderr, "TCG hflags mismatch "
+"(current:(0x%08x,0x" TARGET_FMT_lx ")"
+" rebuilt:(0x%08x,0x" TARGET_FMT_lx ")\n",
+c.flags, c.flags2, r.flags, r.flags2);
+abort();
+}
+#endif
+}


Inline marker non-functional.

Anyway, this should not be separated from cpu_get_tb_cpu_state, where it folds to no-op 
when --disable-debug-tcg.  Indeed, I once had it textually inlined in 
cpu_get_tb_cpu_state, before Philippe thought it looked nicer as a separate helper function.


Otherwise,
Reviewed-by: Richard Henderson 


r~



Re: [RFC PATCH v2 13/19] tests: do not run test-hmp on all machines for ARM KVM-only

2023-01-11 Thread Richard Henderson

On 1/11/23 04:36, Fabiano Rosas wrote:

Nowadays for KVM is the 'virt' machine the only one we use?


Also sbsa-ref.


r~



Re: [PATCH 02/18] hw/arm/boot: Include missing 'exec/cpu-all.h' header

2023-01-11 Thread Richard Henderson

On 1/10/23 08:43, Philippe Mathieu-Daudé wrote:

default_reset_secondary() uses address_space_stl_notdirty(),
itself declared in "exec/cpu-all.h". Include this header in
order to avoid when refactoring headers:

   ../hw/arm/boot.c:281:5: error: implicit declaration of function 
'address_space_stl_notdirty' is invalid in C99 
[-Werror,-Wimplicit-function-declaration]
 address_space_stl_notdirty(as, info->smp_bootreg_addr,
 ^

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/arm/boot.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 04/18] hw/arm: Use full "target/arm/cpu.h" path to include target's "cpu.h"

2023-01-11 Thread Richard Henderson

On 1/10/23 08:43, Philippe Mathieu-Daudé wrote:

First we want to introduce a new "cpu.h" header in the "hw/arm/"
namespace;


Do we?  Is that really the best name (not having seen its contents).


second we would like to get rid of '-I target/$ARCH/'
in the CPPFLAGS.
Use the full path to "cpu.h": "target/arm/cpu.h".


That seems a worthy goal for hw/ though.

Reviewed-by: Richard Henderson 


r~



Re: [PATCH 05/18] target/arm: Move CPU QOM type definitions to "hw/arm/cpu.h"

2023-01-11 Thread Richard Henderson

On 1/10/23 08:43, Philippe Mathieu-Daudé wrote:

+++ b/target/arm/cpu.h
@@ -26,6 +26,7 @@
  #include "cpu-qom.h"
  #include "exec/cpu-defs.h"
  #include "qapi/qapi-types-common.h"
+#include "hw/arm/cpu.h"


I'm not a fan of this.

If you want a smaller version of cpu-qom.h here in target/arm/, for use by hw/, that's one 
thing.  But target/ should not be reaching back into hw/, IMO.



r~



Re: [PATCH 16/26] semihosting: add semihosting section to the docs

2023-01-11 Thread Richard Henderson

On 1/10/23 09:39, Alex Bennée wrote:

The main reason to do this is to document our O_BINARY implementation
decision somewhere. However I've also moved some of the implementation
details out of qemu-options and added links between the two. As a
bonus I've highlighted the scary warnings about host access with the
appropriate RST tags.

Signed-off-by: Alex Bennée
---
  docs/about/features.rst| 10 ++---
  docs/specs/index.rst   |  1 +
  docs/specs/semihosting.rst | 79 ++
  qemu-options.hx| 27 +
  4 files changed, 95 insertions(+), 22 deletions(-)
  create mode 100644 docs/specs/semihosting.rst


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 1/4] hw: Remove hardcoded tabs (code style)

2023-01-11 Thread Richard Henderson

On 1/11/23 00:39, Philippe Mathieu-Daudé wrote:

We are going to modify this code, fix its style first to avoid
the following checkpatch.pl violations:

   ERROR: code indent should never use tabs
   ERROR: space prohibited between function name and open parenthesis '('

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/dma/etraxfs_dma.c | 196 +--
  hw/misc/mst_fpga.c   | 162 ++-
  2 files changed, 175 insertions(+), 183 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 01/18] hw/arm: Move various units to softmmu_ss[]

2023-01-11 Thread Richard Henderson

On 1/10/23 08:43, Philippe Mathieu-Daudé wrote:

arm_ss[] units are built twice: once for 32-bit word size and
once for 64-bit. The following units don't require any word
size knowledge and can be moved to softmmu_ss[] (where they
are built once):

  - smmu-common.c
  - exynos4_boards.c
  - bcm2835_peripherals.c
  - tosa.c

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/arm/meson.build | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 


r~



Re: [PATCH 03/18] target/arm/cpregs: Include missing 'target/arm/cpu.h' header

2023-01-11 Thread Richard Henderson

On 1/10/23 08:43, Philippe Mathieu-Daudé wrote:

+#include "target/arm/cpu.h"


Just "cpu.h" in this file.

Otherwise,
Reviewed-by: Richard Henderson 

r~



Re: [PATCH 21/26] util/qht: use striped locks under TSAN

2023-01-11 Thread Richard Henderson

On 1/10/23 09:39, Alex Bennée wrote:

From: Emilio Cota

Fixes this tsan crash, easy to reproduce with any large enough program:

$ tests/unit/test-qht
1..2
ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < 
(((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]" (0x40, 
0x40) (tid=1821568)
 #0 __tsan::CheckUnwind() 
../../../../src/libsanitizer/tsan/tsan_rtl.cpp:353 (libtsan.so.2+0x90034)
 #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long 
long, unsigned long long) 
../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86 
(libtsan.so.2+0xca555)
 #2 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, 
__sanitizer::BasicBitVector > >::addLock(unsigned long, unsigned long, 
unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:67 
(libtsan.so.2+0xb3616)
 #3 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, 
__sanitizer::BasicBitVector > >::addLock(unsigned long, unsigned long, 
unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:59 
(libtsan.so.2+0xb3616)
 #4 __sanitizer::DeadlockDetector<__sanitizer::TwoLevelBitVector<1ul, 
__sanitizer::BasicBitVector > 
>::onLockAfter(__sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, 
__sanitizer::BasicBitVector > >*, unsigned long, unsigned int) 
../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:216 (libtsan.so.2+0xb3616)
 #5 __sanitizer::DD::MutexAfterLock(__sanitizer::DDCallback*, 
__sanitizer::DDMutex*, bool, bool) 
../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector1.cpp:169
 (libtsan.so.2+0xb3616)
 #6 __tsan::MutexPostLock(__tsan::ThreadState*, unsigned long, unsigned 
long, unsigned int, int) 
../../../../src/libsanitizer/tsan/tsan_rtl_mutex.cpp:200 (libtsan.so.2+0xa3382)
 #7 __tsan_mutex_post_lock 
../../../../src/libsanitizer/tsan/tsan_interface_ann.cpp:384 
(libtsan.so.2+0x76bc3)
 #8 qemu_spin_lock /home/cota/src/qemu/include/qemu/thread.h:259 
(test-qht+0x44a97)
 #9 qht_map_lock_buckets ../util/qht.c:253 (test-qht+0x44a97)
 #10 do_qht_iter ../util/qht.c:809 (test-qht+0x45f33)
 #11 qht_iter ../util/qht.c:821 (test-qht+0x45f33)
 #12 iter_check ../tests/unit/test-qht.c:121 (test-qht+0xe473)
 #13 qht_do_test ../tests/unit/test-qht.c:202 (test-qht+0xe473)
 #14 qht_test ../tests/unit/test-qht.c:240 (test-qht+0xe7c1)
 #15 test_default ../tests/unit/test-qht.c:246 (test-qht+0xe828)
 #16   (libglib-2.0.so.0+0x7daed)
 #17   (libglib-2.0.so.0+0x7d80a)
 #18   (libglib-2.0.so.0+0x7d80a)
 #19 g_test_run_suite  (libglib-2.0.so.0+0x7dfe9)
 #20 g_test_run  (libglib-2.0.so.0+0x7e055)
 #21 main ../tests/unit/test-qht.c:259 (test-qht+0xd2c6)
 #22 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 
(libc.so.6+0x29d8f)
 #23 __libc_start_main_impl ../csu/libc-start.c:392 (libc.so.6+0x29e3f)
 #24 _start  (test-qht+0xdb44)

Signed-off-by: Emilio Cota
Message-Id:<20230109224954.161672-5-c...@braap.org>
Signed-off-by: Alex Bennée
---
  util/qht.c | 101 +
  1 file changed, 87 insertions(+), 14 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v3 2/3] accel/tcg: Add debuginfo support

2023-01-11 Thread Richard Henderson

On 1/10/23 17:47, Ilya Leoshkevich wrote:

ginfo.h
@@ -0,0 +1,77 @@
+/*
+ * Debug information support.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ACCEL_TCG_DEBUGINFO_H
+#define ACCEL_TCG_DEBUGINFO_H
+
+/*
+ * Debuginfo describing a certain address.
+ */
+struct debuginfo_query {
+unsigned long long address;  /* Input: address. */
+int flags;   /* Input: debuginfo subset. */
+const char *symbol;  /* Symbol that the address is part of. */
+unsigned long long offset;   /* Offset from the symbol. */
+const char *file;/* Source file associated with the address. */
+int line;/* Line number in the source file. */
+};


s/unsigned long long/uint64_t/g

Otherwise it looks reasonable.

r~



Re: [PATCH v2 3/4] bulk: Replace TARGET_FMT_plx -> HWADDR_PRIx

2023-01-11 Thread Richard Henderson

On 1/11/23 00:39, Philippe Mathieu-Daudé wrote:

The 'hwaddr' type is defined in "exec/hwaddr.h" as:

 hwaddr is the type of a physical address
(its size can be different from 'target_ulong').

All definitions use the 'HWADDR_' prefix, except TARGET_FMT_plx:

  $ fgrep define include/exec/hwaddr.h
  #define HWADDR_H
  #define HWADDR_BITS 64
  #define HWADDR_MAX UINT64_MAX
  #define TARGET_FMT_plx "%016" PRIx64
  ^^
  #define HWADDR_PRId PRId64
  #define HWADDR_PRIi PRIi64
  #define HWADDR_PRIo PRIo64
  #define HWADDR_PRIu PRIu64
  #define HWADDR_PRIx PRIx64
  #define HWADDR_PRIX PRIX64

Since hwaddr's size can be*different*  from target_ulong, it is
very confusing to read one of its format using the 'TARGET_FMT_'
prefix, normally used for the target_long / target_ulong types:

$ fgrep TARGET_FMT_ include/exec/cpu-defs.h
  #define TARGET_FMT_lx "%08x"
  #define TARGET_FMT_ld "%d"
  #define TARGET_FMT_lu "%u"
  #define TARGET_FMT_lx "%016" PRIx64
  #define TARGET_FMT_ld "%" PRId64
  #define TARGET_FMT_lu "%" PRIu64

Apparently this format was missed during commit a8170e5e97
("Rename target_phys_addr_t to hwaddr"), so complete it by
doing a bulk-replacement to '"%016" HWADDR_PRIx' using:

  $ sed -i -E \
-e 's/" ?TARGET_FMT_plx ?"/%016" HWADDR_PRIx "/g' \
-e 's/" ?TARGET_FMT_plx/%016" HWADDR_PRIx/g' \
-e 's/TARGET_FMT_plx ?"/"%016" HWADDR_PRIx "/g' \
$(git grep -l TARGET_FMT_plx)

and removing the definition from "exec/hwaddr.h".

Suggested-by: BALATON Zoltan
Signed-off-by: Philippe Mathieu-Daudé
---


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 25/26] tcg: exclude lookup_tb_ptr from helper instrumentation

2023-01-11 Thread Richard Henderson

On 1/10/23 09:39, Alex Bennée wrote:

From: Emilio Cota 

It is internal to TCG and therefore we know it does not
access guest memory.

Related: #1381

Signed-off-by: Emilio Cota 
Message-Id: <20230108164731.61469-4-c...@braap.org>
Signed-off-by: Alex Bennée 
---
  tcg/tcg.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index da91779890..ee67eefc0c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1652,8 +1652,10 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, 
TCGTemp **args)
  op = tcg_op_alloc(INDEX_op_call, total_args);
  
  #ifdef CONFIG_PLUGIN

-/* detect non-plugin helpers */
-if (tcg_ctx->plugin_insn && unlikely(strncmp(info->name, "plugin_", 7))) {
+/* flag helpers that are not internal to TCG */
+if (tcg_ctx->plugin_insn &&
+strncmp(info->name, "plugin_", 7) &&
+strcmp(info->name, "lookup_tb_ptr")) {
  tcg_ctx->plugin_insn->calls_helpers = true;
  }
  #endif


I think this should be detected with

  !(info->flags & TCG_CALL_NO_SIDE_EFFECTS)

i.e., side-effects, which in this case is the possibility of a fault.


r~



Re: [PATCH v2 2/4] bulk: Coding style fixes

2023-01-11 Thread Richard Henderson

On 1/11/23 00:39, Philippe Mathieu-Daudé wrote:

Fix the following checkpatch.pl violation on lines using the
TARGET_FMT_plx definition to avoid:

   WARNING: line over 80 characters

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/block/pflash_cfi01.c   |  5 +++--
  hw/char/digic-uart.c  |  8 
  hw/core/sysbus.c  |  3 ++-
  hw/dma/pl330.c| 16 +---
  hw/i386/multiboot.c   |  3 ++-
  hw/i386/xen/xen-hvm.c |  3 ++-
  hw/i386/xen/xen-mapcache.c| 13 -
  hw/intc/exynos4210_combiner.c | 20 ++--
  hw/misc/auxbus.c  |  3 ++-
  hw/net/allwinner_emac.c   |  8 
  hw/timer/digic-timer.c|  8 
  hw/timer/etraxfs_timer.c  |  3 +--
  softmmu/memory.c  |  3 ++-
  target/ppc/mmu-hash32.c   | 10 ++
  target/ppc/mmu_common.c   |  8 +---
  target/sparc/mmu_helper.c |  5 +++--
  16 files changed, 67 insertions(+), 52 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 03/26] gitlab: just use plain --cc=clang for custom runner build

2023-01-11 Thread Richard Henderson

On 1/10/23 09:38, Alex Bennée wrote:

I think this was because older Ubuntu's didn't alias clang to whatever
the latest version was. They do now so lets use that and not break.

Signed-off-by: Alex Bennée
---
  .gitlab-ci.d/custom-runners/ubuntu-22.04-aarch64.yml | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 11/18] hw/arm/digic: Remove unnecessary target_long use

2023-01-11 Thread Richard Henderson

On 1/10/23 08:43, Philippe Mathieu-Daudé wrote:

load_image_targphys(), declared in "hw/loader.h", returns a ssize_t.

Remove the 'target_long' type which size changes per target.

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/arm/digic_boards.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 20/26] thread: de-const qemu_spin_destroy

2023-01-11 Thread Richard Henderson

On 1/10/23 09:39, Alex Bennée wrote:

From: Emilio Cota

Signed-off-by: Emilio Cota
Reviewed-by: Alex Bennée
Message-Id:<20230109224954.161672-4-c...@braap.org>
Signed-off-by: Alex Bennée
---
  include/qemu/thread.h | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 18/26] cpu: free cpu->tb_jmp_cache with RCU

2023-01-11 Thread Richard Henderson

On 1/10/23 09:39, Alex Bennée wrote:

From: Emilio Cota

Fixes the appended use-after-free. The root cause is that
during tb invalidation we use CPU_FOREACH, and therefore
to safely free a vCPU we must wait for an RCU grace period
to elapse.

$ x86_64-linux-user/qemu-x86_64 tests/tcg/x86_64-linux-user/munmap-pthread
=
==1800604==ERROR: AddressSanitizer: heap-use-after-free on address 
0x62d0005f7418 at pc 0x5593da6704eb bp 0x7f4961a7ac70 sp 0x7f4961a7ac60
READ of size 8 at 0x62d0005f7418 thread T2
 #0 0x5593da6704ea in tb_jmp_cache_inval_tb ../accel/tcg/tb-maint.c:244
 #1 0x5593da6704ea in do_tb_phys_invalidate ../accel/tcg/tb-maint.c:290
 #2 0x5593da670631 in tb_phys_invalidate__locked ../accel/tcg/tb-maint.c:306
 #3 0x5593da670631 in tb_invalidate_phys_page_range__locked 
../accel/tcg/tb-maint.c:542
 #4 0x5593da67106d in tb_invalidate_phys_range ../accel/tcg/tb-maint.c:614
 #5 0x5593da6a64d4 in target_munmap ../linux-user/mmap.c:766
 #6 0x5593da6dba05 in do_syscall1 ../linux-user/syscall.c:10105
 #7 0x5593da6f564c in do_syscall ../linux-user/syscall.c:13329
 #8 0x5593da49e80c in cpu_loop ../linux-user/x86_64/../i386/cpu_loop.c:233
 #9 0x5593da6be28c in clone_func ../linux-user/syscall.c:6633
 #10 0x7f496231cb42 in start_thread nptl/pthread_create.c:442
 #11 0x7f49623ae9ff  (/lib/x86_64-linux-gnu/libc.so.6+0x1269ff)

0x62d0005f7418 is located 28696 bytes inside of 32768-byte region 
[0x62d0005f0400,0x62d0005f8400)
freed by thread T148 here:
 #0 0x7f49627b6460 in __interceptor_free 
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
 #1 0x5593da5ac057 in cpu_exec_unrealizefn ../cpu.c:180
 #2 0x5593da81f851  (/home/cota/src/qemu/build/qemu-x86_64+0x484851)

Signed-off-by: Emilio Cota
Message-Id:<20230109224954.161672-2-c...@braap.org>
Signed-off-by: Alex Bennée
---
  accel/tcg/tb-jmp-cache.h |  1 +
  accel/tcg/cpu-exec.c |  3 +--
  cpu.c| 11 ++-
  3 files changed, 12 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 17/26] tests/tcg: add memory-sve test for aarch64

2023-01-11 Thread Richard Henderson

On 1/10/23 09:39, Alex Bennée wrote:

This will be helpful in debugging problems with tracking SVE memory
accesses via the TCG plugins system.

Signed-off-by: Alex Bennée
Cc: Robert Henry
Cc: Aaron Lindsay
---
  tests/tcg/aarch64/Makefile.softmmu-target | 7 +++
  tests/tcg/aarch64/system/boot.S   | 3 ++-
  2 files changed, 9 insertions(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v5 10/11] hw/riscv/boot.c: consolidate all kernel init in riscv_load_kernel()

2023-01-11 Thread Alistair Francis
On Mon, Jan 2, 2023 at 9:55 PM Daniel Henrique Barboza
 wrote:
>
> The microchip_icicle_kit, sifive_u, spike and virt boards are now doing
> the same steps when '-kernel' is used:
>
> - execute load_kernel()
> - load init_rd()
> - write kernel_cmdline
>
> Let's fold everything inside riscv_load_kernel() to avoid code
> repetition. To not change the behavior of boards that aren't calling
> riscv_load_init(), add an 'load_initrd' flag to riscv_load_kernel() and
> allow these boards to opt out from initrd loading.
>
> Cc: Palmer Dabbelt 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/riscv/boot.c| 22 +++---
>  hw/riscv/microchip_pfsoc.c | 12 ++--
>  hw/riscv/opentitan.c   |  2 +-
>  hw/riscv/sifive_e.c|  3 ++-
>  hw/riscv/sifive_u.c| 12 ++--
>  hw/riscv/spike.c   | 11 +--
>  hw/riscv/virt.c| 12 ++--
>  include/hw/riscv/boot.h|  1 +
>  8 files changed, 30 insertions(+), 45 deletions(-)
>
> diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
> index 2594276223..4888d5c1e0 100644
> --- a/hw/riscv/boot.c
> +++ b/hw/riscv/boot.c
> @@ -175,10 +175,12 @@ target_ulong riscv_load_firmware(const char 
> *firmware_filename,
>
>  target_ulong riscv_load_kernel(MachineState *machine,
> target_ulong kernel_start_addr,
> +   bool load_initrd,
> symbol_fn_t sym_cb)
>  {
>  const char *kernel_filename = machine->kernel_filename;
>  uint64_t kernel_load_base, kernel_entry;
> +void *fdt = machine->fdt;
>
>  g_assert(kernel_filename != NULL);
>
> @@ -192,21 +194,35 @@ target_ulong riscv_load_kernel(MachineState *machine,
>  if (load_elf_ram_sym(kernel_filename, NULL, NULL, NULL,
>   NULL, &kernel_load_base, NULL, NULL, 0,
>   EM_RISCV, 1, 0, NULL, true, sym_cb) > 0) {
> -return kernel_load_base;
> +kernel_entry = kernel_load_base;

This breaks 32-bit Xvisor loading. It seems that for the 32-bit guest
we get a value of 0x8000.

Previously the top bits would be lost as we return a target_ulong from
this function, but with this change we pass the value
0x8000 to riscv_load_initrd() which causes failures.

This diff fixes the failure for me

diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index 4888d5c1e0..f08ed44b97 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -194,7 +194,7 @@ target_ulong riscv_load_kernel(MachineState *machine,
if (load_elf_ram_sym(kernel_filename, NULL, NULL, NULL,
 NULL, &kernel_load_base, NULL, NULL, 0,
 EM_RISCV, 1, 0, NULL, true, sym_cb) > 0) {
-kernel_entry = kernel_load_base;
+kernel_entry = (target_ulong) kernel_load_base;
goto out;
}


but I don't think that's the right fix. We should instead look at the
CPU XLEN and drop the high bits if required.

I'm going to drop this patch, do you mind looking into a proper fix?

Alistair



Re: [PATCH v6 2/4] hw/core/qdev-properties-system: Allow the 'slew' policy only on x86

2023-01-11 Thread Bernhard Beschow



Am 10. Januar 2023 09:53:49 UTC schrieb Thomas Huth :
>The 'slew' tick policy is currently enforced to be only available on
>x86 via some "#ifdef TARGET_I386" statements in mc146818rtc.c. We
>want to get rid of those #ifdefs, so we need a different way of
>checking whether the policy is allowed or not. Using the setter
>function in hw/core/qdev-properties-system.c seems to be a good
>place, so let's add a check here.
>
>Suggested-by: Mark Cave-Ayland 
>Signed-off-by: Thomas Huth 
>---
> hw/core/qdev-properties-system.c | 28 +++-
> 1 file changed, 27 insertions(+), 1 deletion(-)
>
>diff --git a/hw/core/qdev-properties-system.c 
>b/hw/core/qdev-properties-system.c
>index 54a09fa9ac..d42493f630 100644
>--- a/hw/core/qdev-properties-system.c
>+++ b/hw/core/qdev-properties-system.c
>@@ -33,6 +33,7 @@
> #include "net/net.h"
> #include "hw/pci/pci.h"
> #include "hw/pci/pcie.h"
>+#include "hw/i386/x86.h"
> #include "util/block-helpers.h"
> 
> static bool check_prop_still_unset(Object *obj, const char *name,
>@@ -558,13 +559,38 @@ void qdev_set_nic_properties(DeviceState *dev, NICInfo 
>*nd)
> 
> /* --- lost tick policy --- */
> 
>+static void qdev_propinfo_set_losttickpolicy(Object *obj, Visitor *v,
>+ const char *name, void *opaque,
>+ Error **errp)
>+{
>+Property *prop = opaque;
>+int *ptr = object_field_prop_ptr(obj, prop);
>+int value;
>+
>+if (!visit_type_enum(v, name, &value, prop->info->enum_table, errp)) {
>+return;
>+}
>+
>+if (value == LOST_TICK_POLICY_SLEW) {
>+MachineState *ms = MACHINE(qdev_get_machine());
>+
>+if (!object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
>+error_setg(errp,
>+   "the 'slew' policy is only available for x86 
>machines");
>+return;
>+}
>+}
>+
>+*ptr = value;
>+}
>+
> QEMU_BUILD_BUG_ON(sizeof(LostTickPolicy) != sizeof(int));
> 
> const PropertyInfo qdev_prop_losttickpolicy = {
> .name  = "LostTickPolicy",
> .enum_table  = &LostTickPolicy_lookup,
> .get   = qdev_propinfo_get_enum,
>-.set   = qdev_propinfo_set_enum,
>+.set   = qdev_propinfo_set_losttickpolicy,
> .set_default_value = qdev_propinfo_set_default_value_enum,
> };
> 

Reviewed-by: Bernhard Beschow 



Re: [PATCH v6 1/4] hw/intc: Extract the IRQ counting functions into a separate file

2023-01-11 Thread Bernhard Beschow



Am 10. Januar 2023 09:53:48 UTC schrieb Thomas Huth :
>These IRQ counting functions will soon be required in binaries that
>do not include the APIC code, too, so let's extract them into a
>separate file that can be linked independently of the APIC code.
>
>While we're at it, change the apic_* prefix into kvm_* since the
>functions are used from the i8259 PIC (i.e. not the APIC), too.
>
>Reviewed-by: Bernhard Beschow 
>Signed-off-by: Thomas Huth 
>---
> include/hw/i386/apic.h  |  2 --
> include/hw/i386/apic_internal.h |  1 -
> include/hw/intc/kvm_irqcount.h  | 10 +++
> hw/i386/kvm/i8259.c |  4 +--
> hw/i386/kvm/ioapic.c|  4 +--
> hw/intc/apic.c  |  3 +-
> hw/intc/apic_common.c   | 30 ++--
> hw/intc/kvm_irqcount.c  | 49 +
> hw/rtc/mc146818rtc.c|  6 ++--
> hw/intc/meson.build |  6 
> hw/intc/trace-events|  9 +++---
> 11 files changed, 81 insertions(+), 43 deletions(-)
> create mode 100644 include/hw/intc/kvm_irqcount.h
> create mode 100644 hw/intc/kvm_irqcount.c
>
>diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
>index da1d2fe155..bdc15a7a73 100644
>--- a/include/hw/i386/apic.h
>+++ b/include/hw/i386/apic.h
>@@ -9,8 +9,6 @@ int apic_accept_pic_intr(DeviceState *s);
> void apic_deliver_pic_intr(DeviceState *s, int level);
> void apic_deliver_nmi(DeviceState *d);
> int apic_get_interrupt(DeviceState *s);
>-void apic_reset_irq_delivered(void);
>-int apic_get_irq_delivered(void);
> void cpu_set_apic_base(DeviceState *s, uint64_t val);
> uint64_t cpu_get_apic_base(DeviceState *s);
> void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
>diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
>index 968b6648b3..5f2ba24bfc 100644
>--- a/include/hw/i386/apic_internal.h
>+++ b/include/hw/i386/apic_internal.h
>@@ -199,7 +199,6 @@ typedef struct VAPICState {
> 
> extern bool apic_report_tpr_access;
> 
>-void apic_report_irq_delivered(int delivered);
> bool apic_next_timer(APICCommonState *s, int64_t current_time);
> void apic_enable_tpr_access_reporting(DeviceState *d, bool enable);
> void apic_enable_vapic(DeviceState *d, hwaddr paddr);
>diff --git a/include/hw/intc/kvm_irqcount.h b/include/hw/intc/kvm_irqcount.h
>new file mode 100644
>index 00..0ed5999e49
>--- /dev/null
>+++ b/include/hw/intc/kvm_irqcount.h
>@@ -0,0 +1,10 @@
>+/* SPDX-License-Identifier: LGPL-2.1-or-later */
>+
>+#ifndef KVM_IRQCOUNT_H
>+#define KVM_IRQCOUNT_H
>+
>+void kvm_report_irq_delivered(int delivered);
>+void kvm_reset_irq_delivered(void);
>+int kvm_get_irq_delivered(void);
>+
>+#endif
>diff --git a/hw/i386/kvm/i8259.c b/hw/i386/kvm/i8259.c
>index d61bae4dc3..3ca0e1ff03 100644
>--- a/hw/i386/kvm/i8259.c
>+++ b/hw/i386/kvm/i8259.c
>@@ -14,7 +14,7 @@
> #include "hw/isa/i8259_internal.h"
> #include "hw/intc/i8259.h"
> #include "qemu/module.h"
>-#include "hw/i386/apic_internal.h"
>+#include "hw/intc/kvm_irqcount.h"
> #include "hw/irq.h"
> #include "sysemu/kvm.h"
> #include "qom/object.h"
>@@ -117,7 +117,7 @@ static void kvm_pic_set_irq(void *opaque, int irq, int 
>level)
> 
> pic_stat_update_irq(irq, level);
> delivered = kvm_set_irq(kvm_state, irq, level);
>-apic_report_irq_delivered(delivered);
>+kvm_report_irq_delivered(delivered);
> }
> 
> static void kvm_pic_realize(DeviceState *dev, Error **errp)
>diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
>index ee7c8ef68b..272e26b4a2 100644
>--- a/hw/i386/kvm/ioapic.c
>+++ b/hw/i386/kvm/ioapic.c
>@@ -15,7 +15,7 @@
> #include "hw/i386/x86.h"
> #include "hw/qdev-properties.h"
> #include "hw/i386/ioapic_internal.h"
>-#include "hw/i386/apic_internal.h"
>+#include "hw/intc/kvm_irqcount.h"
> #include "sysemu/kvm.h"
> 
> /* PC Utility function */
>@@ -116,7 +116,7 @@ static void kvm_ioapic_set_irq(void *opaque, int irq, int 
>level)
> 
> ioapic_stat_update_irq(common, irq, level);
> delivered = kvm_set_irq(kvm_state, s->kvm_gsi_base + irq, level);
>-apic_report_irq_delivered(delivered);
>+kvm_report_irq_delivered(delivered);
> }
> 
> static void kvm_ioapic_realize(DeviceState *dev, Error **errp)
>diff --git a/hw/intc/apic.c b/hw/intc/apic.c
>index 3df11c34d6..2d3e55f4e2 100644
>--- a/hw/intc/apic.c
>+++ b/hw/intc/apic.c
>@@ -22,6 +22,7 @@
> #include "hw/i386/apic.h"
> #include "hw/i386/ioapic.h"
> #include "hw/intc/i8259.h"
>+#include "hw/intc/kvm_irqcount.h"
> #include "hw/pci/msi.h"
> #include "qemu/host-utils.h"
> #include "sysemu/kvm.h"
>@@ -399,7 +400,7 @@ void apic_poll_irq(DeviceState *dev)
> 
> static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
> {
>-apic_report_irq_delivered(!apic_get_bit(s->irr, vector_num));
>+kvm_report_irq_delivered(!apic_get_bit(s->irr, vector_num));
> 
> apic_set_bit(s->irr, vector_num);
> if (trigger_mode)
>diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
>index 2a20982066..4a34f03047 

Re: [PATCH v6 25/33] hw/isa/piix4: Use TYPE_ISA_PIC device

2023-01-11 Thread Bernhard Beschow



Am 11. Januar 2023 17:08:28 UTC schrieb "Philippe Mathieu-Daudé" 
:
>On 9/1/23 18:23, Bernhard Beschow wrote:
>> Aligns the code with PIIX3 such that PIIXState can be used in PIIX4,
>> too.
>> 
>> Signed-off-by: Bernhard Beschow 
>> Reviewed-by: Michael S. Tsirkin 
>> Message-Id: <20221022150508.26830-33-shen...@gmail.com>
>> ---
>>   hw/isa/piix4.c  | 28 ++--
>>   hw/mips/malta.c | 11 +--
>>   hw/mips/Kconfig |  1 +
>>   3 files changed, 20 insertions(+), 20 deletions(-)
>
>> @@ -1459,7 +1461,12 @@ void mips_malta_init(MachineState *machine)
>>   pci_ide_create_devs(PCI_DEVICE(dev));
>> /* Interrupt controller */
>> -qdev_connect_gpio_out_named(DEVICE(piix4), "intr", 0, i8259_irq);
>> +dev = DEVICE(object_resolve_path_component(OBJECT(piix4), "pic"));
>> +i8259 = i8259_init(isa_bus, i8259_irq);
>> +for (i = 0; i < ISA_NUM_IRQS; i++) {
>> +qdev_connect_gpio_out(dev, i, i8259[i]);
>> +}
>> +g_free(i8259);
>> /* generate SPD EEPROM data */
>>   dev = DEVICE(object_resolve_path_component(OBJECT(piix4), "pm"));
>> diff --git a/hw/mips/Kconfig b/hw/mips/Kconfig
>> index 4e7042f03d..d156de812c 100644
>> --- a/hw/mips/Kconfig
>> +++ b/hw/mips/Kconfig
>> @@ -1,5 +1,6 @@
>>   config MALTA
>>   bool
>> +select I8259
>>   select ISA_SUPERIO
>>   select PIIX4
>
>The PIIX4 owns / exposes the I8259, so we don't need to select it here.
>The Malta board only initializes it. Why did you have to add this?

The Malta board instantiates a real I8259 while PIIX4 instantiates the 
TYPE_ISA_PIC proxy. Both are implemented pragmatically in the same source file, 
so both Malta and PIIX4 select it for different reasons.

In previous iterations using the TYPE_PROXY_PIC this might have been clearer: 
PIIX4 would select PROXY_PIC (and not I8259) while Malta would select I8259.

OK?

Best regards,
Bernhard



Re: [PATCH] linux-user: add more netlink protocol constants

2023-01-11 Thread Letu Ren
Ping! Hi, it's been more than a week since I submitted this patch. I
think it's a trivial patch. Maybe qemu trivial team could take a look.

Thanks,
Letu



[PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL

2023-01-11 Thread Stefan Berger
To prevent getting stuck on waitpid() in case the target process does
not terminate on SIGTERM, poll on waitpid() for 10s and if the target
process has not changed state until then send a SIGKILL to it.

Signed-off-by: Stefan Berger 
---
 tests/qtest/libqtest.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 2fbc3b88f3..362b1f724f 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
 {
 #ifndef _WIN32
 pid_t pid;
+uint64_t end;
+
+/* poll for 10s until sending SIGKILL */
+end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
+
+do {
+pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG);
+if (pid != 0) {
+break;
+}
+g_usleep(100 * 1000);
+} while (g_get_monotonic_time() < end);
+
+if (pid == 0) {
+kill(s->qemu_pid, SIGKILL);
+TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
+}
 
-TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
 assert(pid == s->qemu_pid);
 #else
 DWORD ret;
-- 
2.39.0




Re: [PATCH 11/15] block-backend: make queued_requests thread-safe

2023-01-11 Thread Stefan Hajnoczi
On Mon, Dec 12, 2022 at 01:59:16PM +0100, Paolo Bonzini wrote:
> Protect quiesce_counter and queued_requests with a QemuMutex, so that
> they are protected from concurrent access in the main thread (for example
> blk_root_drained_end() reached from bdrv_drain_all()) and in the iothread
> (where any I/O operation will call blk_inc_in_flight()).
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/block-backend.c | 44 +++
>  1 file changed, 36 insertions(+), 8 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 627d491d4155..fdf82f1f1599 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -23,6 +23,7 @@
>  #include "qapi/error.h"
>  #include "qapi/qapi-events-block.h"
>  #include "qemu/id.h"
> +#include "qemu/thread.h"
>  #include "qemu/main-loop.h"
>  #include "qemu/option.h"
>  #include "trace.h"
> @@ -85,6 +86,8 @@ struct BlockBackend {
>  NotifierList remove_bs_notifiers, insert_bs_notifiers;
>  QLIST_HEAD(, BlockBackendAioNotifier) aio_notifiers;
>  
> +/* Protected by quiesce_lock.  */
> +QemuMutex quiesce_lock;
>  int quiesce_counter;
>  CoQueue queued_requests;
>  
> @@ -372,6 +375,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
> uint64_t shared_perm)
>  
>  block_acct_init(&blk->stats);
>  
> +qemu_mutex_init(&blk->quiesce_lock);
>  qemu_co_queue_init(&blk->queued_requests);
>  notifier_list_init(&blk->remove_bs_notifiers);
>  notifier_list_init(&blk->insert_bs_notifiers);
> @@ -490,6 +494,7 @@ static void blk_delete(BlockBackend *blk)
>  assert(QLIST_EMPTY(&blk->insert_bs_notifiers.notifiers));
>  assert(QLIST_EMPTY(&blk->aio_notifiers));
>  QTAILQ_REMOVE(&block_backends, blk, link);
> +qemu_mutex_destroy(&blk->quiesce_lock);
>  drive_info_del(blk->legacy_dinfo);
>  block_acct_cleanup(&blk->stats);
>  g_free(blk);
> @@ -1451,11 +1456,25 @@ void blk_inc_in_flight(BlockBackend *blk)
>  {
>  IO_CODE();
>  qatomic_inc(&blk->in_flight);
> -if (!blk->disable_request_queuing) {
> -/* TODO: this is not thread-safe! */
> +
> +/*
> + * Avoid a continuous stream of requests from AIO callbacks, which
> + * call a user-provided callback while blk->in_flight is elevated,
> + * if the BlockBackend is being quiesced.
> + *
> + * This initial test does not have to be perfect: a race might
> + * cause an extra I/O to be queued, but sooner or later a nonzero
> + * quiesce_counter will be observed.
> + */
> +if (!blk->disable_request_queuing && 
> qatomic_read(&blk->quiesce_counter)) {
> +/*
> + * ... on the other hand, it is important that the final check and
> +  * the wait are done under the lock, so that no wakeups are missed.
> + */
> +QEMU_LOCK_GUARD(&blk->quiesce_lock);
>  while (blk->quiesce_counter) {
>  qatomic_dec(&blk->in_flight);
> -qemu_co_queue_wait(&blk->queued_requests, NULL);
> +qemu_co_queue_wait(&blk->queued_requests, &blk->quiesce_lock);
>  qatomic_inc(&blk->in_flight);
>  }
>  }
> @@ -2542,7 +2561,8 @@ static void blk_root_drained_begin(BdrvChild *child)
>  BlockBackend *blk = child->opaque;
>  ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
>  
> -if (++blk->quiesce_counter == 1) {
> +qatomic_set(&blk->quiesce_counter, blk->quiesce_counter + 1);

Can drained begin race with drained end? If yes, then this atomic set
looks racy because drained end might be updating the counter at around
the same time. We should probably hold quiesce_lock?

> +if (blk->quiesce_counter == 1) {
>  if (blk->dev_ops && blk->dev_ops->drained_begin) {
>  blk->dev_ops->drained_begin(blk->dev_opaque);
>  }
> @@ -2560,6 +2580,7 @@ static bool blk_root_drained_poll(BdrvChild *child)
>  {
>  BlockBackend *blk = child->opaque;
>  bool busy = false;
> +
>  assert(blk->quiesce_counter);
>  
>  if (blk->dev_ops && blk->dev_ops->drained_poll) {
> @@ -2576,14 +2597,21 @@ static void blk_root_drained_end(BdrvChild *child)
>  assert(blk->public.throttle_group_member.io_limits_disabled);
>  qatomic_dec(&blk->public.throttle_group_member.io_limits_disabled);
>  
> -if (--blk->quiesce_counter == 0) {
> +qemu_mutex_lock(&blk->quiesce_lock);
> +qatomic_set(&blk->quiesce_counter, blk->quiesce_counter - 1);
> +if (blk->quiesce_counter == 0) {
> +/* After this point, new I/O will not sleep on queued_requests.  */
> +qemu_mutex_unlock(&blk->quiesce_lock);
> +
>  if (blk->dev_ops && blk->dev_ops->drained_end) {
>  blk->dev_ops->drained_end(blk->dev_opaque);
>  }
> -while (qemu_co_enter_next(&blk->queued_requests, NULL)) {
> -/* Resume all queued requests */
> -}
> +
> +/* Now resume all queued requests */
> +qemu_mutex_lock

Re: [RFC] Notify IRQ sources of level interrupt ack/EOI

2023-01-11 Thread David Woodhouse
On Wed, 2023-01-11 at 13:00 -0700, Alex Williamson wrote:
> 
> Careful about making too many assumptions around PCI, it's clearly the
> most used but vfio is bus agnostic and we do have vfio-platform support
> as well as some weird s390 devices.  There's nothing PCI specific about
> a level triggered interrupt, so preferably all this sits a layer below
> the PCI interfaces.  Thanks,

True. PCI is just the most fun to wire up the ack though, because of
the aggregation and INTX line swizzling and all that nonsense.


smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC] Notify IRQ sources of level interrupt ack/EOI

2023-01-11 Thread Alex Williamson
On Wed, 11 Jan 2023 19:50:15 +
David Woodhouse  wrote:

> On Wed, 2023-01-11 at 12:43 -0700, Alex Williamson wrote:
> > On Wed, 11 Jan 2023 19:08:44 +
> > David Woodhouse  wrote:
> >   
> > > On Wed, 2023-01-11 at 11:29 -0700, Alex Williamson wrote:  
> > > > 
> > > > Nice.  IIRC, we ended up with the hack solution we have today in vfio
> > > > because there was too much resistance to callbacks that were only
> > > > necessary for vfio in the past.  Once we had KVM resampling support,
> > > > it simply wasn't worth the effort for a higher latency solution to
> > > > fight that battle, so we implemented what could best be described as
> > > > a universal workaround embedded in vfio.
> > > > 
> > > > Clearly a callback is preferable, and yes, that's how we operate with
> > > > KVM resampling and unmasking INTx, so in theory plumbing this to our
> > > > existing eoi callback and removing the region toggling ought to do
> > > > the right thing.  Thanks,    
> > > 
> > > Well, I'm happy for the Xen support be a second use case which means
> > > it's no longer "only necessary for VFIO", and keep prodding at it if
> > > that's going to be useful...  
> > 
> > Welcome aboard.  I take it from your cover letter than non-x86
> > architectures would be on your todo list.  Ideally the ack callback
> > would simply be a requirement for any implementation of a new interrupt
> > controller, but that's where we get into striking a balance of device
> > assignment imposing requirements on arbitrary architectures that may or
> > may not care, or even support, device assignment.  
> 
> Right. We'd probably want to do it for those interrupt controllers
> which could be behind PCI host controllers that might have VFIO devices
> attached.
>
> > This is the... dare I say, elegance of the region access hack.  It's
> > obviously not pretty or performant, but it universally provides an
> > approximation of the behavior of an emulated device, ie. some form of
> > guest access to the device is required to de-assert the interrupt.
> > 
> > We probably need some way to detect the interrupt controller support
> > for the callback mechanism so we can generate an appropriate user
> > warning to encourage development of that support and fall back to our
> > current hack for some degree of functionality.  Thanks,  
> 
> Yeah, I pondered that. It's not that hard to do; have the interrupt
> controller indicate that a given qemu_irq supports EOI notifications,
> and then when the VFIO or other event source calls
> qemu_set_irq_ack_callback() it can get a failure which will cause it to
> use the fallback mode.
> 
> Happy to look at wiring this up through the pci_set_irq() mechanisms if
> it's not going to be rejected out of hand.


Careful about making too many assumptions around PCI, it's clearly the
most used but vfio is bus agnostic and we do have vfio-platform support
as well as some weird s390 devices.  There's nothing PCI specific about
a level triggered interrupt, so preferably all this sits a layer below
the PCI interfaces.  Thanks,

Alex




Re: [RFC] Notify IRQ sources of level interrupt ack/EOI

2023-01-11 Thread David Woodhouse
On Wed, 2023-01-11 at 12:43 -0700, Alex Williamson wrote:
> On Wed, 11 Jan 2023 19:08:44 +
> David Woodhouse  wrote:
> 
> > On Wed, 2023-01-11 at 11:29 -0700, Alex Williamson wrote:
> > > 
> > > Nice.  IIRC, we ended up with the hack solution we have today in vfio
> > > because there was too much resistance to callbacks that were only
> > > necessary for vfio in the past.  Once we had KVM resampling support,
> > > it simply wasn't worth the effort for a higher latency solution to
> > > fight that battle, so we implemented what could best be described as
> > > a universal workaround embedded in vfio.
> > > 
> > > Clearly a callback is preferable, and yes, that's how we operate with
> > > KVM resampling and unmasking INTx, so in theory plumbing this to our
> > > existing eoi callback and removing the region toggling ought to do
> > > the right thing.  Thanks,  
> > 
> > Well, I'm happy for the Xen support be a second use case which means
> > it's no longer "only necessary for VFIO", and keep prodding at it if
> > that's going to be useful...
> 
> Welcome aboard.  I take it from your cover letter than non-x86
> architectures would be on your todo list.  Ideally the ack callback
> would simply be a requirement for any implementation of a new interrupt
> controller, but that's where we get into striking a balance of device
> assignment imposing requirements on arbitrary architectures that may or
> may not care, or even support, device assignment.

Right. We'd probably want to do it for those interrupt controllers
which could be behind PCI host controllers that might have VFIO devices
attached.

> This is the... dare I say, elegance of the region access hack.  It's
> obviously not pretty or performant, but it universally provides an
> approximation of the behavior of an emulated device, ie. some form of
> guest access to the device is required to de-assert the interrupt.
> 
> We probably need some way to detect the interrupt controller support
> for the callback mechanism so we can generate an appropriate user
> warning to encourage development of that support and fall back to our
> current hack for some degree of functionality.  Thanks,

Yeah, I pondered that. It's not that hard to do; have the interrupt
controller indicate that a given qemu_irq supports EOI notifications,
and then when the VFIO or other event source calls
qemu_set_irq_ack_callback() it can get a failure which will cause it to
use the fallback mode.

Happy to look at wiring this up through the pci_set_irq() mechanisms if
it's not going to be rejected out of hand.


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2 0/3] python/qemu/machine: fix potential hang in QMP accept

2023-01-11 Thread John Snow
On Tue, Jan 10, 2023 at 12:45 PM John Snow  wrote:
>
> On Tue, Jan 10, 2023 at 2:05 AM Marc-André Lureau
>  wrote:
> >
> > Hi John
> >
> > On Tue, Jan 10, 2023 at 1:06 AM John Snow  wrote:
> > >
> > > On Mon, Jul 25, 2022 at 7:23 AM Marc-André Lureau
> > >  wrote:
> > > >
> > > > Hi
> > > >
> > > > On Fri, Jul 1, 2022 at 2:51 AM John Snow  wrote:
> > > >>
> > > >> On Thu, Jun 30, 2022 at 8:34 AM  wrote:
> > > >> >
> > > >> > From: Marc-André Lureau 
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> > As reported earlier by Richard Henderson ("virgl avocado hang" 
> > > >> > thread), avocado
> > > >> > tests may hang when QEMU exits before the QMP connection is 
> > > >> > established.
> > > >> >
> > > >> > v2:
> > > >> >  - use a socketpair() for QMP (instead of async concurrent code from 
> > > >> > v1) as
> > > >> >suggested by Daniel Berrange.
> > > >> >  - should not regress (hopefully)
> > > >> >
> > > >> > Marc-André Lureau (3):
> > > >> >   python/qmp/protocol: add open_with_socket()
> > > >> >   python/qmp/legacy: make QEMUMonitorProtocol accept a socket
> > > >> >   python/qemu/machine: use socketpair() for QMP by default
> > > >> >
> > > >> >  python/qemu/machine/machine.py | 24 
> > > >> >  python/qemu/qmp/legacy.py  | 18 +++---
> > > >> >  python/qemu/qmp/protocol.py| 25 -
> > > >> >  3 files changed, 51 insertions(+), 16 deletions(-)
> > > >> >
> > > >> > --
> > > >> > 2.37.0.rc0
> > > >> >
> > > >>
> > > >> For anything that touches python/qemu/qmp/*, may I please ask that you
> > > >> submit them to https://gitlab.com/qemu-project/python-qemu-qmp ?
> > > >>
> > > >
> > > > Ok
> > > >
> > > >>
> > > >> (I'll review them in the meantime on-list just in the interest of
> > > >> moving things along.)
> > > >
> > > >
> > > > I was waiting for a review before updating the patches / moving to 
> > > > python-qemu-qmp.
> > > >
> > >
> > > Fair enough - can I kindly ask you to resend, though? My apologies and
> > > Happy New Year!
> >
> > I am confused, what's the relation between QEMU python/qemu/qmp and
> > https://gitlab.com/qemu-project/python-qemu-qmp ?
> >
> > When / how is the code sync ?
> >
>
> python-qemu-qmp supersedes the code that is in qemu.git.
> qemu.git/python/qemu/qmp is scheduled to be deleted, but there are
> changes that need to go in to configure etc to support the deletion
> first, and I've been backlogged/waylaid on making those changes.

... by which I mean, I generally do review and merge on the standalone
repo first, then backport to qemu.git. Or, that's what I'd prefer to
do, since the tooling and testing is more advanced on the standalone
repo.




Re: [RFC] Notify IRQ sources of level interrupt ack/EOI

2023-01-11 Thread Alex Williamson
On Wed, 11 Jan 2023 19:08:44 +
David Woodhouse  wrote:

> On Wed, 2023-01-11 at 11:29 -0700, Alex Williamson wrote:
> > 
> > Nice.  IIRC, we ended up with the hack solution we have today in vfio
> > because there was too much resistance to callbacks that were only
> > necessary for vfio in the past.  Once we had KVM resampling support,
> > it simply wasn't worth the effort for a higher latency solution to
> > fight that battle, so we implemented what could best be described as
> > a universal workaround embedded in vfio.
> > 
> > Clearly a callback is preferable, and yes, that's how we operate with
> > KVM resampling and unmasking INTx, so in theory plumbing this to our
> > existing eoi callback and removing the region toggling ought to do
> > the right thing.  Thanks,  
> 
> Well, I'm happy for the Xen support be a second use case which means
> it's no longer "only necessary for VFIO", and keep prodding at it if
> that's going to be useful...

Welcome aboard.  I take it from your cover letter than non-x86
architectures would be on your todo list.  Ideally the ack callback
would simply be a requirement for any implementation of a new interrupt
controller, but that's where we get into striking a balance of device
assignment imposing requirements on arbitrary architectures that may or
may not care, or even support, device assignment.

This is the... dare I say, elegance of the region access hack.  It's
obviously not pretty or performant, but it universally provides an
approximation of the behavior of an emulated device, ie. some form of
guest access to the device is required to de-assert the interrupt.

We probably need some way to detect the interrupt controller support
for the callback mechanism so we can generate an appropriate user
warning to encourage development of that support and fall back to our
current hack for some degree of functionality.  Thanks,

Alex




Re: [RFC] Notify IRQ sources of level interrupt ack/EOI

2023-01-11 Thread David Woodhouse
On Wed, 2023-01-11 at 11:29 -0700, Alex Williamson wrote:
> 
> Nice.  IIRC, we ended up with the hack solution we have today in vfio
> because there was too much resistance to callbacks that were only
> necessary for vfio in the past.  Once we had KVM resampling support,
> it simply wasn't worth the effort for a higher latency solution to
> fight that battle, so we implemented what could best be described as
> a universal workaround embedded in vfio.
> 
> Clearly a callback is preferable, and yes, that's how we operate with
> KVM resampling and unmasking INTx, so in theory plumbing this to our
> existing eoi callback and removing the region toggling ought to do
> the right thing.  Thanks,

Well, I'm happy for the Xen support be a second use case which means
it's no longer "only necessary for VFIO", and keep prodding at it if
that's going to be useful...


smime.p7s
Description: S/MIME cryptographic signature


Re: [RESEND PATCH 2/2] migration: report multiFd related thread pid to libvirt

2023-01-11 Thread Daniel P . Berrangé
On Wed, Jan 11, 2023 at 07:00:53PM +, Dr. David Alan Gilbert wrote:
> * Jiang Jiacheng via (qemu-devel@nongnu.org) wrote:
> > From: Zheng Chuan 
> > 
> > Report multiFd related thread pid to libvirt in order to
> > pin multiFd thread to different cpu.
> 
> With multifd you may well want to pin different multifd threads
> to different cores; so you need to include the 'id' and 'name' fields of
> the multifd thread in the event.

Are the 'id' / 'name' fields considered stable API for QEMU ?

IIRC, the mgmt app merely requests the number of multifd threads
and doesn't assign any identifying names/ids to them, unlike
iothreads where the mgmt app gives an explicit 'id'.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [RESEND PATCH 2/2] migration: report multiFd related thread pid to libvirt

2023-01-11 Thread Dr. David Alan Gilbert
* Jiang Jiacheng via (qemu-devel@nongnu.org) wrote:
> From: Zheng Chuan 
> 
> Report multiFd related thread pid to libvirt in order to
> pin multiFd thread to different cpu.

With multifd you may well want to pin different multifd threads
to different cores; so you need to include the 'id' and 'name' fields of
the multifd thread in the event.

(Copying in Jiri and Dan )

Dave

> ---
>  migration/multifd.c |  4 
>  qapi/migration.json | 12 
>  2 files changed, 16 insertions(+)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 000ca4d4ec..f3f7e8ae31 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -17,6 +17,7 @@
>  #include "exec/ramblock.h"
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-events-migration.h"
>  #include "ram.h"
>  #include "migration.h"
>  #include "socket.h"
> @@ -650,6 +651,9 @@ static void *multifd_send_thread(void *opaque)
>  int ret = 0;
>  bool use_zero_copy_send = migrate_use_zero_copy_send();
>  
> +/* report multifd thread pid to libvirt */
> +qapi_event_send_migration_multifd_pid(qemu_get_thread_id());
> +
>  trace_multifd_send_thread_start(p->id);
>  rcu_register_thread();
>  
> diff --git a/qapi/migration.json b/qapi/migration.json
> index aafc940617..33fc319329 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1286,6 +1286,18 @@
>  { 'event': 'MIGRATION_PASS',
>'data': { 'pass': 'int' } }
>  
> +##
> +# @MIGRATION_MULTIFD_PID:
> +#
> +# Emitted when multifd thread appear
> +#
> +# @pid: pid of multifd thread
> +#
> +# Since: 7.2
> +##
> +{ 'event': 'MIGRATION_MULTIFD_PID',
> +  'data': { 'pid': 'int' } }
> +
>  ##
>  # @MIGRATION_PID:
>  #
> -- 
> 2.33.0
> 
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [RFC] Notify IRQ sources of level interrupt ack/EOI

2023-01-11 Thread Alex Williamson
On Wed, 11 Jan 2023 16:58:37 +
David Woodhouse  wrote:

> On Wed, 2023-01-11 at 11:25 -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 11, 2023 at 02:41:58PM +, David Woodhouse wrote:  
> > > This allows drivers to register a callback on a qemu_irq, which is
> > > invoked when a level-triggered IRQ is acked on the irqchip.
> > > 
> > > This allows us to simulate level-triggered interrupts more efficiently,
> > > by resampling the state of the interrupt only when it actually matters.  
> > 
> > I think we tried this with vfio and had to give up on this.
> > I don't remember the details though. Alex probably does?  
> 
> Hm, not sure why there would be any insurmountable problems.
> 
> I've seen this working at scale in a different VMM with split-irqchip
> and PCI INTX + Xen event channel support.
> 
> And it's what the kernel does internally, and exposes through its dual-
> eventfd APIs in both KVM IRQ routing and VFIO interrupts.
> 
> That said, I don't care *that* much. I can live with the way I've done
> it for the Xen support, by polling the status on a vCPU0 vmexit.
> Looking at the VFIO code made me throw up in my mouth a little bit, but
> I just won't do that again... :)

Nice.  IIRC, we ended up with the hack solution we have today in vfio
because there was too much resistance to callbacks that were only
necessary for vfio in the past.  Once we had KVM resampling support, it
simply wasn't worth the effort for a higher latency solution to fight
that battle, so we implemented what could best be described as a
universal workaround embedded in vfio.

Clearly a callback is preferable, and yes, that's how we operate with
KVM resampling and unmasking INTx, so in theory plumbing this to our
existing eoi callback and removing the region toggling ought to do the
right thing.  Thanks,

Alex




Re: [PATCH v14 04/11] s390x/sclp: reporting the maximum nested topology entries

2023-01-11 Thread Nina Schoetterl-Glausch
On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:
> The maximum nested topology entries is used by the guest to know
> how many nested topology are available on the machine.
> 
> Currently, SCLP READ SCP INFO reports MNEST = 0, which is the
> equivalent of reporting the default value of 2.
> Let's use the default SCLP value of 2 and increase this value in the
> future patches implementing higher levels.
> 
> Signed-off-by: Pierre Morel 

Reviewed-by: Nina Schoetterl-Glausch 
if you address the issues Thomas found with the commit description
and the nits below.

> ---
>  include/hw/s390x/sclp.h | 5 +++--
>  hw/s390x/sclp.c | 4 
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/hw/s390x/sclp.h b/include/hw/s390x/sclp.h
> index 712fd68123..4ce852473c 100644
> --- a/include/hw/s390x/sclp.h
> +++ b/include/hw/s390x/sclp.h
> @@ -112,12 +112,13 @@ typedef struct CPUEntry {
>  } QEMU_PACKED CPUEntry;
>  
>  #define SCLP_READ_SCP_INFO_FIXED_CPU_OFFSET 128
> -#define SCLP_READ_SCP_INFO_MNEST2
> +#define SCLP_READ_SCP_INFO_MNEST4
>  typedef struct ReadInfo {
>  SCCBHeader h;
>  uint16_t rnmax;
>  uint8_t rnsize;
> -uint8_t  _reserved1[16 - 11];   /* 11-15 */
> +uint8_t  _reserved1[15 - 11];   /* 11-14 */
> +uint8_t  stsi_parm; /* 15-16 */

The numbering here is the same as the one in the arch doc, instead
of the maybe more usual one where the right number is exclusive.
So 15-16 looks like a two byte field, so just do 15 or just drop it.

>  uint16_t entries_cpu;   /* 16-17 */
>  uint16_t offset_cpu;/* 18-19 */
>  uint8_t  _reserved2[24 - 20];   /* 20-23 */
> diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
> index eff74479f4..07e3cb4cac 100644
> --- a/hw/s390x/sclp.c
> +++ b/hw/s390x/sclp.c
> @@ -20,6 +20,7 @@
>  #include "hw/s390x/event-facility.h"
>  #include "hw/s390x/s390-pci-bus.h"
>  #include "hw/s390x/ipl.h"
> +#include "hw/s390x/cpu-topology.h"
>  
>  static inline SCLPDevice *get_sclp_device(void)
>  {
> @@ -125,6 +126,9 @@ static void read_SCP_info(SCLPDevice *sclp, SCCB *sccb)
>  
>  /* CPU information */
>  prepare_cpu_entries(machine, entries_start, &cpu_count);
> +if (s390_has_topology()) {
> +read_info->stsi_parm = SCLP_READ_SCP_INFO_MNEST;
> +}

Maybe move that up a bit, not sure if it belongs under the CPU information 
section.
I'd leave prepare_cpu_entries and read_info->entries_cpu = adjacent in any case.

>  read_info->entries_cpu = cpu_to_be16(cpu_count);
>  read_info->offset_cpu = cpu_to_be16(offset_cpu);
>  read_info->highest_cpu = cpu_to_be16(machine->smp.max_cpus - 1);




Re: [PATCH 5/8] hw/i386/acpi: Drop duplicate _UID entry for CXL root bridge

2023-01-11 Thread Ira Weiny
On Wed, Jan 11, 2023 at 02:24:37PM +, Jonathan Cameron wrote:
> Noticed as this prevents iASL disasembling the DSDT table.
> 
> Signed-off-by: Jonathan Cameron 

Reviewed-by: Ira Weiny 

> ---
>  hw/i386/acpi-build.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 127c4e2d50..a584b62ae2 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1482,7 +1482,6 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>  aml_append(pkg, aml_eisaid("PNP0A03"));
>  aml_append(dev, aml_name_decl("_CID", pkg));
>  aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> -aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
>  build_cxl_osc_method(dev);
>  } else if (pci_bus_is_express(bus)) {
>  aml_append(dev, aml_name_decl("_HID", 
> aml_eisaid("PNP0A08")));
> -- 
> 2.37.2
> 



Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)

2023-01-11 Thread David Hildenbrand


Not against it if you want to keep exploring, but if so please avoid using
the priority field, also I'd hope the early vmsd will be saved within
qemu_savevm_state_setup() so maybe it can be another alternative to
save_setup().

Maybe it's still easier we keep going with the save_setup() and the shim
functions above, if it works for you.


I'll happy to go the path you suggested if we can make it work. I'd be happy
to have something "reasonable" for the virtio-mem device in the
analyze-migration.py output. But I could live with just nothing useful for
the device itself -- as long as at least the other devices still show up.


So, let's see whether we can go back to the load_state() approach..

static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
   .save_setup = virtio_mem_save_setup_early,
   .save_state = virtio_mem_save_state_early,
   .load_state = virtio_mem_load_state_early,
};

vfio handled it using a single header flag for either save_setup() or
save_state(), then load_state() identifies it:

 data = qemu_get_be64(f);
 ...
 switch (data) {
 case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
 ...
 case VFIO_MIG_FLAG_DEV_SETUP_STATE:
 ...

Maybe play the same trick here?  The flag needs to be hard coded but at
least not the vmsd.  Based on previous experience, I could have missed
something else, though. :)


I could also remember "internally" if load_state() was already called 
throughout the migartion I think.


But, IIUC, it will still make analyze-migration.py produce wrong 
results, because of the vmdesc mismatch.




Or if you like go the other way I'm fine too.


IMHO my approach will be cleaner on the device side but will require 
migration code changes. I'll try getting that as clean as possible for 
now and resend. If there are more ideas on how to get the other approach 
running, I'll be happy to try.


Thanks!

--
Thanks,

David / dhildenb




Re: [PATCH 3/8] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL

2023-01-11 Thread Ira Weiny
On Wed, Jan 11, 2023 at 02:24:35PM +, Jonathan Cameron wrote:
> From: Gregory Price 
> 
> Current code sets to STORAGE_EXPRESS and then overrides it.
> 
> Signed-off-by: Gregory Price 
> Reviewed-by: Davidlohr Bueso 

Reviewed-by: Ira Weiny 

> Signed-off-by: Jonathan Cameron 
> ---
>  hw/mem/cxl_type3.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 252822bd82..217a5e639b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -408,7 +408,6 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>  }
>  
>  pci_config_set_prog_interface(pci_conf, 0x10);
> -pci_config_set_class(pci_conf, PCI_CLASS_MEMORY_CXL);
>  
>  pcie_endpoint_cap_init(pci_dev, 0x80);
>  if (ct3d->sn != UI64_NULL) {
> @@ -627,7 +626,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
>  
>  pc->realize = ct3_realize;
>  pc->exit = ct3_exit;
> -pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
> +pc->class_id = PCI_CLASS_MEMORY_CXL;
>  pc->vendor_id = PCI_VENDOR_ID_INTEL;
>  pc->device_id = 0xd93; /* LVF for now */
>  pc->revision = 1;
> -- 
> 2.37.2
> 



Re: [PATCH 2/8] hw/pci-bridge/cxl_downstream: Fix type naming mismatch

2023-01-11 Thread Ira Weiny
On Wed, Jan 11, 2023 at 02:24:34PM +, Jonathan Cameron wrote:
> Fix capitalization difference between struct name and typedef.
> 
> Signed-off-by: Jonathan Cameron 

Reviewed-by: Ira Weiny 

> ---
>  hw/pci-bridge/cxl_downstream.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/pci-bridge/cxl_downstream.c b/hw/pci-bridge/cxl_downstream.c
> index 3d4e6b59cd..54f507318f 100644
> --- a/hw/pci-bridge/cxl_downstream.c
> +++ b/hw/pci-bridge/cxl_downstream.c
> @@ -15,7 +15,7 @@
>  #include "hw/pci/pcie_port.h"
>  #include "qapi/error.h"
>  
> -typedef struct CXLDownStreamPort {
> +typedef struct CXLDownstreamPort {
>  /*< private >*/
>  PCIESlot parent_obj;
>  
> -- 
> 2.37.2
> 



Re: [PATCH 1/8] hw/mem/cxl_type3: Improve error handling in realize()

2023-01-11 Thread Ira Weiny
On Wed, Jan 11, 2023 at 02:24:33PM +, Jonathan Cameron wrote:
> msix_init_exclusive_bar() can fail, so if it does cleanup the address space.
> 
> Signed-off-by: Jonathan Cameron 

Reviewed-by: Ira Weiny 

> ---
>  hw/mem/cxl_type3.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index dae4fd89ca..252822bd82 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -401,7 +401,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>  MemoryRegion *mr = ®s->component_registers;
>  uint8_t *pci_conf = pci_dev->config;
>  unsigned short msix_num = 1;
> -int i;
> +int i, rc;
>  
>  if (!cxl_setup_memory(ct3d, errp)) {
>  return;
> @@ -438,7 +438,10 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>   &ct3d->cxl_dstate.device_registers);
>  
>  /* MSI(-X) Initailization */
> -msix_init_exclusive_bar(pci_dev, msix_num, 4, NULL);
> +rc = msix_init_exclusive_bar(pci_dev, msix_num, 4, NULL);
> +if (rc) {
> +goto err_address_space_free;
> +}
>  for (i = 0; i < msix_num; i++) {
>  msix_vector_use(pci_dev, i);
>  }
> @@ -450,6 +453,11 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>  cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table;
>  cxl_cstate->cdat.private = ct3d;
>  cxl_doe_cdat_init(cxl_cstate, errp);
> +return;
> +
> +err_address_space_free:
> +address_space_destroy(&ct3d->hostmem_as);
> +return;
>  }
>  
>  static void ct3_exit(PCIDevice *pci_dev)
> -- 
> 2.37.2
> 



Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)

2023-01-11 Thread Peter Xu
On Wed, Jan 11, 2023 at 05:58:36PM +0100, David Hildenbrand wrote:
> On 11.01.23 17:35, Peter Xu wrote:
> > On Wed, Jan 11, 2023 at 02:48:09PM +0100, David Hildenbrand wrote:
> > > On 10.01.23 21:03, Peter Xu wrote:
> > > > On Tue, Jan 10, 2023 at 12:52:32PM +0100, David Hildenbrand wrote:
> > > > > The following seems to work,
> > > > 
> > > > That looks much better at least from the diffstat pov (comparing to the
> > > > existing patch 1+5 and the framework changes), thanks.
> > > > 
> > > > > but makes analyze-migration.py angry:
> > > > > 
> > > > > $ ./scripts/analyze-migration.py -f STATEFILE
> > > > > Traceback (most recent call last):
> > > > > File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", 
> > > > > line 605, in 
> > > > >   dump.read(dump_memory = args.memory)
> > > > > File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", 
> > > > > line 539, in read
> > > > >   classdesc = self.section_classes[section_key]
> > > > >   ^
> > > > > KeyError: (':00:03.0/virtio-mem-early', 0)
> > > > > 
> > > > > 
> > > > > We need the vmdesc to create info for the device.
> > > > 
> > > > Migration may ignore the save entry if save_state() not provided in the
> > > > "devices" section:
> > > > 
> > > >   if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
> > > >   continue;
> > > >   }
> > > > 
> > > > Could you try providing a shim save_state() for the new virtio-mem save
> > > > entry?
> > > > 
> > > > /*
> > > >* Shim function to make sure the save entry will be dumped into 
> > > > "devices"
> > > >* section, to make analyze-migration.py happy.
> > > >*/
> > > > static void virtio_mem_save_state_early(QEMUFile *file, void *opaque)
> > > > {
> > > > }
> > > > 
> > > > Then:
> > > > 
> > > > static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
> > > >   .save_setup = virtio_mem_save_setup_early,
> > > >   .save_state = virtio_mem_save_state_early,
> > > >   .load_state = virtio_mem_load_state_early,
> > > > };
> > > > 
> > > > I'm not 100% sure it'll work yet, but maybe worth trying.
> > > 
> > > It doesn't. virtio_mem_load_state_early() will get called twice (once with
> > > state saved during save_setup() and once with effectively nothing during
> > > save_state(), which breaks the whole migration).
> > 
> > Oh hold on.. so load_state() to pair save_setup()? Maybe you should pair it
> > with load_setup(), which I just noticed..  Then the load_state() needs to
> > be another shim like save_state().
> 
> Let me see if that improves the situation. E.g., ram.c writes state in
> save_setup() but doesn't load any state in load_setup() -- it's all done in
> load_state().
> 
> ... no, still not working. It will read garbage during load_setup() now.
> 
> qemu-system-x86_64: Property 'memaddr' changed from 0x10002037261 to
> 0x14000
> qemu-system-x86_64: Failed to load virtio-mem-device-early:tmp
> qemu-system-x86_64: Load state of device :00:03.0/virtio-mem-early
> failed
> qemu-system-x86_64: load of migration failed: Invalid argument
> 
> 
> I don't think that load_setup() is supposed to consume anything useful from
> the migration stream. I suspects it should actually not even consume a QEMU
> file right now, because they way it's used is just for initializing stuff.
> 
> qemu_loadvm_state_main() does the actual loading part, parsing sections etc.
> qemu_loadvm_state_setup() doesn't do any of that.
> 
> AFAIKS, at least qemu_loadvm_state_setup() would have to parse sections and
> the save_setup() users would have to be converted into using load_setup() as
> well. Not sure if more is missing.

Ouch, yeah, load_setup() is unfortunate. :(

I think load_setup() should be named load_init() without having the
qemufile at all.  But let's keep that aside for now, and see what other
option we have..

> 
> Even with that, I doubt that it would make analyze-migration.py work,
> because what we store is something different than what we record in the
> vmdesc.
> 
> > 
> > > 
> > > vmdesc handling is also wrong, because analyze-migration.py gets confused
> > > because it receives data stored during save_setup() but vmdesc created
> > > during save_state() was told that there would be "nothing" to interpret 
> > > ...
> > > 
> > > $ ./scripts/analyze-migration.py -f STATEFILE
> > > {
> > >  "ram (2)": {
> > >  "section sizes": {
> > >  ":00:03.0/mem0": "0x000f",
> > >  "pc.ram": "0x0001",
> > >  "/rom@etc/acpi/tables": "0x0002",
> > >  "pc.bios": "0x0004",
> > >  ":00:02.0/e1000.rom": "0x0004",
> > >  "pc.rom": "0x0002",
> > >  "/rom@etc/table-loader": "0x1000",
> > >  "/rom@etc/acpi/rsdp": "0x1000"
> > >  }
> > >  },
> > >  ":00

[PATCH 06/10] hw/riscv: use ms->fdt in riscv_socket_fdt_write_distance_matrix()

2023-01-11 Thread Daniel Henrique Barboza
There's no need to use a MachineState pointer and a fdt pointer now that
all RISC-V machines are using the FDT from the MachineState.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/numa.c | 8 
 hw/riscv/spike.c| 2 +-
 hw/riscv/virt.c | 2 +-
 include/hw/riscv/numa.h | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
index f4343f5cde..4720102561 100644
--- a/hw/riscv/numa.c
+++ b/hw/riscv/numa.c
@@ -164,7 +164,7 @@ void riscv_socket_fdt_write_id(const MachineState *ms, 
const char *node_name,
 }
 }
 
-void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt)
+void riscv_socket_fdt_write_distance_matrix(const MachineState *ms)
 {
 int i, j, idx;
 uint32_t *dist_matrix, dist_matrix_size;
@@ -184,10 +184,10 @@ void riscv_socket_fdt_write_distance_matrix(const 
MachineState *ms, void *fdt)
 }
 }
 
-qemu_fdt_add_subnode(fdt, "/distance-map");
-qemu_fdt_setprop_string(fdt, "/distance-map", "compatible",
+qemu_fdt_add_subnode(ms->fdt, "/distance-map");
+qemu_fdt_setprop_string(ms->fdt, "/distance-map", "compatible",
 "numa-distance-map-v1");
-qemu_fdt_setprop(fdt, "/distance-map", "distance-matrix",
+qemu_fdt_setprop(ms->fdt, "/distance-map", "distance-matrix",
  dist_matrix, dist_matrix_size);
 g_free(dist_matrix);
 }
diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 05d34651cb..91bf194ec1 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -174,7 +174,7 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 g_free(clust_name);
 }
 
-riscv_socket_fdt_write_distance_matrix(mc, fdt);
+riscv_socket_fdt_write_distance_matrix(mc);
 
 qemu_fdt_add_subnode(fdt, "/chosen");
 qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", "/htif");
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 1d3bd25cb5..e374b58f89 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -805,7 +805,7 @@ static void create_fdt_sockets(RISCVVirtState *s, const 
MemMapEntry *memmap,
 }
 }
 
-riscv_socket_fdt_write_distance_matrix(mc, mc->fdt);
+riscv_socket_fdt_write_distance_matrix(mc);
 }
 
 static void create_fdt_virtio(RISCVVirtState *s, const MemMapEntry *memmap,
diff --git a/include/hw/riscv/numa.h b/include/hw/riscv/numa.h
index 634df6673f..8f5280211d 100644
--- a/include/hw/riscv/numa.h
+++ b/include/hw/riscv/numa.h
@@ -100,9 +100,9 @@ void riscv_socket_fdt_write_id(const MachineState *ms, 
const char *node_name,
  * @ms: pointer to machine state
  * @socket_id: socket index
  *
- * Write NUMA distance matrix in FDT for given machine
+ * Write NUMA distance matrix in MachineState->fdt
  */
-void riscv_socket_fdt_write_distance_matrix(const MachineState *ms, void *fdt);
+void riscv_socket_fdt_write_distance_matrix(const MachineState *ms);
 
 CpuInstanceProperties
 riscv_numa_cpu_index_to_props(MachineState *ms, unsigned cpu_index);
-- 
2.39.0




[PATCH] hw/mips/boston.c: rename MachineState 'mc' pointer to 'ms'

2023-01-11 Thread Daniel Henrique Barboza
Follow the QEMU convention of naming MachineState pointers as 'ms' by
renaming the instance in create_fdt() where we're calling it 'mc'.

Cc: Paul Burton 
Cc: Aleksandar Rikalo 
Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/mips/boston.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/mips/boston.c b/hw/mips/boston.c
index edda87e23c..6ef6b020a2 100644
--- a/hw/mips/boston.c
+++ b/hw/mips/boston.c
@@ -515,7 +515,7 @@ static const void *create_fdt(BostonState *s,
 {
 void *fdt;
 int cpu;
-MachineState *mc = s->mach;
+MachineState *ms = s->mach;
 uint32_t platreg_ph, gic_ph, clk_ph;
 char *name, *gic_name, *platreg_name, *stdout_name;
 static const char * const syscon_compat[2] = {
@@ -542,7 +542,7 @@ static const void *create_fdt(BostonState *s,
 qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
 qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
 
-for (cpu = 0; cpu < mc->smp.cpus; cpu++) {
+for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
 name = g_strdup_printf("/cpus/cpu@%d", cpu);
 qemu_fdt_add_subnode(fdt, name);
 qemu_fdt_setprop_string(fdt, name, "compatible", "img,mips");
-- 
2.39.0




[PATCH 07/10] hw/riscv: simplify riscv_load_fdt()

2023-01-11 Thread Daniel Henrique Barboza
All callers of riscv_load_fdt() are using machine->ram_size as
'mem_size' and the fdt is always retrievable via machine->fdt.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/boot.c| 4 +++-
 hw/riscv/microchip_pfsoc.c | 4 ++--
 hw/riscv/sifive_u.c| 3 +--
 hw/riscv/spike.c   | 3 +--
 hw/riscv/virt.c| 3 +--
 include/hw/riscv/boot.h| 2 +-
 6 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index e868fb6ade..21dea7eac2 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -265,10 +265,12 @@ out:
 return kernel_entry;
 }
 
-uint64_t riscv_load_fdt(hwaddr dram_base, uint64_t mem_size, void *fdt)
+uint64_t riscv_load_fdt(MachineState *ms, hwaddr dram_base)
 {
 uint64_t temp, fdt_addr;
+uint64_t mem_size = ms->ram_size;
 hwaddr dram_end = dram_base + mem_size;
+void *fdt = ms->fdt;
 int ret, fdtsize = fdt_totalsize(fdt);
 
 if (fdtsize <= 0) {
diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
index c45023a2b1..6bb08f66bd 100644
--- a/hw/riscv/microchip_pfsoc.c
+++ b/hw/riscv/microchip_pfsoc.c
@@ -633,8 +633,8 @@ static void microchip_icicle_kit_machine_init(MachineState 
*machine)
  true, NULL);
 
 /* Compute the fdt load address in dram */
-fdt_load_addr = riscv_load_fdt(memmap[MICROCHIP_PFSOC_DRAM_LO].base,
-   machine->ram_size, machine->fdt);
+fdt_load_addr = riscv_load_fdt(machine,
+   memmap[MICROCHIP_PFSOC_DRAM_LO].base);
 /* Load the reset vector */
 riscv_setup_rom_reset_vec(machine, &s->soc.u_cpus, firmware_load_addr,
   memmap[MICROCHIP_PFSOC_ENVM_DATA].base,
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index ccad386920..fc2a8a7af4 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -609,8 +609,7 @@ static void sifive_u_machine_init(MachineState *machine)
 }
 
 /* Compute the fdt load address in dram */
-fdt_load_addr = riscv_load_fdt(memmap[SIFIVE_U_DEV_DRAM].base,
-   machine->ram_size, machine->fdt);
+fdt_load_addr = riscv_load_fdt(machine, memmap[SIFIVE_U_DEV_DRAM].base);
 if (!riscv_is_32bit(&s->soc.u_cpus)) {
 start_addr_hi32 = (uint64_t)start_addr >> 32;
 }
diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 91bf194ec1..82093dd2cb 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -316,8 +316,7 @@ static void spike_board_init(MachineState *machine)
 }
 
 /* Compute the fdt load address in dram */
-fdt_load_addr = riscv_load_fdt(memmap[SPIKE_DRAM].base,
-   machine->ram_size, machine->fdt);
+fdt_load_addr = riscv_load_fdt(machine, memmap[SPIKE_DRAM].base);
 
 /* load the reset vector */
 riscv_setup_rom_reset_vec(machine, &s->soc[0], memmap[SPIKE_DRAM].base,
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index e374b58f89..0a0252368e 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1300,8 +1300,7 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
 }
 
 /* Compute the fdt load address in dram */
-fdt_load_addr = riscv_load_fdt(memmap[VIRT_DRAM].base,
-   machine->ram_size, machine->fdt);
+fdt_load_addr = riscv_load_fdt(machine, memmap[VIRT_DRAM].base);
 /* load the reset vector */
 riscv_setup_rom_reset_vec(machine, &s->soc[0], start_addr,
   virt_memmap[VIRT_MROM].base,
diff --git a/include/hw/riscv/boot.h b/include/hw/riscv/boot.h
index cbd131bad7..3581bbe447 100644
--- a/include/hw/riscv/boot.h
+++ b/include/hw/riscv/boot.h
@@ -47,7 +47,7 @@ target_ulong riscv_load_kernel(MachineState *machine,
target_ulong firmware_end_addr,
bool load_initrd,
symbol_fn_t sym_cb);
-uint64_t riscv_load_fdt(hwaddr dram_start, uint64_t dram_size, void *fdt);
+uint64_t riscv_load_fdt(MachineState *ms, hwaddr dram_start);
 void riscv_setup_rom_reset_vec(MachineState *machine, RISCVHartArrayState 
*harts,
hwaddr saddr,
hwaddr rom_base, hwaddr rom_size,
-- 
2.39.0




[PATCH 03/10] hw/riscv/sifive_u.c: simplify create_fdt()

2023-01-11 Thread Daniel Henrique Barboza
'cmdline' isn't being used. Remove it.

A MachineState pointer is being retrieved via a MACHINE() macro calling
qdev_get_machine(). Use MACHINE(s) instead to avoid calling qdev().

 'mem_size' is being set as machine->ram_size by the caller. Retrieve it
via ms->ram_size.

Cc: Palmer Dabbelt 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/sifive_u.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 9a75d4aa62..ccad386920 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -94,9 +94,10 @@ static const MemMapEntry sifive_u_memmap[] = {
 #define GEM_REVISION0x10070109
 
 static void create_fdt(SiFiveUState *s, const MemMapEntry *memmap,
-   uint64_t mem_size, const char *cmdline, bool is_32_bit)
+   bool is_32_bit)
 {
-MachineState *ms = MACHINE(qdev_get_machine());
+MachineState *ms = MACHINE(s);
+uint64_t mem_size = ms->ram_size;
 void *fdt;
 int cpu, fdt_size;
 uint32_t *cells;
@@ -560,8 +561,7 @@ static void sifive_u_machine_init(MachineState *machine)
   qemu_allocate_irq(sifive_u_machine_reset, NULL, 0));
 
 /* create device tree */
-create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline,
-   riscv_is_32bit(&s->soc.u_cpus));
+create_fdt(s, memmap, riscv_is_32bit(&s->soc.u_cpus));
 
 if (s->start_in_flash) {
 /*
-- 
2.39.0




[PATCH 05/10] hw/riscv: use MachineState::fdt in riscv_socket_fdt_write_id()

2023-01-11 Thread Daniel Henrique Barboza
There's no need to use a MachineState pointer and a fdt pointer now that
all RISC-V machines are using the FDT from the MachineState.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/numa.c |  6 +++---
 hw/riscv/spike.c|  6 +++---
 hw/riscv/virt.c | 18 +-
 include/hw/riscv/numa.h |  6 +++---
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
index 7fe92d402f..f4343f5cde 100644
--- a/hw/riscv/numa.c
+++ b/hw/riscv/numa.c
@@ -156,11 +156,11 @@ uint64_t riscv_socket_mem_size(const MachineState *ms, 
int socket_id)
 ms->numa_state->nodes[socket_id].node_mem : 0;
 }
 
-void riscv_socket_fdt_write_id(const MachineState *ms, void *fdt,
-   const char *node_name, int socket_id)
+void riscv_socket_fdt_write_id(const MachineState *ms, const char *node_name,
+   int socket_id)
 {
 if (numa_enabled(ms)) {
-qemu_fdt_setprop_cell(fdt, node_name, "numa-node-id", socket_id);
+qemu_fdt_setprop_cell(ms->fdt, node_name, "numa-node-id", socket_id);
 }
 }
 
diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 4a66016d69..05d34651cb 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -121,7 +121,7 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
 s->soc[socket].hartid_base + cpu);
 qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
-riscv_socket_fdt_write_id(mc, fdt, cpu_name, socket);
+riscv_socket_fdt_write_id(mc, cpu_name, socket);
 qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
 
 intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
@@ -154,7 +154,7 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 qemu_fdt_setprop_cells(fdt, mem_name, "reg",
 addr >> 32, addr, size >> 32, size);
 qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
-riscv_socket_fdt_write_id(mc, fdt, mem_name, socket);
+riscv_socket_fdt_write_id(mc, mem_name, socket);
 g_free(mem_name);
 
 clint_addr = memmap[SPIKE_CLINT].base +
@@ -167,7 +167,7 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 0x0, clint_addr, 0x0, memmap[SPIKE_CLINT].size);
 qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
 clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
-riscv_socket_fdt_write_id(mc, fdt, clint_name, socket);
+riscv_socket_fdt_write_id(mc, clint_name, socket);
 
 g_free(clint_name);
 g_free(clint_cells);
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 99a0a43a73..1d3bd25cb5 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -253,7 +253,7 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
socket,
 qemu_fdt_setprop_cell(mc->fdt, cpu_name, "reg",
 s->soc[socket].hartid_base + cpu);
 qemu_fdt_setprop_string(mc->fdt, cpu_name, "device_type", "cpu");
-riscv_socket_fdt_write_id(mc, mc->fdt, cpu_name, socket);
+riscv_socket_fdt_write_id(mc, cpu_name, socket);
 qemu_fdt_setprop_cell(mc->fdt, cpu_name, "phandle", cpu_phandle);
 
 intc_phandles[cpu] = (*phandle)++;
@@ -291,7 +291,7 @@ static void create_fdt_socket_memory(RISCVVirtState *s,
 qemu_fdt_setprop_cells(mc->fdt, mem_name, "reg",
 addr >> 32, addr, size >> 32, size);
 qemu_fdt_setprop_string(mc->fdt, mem_name, "device_type", "memory");
-riscv_socket_fdt_write_id(mc, mc->fdt, mem_name, socket);
+riscv_socket_fdt_write_id(mc, mem_name, socket);
 g_free(mem_name);
 }
 
@@ -327,7 +327,7 @@ static void create_fdt_socket_clint(RISCVVirtState *s,
 0x0, clint_addr, 0x0, memmap[VIRT_CLINT].size);
 qemu_fdt_setprop(mc->fdt, clint_name, "interrupts-extended",
 clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
-riscv_socket_fdt_write_id(mc, mc->fdt, clint_name, socket);
+riscv_socket_fdt_write_id(mc, clint_name, socket);
 g_free(clint_name);
 
 g_free(clint_cells);
@@ -372,7 +372,7 @@ static void create_fdt_socket_aclint(RISCVVirtState *s,
 aclint_mswi_cells, aclint_cells_size);
 qemu_fdt_setprop(mc->fdt, name, "interrupt-controller", NULL, 0);
 qemu_fdt_setprop_cell(mc->fdt, name, "#interrupt-cells", 0);
-riscv_socket_fdt_write_id(mc, mc->fdt, name, socket);
+riscv_socket_fdt_write_id(mc, name, socket);
 g_free(name);
 }
 
@@ -396,7 +396,7 @@ static void create_fdt_socket_aclint(RISCVVirtState *s,
 0x0, RISCV_ACLINT_DEFAULT_MTIME);
 qemu_fdt_setprop(mc->fdt, name, "interrupts-extended",
 aclint_mtimer_cells, aclint_cells_size);
-riscv_socket_fdt_write_id(mc, mc->fdt, name, socket);
+riscv_socket_fdt_write_id(mc, name, socket

Re: [PATCH v14 03/11] target/s390x/cpu topology: handle STSI(15) and build the SYSIB

2023-01-11 Thread Nina Schoetterl-Glausch
On Tue, 2023-01-10 at 15:29 +0100, Thomas Huth wrote:
> On 05/01/2023 15.53, Pierre Morel wrote:
> > On interception of STSI(15.1.x) the System Information Block
> > (SYSIB) is built from the list of pre-ordered topology entries.
> > 
> > Signed-off-by: Pierre Morel 
> > ---
> 
[...]

> > +void insert_stsi_15_1_x(S390CPU *cpu, int sel2, __u64 addr, uint8_t ar)
> > +{
> > +union {
> > +char place_holder[S390_TOPOLOGY_SYSIB_SIZE];
> > +SysIB_151x sysib;
> > +} buffer QEMU_ALIGNED(8) = {};
> > +int len;
> > +
> > +if (!s390_has_topology() || sel2 < 2 || sel2 > 
> > SCLP_READ_SCP_INFO_MNEST) {
> > +setcc(cpu, 3);
> > +return;
> > +}
> > +
> > +len = setup_stsi(cpu, &buffer.sysib, sel2);
> > +
> > +if (len > 4096) {
> 
> Maybe use TARGET_PAGE_SIZE instead of 4096 ?

sizeof(SysIB) would be preferable IMO.
> 
> > +setcc(cpu, 3);
> > +return;
> > +}



[PATCH 04/10] hw/riscv/virt.c: remove 'is_32_bit' param from create_fdt_socket_cpus()

2023-01-11 Thread Daniel Henrique Barboza
create_fdt_socket_cpus() writes a different 'mmu-type' value if we're
running in 32 or 64 bits. However, the flag is being calculated during
virt_machine_init(), and is passed around in create_fdt(), then
create_fdt_socket(), and then finally create_fdt_socket_cpus(). None of
the intermediate functions are using the flag, which is a bit
misleading.

Remove 'is_32_bit' flag from create_fdt_socket_cpus() and calculate it
using the already available RISCVVirtState pointer. This will also
change the signature of create_fdt_socket() and create_fdt(), making it
clear that these functions don't do anything special when we're running
in 32 bit mode.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/virt.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 89c99ec1af..99a0a43a73 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -223,12 +223,13 @@ static void create_pcie_irq_map(RISCVVirtState *s, void 
*fdt, char *nodename,
 
 static void create_fdt_socket_cpus(RISCVVirtState *s, int socket,
char *clust_name, uint32_t *phandle,
-   bool is_32_bit, uint32_t *intc_phandles)
+   uint32_t *intc_phandles)
 {
 int cpu;
 uint32_t cpu_phandle;
 MachineState *mc = MACHINE(s);
 char *name, *cpu_name, *core_name, *intc_name;
+bool is_32_bit = riscv_is_32bit(&s->soc[0]);
 
 for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
 cpu_phandle = (*phandle)++;
@@ -721,7 +722,7 @@ static void create_fdt_pmu(RISCVVirtState *s)
 }
 
 static void create_fdt_sockets(RISCVVirtState *s, const MemMapEntry *memmap,
-   bool is_32_bit, uint32_t *phandle,
+   uint32_t *phandle,
uint32_t *irq_mmio_phandle,
uint32_t *irq_pcie_phandle,
uint32_t *irq_virtio_phandle,
@@ -750,7 +751,7 @@ static void create_fdt_sockets(RISCVVirtState *s, const 
MemMapEntry *memmap,
 qemu_fdt_add_subnode(mc->fdt, clust_name);
 
 create_fdt_socket_cpus(s, socket, clust_name, phandle,
-is_32_bit, &intc_phandles[phandle_pos]);
+   &intc_phandles[phandle_pos]);
 
 create_fdt_socket_memory(s, memmap, socket);
 
@@ -998,8 +999,7 @@ static void create_fdt_fw_cfg(RISCVVirtState *s, const 
MemMapEntry *memmap)
 g_free(nodename);
 }
 
-static void create_fdt(RISCVVirtState *s, const MemMapEntry *memmap,
-   bool is_32_bit)
+static void create_fdt(RISCVVirtState *s, const MemMapEntry *memmap)
 {
 MachineState *mc = MACHINE(s);
 uint32_t phandle = 1, irq_mmio_phandle = 1, msi_pcie_phandle = 1;
@@ -1031,9 +1031,9 @@ static void create_fdt(RISCVVirtState *s, const 
MemMapEntry *memmap,
 qemu_fdt_setprop_cell(mc->fdt, "/soc", "#size-cells", 0x2);
 qemu_fdt_setprop_cell(mc->fdt, "/soc", "#address-cells", 0x2);
 
-create_fdt_sockets(s, memmap, is_32_bit, &phandle,
-&irq_mmio_phandle, &irq_pcie_phandle, &irq_virtio_phandle,
-&msi_pcie_phandle);
+create_fdt_sockets(s, memmap, &phandle, &irq_mmio_phandle,
+   &irq_pcie_phandle, &irq_virtio_phandle,
+   &msi_pcie_phandle);
 
 create_fdt_virtio(s, memmap, irq_virtio_phandle);
 
@@ -1499,7 +1499,7 @@ static void virt_machine_init(MachineState *machine)
 virt_flash_map(s, system_memory);
 
 /* create device tree */
-create_fdt(s, memmap, riscv_is_32bit(&s->soc[0]));
+create_fdt(s, memmap);
 
 s->machine_done.notify = virt_machine_done;
 qemu_add_machine_init_done_notifier(&s->machine_done);
-- 
2.39.0




[PATCH 02/10] hw/riscv/virt.c: simplify create_fdt()

2023-01-11 Thread Daniel Henrique Barboza
'mem_size' and 'cmdline' aren't being used. Remove them.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/virt.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index a931ed05ab..89c99ec1af 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -999,7 +999,7 @@ static void create_fdt_fw_cfg(RISCVVirtState *s, const 
MemMapEntry *memmap)
 }
 
 static void create_fdt(RISCVVirtState *s, const MemMapEntry *memmap,
-   uint64_t mem_size, const char *cmdline, bool is_32_bit)
+   bool is_32_bit)
 {
 MachineState *mc = MACHINE(s);
 uint32_t phandle = 1, irq_mmio_phandle = 1, msi_pcie_phandle = 1;
@@ -1499,8 +1499,7 @@ static void virt_machine_init(MachineState *machine)
 virt_flash_map(s, system_memory);
 
 /* create device tree */
-create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline,
-   riscv_is_32bit(&s->soc[0]));
+create_fdt(s, memmap, riscv_is_32bit(&s->soc[0]));
 
 s->machine_done.notify = virt_machine_done;
 qemu_add_machine_init_done_notifier(&s->machine_done);
-- 
2.39.0




[PATCH 09/10] hw/riscv/virt.c: rename MachineState 'mc' pointers to 'ms'

2023-01-11 Thread Daniel Henrique Barboza
We have a convention in other QEMU boards/archs to name MachineState
pointers as either 'machine' or 'ms'. MachineClass pointers are usually
called 'mc'.

The 'virt' RISC-V machine has a lot of instances where MachineState
pointers are named 'mc'. There is nothing wrong with that, but we gain
more compatibility with the rest of the QEMU code base, and easier
reviews, if we follow QEMU conventions.

Rename all 'mc' MachineState pointers to 'ms'. This is a very tedious
and mechanical patch that was produced by doing the following:

- find/replace all 'MachineState *mc' to 'MachineState *ms';
- find/replace all 'mc->fdt' to 'ms->fdt';
- find/replace all 'mc->smp.cpus' to 'ms->smp.cpus';
- replace any remaining occurrences of 'mc' that the compiler complained
about.

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/virt.c | 434 
 1 file changed, 217 insertions(+), 217 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index f9bdf2a70b..3b73666d2a 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -227,7 +227,7 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
socket,
 {
 int cpu;
 uint32_t cpu_phandle;
-MachineState *mc = MACHINE(s);
+MachineState *ms = MACHINE(s);
 char *name, *cpu_name, *core_name, *intc_name;
 bool is_32_bit = riscv_is_32bit(&s->soc[0]);
 
@@ -236,40 +236,40 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
socket,
 
 cpu_name = g_strdup_printf("/cpus/cpu@%d",
 s->soc[socket].hartid_base + cpu);
-qemu_fdt_add_subnode(mc->fdt, cpu_name);
+qemu_fdt_add_subnode(ms->fdt, cpu_name);
 if (riscv_feature(&s->soc[socket].harts[cpu].env,
   RISCV_FEATURE_MMU)) {
-qemu_fdt_setprop_string(mc->fdt, cpu_name, "mmu-type",
+qemu_fdt_setprop_string(ms->fdt, cpu_name, "mmu-type",
 (is_32_bit) ? "riscv,sv32" : "riscv,sv48");
 } else {
-qemu_fdt_setprop_string(mc->fdt, cpu_name, "mmu-type",
+qemu_fdt_setprop_string(ms->fdt, cpu_name, "mmu-type",
 "riscv,none");
 }
 name = riscv_isa_string(&s->soc[socket].harts[cpu]);
-qemu_fdt_setprop_string(mc->fdt, cpu_name, "riscv,isa", name);
+qemu_fdt_setprop_string(ms->fdt, cpu_name, "riscv,isa", name);
 g_free(name);
-qemu_fdt_setprop_string(mc->fdt, cpu_name, "compatible", "riscv");
-qemu_fdt_setprop_string(mc->fdt, cpu_name, "status", "okay");
-qemu_fdt_setprop_cell(mc->fdt, cpu_name, "reg",
+qemu_fdt_setprop_string(ms->fdt, cpu_name, "compatible", "riscv");
+qemu_fdt_setprop_string(ms->fdt, cpu_name, "status", "okay");
+qemu_fdt_setprop_cell(ms->fdt, cpu_name, "reg",
 s->soc[socket].hartid_base + cpu);
-qemu_fdt_setprop_string(mc->fdt, cpu_name, "device_type", "cpu");
-riscv_socket_fdt_write_id(mc, cpu_name, socket);
-qemu_fdt_setprop_cell(mc->fdt, cpu_name, "phandle", cpu_phandle);
+qemu_fdt_setprop_string(ms->fdt, cpu_name, "device_type", "cpu");
+riscv_socket_fdt_write_id(ms, cpu_name, socket);
+qemu_fdt_setprop_cell(ms->fdt, cpu_name, "phandle", cpu_phandle);
 
 intc_phandles[cpu] = (*phandle)++;
 
 intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
-qemu_fdt_add_subnode(mc->fdt, intc_name);
-qemu_fdt_setprop_cell(mc->fdt, intc_name, "phandle",
+qemu_fdt_add_subnode(ms->fdt, intc_name);
+qemu_fdt_setprop_cell(ms->fdt, intc_name, "phandle",
 intc_phandles[cpu]);
-qemu_fdt_setprop_string(mc->fdt, intc_name, "compatible",
+qemu_fdt_setprop_string(ms->fdt, intc_name, "compatible",
 "riscv,cpu-intc");
-qemu_fdt_setprop(mc->fdt, intc_name, "interrupt-controller", NULL, 0);
-qemu_fdt_setprop_cell(mc->fdt, intc_name, "#interrupt-cells", 1);
+qemu_fdt_setprop(ms->fdt, intc_name, "interrupt-controller", NULL, 0);
+qemu_fdt_setprop_cell(ms->fdt, intc_name, "#interrupt-cells", 1);
 
 core_name = g_strdup_printf("%s/core%d", clust_name, cpu);
-qemu_fdt_add_subnode(mc->fdt, core_name);
-qemu_fdt_setprop_cell(mc->fdt, core_name, "cpu", cpu_phandle);
+qemu_fdt_add_subnode(ms->fdt, core_name);
+qemu_fdt_setprop_cell(ms->fdt, core_name, "cpu", cpu_phandle);
 
 g_free(core_name);
 g_free(intc_name);
@@ -282,16 +282,16 @@ static void create_fdt_socket_memory(RISCVVirtState *s,
 {
 char *mem_name;
 uint64_t addr, size;
-MachineState *mc = MACHINE(s);
+MachineState *ms = MACHINE(s);
 
-addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(mc, socket);
-size = riscv_socket_mem_size(mc, socket);
+addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(ms, socket);
+size = riscv_s

[PATCH 10/10] hw/riscv/spike.c: rename MachineState 'mc' pointers to' ms'

2023-01-11 Thread Daniel Henrique Barboza
Follow the QEMU convention of naming MachineState pointers as 'ms' by
renaming the instances where we're calling it 'mc'.

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/spike.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 82093dd2cb..d753fb1f31 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -56,7 +56,7 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 uint64_t addr, size;
 unsigned long clint_addr;
 int cpu, socket;
-MachineState *mc = MACHINE(s);
+MachineState *ms = MACHINE(s);
 uint32_t *clint_cells;
 uint32_t cpu_phandle, intc_phandle, phandle = 1;
 char *name, *mem_name, *clint_name, *clust_name;
@@ -65,7 +65,7 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 "sifive,clint0", "riscv,clint0"
 };
 
-fdt = mc->fdt = create_device_tree(&fdt_size);
+fdt = ms->fdt = create_device_tree(&fdt_size);
 if (!fdt) {
 error_report("create_device_tree() failed");
 exit(1);
@@ -96,7 +96,7 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
 qemu_fdt_add_subnode(fdt, "/cpus/cpu-map");
 
-for (socket = (riscv_socket_count(mc) - 1); socket >= 0; socket--) {
+for (socket = (riscv_socket_count(ms) - 1); socket >= 0; socket--) {
 clust_name = g_strdup_printf("/cpus/cpu-map/cluster%d", socket);
 qemu_fdt_add_subnode(fdt, clust_name);
 
@@ -121,7 +121,7 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 qemu_fdt_setprop_cell(fdt, cpu_name, "reg",
 s->soc[socket].hartid_base + cpu);
 qemu_fdt_setprop_string(fdt, cpu_name, "device_type", "cpu");
-riscv_socket_fdt_write_id(mc, cpu_name, socket);
+riscv_socket_fdt_write_id(ms, cpu_name, socket);
 qemu_fdt_setprop_cell(fdt, cpu_name, "phandle", cpu_phandle);
 
 intc_name = g_strdup_printf("%s/interrupt-controller", cpu_name);
@@ -147,14 +147,14 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 g_free(cpu_name);
 }
 
-addr = memmap[SPIKE_DRAM].base + riscv_socket_mem_offset(mc, socket);
-size = riscv_socket_mem_size(mc, socket);
+addr = memmap[SPIKE_DRAM].base + riscv_socket_mem_offset(ms, socket);
+size = riscv_socket_mem_size(ms, socket);
 mem_name = g_strdup_printf("/memory@%lx", (long)addr);
 qemu_fdt_add_subnode(fdt, mem_name);
 qemu_fdt_setprop_cells(fdt, mem_name, "reg",
 addr >> 32, addr, size >> 32, size);
 qemu_fdt_setprop_string(fdt, mem_name, "device_type", "memory");
-riscv_socket_fdt_write_id(mc, mem_name, socket);
+riscv_socket_fdt_write_id(ms, mem_name, socket);
 g_free(mem_name);
 
 clint_addr = memmap[SPIKE_CLINT].base +
@@ -167,14 +167,14 @@ static void create_fdt(SpikeState *s, const MemMapEntry 
*memmap,
 0x0, clint_addr, 0x0, memmap[SPIKE_CLINT].size);
 qemu_fdt_setprop(fdt, clint_name, "interrupts-extended",
 clint_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 4);
-riscv_socket_fdt_write_id(mc, clint_name, socket);
+riscv_socket_fdt_write_id(ms, clint_name, socket);
 
 g_free(clint_name);
 g_free(clint_cells);
 g_free(clust_name);
 }
 
-riscv_socket_fdt_write_distance_matrix(mc);
+riscv_socket_fdt_write_distance_matrix(ms);
 
 qemu_fdt_add_subnode(fdt, "/chosen");
 qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", "/htif");
-- 
2.39.0




[PATCH 08/10] hw/riscv/virt.c: calculate socket count once in create_fdt_imsic()

2023-01-11 Thread Daniel Henrique Barboza
riscv_socket_count() returns either ms->numa_state->num_nodes or 1
depending on NUMA support. In any case the value can be retrieved only
once and used in the rest of the function.

This will also alleviate the rename we're going to do next by reducing
the instances of MachineState 'mc' inside hw/riscv/virt.c.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/virt.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 0a0252368e..f9bdf2a70b 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -505,13 +505,14 @@ static void create_fdt_imsic(RISCVVirtState *s, const 
MemMapEntry *memmap,
 int cpu, socket;
 char *imsic_name;
 MachineState *mc = MACHINE(s);
+int socket_count = riscv_socket_count(mc);
 uint32_t imsic_max_hart_per_socket, imsic_guest_bits;
 uint32_t *imsic_cells, *imsic_regs, imsic_addr, imsic_size;
 
 *msi_m_phandle = (*phandle)++;
 *msi_s_phandle = (*phandle)++;
 imsic_cells = g_new0(uint32_t, mc->smp.cpus * 2);
-imsic_regs = g_new0(uint32_t, riscv_socket_count(mc) * 4);
+imsic_regs = g_new0(uint32_t, socket_count * 4);
 
 /* M-level IMSIC node */
 for (cpu = 0; cpu < mc->smp.cpus; cpu++) {
@@ -519,7 +520,7 @@ static void create_fdt_imsic(RISCVVirtState *s, const 
MemMapEntry *memmap,
 imsic_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_M_EXT);
 }
 imsic_max_hart_per_socket = 0;
-for (socket = 0; socket < riscv_socket_count(mc); socket++) {
+for (socket = 0; socket < socket_count; socket++) {
 imsic_addr = memmap[VIRT_IMSIC_M].base +
  socket * VIRT_IMSIC_GROUP_MAX_SIZE;
 imsic_size = IMSIC_HART_SIZE(0) * s->soc[socket].num_harts;
@@ -545,14 +546,14 @@ static void create_fdt_imsic(RISCVVirtState *s, const 
MemMapEntry *memmap,
 qemu_fdt_setprop(mc->fdt, imsic_name, "interrupts-extended",
 imsic_cells, mc->smp.cpus * sizeof(uint32_t) * 2);
 qemu_fdt_setprop(mc->fdt, imsic_name, "reg", imsic_regs,
-riscv_socket_count(mc) * sizeof(uint32_t) * 4);
+socket_count * sizeof(uint32_t) * 4);
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,num-ids",
 VIRT_IRQCHIP_NUM_MSIS);
-if (riscv_socket_count(mc) > 1) {
+if (socket_count > 1) {
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,hart-index-bits",
 imsic_num_bits(imsic_max_hart_per_socket));
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,group-index-bits",
-imsic_num_bits(riscv_socket_count(mc)));
+imsic_num_bits(socket_count));
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,group-index-shift",
 IMSIC_MMIO_GROUP_MIN_SHIFT);
 }
@@ -567,7 +568,7 @@ static void create_fdt_imsic(RISCVVirtState *s, const 
MemMapEntry *memmap,
 }
 imsic_guest_bits = imsic_num_bits(s->aia_guests + 1);
 imsic_max_hart_per_socket = 0;
-for (socket = 0; socket < riscv_socket_count(mc); socket++) {
+for (socket = 0; socket < socket_count; socket++) {
 imsic_addr = memmap[VIRT_IMSIC_S].base +
  socket * VIRT_IMSIC_GROUP_MAX_SIZE;
 imsic_size = IMSIC_HART_SIZE(imsic_guest_bits) *
@@ -594,18 +595,18 @@ static void create_fdt_imsic(RISCVVirtState *s, const 
MemMapEntry *memmap,
 qemu_fdt_setprop(mc->fdt, imsic_name, "interrupts-extended",
 imsic_cells, mc->smp.cpus * sizeof(uint32_t) * 2);
 qemu_fdt_setprop(mc->fdt, imsic_name, "reg", imsic_regs,
-riscv_socket_count(mc) * sizeof(uint32_t) * 4);
+socket_count * sizeof(uint32_t) * 4);
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,num-ids",
 VIRT_IRQCHIP_NUM_MSIS);
 if (imsic_guest_bits) {
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,guest-index-bits",
 imsic_guest_bits);
 }
-if (riscv_socket_count(mc) > 1) {
+if (socket_count > 1) {
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,hart-index-bits",
 imsic_num_bits(imsic_max_hart_per_socket));
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,group-index-bits",
-imsic_num_bits(riscv_socket_count(mc)));
+imsic_num_bits(socket_count));
 qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,group-index-shift",
 IMSIC_MMIO_GROUP_MIN_SHIFT);
 }
@@ -733,6 +734,7 @@ static void create_fdt_sockets(RISCVVirtState *s, const 
MemMapEntry *memmap,
 MachineState *mc = MACHINE(s);
 uint32_t msi_m_phandle = 0, msi_s_phandle = 0;
 uint32_t *intc_phandles, xplic_phandles[MAX_NODES];
+int socket_count = riscv_socket_count(mc);
 
 qemu_fdt_add_subnode(mc->fdt, "/cpus");
 qemu_fdt_setprop_cell(mc->fdt, "/cpus", "timebase-frequency",
@@ -744,7 +746,7 @@ static void create_fdt_sockets(RISCVVirtState *s, const 
MemMapEntry *memmap,
 intc_phandles = g_new0(uint32_t, mc->smp.cpus);
 
 phandle_pos = mc->smp.cpus;
-for (socke

[PATCH 01/10] hw/riscv/spike.c: simplify create_fdt()

2023-01-11 Thread Daniel Henrique Barboza
'mem_size' and 'cmdline' are unused.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/spike.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index c517885e6e..4a66016d69 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -49,7 +49,6 @@ static const MemMapEntry spike_memmap[] = {
 };
 
 static void create_fdt(SpikeState *s, const MemMapEntry *memmap,
-   uint64_t mem_size, const char *cmdline,
bool is_32_bit, bool htif_custom_base)
 {
 void *fdt;
@@ -299,8 +298,7 @@ static void spike_board_init(MachineState *machine)
 }
 
 /* Create device tree */
-create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline,
-   riscv_is_32bit(&s->soc[0]), htif_custom_base);
+create_fdt(s, memmap, riscv_is_32bit(&s->soc[0]), htif_custom_base);
 
 /* Load kernel */
 if (machine->kernel_filename) {
-- 
2.39.0




Re: [PATCH v6 25/33] hw/isa/piix4: Use TYPE_ISA_PIC device

2023-01-11 Thread Philippe Mathieu-Daudé

On 9/1/23 18:23, Bernhard Beschow wrote:

Aligns the code with PIIX3 such that PIIXState can be used in PIIX4,
too.

Signed-off-by: Bernhard Beschow 
Reviewed-by: Michael S. Tsirkin 
Message-Id: <20221022150508.26830-33-shen...@gmail.com>
---
  hw/isa/piix4.c  | 28 ++--
  hw/mips/malta.c | 11 +--
  hw/mips/Kconfig |  1 +
  3 files changed, 20 insertions(+), 20 deletions(-)



@@ -1459,7 +1461,12 @@ void mips_malta_init(MachineState *machine)
  pci_ide_create_devs(PCI_DEVICE(dev));
  
  /* Interrupt controller */

-qdev_connect_gpio_out_named(DEVICE(piix4), "intr", 0, i8259_irq);
+dev = DEVICE(object_resolve_path_component(OBJECT(piix4), "pic"));
+i8259 = i8259_init(isa_bus, i8259_irq);
+for (i = 0; i < ISA_NUM_IRQS; i++) {
+qdev_connect_gpio_out(dev, i, i8259[i]);
+}
+g_free(i8259);
  
  /* generate SPD EEPROM data */

  dev = DEVICE(object_resolve_path_component(OBJECT(piix4), "pm"));
diff --git a/hw/mips/Kconfig b/hw/mips/Kconfig
index 4e7042f03d..d156de812c 100644
--- a/hw/mips/Kconfig
+++ b/hw/mips/Kconfig
@@ -1,5 +1,6 @@
  config MALTA
  bool
+select I8259
  select ISA_SUPERIO
  select PIIX4


The PIIX4 owns / exposes the I8259, so we don't need to select it here.
The Malta board only initializes it. Why did you have to add this?



[PATCH 00/10] riscv: create_fdt() related cleanups

2023-01-11 Thread Daniel Henrique Barboza
Hi,

This is a follow-up of:

"[PATCH v5 00/11] riscv: OpenSBI boot test and cleanups"

Patches were based on top of riscv-to-apply.next [1] + the series above.

The recent FDT changes made in hw/riscv (all machines are now using the
FDT via MachineState::fdt) allowed for most of the cleanups made here.

Patches 9 and 10 were based on a suggestion made by Phil a few weeks ago.
I decided to go for it.

[1] https://github.com/alistair23/qemu/tree/riscv-to-apply.next

Daniel Henrique Barboza (10):
  hw/riscv/spike.c: simplify create_fdt()
  hw/riscv/virt.c: simplify create_fdt()
  hw/riscv/sifive_u.c: simplify create_fdt()
  hw/riscv/virt.c: remove 'is_32_bit' param from
create_fdt_socket_cpus()
  hw/riscv: use MachineState::fdt in riscv_socket_fdt_write_id()
  hw/riscv: use ms->fdt in riscv_socket_fdt_write_distance_matrix()
  hw/riscv: simplify riscv_load_fdt()
  hw/riscv/virt.c: calculate socket count once in create_fdt_imsic()
  hw/riscv/virt.c: rename MachineState 'mc' pointers to 'ms'
  hw/riscv/spike.c: rename MachineState 'mc' pointers to' ms'

 hw/riscv/boot.c|   4 +-
 hw/riscv/microchip_pfsoc.c |   4 +-
 hw/riscv/numa.c|  14 +-
 hw/riscv/sifive_u.c|  11 +-
 hw/riscv/spike.c   |  25 +-
 hw/riscv/virt.c| 484 +++--
 include/hw/riscv/boot.h|   2 +-
 include/hw/riscv/numa.h|  10 +-
 8 files changed, 277 insertions(+), 277 deletions(-)

-- 
2.39.0




Re: [PATCH 6/8] qemu/bswap: Add const_le64()

2023-01-11 Thread Jonathan Cameron via
On Wed, 11 Jan 2023 17:40:46 +0100
Philippe Mathieu-Daudé  wrote:

> On 11/1/23 15:24, Jonathan Cameron via wrote:
> > From: Ira Weiny 
> > 
> > Gcc requires constant versions of cpu_to_le* calls.
> > 
> > Add a 64 bit version.
> > 
> > Reviewed-by: Peter Maydell 
> > Signed-off-by: Ira Weiny 
> > Signed-off-by: Jonathan Cameron 
> > ---
> >   include/qemu/bswap.h | 10 ++
> >   1 file changed, 10 insertions(+)
> > 
> > diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
> > index 346d05f2aa..e1eca22f25 100644
> > --- a/include/qemu/bswap.h
> > +++ b/include/qemu/bswap.h
> > @@ -187,6 +187,15 @@ CPU_CONVERT(le, 64, uint64_t)
> >* used to initialize static variables.
> >*/
> >   #if HOST_BIG_ENDIAN
> > +# define const_le64(_x)  \
> > +_x) & 0x00ffU) << 56) |  \
> > + (((_x) & 0xff00U) << 40) |  \
> > + (((_x) & 0x00ffU) << 24) |  \
> > + (((_x) & 0xff00U) <<  8) |  \
> > + (((_x) & 0x00ffU) >>  8) |  \
> > + (((_x) & 0xff00U) >> 24) |  \
> > + (((_x) & 0x00ffU) >> 40) |  \
> > + (((_x) & 0xff00U) >> 56))  
> 
> So looking back at 
> https://lore.kernel.org/qemu-devel/20200917163106.49351-4-phi...@redhat.com/
> this patch missed to update the function description:
> 
>   /*
> - * Same as cpu_to_{be,le}{16,32} described below, except that gcc will
> + * Same as cpu_to_{be,le}{16,32,64} described below, except that gcc will
>* figure the result is a compile-time constant if you pass in a constant.
>* So this can be used to initialize static variables.
>*/

Good point.  The context is a little difference as your series had combined
several comments into one, but I've put in a similar update.

Obviously if your series lands first we can drop this one, but I'll carry
it forwards for now.

Thanks,

Jonathan

> 
> 




Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)

2023-01-11 Thread David Hildenbrand

On 11.01.23 17:35, Peter Xu wrote:

On Wed, Jan 11, 2023 at 02:48:09PM +0100, David Hildenbrand wrote:

On 10.01.23 21:03, Peter Xu wrote:

On Tue, Jan 10, 2023 at 12:52:32PM +0100, David Hildenbrand wrote:

The following seems to work,


That looks much better at least from the diffstat pov (comparing to the
existing patch 1+5 and the framework changes), thanks.


but makes analyze-migration.py angry:

$ ./scripts/analyze-migration.py -f STATEFILE
Traceback (most recent call last):
File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 605, in 

  dump.read(dump_memory = args.memory)
File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 539, in 
read
  classdesc = self.section_classes[section_key]
  ^
KeyError: (':00:03.0/virtio-mem-early', 0)


We need the vmdesc to create info for the device.


Migration may ignore the save entry if save_state() not provided in the
"devices" section:

  if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
  continue;
  }

Could you try providing a shim save_state() for the new virtio-mem save
entry?

/*
   * Shim function to make sure the save entry will be dumped into "devices"
   * section, to make analyze-migration.py happy.
   */
static void virtio_mem_save_state_early(QEMUFile *file, void *opaque)
{
}

Then:

static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
  .save_setup = virtio_mem_save_setup_early,
  .save_state = virtio_mem_save_state_early,
  .load_state = virtio_mem_load_state_early,
};

I'm not 100% sure it'll work yet, but maybe worth trying.


It doesn't. virtio_mem_load_state_early() will get called twice (once with
state saved during save_setup() and once with effectively nothing during
save_state(), which breaks the whole migration).


Oh hold on.. so load_state() to pair save_setup()? Maybe you should pair it
with load_setup(), which I just noticed..  Then the load_state() needs to
be another shim like save_state().


Let me see if that improves the situation. E.g., ram.c writes state in 
save_setup() but doesn't load any state in load_setup() -- it's all done 
in load_state().


... no, still not working. It will read garbage during load_setup() now.

qemu-system-x86_64: Property 'memaddr' changed from 0x10002037261 to 
0x14000

qemu-system-x86_64: Failed to load virtio-mem-device-early:tmp
qemu-system-x86_64: Load state of device :00:03.0/virtio-mem-early 
failed

qemu-system-x86_64: load of migration failed: Invalid argument


I don't think that load_setup() is supposed to consume anything useful 
from the migration stream. I suspects it should actually not even 
consume a QEMU file right now, because they way it's used is just for 
initializing stuff.


qemu_loadvm_state_main() does the actual loading part, parsing sections 
etc. qemu_loadvm_state_setup() doesn't do any of that.


AFAIKS, at least qemu_loadvm_state_setup() would have to parse sections 
and the save_setup() users would have to be converted into using 
load_setup() as well. Not sure if more is missing.


Even with that, I doubt that it would make analyze-migration.py work, 
because what we store is something different than what we record in the 
vmdesc.






vmdesc handling is also wrong, because analyze-migration.py gets confused
because it receives data stored during save_setup() but vmdesc created
during save_state() was told that there would be "nothing" to interpret ...

$ ./scripts/analyze-migration.py -f STATEFILE
{
 "ram (2)": {
 "section sizes": {
 ":00:03.0/mem0": "0x000f",
 "pc.ram": "0x0001",
 "/rom@etc/acpi/tables": "0x0002",
 "pc.bios": "0x0004",
 ":00:02.0/e1000.rom": "0x0004",
 "pc.rom": "0x0002",
 "/rom@etc/table-loader": "0x1000",
 "/rom@etc/acpi/rsdp": "0x1000"
 }
 },
 ":00:03.0/virtio-mem-early (51)": {
 "data": ""
 }
}


Yeah this is expected, because the whole data stream within the setup phase
is a black box and not described by anything.  "ram" can display correctly
only because it's hard coded in the python script, I think.  The trick
above can only make the script not break but not teaching the script to
also check for the early vmsd.


Note that the issue here is that the scripts stops the output after the 
virtio-mem device. So I'm not complaining about the "data": "", but 
about the vmstate according to the vmdesc not having any other devices :)




But that's one step forward and IMHO it's fine for special devices. For
example, even with your initial patch, I think the static analyzer (aka,
qemu -dump-vmstate) will also ignore your new vmsd anyway because it's not
directly bound to the DeviceState* but registered separately.  So no ideal
solution s

Re: [RFC] Notify IRQ sources of level interrupt ack/EOI

2023-01-11 Thread David Woodhouse
On Wed, 2023-01-11 at 11:25 -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 11, 2023 at 02:41:58PM +, David Woodhouse wrote:
> > This allows drivers to register a callback on a qemu_irq, which is
> > invoked when a level-triggered IRQ is acked on the irqchip.
> > 
> > This allows us to simulate level-triggered interrupts more efficiently,
> > by resampling the state of the interrupt only when it actually matters.
> 
> I think we tried this with vfio and had to give up on this.
> I don't remember the details though. Alex probably does?

Hm, not sure why there would be any insurmountable problems.

I've seen this working at scale in a different VMM with split-irqchip
and PCI INTX + Xen event channel support.

And it's what the kernel does internally, and exposes through its dual-
eventfd APIs in both KVM IRQ routing and VFIO interrupts.

That said, I don't care *that* much. I can live with the way I've done
it for the Xen support, by polling the status on a vCPU0 vmexit.
Looking at the VFIO code made me throw up in my mouth a little bit, but
I just won't do that again... :)


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 6/8] qemu/bswap: Add const_le64()

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 15:24, Jonathan Cameron via wrote:

From: Ira Weiny 

Gcc requires constant versions of cpu_to_le* calls.

Add a 64 bit version.

Reviewed-by: Peter Maydell 
Signed-off-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 
---
  include/qemu/bswap.h | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index 346d05f2aa..e1eca22f25 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -187,6 +187,15 @@ CPU_CONVERT(le, 64, uint64_t)
   * used to initialize static variables.
   */
  #if HOST_BIG_ENDIAN
+# define const_le64(_x)  \
+_x) & 0x00ffU) << 56) |  \
+ (((_x) & 0xff00U) << 40) |  \
+ (((_x) & 0x00ffU) << 24) |  \
+ (((_x) & 0xff00U) <<  8) |  \
+ (((_x) & 0x00ffU) >>  8) |  \
+ (((_x) & 0xff00U) >> 24) |  \
+ (((_x) & 0x00ffU) >> 40) |  \
+ (((_x) & 0xff00U) >> 56))


So looking back at 
https://lore.kernel.org/qemu-devel/20200917163106.49351-4-phi...@redhat.com/

this patch missed to update the function description:

 /*
- * Same as cpu_to_{be,le}{16,32} described below, except that gcc will
+ * Same as cpu_to_{be,le}{16,32,64} described below, except that gcc will
  * figure the result is a compile-time constant if you pass in a constant.
  * So this can be used to initialize static variables.
  */





Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)

2023-01-11 Thread Peter Xu
On Wed, Jan 11, 2023 at 02:48:09PM +0100, David Hildenbrand wrote:
> On 10.01.23 21:03, Peter Xu wrote:
> > On Tue, Jan 10, 2023 at 12:52:32PM +0100, David Hildenbrand wrote:
> > > The following seems to work,
> > 
> > That looks much better at least from the diffstat pov (comparing to the
> > existing patch 1+5 and the framework changes), thanks.
> > 
> > > but makes analyze-migration.py angry:
> > > 
> > > $ ./scripts/analyze-migration.py -f STATEFILE
> > > Traceback (most recent call last):
> > >File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 
> > > 605, in 
> > >  dump.read(dump_memory = args.memory)
> > >File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 
> > > 539, in read
> > >  classdesc = self.section_classes[section_key]
> > >  ^
> > > KeyError: (':00:03.0/virtio-mem-early', 0)
> > > 
> > > 
> > > We need the vmdesc to create info for the device.
> > 
> > Migration may ignore the save entry if save_state() not provided in the
> > "devices" section:
> > 
> >  if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
> >  continue;
> >  }
> > 
> > Could you try providing a shim save_state() for the new virtio-mem save
> > entry?
> > 
> > /*
> >   * Shim function to make sure the save entry will be dumped into "devices"
> >   * section, to make analyze-migration.py happy.
> >   */
> > static void virtio_mem_save_state_early(QEMUFile *file, void *opaque)
> > {
> > }
> > 
> > Then:
> > 
> > static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
> >  .save_setup = virtio_mem_save_setup_early,
> >  .save_state = virtio_mem_save_state_early,
> >  .load_state = virtio_mem_load_state_early,
> > };
> > 
> > I'm not 100% sure it'll work yet, but maybe worth trying.
> 
> It doesn't. virtio_mem_load_state_early() will get called twice (once with
> state saved during save_setup() and once with effectively nothing during
> save_state(), which breaks the whole migration).

Oh hold on.. so load_state() to pair save_setup()? Maybe you should pair it
with load_setup(), which I just noticed..  Then the load_state() needs to
be another shim like save_state().

> 
> vmdesc handling is also wrong, because analyze-migration.py gets confused
> because it receives data stored during save_setup() but vmdesc created
> during save_state() was told that there would be "nothing" to interpret ...
> 
> $ ./scripts/analyze-migration.py -f STATEFILE
> {
> "ram (2)": {
> "section sizes": {
> ":00:03.0/mem0": "0x000f",
> "pc.ram": "0x0001",
> "/rom@etc/acpi/tables": "0x0002",
> "pc.bios": "0x0004",
> ":00:02.0/e1000.rom": "0x0004",
> "pc.rom": "0x0002",
> "/rom@etc/table-loader": "0x1000",
> "/rom@etc/acpi/rsdp": "0x1000"
> }
> },
> ":00:03.0/virtio-mem-early (51)": {
> "data": ""
> }
> }

Yeah this is expected, because the whole data stream within the setup phase
is a black box and not described by anything.  "ram" can display correctly
only because it's hard coded in the python script, I think.  The trick
above can only make the script not break but not teaching the script to
also check for the early vmsd.

But that's one step forward and IMHO it's fine for special devices. For
example, even with your initial patch, I think the static analyzer (aka,
qemu -dump-vmstate) will also ignore your new vmsd anyway because it's not
directly bound to the DeviceState* but registered separately.  So no ideal
solution so far, afaict, without more work done on this aspect.

> 
> 
> Not sure if the whole thing becomes nicer when manually looking up the
> vmdesc ... because filling it will also requires manually storing the
> se->idstr and the se->instance_id, whereby both are migration-internal and
> not available inside save_setup().
> 
> 
> Hm, I really prefer something like the simplified version that let's
> migration core deal with vmstate to be migrated during save_setup() phase.
> We could avoid the vmstate->immutable flag and simply use a separate
> function for registering the vmstate, like:
> 
> vmstate_register_immutable()
> vmstate_register_early()
> ...

I agree, this looks useful.  It's just that the devices that need this will
be rare, and unfortunately some of them already implemented the stream in
other ways so maybe non-trivial to convert them.

Not against it if you want to keep exploring, but if so please avoid using
the priority field, also I'd hope the early vmsd will be saved within
qemu_savevm_state_setup() so maybe it can be another alternative to
save_setup().

Maybe it's still easier we keep going with the save_setup() and the shim
functions above, if it works for you.

-- 
Peter Xu




[PATCH v3 4/6] qemu/bswap: Use compiler __builtin_bswap() on Haiku

2023-01-11 Thread Philippe Mathieu-Daudé
From: Philippe Mathieu-Daudé 

Since commit efc6c070aca ("configure: Add a test for the minimum
compiler version") the minimum compiler version required for GCC
is 4.8, which supports __builtin_bswap().
Remove the Haiku specific ifdef'ry.

This reverts commit 652a46ebba970017c7a23767dcc983265cdb8eb7
("bswap.h: Include  on Haiku for bswap operations").

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: David Carlier 
Cc: Carlo Arenas 
---
 include/qemu/bswap.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index fd5a98125a..8cd5a2b02e 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -6,8 +6,6 @@
 # include 
 #elif defined(__FreeBSD__)
 # include 
-#elif defined(__HAIKU__)
-# include 
 # else
 #define BSWAP_FROM_FALLBACKS
 #endif /* ! CONFIG_MACHINE_BSWAP_H */
-- 
2.38.1




Re: [PATCH 6/8] qemu/bswap: Add const_le64()

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 17:07, Philippe Mathieu-Daudé wrote:

On 11/1/23 16:49, Philippe Mathieu-Daudé wrote:

On 11/1/23 15:24, Jonathan Cameron via wrote:

From: Ira Weiny 

Gcc requires constant versions of cpu_to_le* calls.

Add a 64 bit version.

Reviewed-by: Peter Maydell 
Signed-off-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 
---
  include/qemu/bswap.h | 10 ++
  1 file changed, 10 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 


Actually I thought this was already merged but apparently
this never got in:
https://lore.kernel.org/qemu-devel/20200928131934.739451-1-phi...@redhat.com/


Oops unrelated, I meant:
https://lore.kernel.org/qemu-devel/20200917163106.49351-1-phi...@redhat.com/

But thank for the reminder, good opportunity to respin the
other one ;)



[PATCH v3 5/6] qemu/bswap: Use compiler __builtin_bswap() on FreeBSD

2023-01-11 Thread Philippe Mathieu-Daudé
From: Philippe Mathieu-Daudé 

Since commit efc6c070aca ("configure: Add a test for the minimum
compiler version") the minimum compiler version required for GCC
is 4.8, which supports __builtin_bswap().
Remove the FreeBSD specific ifdef'ry.

This reverts commit de03c3164accc21311c39327601fcdd95da301f3
("bswap: Fix build on FreeBSD 10.0").

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Ed Maste 
---
 include/qemu/bswap.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index 8cd5a2b02e..32d5cdec27 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -4,8 +4,6 @@
 #ifdef CONFIG_MACHINE_BSWAP_H
 # include 
 # include 
-#elif defined(__FreeBSD__)
-# include 
 # else
 #define BSWAP_FROM_FALLBACKS
 #endif /* ! CONFIG_MACHINE_BSWAP_H */
-- 
2.38.1




[PATCH v3 6/6] qemu/bswap: Use compiler __builtin_bswap() on NetBSD

2023-01-11 Thread Philippe Mathieu-Daudé
From: Philippe Mathieu-Daudé 

Since commit efc6c070aca ("configure: Add a test for the minimum
compiler version") the minimum compiler version required for GCC
is 4.8, which supports __builtin_bswap().
Remove the NetBSD specific ifdef'ry.

This reverts commit 1360677cfe3ca8f945fa1de77823df21a77e4500
("makes NetBSD use the native bswap functions").

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/bswap.h | 11 ---
 meson.build  |  4 
 2 files changed, 15 deletions(-)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index 32d5cdec27..3cbe52246b 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -1,27 +1,16 @@
 #ifndef BSWAP_H
 #define BSWAP_H
 
-#ifdef CONFIG_MACHINE_BSWAP_H
-# include 
-# include 
-# else
-#define BSWAP_FROM_FALLBACKS
-#endif /* ! CONFIG_MACHINE_BSWAP_H */
-
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#ifdef BSWAP_FROM_FALLBACKS
 #undef  bswap16
 #define bswap16(_x) __builtin_bswap16(_x)
 #undef  bswap32
 #define bswap32(_x) __builtin_bswap32(_x)
 #undef  bswap64
 #define bswap64(_x) __builtin_bswap64(_x)
-#endif
-
-#undef BSWAP_FROM_FALLBACKS
 
 static inline void bswap16s(uint16_t *s)
 {
diff --git a/meson.build b/meson.build
index 697059d2c8..f2663cfc32 100644
--- a/meson.build
+++ b/meson.build
@@ -2023,10 +2023,6 @@ config_host_data.set('CONFIG_INOTIFY',
  cc.has_header_symbol('sys/inotify.h', 'inotify_init'))
 config_host_data.set('CONFIG_INOTIFY1',
  cc.has_header_symbol('sys/inotify.h', 'inotify_init1'))
-config_host_data.set('CONFIG_MACHINE_BSWAP_H',
- cc.has_header_symbol('machine/bswap.h', 'bswap32',
-  prefix: '''#include 
- #include 
'''))
 config_host_data.set('CONFIG_PRCTL_PR_SET_TIMERSLACK',
  cc.has_header_symbol('sys/prctl.h', 'PR_SET_TIMERSLACK'))
 config_host_data.set('CONFIG_RTNETLINK',
-- 
2.38.1




[PATCH v3 0/6] qemu/bswap: Use compiler __builtin_bswap()

2023-01-11 Thread Philippe Mathieu-Daudé
Implement Richard's suggestion to use __builtin_bswap().

Convert to __builtin_bswap() one patch per OS to simplify
maintainers review.

Since v2:
- Rebased adapting ./configure changes to meson

Since v1:
- Remove the Haiku/BSD ifdef'ry (Peter)
- Include the Haiku VM image from Alexander

Philippe Mathieu-Daudé (6):
  qemu/bswap: Replace bswapXX() by compiler __builtin_bswap()
  qemu/bswap: Replace bswapXXs() by compiler __builtin_bswap()
  qemu/bswap: Remove  dependency
  qemu/bswap: Use compiler __builtin_bswap() on Haiku
  qemu/bswap: Use compiler __builtin_bswap() on FreeBSD
  qemu/bswap: Use compiler __builtin_bswap() on NetBSD

 include/qemu/bswap.h | 83 
 meson.build  |  6 
 2 files changed, 15 insertions(+), 74 deletions(-)

-- 
2.38.1




[PATCH v3 3/6] qemu/bswap: Remove dependency

2023-01-11 Thread Philippe Mathieu-Daudé
From: Philippe Mathieu-Daudé 

Since commit efc6c070aca ("configure: Add a test for the minimum
compiler version") the minimum compiler version required for GCC
is 4.8, which supports __builtin_bswap().
Drop the  dependency.

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/bswap.h | 21 -
 meson.build  |  2 --
 2 files changed, 23 deletions(-)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index d2dafdc54c..fd5a98125a 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -8,9 +8,6 @@
 # include 
 #elif defined(__HAIKU__)
 # include 
-#elif defined(CONFIG_BYTESWAP_H)
-# include 
-#define BSWAP_FROM_BYTESWAP
 # else
 #define BSWAP_FROM_FALLBACKS
 #endif /* ! CONFIG_MACHINE_BSWAP_H */
@@ -19,23 +16,6 @@
 extern "C" {
 #endif
 
-#ifdef BSWAP_FROM_BYTESWAP
-static inline uint16_t bswap16(uint16_t x)
-{
-return bswap_16(x);
-}
-
-static inline uint32_t bswap32(uint32_t x)
-{
-return bswap_32(x);
-}
-
-static inline uint64_t bswap64(uint64_t x)
-{
-return bswap_64(x);
-}
-#endif
-
 #ifdef BSWAP_FROM_FALLBACKS
 #undef  bswap16
 #define bswap16(_x) __builtin_bswap16(_x)
@@ -45,7 +25,6 @@ static inline uint64_t bswap64(uint64_t x)
 #define bswap64(_x) __builtin_bswap64(_x)
 #endif
 
-#undef BSWAP_FROM_BYTESWAP
 #undef BSWAP_FROM_FALLBACKS
 
 static inline void bswap16s(uint16_t *s)
diff --git a/meson.build b/meson.build
index 175517eafd..697059d2c8 100644
--- a/meson.build
+++ b/meson.build
@@ -2006,8 +2006,6 @@ if rdma.found()
 endif
 
 # has_header_symbol
-config_host_data.set('CONFIG_BYTESWAP_H',
- cc.has_header_symbol('byteswap.h', 'bswap_32'))
 config_host_data.set('CONFIG_EPOLL_CREATE1',
  cc.has_header_symbol('sys/epoll.h', 'epoll_create1'))
 config_host_data.set('CONFIG_FALLOCATE_PUNCH_HOLE',
-- 
2.38.1




[PATCH v3 2/6] qemu/bswap: Replace bswapXXs() by compiler __builtin_bswap()

2023-01-11 Thread Philippe Mathieu-Daudé
From: Philippe Mathieu-Daudé 

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/bswap.h | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index ca2b4c3f15..d2dafdc54c 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -50,29 +50,31 @@ static inline uint64_t bswap64(uint64_t x)
 
 static inline void bswap16s(uint16_t *s)
 {
-*s = bswap16(*s);
+*s = __builtin_bswap16(*s);
 }
 
 static inline void bswap32s(uint32_t *s)
 {
-*s = bswap32(*s);
+*s = __builtin_bswap32(*s);
 }
 
 static inline void bswap64s(uint64_t *s)
 {
-*s = bswap64(*s);
+*s = __builtin_bswap64(*s);
 }
 
 #if HOST_BIG_ENDIAN
 #define be_bswap(v, size) (v)
-#define le_bswap(v, size) glue(bswap, size)(v)
+#define le_bswap(v, size) glue(__builtin_bswap, size)(v)
 #define be_bswaps(v, size)
-#define le_bswaps(p, size) do { *p = glue(bswap, size)(*p); } while(0)
+#define le_bswaps(p, size) \
+do { *p = glue(__builtin_bswap, size)(*p); } while (0)
 #else
 #define le_bswap(v, size) (v)
-#define be_bswap(v, size) glue(bswap, size)(v)
+#define be_bswap(v, size) glue(__builtin_bswap, size)(v)
 #define le_bswaps(v, size)
-#define be_bswaps(p, size) do { *p = glue(bswap, size)(*p); } while(0)
+#define be_bswaps(p, size) \
+do { *p = glue(__builtin_bswap, size)(*p); } while (0)
 #endif
 
 /**
-- 
2.38.1




[PATCH v3 1/6] qemu/bswap: Replace bswapXX() by compiler __builtin_bswap()

2023-01-11 Thread Philippe Mathieu-Daudé
From: Philippe Mathieu-Daudé 

Use the compiler built-in function to byte swap values,
as the compiler is clever and will fold constants.

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/bswap.h | 31 ++-
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index 346d05f2aa..ca2b4c3f15 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -37,31 +37,12 @@ static inline uint64_t bswap64(uint64_t x)
 #endif
 
 #ifdef BSWAP_FROM_FALLBACKS
-static inline uint16_t bswap16(uint16_t x)
-{
-return (((x & 0x00ff) << 8) |
-((x & 0xff00) >> 8));
-}
-
-static inline uint32_t bswap32(uint32_t x)
-{
-return (((x & 0x00ffU) << 24) |
-((x & 0xff00U) <<  8) |
-((x & 0x00ffU) >>  8) |
-((x & 0xff00U) >> 24));
-}
-
-static inline uint64_t bswap64(uint64_t x)
-{
-return (((x & 0x00ffULL) << 56) |
-((x & 0xff00ULL) << 40) |
-((x & 0x00ffULL) << 24) |
-((x & 0xff00ULL) <<  8) |
-((x & 0x00ffULL) >>  8) |
-((x & 0xff00ULL) >> 24) |
-((x & 0x00ffULL) >> 40) |
-((x & 0xff00ULL) >> 56));
-}
+#undef  bswap16
+#define bswap16(_x) __builtin_bswap16(_x)
+#undef  bswap32
+#define bswap32(_x) __builtin_bswap32(_x)
+#undef  bswap64
+#define bswap64(_x) __builtin_bswap64(_x)
 #endif
 
 #undef BSWAP_FROM_BYTESWAP
-- 
2.38.1




Re: [PATCH] virtio-rng-pci: fix migration compat for vectors

2023-01-11 Thread David Daney
Seems good to me.

Acked-by: David Daney 

On Mon, Jan 9, 2023 at 4:58 AM Dr. David Alan Gilbert (git) <
dgilb...@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" 
>
> Fixup the migration compatibility for existing machine types
> so that they do not enable msi-x.
>
> Symptom:
>
> (qemu) qemu: get_pci_config_device: Bad config data: i=0x34 read: 84
> device: 98 cmask: ff wmask: 0 w1cmask:0
> qemu: Failed to load PCIDevice:config
> qemu: Failed to load virtio-rng:virtio
> qemu: error while loading state for instance 0x0 of device
> ':00:03.0/virtio-rng'
> qemu: load of migration failed: Invalid argument
>
> Note: This fix will break migration from 7.2->7.2-fixed with this patch
>
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=2155749
> Fixes: 9ea02e8f1 ("virtio-rng-pci: Allow setting nvectors, so we can use
> MSI-X")
>
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  hw/core/machine.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index f589b92909..45459d1cef 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -45,6 +45,7 @@ const size_t hw_compat_7_2_len =
> G_N_ELEMENTS(hw_compat_7_2);
>
>  GlobalProperty hw_compat_7_1[] = {
>  { "virtio-device", "queue_reset", "false" },
> +{ "virtio-rng-pci", "vectors", "0" },
>  };
>  const size_t hw_compat_7_1_len = G_N_ELEMENTS(hw_compat_7_1);
>
> --
> 2.39.0
>
>


Re: [RFC] Notify IRQ sources of level interrupt ack/EOI

2023-01-11 Thread Michael S. Tsirkin
On Wed, Jan 11, 2023 at 02:41:58PM +, David Woodhouse wrote:
> This allows drivers to register a callback on a qemu_irq, which is
> invoked when a level-triggered IRQ is acked on the irqchip.
> 
> This allows us to simulate level-triggered interrupts more efficiently,
> by resampling the state of the interrupt only when it actually matters.

I think we tried this with vfio and had to give up on this.
I don't remember the details though. Alex probably does?




Re: [PATCH v3 1/3] linux-user: Clean up when exiting due to a signal

2023-01-11 Thread Alex Bennée


Ilya Leoshkevich  writes:

> When exiting due to an exit() syscall, qemu-user calls
> preexit_cleanup(), but this is currently not the case when exiting due
> to a signal. This leads to various buffers not being flushed (e.g.,
> for gprof, for gcov, and for the upcoming perf support).
>
> Add the missing call.
>
> Signed-off-by: Ilya Leoshkevich 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: Questions about how block devices use snapshots

2023-01-11 Thread Zhiyong Ye

Hi Kevin,

Can I ask again how base.img + diff.qcow2 can be re-merged into one 
image via qemu-img or hmp command when modified.img is discarded?


Regards

Zhiyong

On 1/11/23 10:32 PM, Kevin Wolf wrote:

Am 11.01.2023 um 08:55 hat Zhiyong Ye geschrieben:

Hi Kevin,

Thank you for your reply and detailed answers.

In my scenario is the iSCSI SAN environment. The network storage device is
connected to the physical machine via iSCSI, and LVM is used as the middle
layer between the storage device and the VM for storage management and
metadata synchronization. Every VM uses both raw and qcow2 formats, with the
system disk being qcow2 and the data disk being raw. Therefore block devices
need to support snapshot capability in both raw and qcow2 store methods. In
addition, snapshot images should also be stored in iSCSI storage, which is a
block device.

Both internal and external snapshots can implement snapshots of block
devices, but they both have their drawbacks when multiple snapshots are
required.

Internal snapshots can only be used in qcow2 format and do not require
additional creation of new block devices. As you said, the block device has
much more space than the virtual disk. There is no telling when disk space
will be full when creating multiple snapshots.

External snapshots require the creation of additional block devices to store
the overlay images, but it is not clear how much space needs to be created.
If the space is the same as the virtual disk, when there are multiple
snapshots it will be a serious waste of disk space, because each time a new
snapshot is created the previous one will become read-only. However, if the
disk space created is too small, the snapshot data may not be stored when
the disk space is full.

The problem with both is the uncertainty of the space size of the block
device at the time of creation. Of course, we can rely on lvm's resize
function to dynamically grow the space of the block device. But I think this
is more of a workaround.


Yes, this is why I said it's challenging. oVirt uses resizing of LVs to
achieve this, and it's still very complex. You need to decide yourself
if you think implementing the management software for this is worth it.


It is mentioned in the Qemu docs page under "QEMU disk image utility" that
the qemu-img rebase can be used to perform a “diff” operation on two disk
images.

Say that base.img has been cloned as modified.img by copying it, and that
the modified.img guest has run so there are now some changes compared to
base.img. To construct a thin image called diff.qcow2 that contains just the
differences, do:

qemu-img create -f qcow2 -b modified.img diff.qcow2
qemu-img rebase -b base.img diff.qcow2

At this point, modified.img can be discarded, since base.img + diff.qcow2
contains the same information.

Can this “diff” operation be used on snapshots of block devices? The first
snapshot is a copy of the original disk (to save space we can copy only the
data that has already been used), while the subsequent snapshots are based
on the diff of the previous snapshot, so that the space required for the
created block device is known at the time of the snapshot.


Yes, you can use raw block devices for both base.img and modified.img.
But of course, the result is still a qcow2 file that you need to store
somewhere.

Kevin





Re: [PATCH v3 0/5] tsan fixes

2023-01-11 Thread Alex Bennée


Emilio Cota  writes:

> Changes since v2:
>
> - Add R-b's
> - patch 4/5: Fix incompatible pointer type errors
> - patch 4/5: Remove leftover helper
>
> Thanks,
>   Emilio

Queued to plugins/next, thanks.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



[PATCH v4 2/2] qtests/arm: add some mte tests

2023-01-11 Thread Cornelia Huck
Acked-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 tests/qtest/arm-cpu-features.c | 76 ++
 1 file changed, 76 insertions(+)

diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
index 5a145273860c..e264d2178a8b 100644
--- a/tests/qtest/arm-cpu-features.c
+++ b/tests/qtest/arm-cpu-features.c
@@ -22,6 +22,7 @@
 
 #define MACHINE "-machine virt,gic-version=max -accel tcg "
 #define MACHINE_KVM "-machine virt,gic-version=max -accel kvm -accel tcg "
+#define MACHINE_MTE "-machine virt,gic-version=max,mte=on -accel tcg "
 #define QUERY_HEAD  "{ 'execute': 'query-cpu-model-expansion', " \
 "  'arguments': { 'type': 'full', "
 #define QUERY_TAIL  "}}"
@@ -155,6 +156,18 @@ static bool resp_get_feature(QDict *resp, const char 
*feature)
 g_assert(qdict_get_bool(_props, feature) == (expected_value)); \
 })
 
+#define resp_assert_feature_str(resp, feature, expected_value) \
+({ \
+QDict *_props; \
+   \
+g_assert(_resp);   \
+g_assert(resp_has_props(_resp));   \
+_props = resp_get_props(_resp);\
+g_assert(qdict_get(_props, feature));  \
+g_assert_cmpstr(qdict_get_try_str(_props, feature), ==,\
+expected_value);   \
+})
+
 #define assert_feature(qts, cpu_type, feature, expected_value) \
 ({ \
 QDict *_resp;  \
@@ -165,6 +178,16 @@ static bool resp_get_feature(QDict *resp, const char 
*feature)
 qobject_unref(_resp);  \
 })
 
+#define assert_feature_str(qts, cpu_type, feature, expected_value) \
+({ \
+QDict *_resp;  \
+   \
+_resp = do_query_no_props(qts, cpu_type);  \
+g_assert(_resp);   \
+resp_assert_feature_str(_resp, feature, expected_value);   \
+qobject_unref(_resp);  \
+})
+
 #define assert_set_feature(qts, cpu_type, feature, value)  \
 ({ \
 const char *_fmt = (value) ? "{ %s: true }" : "{ %s: false }"; \
@@ -176,6 +199,16 @@ static bool resp_get_feature(QDict *resp, const char 
*feature)
 qobject_unref(_resp);  \
 })
 
+#define assert_set_feature_str(qts, cpu_type, feature, value, _fmt)\
+({ \
+QDict *_resp;  \
+   \
+_resp = do_query(qts, cpu_type, _fmt, feature);\
+g_assert(_resp);   \
+resp_assert_feature_str(_resp, feature, value);\
+qobject_unref(_resp);  \
+})
+
 #define assert_has_feature_enabled(qts, cpu_type, feature) \
 assert_feature(qts, cpu_type, feature, true)
 
@@ -412,6 +445,24 @@ static void sve_tests_sve_off_kvm(const void *data)
 qtest_quit(qts);
 }
 
+static void mte_tests_tag_memory_on(const void *data)
+{
+QTestState *qts;
+
+qts = qtest_init(MACHINE_MTE "-cpu max");
+
+/*
+ * With tag memory, "mte" should default to on, and explicitly specifying
+ * either on or off should be fine.
+ */
+assert_has_feature(qts, "max", "mte");
+
+assert_set_feature_str(qts, "max", "mte", "off", "{ 'mte': 'off' }");
+assert_set_feature_str(qts, "max", "mte", "on", "{ 'mte': 'on' }");
+
+qtest_quit(qts);
+}
+
 static void pauth_tests_default(QTestState *qts, const char *cpu_type)
 {
 assert_has_feature_enabled(qts, cpu_type, "pauth");
@@ -424,6 +475,21 @@ static void pauth_tests_default(QTestState *qts, const 
char *cpu_type)
  "{ 'pauth': false, 'pauth-impdef': true }");
 }
 
+static void mte_tests_default(QTestState *qts, const char *cpu_type)
+{
+assert_has_feature(qts, cpu_type, "mte");
+
+/*
+ * Without tag memory, mte will be off under tcg.
+ * Explicitly enabling it yields an error.
+ */
+assert_has_feature(qts, cpu_type, "mte");
+
+assert_set_feature_str(qts, "max", "mte", "off", "{ 'mte': 'off' }");
+  

[PATCH v4 1/2] arm/kvm: add support for MTE

2023-01-11 Thread Cornelia Huck
Introduce a new cpu feature flag to control MTE support. To preserve
backwards compatibility for tcg, MTE will continue to be enabled as
long as tag memory has been provided.

If MTE has been enabled, we need to disable migration, as we do not
yet have a way to migrate the tags as well. Therefore, MTE will stay
off with KVM unless requested explicitly.

Signed-off-by: Cornelia Huck 
---
 docs/system/arm/cpu-features.rst |  21 +
 hw/arm/virt.c|   2 +-
 target/arm/cpu.c |  18 ++---
 target/arm/cpu.h |   1 +
 target/arm/cpu64.c   | 133 +++
 target/arm/internals.h   |   1 +
 target/arm/kvm64.c   |   5 ++
 target/arm/kvm_arm.h |  12 +++
 target/arm/monitor.c |   1 +
 9 files changed, 181 insertions(+), 13 deletions(-)

diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
index 00c444042ff5..e278650c837e 100644
--- a/docs/system/arm/cpu-features.rst
+++ b/docs/system/arm/cpu-features.rst
@@ -443,3 +443,24 @@ As with ``sve-default-vector-length``, if the default 
length is larger
 than the maximum vector length enabled, the actual vector length will
 be reduced.  If this property is set to ``-1`` then the default vector
 length is set to the maximum possible length.
+
+MTE CPU Property
+
+
+The ``mte`` property controls the Memory Tagging Extension. For TCG, it 
requires
+presence of tag memory (which can be turned on for the ``virt`` machine via
+``mte=on``). For KVM, it requires the ``KVM_CAP_ARM_MTE`` capability; until
+proper migration support is implemented, enabling MTE will install a migration
+blocker.
+
+If not specified explicitly via ``on`` or ``off``, MTE will be available
+according to the following rules:
+
+* When TCG is used, MTE will be available iff tag memory is available; i.e. it
+  preserves the behaviour prior to introduction of the feature.
+
+* When KVM is used, MTE will default to off, so that migration will not
+  unintentionally be blocked.
+
+* Other accelerators currently don't support MTE.
+
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ea2413a0bad7..42359e256ad0 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2136,7 +2136,7 @@ static void machvirt_init(MachineState *machine)
 
 if (vms->mte && (kvm_enabled() || hvf_enabled())) {
 error_report("mach-virt: %s does not support providing "
- "MTE to the guest CPU",
+ "emulated MTE to the guest CPU",
  kvm_enabled() ? "KVM" : "HVF");
 exit(1);
 }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 5f63316dbf22..decab743d0d5 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1529,6 +1529,11 @@ void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp)
 error_propagate(errp, local_err);
 return;
 }
+arm_cpu_mte_finalize(cpu, &local_err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
 }
 #endif
 
@@ -1605,7 +1610,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 }
 if (cpu->tag_memory) {
 error_setg(errp,
-   "Cannot enable %s when guest CPUs has MTE enabled",
+   "Cannot enable %s when guest CPUs has tag memory 
enabled",
current_accel_name());
 return;
 }
@@ -1984,17 +1989,6 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
ID_PFR1, VIRTUALIZATION, 0);
 }
 
-#ifndef CONFIG_USER_ONLY
-if (cpu->tag_memory == NULL && cpu_isar_feature(aa64_mte, cpu)) {
-/*
- * Disable the MTE feature bits if we do not have tag-memory
- * provided by the machine.
- */
-cpu->isar.id_aa64pfr1 =
-FIELD_DP64(cpu->isar.id_aa64pfr1, ID_AA64PFR1, MTE, 0);
-}
-#endif
-
 if (tcg_enabled()) {
 /*
  * Don't report the Statistical Profiling Extension in the ID
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index bf2bce046d56..f1a9015a7ed7 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1038,6 +1038,7 @@ struct ArchCPU {
 bool prop_pauth;
 bool prop_pauth_impdef;
 bool prop_lpa2;
+OnOffAuto prop_mte;
 
 /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
 uint32_t dcz_blocksize;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 0e021960fb5b..3cf42ee05ca3 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -29,6 +29,13 @@
 #include "qapi/visitor.h"
 #include "hw/qdev-properties.h"
 #include "internals.h"
+#include "migration/blocker.h"
+#include "qapi/qapi-visit-common.h"
+#include "hw/arm/virt.h"
+
+#ifdef CONFIG_KVM
+static Error *mte_migration_blocker;
+#endif
 
 static void aarch64_a35_initfn(Object *obj)
 {
@@ -1096,6 +1103,130 @@ static void aarch64_neo

[PATCH v4 0/2] arm: enable MTE for QEMU + kvm

2023-01-11 Thread Cornelia Huck
Here's another repost of my kvm/mte series; no substantial changes.

Changes v3->v4:
- rebase to current master
- tweak message when specifying "mte=on" for the virt machine for non-tcg
- added Thomas' ack for the qtests patch

Cornelia Huck (2):
  arm/kvm: add support for MTE
  qtests/arm: add some mte tests

 docs/system/arm/cpu-features.rst |  21 +
 hw/arm/virt.c|   2 +-
 target/arm/cpu.c |  18 ++---
 target/arm/cpu.h |   1 +
 target/arm/cpu64.c   | 133 +++
 target/arm/internals.h   |   1 +
 target/arm/kvm64.c   |   5 ++
 target/arm/kvm_arm.h |  12 +++
 target/arm/monitor.c |   1 +
 tests/qtest/arm-cpu-features.c   |  76 ++
 10 files changed, 257 insertions(+), 13 deletions(-)

-- 
2.39.0




Re: [PATCH 6/8] qemu/bswap: Add const_le64()

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 16:49, Philippe Mathieu-Daudé wrote:

On 11/1/23 15:24, Jonathan Cameron via wrote:

From: Ira Weiny 

Gcc requires constant versions of cpu_to_le* calls.

Add a 64 bit version.

Reviewed-by: Peter Maydell 
Signed-off-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 
---
  include/qemu/bswap.h | 10 ++
  1 file changed, 10 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 


Actually I thought this was already merged but apparently
this never got in:
https://lore.kernel.org/qemu-devel/20200928131934.739451-1-phi...@redhat.com/



Re: [PATCH 1/2] target/riscv/cpu: set cpu->cfg in register_cpu_props()

2023-01-11 Thread Daniel Henrique Barboza




On 1/11/23 02:39, Richard Henderson wrote:

On 1/10/23 12:14, Daniel Henrique Barboza wrote:

+/*
+ * Register CPU props based on env.misa_ext. If a non-zero
+ * value was set, register only the required cpu->cfg.ext_*
+ * properties and leave. env.misa_ext = 0 means that we want
+ * all the default properties to be registered.
+ */
  static void register_cpu_props(DeviceState *dev)


Suggest invoking this as .instance_post_init hook on TYPE_RISCV_CPU.
Then you don't need to manually call it on every cpu class.


That would be nice but we have code such as:

@@ -317,7 +310,6 @@ static void rv32_sifive_e_cpu_init(Object *obj)
 RISCVCPU *cpu = RISCV_CPU(obj);

 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVC | RVU);
 register_cpu_props(DEVICE(obj));
 set_priv_version(env, PRIV_VERSION_1_10_0);
 cpu->cfg.mmu = false; <===


That are setting cpu->cfg attrs after register_cpu_props(), i.e. "I want the
defaults and these specific settings on top of it".

I can think of a few ways to add a a post_init hook to reduce this code
repetition but I'll need to play around with it a bit first.

Thanks,

Daniel




r~





Re: [PATCH 8/8] hw/cxl/mailbox: Use new UUID network order define for cel_uuid

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 15:24, Jonathan Cameron via wrote:

From: Ira Weiny 

The cel_uuid was programatically generated previously because there was
no static initializer for network order UUIDs.

Use the new network order initializer for cel_uuid.  Adjust
cxl_initialize_mailbox() because it can't fail now.

Update specification reference.

Signed-off-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 
---
  hw/cxl/cxl-device-utils.c   |  2 +-
  hw/cxl/cxl-mailbox-utils.c  | 13 ++---
  include/hw/cxl/cxl_device.h |  2 +-
  3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 83ce7a8270..4c5e88aaf5 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -267,5 +267,5 @@ void cxl_device_register_init_common(CXLDeviceState 
*cxl_dstate)
  cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000);
  memdev_reg_init_common(cxl_dstate);
  
-assert(cxl_initialize_mailbox(cxl_dstate) == 0);

+cxl_initialize_mailbox(cxl_dstate);
  }
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 942de73bbc..7ed344a754 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -192,7 +192,11 @@ static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
  return CXL_MBOX_SUCCESS;
  }
  
-static QemuUUID cel_uuid;

+/* CXL 3.0 8.2.9.5.2.1 Command Effects Log (CEL) */
+static QemuUUID cel_uuid = {


static const.

Otherwise:

Reviewed-by: Philippe Mathieu-Daudé 


+.data = UUID(0x0da9c0b5, 0xbf41, 0x4b78, 0x8f, 0x79,
+ 0x96, 0xb1, 0x62, 0x3b, 0x3f, 0x17)
+};
  
  /* 8.2.9.4.1 */

  static ret_code cmd_logs_get_supported(struct cxl_cmd *cmd,
@@ -457,11 +461,8 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
   DOORBELL, 0);
  }




Re: [PATCH 6/8] qemu/bswap: Add const_le64()

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 15:24, Jonathan Cameron via wrote:

From: Ira Weiny 

Gcc requires constant versions of cpu_to_le* calls.

Add a 64 bit version.

Reviewed-by: Peter Maydell 
Signed-off-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 
---
  include/qemu/bswap.h | 10 ++
  1 file changed, 10 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 4/8] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition

2023-01-11 Thread Philippe Mathieu-Daudé

On 11/1/23 15:24, Jonathan Cameron via wrote:

From: Gregory Price 

Remove usage of magic numbers when accessing capacity fields and replace
with CXL_CAPACITY_MULTIPLIER, matching the kernel definition.

Signed-off-by: Gregory Price 
Reviewed-by: Davidlohr Bueso 
Signed-off-by: Jonathan Cameron 
---
  hw/cxl/cxl-mailbox-utils.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index bc1bb18844..942de73bbc 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -14,6 +14,8 @@
  #include "qemu/log.h"
  #include "qemu/uuid.h"
  
+#define CXL_CAPACITY_MULTIPLIER   0x1000 /* SZ_256M */


Do you mind changing to the self-documenting (256 * MiB) ?



Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-11 Thread Chuck Zmudzinski
On 1/10/23 3:16 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 10, 2023 at 02:08:34AM -0500, Chuck Zmudzinski wrote:
>> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
>> as noted in docs/igd-assign.txt in the Qemu source code.
>> 
>> Currently, when the xl toolstack is used to configure a Xen HVM guest with
>> Intel IGD passthrough to the guest with the Qemu upstream device model,
>> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
>> a different slot. This problem often prevents the guest from booting.
>> 
>> The only available workaround is not good: Configure Xen HVM guests to use
>> the old and no longer maintained Qemu traditional device model available
>> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
>> 
>> To implement this feature in the Qemu upstream device model for Xen HVM
>> guests, introduce the following new functions, types, and macros:
>> 
>> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
>> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
>> * typedef XenPTQdevRealize function pointer
>> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
>> * xen_igd_reserve_slot and xen_igd_clear_slot functions
>> 
>> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
>> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
>> the xl toolstack with the gfx_passthru option enabled, which sets the
>> igd-passthru=on option to Qemu for the Xen HVM machine type.
>> 
>> The new xen_igd_reserve_slot function also needs to be implemented in
>> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
>> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
>> in which case it does nothing.
>> 
>> The new xen_igd_clear_slot function overrides qdev->realize of the parent
>> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
>> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
>> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
>> 
>> Move the call to xen_host_pci_device_get, and the associated error
>> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
>> initialize the device class and vendor values which enables the checks for
>> the Intel IGD to succeed. The verification that the host device is an
>> Intel IGD to be passed through is done by checking the domain, bus, slot,
>> and function values as well as by checking that gfx_passthru is enabled,
>> the device class is VGA, and the device vendor in Intel.
>> 
>> Signed-off-by: Chuck Zmudzinski 
>> ---
>> Notes that might be helpful to reviewers of patched code in hw/xen:
>> 
>> The new functions and types are based on recommendations from Qemu docs:
>> https://qemu.readthedocs.io/en/latest/devel/qom.html
>> 
>> Notes that might be helpful to reviewers of patched code in hw/i386:
>> 
>> The small patch to hw/i386/pc_piix.c is protected by CONFIG_XEN so it does
>> not affect builds that do not have CONFIG_XEN defined.
>> 
>> xen_igd_gfx_pt_enabled() in the patched hw/i386/pc_piix.c file is an
>> existing function that is only true when Qemu is built with
>> xen-pci-passthrough enabled and the administrator has configured the Xen
>> HVM guest with Qemu's igd-passthru=on option.
>> 
>> v2: Remove From:  tag at top of commit message
>> 
>> v3: Changed the test for the Intel IGD in xen_igd_clear_slot:
>> 
>> if (is_igd_vga_passthrough(&s->real_device) &&
>> (s->real_device.vendor_id == PCI_VENDOR_ID_INTEL)) {
>> 
>> is changed to
>> 
>> if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
>> && (s->hostaddr.function == 0)) {
>> 
>> I hoped that I could use the test in v2, since it matches the
>> other tests for the Intel IGD in Qemu and Xen, but those tests
>> do not work because the necessary data structures are not set with
>> their values yet. So instead use the test that the administrator
>> has enabled gfx_passthru and the device address on the host is
>> 02.0. This test does detect the Intel IGD correctly.
>> 
>> v4: Use brchu...@aol.com instead of brchu...@netscape.net for the author's
>> email address to match the address used by the same author in commits
>> be9c61da and c0e86b76
>> 
>> Change variable for XEN_PT_DEVICE_CLASS: xptc changed to xpdc
>> 
>> v5: The patch of xen_pt.c was re-worked to allow a more consistent test
>> for the Intel IGD that uses the same criteria as in other places.
>> This involved moving the call to xen_host_pci_device_get from
>> xen_pt_realize to xen_igd_clear_slot and updating the checks for the
>> Intel IGD in xen_igd_clear_slot:
>> 
>> if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
>> && (s->hostaddr.function == 0)) {
>> 
>> is changed to
>> 
>> if (is_igd_vga_passthrough(&s->real_device) &&
>> s->r

Re: [PATCH v4 1/1] python/machine: Fix AF_UNIX path too long on macOS

2023-01-11 Thread Daniel P . Berrangé
On Wed, Jan 11, 2023 at 10:06:49AM -0500, John Snow wrote:
> On Wed, Jan 11, 2023 at 4:07 AM Daniel P. Berrangé  
> wrote:
> >
> > On Tue, Jan 10, 2023 at 06:18:29PM -0500, John Snow wrote:
> > > On Tue, Jan 10, 2023 at 3:34 AM Peter Delevoryas  wrote:
> > > >
> > > > On macOS, private $TMPDIR's are the default. These $TMPDIR's are
> > > > generated from a user's unix UID and UUID [1], which can create a
> > > > relatively long path:
> > > >
> > > > /var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T/
> > > >
> > > > QEMU's avocado tests create a temporary directory prefixed by
> > > > "avo_qemu_sock_", and create QMP sockets within _that_ as well.
> > > > The QMP socket is unnecessarily long, because a temporary directory
> > > > is created for every QEMUMachine object.
> > > >
> > > > /avo_qemu_sock_uh3w_dgc/qemu-37331-10bacf110-monitor.sock
> > > >
> > > > The path limit for unix sockets on macOS is 104: [2]
> > > >
> > > > /*
> > > >  * [XSI] Definitions for UNIX IPC domain.
> > > >  */
> > > > struct  sockaddr_un {
> > > > unsigned char   sun_len;/* sockaddr len including null 
> > > > */
> > > > sa_family_t sun_family; /* [XSI] AF_UNIX */
> > > > charsun_path[104];  /* [XSI] path name (gag) */
> > > > };
> > > >
> > > > This results in avocado tests failing on macOS because the QMP unix
> > > > socket can't be created, because the path is too long:
> > > >
> > > > ERROR| Failed to establish connection: OSError: AF_UNIX path too 
> > > > long
> > > >
> > > > This change resolves by reducing the size of the socket directory prefix
> > > > and the suffix on the QMP and console socket names.
> > > >
> > > > The result is paths like this:
> > > >
> > > > pdel@pdel-mbp:/var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T
> > > > $ tree qemu*
> > > > qemu_df4evjeq
> > > > qemu_jbxel3gy
> > > > qemu_ml9s_gg7
> > > > qemu_oc7h7f3u
> > > > qemu_oqb1yf97
> > > > ├── 10a004050.con
> > > > └── 10a004050.qmp
> > > >
> > > > [1] 
> > > > https://apple.stackexchange.com/questions/353832/why-is-mac-osx-temp-directory-in-weird-path
> > > > [2] 
> > > > /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/usr/include/sys/un.h
> > > >
> > > > Signed-off-by: Peter Delevoryas 
> > >
> > > I'm tentatively staging this with a benefit-of-the-doubt [1] -- my
> > > tests are still running -- but I do have a question:
> > >
> > > > ---
> > > >  python/qemu/machine/machine.py | 6 +++---
> > > >  tests/avocado/avocado_qemu/__init__.py | 2 +-
> > > >  2 files changed, 4 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/python/qemu/machine/machine.py 
> > > > b/python/qemu/machine/machine.py
> > > > index 748a0d807c9d..d70977378305 100644
> > > > --- a/python/qemu/machine/machine.py
> > > > +++ b/python/qemu/machine/machine.py
> > > > @@ -157,7 +157,7 @@ def __init__(self,
> > > >  self._wrapper = wrapper
> > > >  self._qmp_timer = qmp_timer
> > > >
> > > > -self._name = name or f"qemu-{os.getpid()}-{id(self):02x}"
> > > > +self._name = name or f"{id(self):x}"
> > >
> > > Why is it safe to not differentiate based on the process ID?
> > >
> > > ... I suppose the thinking is: by default, in machine.py, this is a
> > > temp dir created by tempfile.mkdtemp which will be unique per-process.
> > > I suppose there's no protection against a caller supplying the same
> > > tempdir (or sockdir) to multiple instances, but I suppose in those
> > > cases we get to argue that "Well, don't do that, then."
> >
> > Every process will have a separate tempdir, and if there are
> > multiple instances of this class, 'id(self)' will provide
> > uniqueness within the process.
> 
> Right. The only small thing is if a caller passes the same directory
> to multiple instances across multiple processes, you could
> *theoretically* get a collision, and we don't guard against it. It's
> not a super likely occurrence so I'm fine with ignoring it, but I
> think it's technically possible.

If they want to be insane, then they can also pass a 'name' explicitly
to override the default socket path choice.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v3 5/5] plugins: make qemu_plugin_user_exit's locking order consistent with fork_start's

2023-01-11 Thread Emilio Cota
To fix potential deadlocks as reported by tsan.

Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Emilio Cota 
---
 plugins/core.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/plugins/core.c b/plugins/core.c
index ccb770a485..728bacef95 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -500,10 +500,17 @@ void qemu_plugin_user_exit(void)
 enum qemu_plugin_event ev;
 CPUState *cpu;
 
-QEMU_LOCK_GUARD(&plugin.lock);
-
+/*
+ * Locking order: we must acquire locks in an order that is consistent
+ * with the one in fork_start(). That is:
+ * - start_exclusive(), which acquires qemu_cpu_list_lock,
+ *   must be called before acquiring plugin.lock.
+ * - tb_flush(), which acquires mmap_lock(), must be called
+ *   while plugin.lock is not held.
+ */
 start_exclusive();
 
+qemu_rec_mutex_lock(&plugin.lock);
 /* un-register all callbacks except the final AT_EXIT one */
 for (ev = 0; ev < QEMU_PLUGIN_EV_MAX; ev++) {
 if (ev != QEMU_PLUGIN_EV_ATEXIT) {
@@ -513,13 +520,12 @@ void qemu_plugin_user_exit(void)
 }
 }
 }
-
-tb_flush(current_cpu);
-
 CPU_FOREACH(cpu) {
 qemu_plugin_disable_mem_helpers(cpu);
 }
+qemu_rec_mutex_unlock(&plugin.lock);
 
+tb_flush(current_cpu);
 end_exclusive();
 
 /* now it's safe to handle the exit case */
-- 
2.34.1




[PATCH v3 0/5] tsan fixes

2023-01-11 Thread Emilio Cota
Changes since v2:

- Add R-b's
- patch 4/5: Fix incompatible pointer type errors
- patch 4/5: Remove leftover helper

Thanks,
Emilio





[PATCH v3 4/5] util/qht: use striped locks under TSAN

2023-01-11 Thread Emilio Cota
Fixes this tsan crash, easy to reproduce with any large enough program:

$ tests/unit/test-qht
1..2
ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 
"((n_all_locks_)) < 
(((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]" 
(0x40, 0x40) (tid=1821568)
#0 __tsan::CheckUnwind() ../../../../src/libsanitizer/tsan/tsan_rtl.cpp:353 
(libtsan.so.2+0x90034)
#1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long 
long, unsigned long long) 
../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86 
(libtsan.so.2+0xca555)
#2 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, 
__sanitizer::BasicBitVector > >::addLock(unsigned long, unsigned 
long, unsigned int) 
../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:67 
(libtsan.so.2+0xb3616)
#3 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, 
__sanitizer::BasicBitVector > >::addLock(unsigned long, unsigned 
long, unsigned int) 
../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:59 
(libtsan.so.2+0xb3616)
#4 __sanitizer::DeadlockDetector<__sanitizer::TwoLevelBitVector<1ul, 
__sanitizer::BasicBitVector > 
>::onLockAfter(__sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul,
 __sanitizer::BasicBitVector > >*, unsigned long, unsigned int) 
../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:216 
(libtsan.so.2+0xb3616)
#5 __sanitizer::DD::MutexAfterLock(__sanitizer::DDCallback*, 
__sanitizer::DDMutex*, bool, bool) 
../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector1.cpp:169
 (libtsan.so.2+0xb3616)
#6 __tsan::MutexPostLock(__tsan::ThreadState*, unsigned long, unsigned 
long, unsigned int, int) 
../../../../src/libsanitizer/tsan/tsan_rtl_mutex.cpp:200 (libtsan.so.2+0xa3382)
#7 __tsan_mutex_post_lock 
../../../../src/libsanitizer/tsan/tsan_interface_ann.cpp:384 
(libtsan.so.2+0x76bc3)
#8 qemu_spin_lock /home/cota/src/qemu/include/qemu/thread.h:259 
(test-qht+0x44a97)
#9 qht_map_lock_buckets ../util/qht.c:253 (test-qht+0x44a97)
#10 do_qht_iter ../util/qht.c:809 (test-qht+0x45f33)
#11 qht_iter ../util/qht.c:821 (test-qht+0x45f33)
#12 iter_check ../tests/unit/test-qht.c:121 (test-qht+0xe473)
#13 qht_do_test ../tests/unit/test-qht.c:202 (test-qht+0xe473)
#14 qht_test ../tests/unit/test-qht.c:240 (test-qht+0xe7c1)
#15 test_default ../tests/unit/test-qht.c:246 (test-qht+0xe828)
#16   (libglib-2.0.so.0+0x7daed)
#17   (libglib-2.0.so.0+0x7d80a)
#18   (libglib-2.0.so.0+0x7d80a)
#19 g_test_run_suite  (libglib-2.0.so.0+0x7dfe9)
#20 g_test_run  (libglib-2.0.so.0+0x7e055)
#21 main ../tests/unit/test-qht.c:259 (test-qht+0xd2c6)
#22 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 
(libc.so.6+0x29d8f)
#23 __libc_start_main_impl ../csu/libc-start.c:392 (libc.so.6+0x29e3f)
#24 _start  (test-qht+0xdb44)

Signed-off-by: Emilio Cota 
---
 util/qht.c | 95 ++
 1 file changed, 81 insertions(+), 14 deletions(-)

diff --git a/util/qht.c b/util/qht.c
index 15866299e6..92c6b78759 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -151,6 +151,22 @@ struct qht_bucket {
 
 QEMU_BUILD_BUG_ON(sizeof(struct qht_bucket) > QHT_BUCKET_ALIGN);
 
+/*
+ * Under TSAN, we use striped locks instead of one lock per bucket chain.
+ * This avoids crashing under TSAN, since TSAN aborts the program if more than
+ * 64 locks are held (this is a hardcoded limit in TSAN).
+ * When resizing a QHT we grab all the buckets' locks, which can easily
+ * go over TSAN's limit. By using striped locks, we avoid this problem.
+ *
+ * Note: this number must be a power of two for easy index computation.
+ */
+#define QHT_TSAN_BUCKET_LOCKS_BITS 4
+#define QHT_TSAN_BUCKET_LOCKS (1 << QHT_TSAN_BUCKET_LOCKS_BITS)
+
+struct qht_tsan_lock {
+QemuSpin lock;
+} QEMU_ALIGNED(QHT_BUCKET_ALIGN);
+
 /**
  * struct qht_map - structure to track an array of buckets
  * @rcu: used by RCU. Keep it as the top field in the struct to help valgrind
@@ -160,6 +176,7 @@ QEMU_BUILD_BUG_ON(sizeof(struct qht_bucket) > 
QHT_BUCKET_ALIGN);
  * @n_added_buckets: number of added (i.e. "non-head") buckets
  * @n_added_buckets_threshold: threshold to trigger an upward resize once the
  * number of added buckets surpasses it.
+ * @tsan_bucket_locks: Array of striped locks to be used only under TSAN.
  *
  * Buckets are tracked in what we call a "map", i.e. this structure.
  */
@@ -169,6 +186,9 @@ struct qht_map {
 size_t n_buckets;
 size_t n_added_buckets;
 size_t n_added_buckets_threshold;
+#ifdef CONFIG_TSAN
+struct qht_tsan_lock tsan_bucket_locks[QHT_TSAN_BUCKET_LOCKS];
+#endif
 };
 
 /* trigger a resize when n_added_buckets > n_buckets / div */
@@ -229,10 +249,56 @@ static inline size_t qht_elems_to_buckets(size_t n_elems)
 return pow2ceil

[PATCH v3 2/5] util/qht: add missing atomic_set(hashes[i])

2023-01-11 Thread Emilio Cota
We forgot to add this one in "a890643958 util/qht: atomically set b->hashes".

Detected with tsan.

Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio Cota 
---
 util/qht.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/qht.c b/util/qht.c
index 065fc501f4..15866299e6 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -688,7 +688,7 @@ static inline void qht_bucket_remove_entry(struct 
qht_bucket *orig, int pos)
 int i;
 
 if (qht_entry_is_last(orig, pos)) {
-orig->hashes[pos] = 0;
+qatomic_set(&orig->hashes[pos], 0);
 qatomic_set(&orig->pointers[pos], NULL);
 return;
 }
-- 
2.34.1




  1   2   3   >