date:20240704

Re: [PATCH v1 0/8] PRI support for VT-d

2024-07-04 Thread Yi Liu


On 2024/7/5 13:13, CLEMENT MATHIEU--DRIF wrote:


On 05/07/2024 05:03, Yi Liu wrote:

Caution: External email. Do not open attachments or click links,
unless this email comes from a known sender and you know the content
is safe.


On 2024/5/30 20:24, CLEMENT MATHIEU--DRIF wrote:

This series belongs to a list of series that add SVM support for VT-d.

Here we focus on the implementation of PRI support in the IOMMU and
on a PCI-level
API for PRI to be used by virtual devices.

This work is based on the VT-d specification version 4.1 (March 2023).
Here is a link to a GitHub repository where you can find the
following elements :
  - Qemu with all the patches for SVM
  - ATS
  - PRI
  - Device IOTLB invalidations
  - Requests with already translated addresses
  - A demo device
  - A simple driver for the demo device
  - A userspace program (for testing and demonstration purposes)


I didn't see the drain PRQ related logics in this series. Please consider
adding it in next version. It's needed when repurposing a PASID.


Hi,

Are you talking about wait descriptors with SW = 0, IF = 0, FN = 1
(section 7.10 of VT-d)?

I'll move that to the PRI series.


yes. But not only that patch. When guest software submitting the
descriptors per CH7.10 of VT-d spec, QEMU need to emulate the
PRQ drain behavior.




https://github.com/BullSequana/Qemu-in-guest-SVM-demo


Clément Mathieu--Drif (8):
pcie: add a helper to declare the PRI capability for a pcie device
pcie: helper functions to check to check if PRI is enabled
pcie: add a way to get the outstanding page request allocation (pri)
  from the config space.
pci: declare structures and IOMMU operation for PRI
pci: add a PCI-level API for PRI
intel_iommu: declare PRI constants and structures
intel_iommu: declare registers for PRI
intel_iommu: add PRI operations support

   hw/i386/intel_iommu.c  | 302 +
   hw/i386/intel_iommu_internal.h |  54 +-
   hw/pci/pci.c   |  37 
   hw/pci/pcie.c  |  42 +
   include/exec/memory.h  |  65 +++
   include/hw/pci/pci.h   |  45 +
   include/hw/pci/pci_bus.h   |   1 +
   include/hw/pci/pcie.h  |   7 +-
   include/hw/pci/pcie_regs.h |   4 +
   system/memory.c|  49 ++
   10 files changed, 604 insertions(+), 2 deletions(-)



--
Regards,
Yi Liu


--
Regards,
Yi Liu

Re: [PATCH 5/8] aspeed: Set eMMC 'boot-config' property to reflect HW strapping

2024-07-04 Thread Cédric Le Goater


On 7/5/24 5:41 AM, Andrew Jeffery wrote:

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:

From: Cédric Le Goater 

When the boot-from-eMMC HW strapping bit is set, use the 'boot-config'
property to set the boot config register to boot from the first boot
area partition of the eMMC device.

Signed-off-by: Cédric Le Goater 
---
  hw/arm/aspeed.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 756deb91efd1..135f4eb72215 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -327,7 +327,8 @@ void aspeed_board_init_flashes(AspeedSMCState *s, const 
char *flashtype,
  }
  }
  
-static void sdhci_attach_drive(SDHCIState *sdhci, DriveInfo *dinfo, bool emmc)

+static void sdhci_attach_drive(SDHCIState *sdhci, DriveInfo *dinfo, bool emmc,
+   bool boot_emmc)
  {
  DeviceState *card;
  
@@ -335,6 +336,9 @@ static void sdhci_attach_drive(SDHCIState *sdhci, DriveInfo *dinfo, bool emmc)

  return;
  }
  card = qdev_new(emmc ? TYPE_EMMC : TYPE_SD_CARD);
+if (emmc) {
+qdev_prop_set_uint8(card, "boot-config", boot_emmc ? 0x48 : 0x0);


0x48 feels a little bit magic. I poked around a bit and there are some
boot-config macros, but not the ones you need and they're all in an
"internal" header anyway. I guess this is fine for now?


You are right and we should be using these :

hw/sd/sdmmc-internal.h:#define EXT_CSD_PART_CONFIG 179 /* R/W */

hw/sd/sdmmc-internal.h:#define EXT_CSD_PART_CONFIG_ACC_MASK(0x7)
hw/sd/sdmmc-internal.h:#define EXT_CSD_PART_CONFIG_ACC_DEFAULT (0x0)
hw/sd/sdmmc-internal.h:#define EXT_CSD_PART_CONFIG_ACC_BOOT0   (0x1)
hw/sd/sdmmc-internal.h:#define EXT_CSD_PART_CONFIG_EN_MASK (0x7 << 
3)
hw/sd/sdmmc-internal.h:#define EXT_CSD_PART_CONFIG_EN_BOOT0(0x1 << 
3)
hw/sd/sdmmc-internal.h:#define EXT_CSD_PART_CONFIG_EN_USER (0x7 << 
3)

So I wonder where the 0x48 is coming from. Will change.


Reviewed-by: Andrew Jeffery 




Thanks,

C.

Re: [PATCH 3/8] aspeed/scu: Add boot-from-eMMC HW strapping bit for AST2600 SoC

2024-07-04 Thread Cédric Le Goater


On 7/5/24 5:36 AM, Andrew Jeffery wrote:

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:

From: Cédric Le Goater 

Bit SCU500[2] of the AST2600 controls the boot device of the SoC.

Future changes will configure this bit to boot from eMMC disk images
specially built for this purpose.

Signed-off-by: Joel Stanley 
Signed-off-by: Cédric Le Goater 
---
  include/hw/misc/aspeed_scu.h | 4 
  1 file changed, 4 insertions(+)

diff --git a/include/hw/misc/aspeed_scu.h b/include/hw/misc/aspeed_scu.h
index 58db28db45aa..c9f98c20ffd9 100644
--- a/include/hw/misc/aspeed_scu.h
+++ b/include/hw/misc/aspeed_scu.h
@@ -349,6 +349,10 @@ uint32_t aspeed_scu_get_apb_freq(AspeedSCUState *s);
  #define SCU_AST2600_H_PLL_BYPASS_EN(0x1 << 24)
  #define SCU_AST2600_H_PLL_OFF  (0x1 << 23)
  
+/* STRAP1 SCU500 */

+#define AST2600_HW_STRAP_BOOT_SRC_EMMC(0x1 << 2)
+#define AST2600_HW_STRAP_BOOT_SRC_SPI (0x0 << 2)


Maybe these should have a `SCU_` prefix for consistency?


Yep. I agree.  

Anyway:

Reviewed-by: Andrew Jeffery 




Thanks,

C.

[PATCH] meson: Update meson-buildoptions.sh

2024-07-04 Thread Zhao Liu

Update meson-buildoptions.sh to stay in sync with meson_options.txt.

Signed-off-by: Zhao Liu 
---
 scripts/meson-buildoptions.sh | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index cfadb5ea86af..c97079a38c9e 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -83,7 +83,7 @@ meson_options_help() {
   printf "%s\n" '   (can be empty) [qemu]'
   printf "%s\n" '  --with-trace-file=VALUE  Trace file prefix for simple 
backend [trace]'
   printf "%s\n" '  --x86-version=CHOICE tweak required x86_64 architecture 
version beyond'
-  printf "%s\n" '   compiler default [1] (choices: 
0/1/2/3)'
+  printf "%s\n" '   compiler default [1] (choices: 
0/1/2/3/4)'
   printf "%s\n" ''
   printf "%s\n" 'Optional features, enabled with --enable-FEATURE and'
   printf "%s\n" 'disabled with --disable-FEATURE, default is enabled if 
available'
@@ -166,6 +166,7 @@ meson_options_help() {
   printf "%s\n" '  qcow1   qcow1 image format support'
   printf "%s\n" '  qed qed image format support'
   printf "%s\n" '  qga-vss build QGA VSS support (broken with MinGW)'
+  printf "%s\n" '  qpl Query Processing Library support'
   printf "%s\n" '  rbd Ceph block device driver'
   printf "%s\n" '  rdmaEnable RDMA-based migration'
   printf "%s\n" '  replication replication support'
@@ -187,6 +188,7 @@ meson_options_help() {
   printf "%s\n" '  tools   build support utilities that come with QEMU'
   printf "%s\n" '  tpm TPM support'
   printf "%s\n" '  u2f U2F emulation support'
+  printf "%s\n" '  uadkUADK Library support'
   printf "%s\n" '  usb-redir   libusbredir support'
   printf "%s\n" '  vde vde network backend support'
   printf "%s\n" '  vdi vdi image format support'
@@ -221,8 +223,6 @@ meson_options_help() {
   printf "%s\n" '  Xen PCI passthrough support'
   printf "%s\n" '  xkbcommon   xkbcommon support'
   printf "%s\n" '  zstdzstd compression support'
-  printf "%s\n" '  qpl Query Processing Library support'
-  printf "%s\n" '  uadkUADK Library support'
 }
 _meson_option_parse() {
   case $1 in
@@ -440,6 +440,8 @@ _meson_option_parse() {
 --disable-qga-vss) printf "%s" -Dqga_vss=disabled ;;
 --enable-qom-cast-debug) printf "%s" -Dqom_cast_debug=true ;;
 --disable-qom-cast-debug) printf "%s" -Dqom_cast_debug=false ;;
+--enable-qpl) printf "%s" -Dqpl=enabled ;;
+--disable-qpl) printf "%s" -Dqpl=disabled ;;
 --enable-rbd) printf "%s" -Drbd=enabled ;;
 --disable-rbd) printf "%s" -Drbd=disabled ;;
 --enable-rdma) printf "%s" -Drdma=enabled ;;
@@ -501,6 +503,8 @@ _meson_option_parse() {
 --disable-tsan) printf "%s" -Dtsan=false ;;
 --enable-u2f) printf "%s" -Du2f=enabled ;;
 --disable-u2f) printf "%s" -Du2f=disabled ;;
+--enable-uadk) printf "%s" -Duadk=enabled ;;
+--disable-uadk) printf "%s" -Duadk=disabled ;;
 --enable-usb-redir) printf "%s" -Dusb_redir=enabled ;;
 --disable-usb-redir) printf "%s" -Dusb_redir=disabled ;;
 --enable-vde) printf "%s" -Dvde=enabled ;;
@@ -560,10 +564,6 @@ _meson_option_parse() {
 --disable-xkbcommon) printf "%s" -Dxkbcommon=disabled ;;
 --enable-zstd) printf "%s" -Dzstd=enabled ;;
 --disable-zstd) printf "%s" -Dzstd=disabled ;;
---enable-qpl) printf "%s" -Dqpl=enabled ;;
---disable-qpl) printf "%s" -Dqpl=disabled ;;
---enable-uadk) printf "%s" -Duadk=enabled ;;
---disable-uadk) printf "%s" -Duadk=disabled ;;
 *) return 1 ;;
   esac
 }
-- 
2.34.1

Re: [PATCH v1 0/8] PRI support for VT-d

2024-07-04 Thread CLEMENT MATHIEU--DRIF


On 05/07/2024 05:03, Yi Liu wrote:
> Caution: External email. Do not open attachments or click links,
> unless this email comes from a known sender and you know the content
> is safe.
>
>
> On 2024/5/30 20:24, CLEMENT MATHIEU--DRIF wrote:
>> This series belongs to a list of series that add SVM support for VT-d.
>>
>> Here we focus on the implementation of PRI support in the IOMMU and
>> on a PCI-level
>> API for PRI to be used by virtual devices.
>>
>> This work is based on the VT-d specification version 4.1 (March 2023).
>> Here is a link to a GitHub repository where you can find the
>> following elements :
>>  - Qemu with all the patches for SVM
>>  - ATS
>>  - PRI
>>  - Device IOTLB invalidations
>>  - Requests with already translated addresses
>>  - A demo device
>>  - A simple driver for the demo device
>>  - A userspace program (for testing and demonstration purposes)
>
> I didn't see the drain PRQ related logics in this series. Please consider
> adding it in next version. It's needed when repurposing a PASID.

Hi,

Are you talking about wait descriptors with SW = 0, IF = 0, FN = 1
(section 7.10 of VT-d)?

I'll move that to the PRI series.

>
>> https://github.com/BullSequana/Qemu-in-guest-SVM-demo
>>
>>
>> Clément Mathieu--Drif (8):
>>pcie: add a helper to declare the PRI capability for a pcie device
>>pcie: helper functions to check to check if PRI is enabled
>>pcie: add a way to get the outstanding page request allocation (pri)
>>  from the config space.
>>pci: declare structures and IOMMU operation for PRI
>>pci: add a PCI-level API for PRI
>>intel_iommu: declare PRI constants and structures
>>intel_iommu: declare registers for PRI
>>intel_iommu: add PRI operations support
>>
>>   hw/i386/intel_iommu.c  | 302 +
>>   hw/i386/intel_iommu_internal.h |  54 +-
>>   hw/pci/pci.c   |  37 
>>   hw/pci/pcie.c  |  42 +
>>   include/exec/memory.h  |  65 +++
>>   include/hw/pci/pci.h   |  45 +
>>   include/hw/pci/pci_bus.h   |   1 +
>>   include/hw/pci/pcie.h  |   7 +-
>>   include/hw/pci/pcie_regs.h |   4 +
>>   system/memory.c|  49 ++
>>   10 files changed, 604 insertions(+), 2 deletions(-)
>>
>
> --
> Regards,
> Yi Liu

Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-04 Thread David Gibson

On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > On Sat, 29 Jun 2024 at 04:17, David Gibson 
> > > >  wrote:
> > > > >
> > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki 
> > > > > >  wrote:
> > > > > > >
> > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > >
> > > > > > > Signed-off-by: Akihiko Odaki 
> > > > > > > ---
> > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > --- a/hw/ppc/vof.c
> > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void 
> > > > > > > *fdt, GArray *claimed, uint64_t base)
> > > > > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + 
> > > > > > > sc));
> > > > > > >  if (sc == 2) {
> > > > > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > > > > sizeof(uint32_t) * ac));
> > > > > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > > >  } else {
> > > > > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > > > > sizeof(uint32_t) * ac));
> > > > > > >  }
> > > > > >
> > > > > > I did wonder if there was a better way to do what this is doing,
> > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > provide one.
> > > > >
> > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > > not an automatic aligned-or-unaligned helper.   Maybe we should add 
> > > > > that?
> > > >
> > > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > > part, which we already have QEMU utility functions for (and which
> > > > are this patch uses).
> > > >
> > > > This particular bit of code is dealing with an fdt property ("memory")
> > > > that is an array of (address, size) tuples where address and size
> > > > can independently be either 32 or 64 bits, and it wants the
> > > > size value of tuple 0. So the missing functionality is something at
> > > > a higher level than fdt32_ld() which would let you say "give me
> > > > tuple N field X" with some way to specify the tuple layout. (Which
> > > > is an awkward kind of API to write in C.)
> > > >
> > > > Slightly less general, but for this case we could perhaps have
> > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > > >
> > > >   uint64_t value_array[2];
> > > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> > > >ac, sc);
> > > >   /*
> > > >* fills in value_array[0] with address, value_array[1] with size,
> > > >* probably barfs if the varargs-list of cell-sizes doesn't
> > > >* cover the whole property, similar to the current assert on
> > > >* proplen.
> > > >*/
> > > >   mem0_end = value_array[0];
> > > 
> > > Since 4/8 byte cells are most common and size is probably
> > > normally known, what about something simpler to start with?
> >
> > Hrm, I don't think this helps much.  As Peter points out the actual
> > load isn't really the issue, it's locating the right spot for it.
> 
> I don't really see why that's a problem, it's just a pointer
> addition - base + fdt_address_cells * 4. The problem was in

This is harder if #address-cells and #size-cells are different, or if
you're parsing ranges and #address-cells is different between parent
and child node.

> the memory access (yes it's fixed with the patch but you could
> add a general libfdt way to do it).

Huh.. well I'm getting different impressions of what the problem
actually is from what I initially read versus Peter Maydell's
comments, so I don't really know what to think.

If it's just the load then fdt32_ld() etc. already exist.  Or is it
really such a hot path that unconditionally handling unaligned
accesses isn't tenable?

> Some fancy function like above could be used, But is it really
> worth implementing such a thing for this?
> 
> Thanks,
> Nick
> 

-- 
David Gibson (he or they)   | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[PATCH v3 2/3] intel_iommu: fix type of the mask field in VTDIOTLBPageInvInfo

2024-07-04 Thread CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif 

VTDIOTLBPageInvInfo.mask might not fit in an uint8_t.
Moreover, this field is used in binary operations with 64-bit addresses.

Signed-off-by: Clément Mathieu--Drif 
---
 hw/i386/intel_iommu_internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index cbc4030031..5fcbe2744f 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -436,7 +436,7 @@ struct VTDIOTLBPageInvInfo {
 uint16_t domain_id;
 uint32_t pasid;
 uint64_t addr;
-uint8_t mask;
+uint64_t mask;
 };
 typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
 
-- 
2.45.2

[PATCH v3 0/3] VT-d minor fixes

2024-07-04 Thread CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif 

Various fixes for VT-d

This series contains fixes that will be necessary
when adding in-guest (fully emulated) SVM support.

v3
FRCD construction macro :
- Longer sha1 for the 'Fixes' tag
- Add '.' at the end of the sentence

Make types match :
- Split into 2 patches (one for the fix and one for type matching)

Remove patch for wait descriptor handling (will be in the PRI series)

v2
Make commit author consistent



Clément Mathieu--Drif (3):
  intel_iommu: fix FRCD construction macro.
  intel_iommu: fix type of the mask field in VTDIOTLBPageInvInfo
  intel_iommu: make types match

 hw/i386/intel_iommu.c  | 2 +-
 hw/i386/intel_iommu_internal.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.45.2

[PATCH v3 1/3] intel_iommu: fix FRCD construction macro.

2024-07-04 Thread CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif 

The constant must be unsigned, otherwise the two's complement
overrides the other fields when a PASID is present.

Fixes: 1b2b12376c8a ("intel-iommu: PASID support")

Signed-off-by: Clément Mathieu--Drif 
Reviewed-by: Yi Liu 
---
 hw/i386/intel_iommu_internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f8cf99bddf..cbc4030031 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -267,7 +267,7 @@
 /* For the low 64-bit of 128-bit */
 #define VTD_FRCD_FI(val)((val) & ~0xfffULL)
 #define VTD_FRCD_PV(val)(((val) & 0xULL) << 40)
-#define VTD_FRCD_PP(val)(((val) & 0x1) << 31)
+#define VTD_FRCD_PP(val)(((val) & 0x1ULL) << 31)
 #define VTD_FRCD_IR_IDX(val)(((val) & 0xULL) << 48)
 
 /* DMA Remapping Fault Conditions */
-- 
2.45.2

[PATCH v3 3/3] intel_iommu: make types match

2024-07-04 Thread CLEMENT MATHIEU--DRIF

From: Clément Mathieu--Drif 

The 'level' field in vtd_iotlb_key is an unsigned integer.
We don't need to store level as an int in vtd_lookup_iotlb.

Signed-off-by: Clément Mathieu--Drif 
---
 hw/i386/intel_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 37c21a0aec..be0cb39b5c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -358,7 +358,7 @@ static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, 
uint16_t source_id,
 {
 struct vtd_iotlb_key key;
 VTDIOTLBEntry *entry;
-int level;
+unsigned level;
 
 for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
 key.gfn = vtd_get_iotlb_gfn(addr, level);
-- 
2.45.2

Re: [PATCH v7] virtio-net: Fix network stall at the host side waiting for kick

2024-07-04 Thread Michael S. Tsirkin

On Fri, Jul 05, 2024 at 02:21:35AM +, Yang Dongshan wrote:
> > virtqueue_get_avail_bytes would always return the opaque.
> Then the condition will change from:
> 
> static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
> {
>   if (virtio_queue_empty(q->rx_vq) ||
>   (n->mergeable_rx_bufs &&
>!virtqueue_avail_bytes(q->rx_vq, bufsize, 0)))
> ...
> }
> 
> to:
> 
> static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
> {
>   if (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) { 
>shadow_Idx = virtqueue_get_avail_bytes (q->rx_vq, &in_total, NULL, 
> bufsize, 0));
>if (bufsize <= in_totail) {
> // bufsize is okay
> } else {
>  // do notification and recheck available buffers.
> If (virtio_queue_set_notification_and_check(q->rx_vq, 
> shadow_idx)) {
>  ...
> } else {
> ...
>}
>}
> ...
> }
> 
> When the queue is empty, always call virtqueue_get_avail_bytes() just to
> get the opaque, it seems not friendly to the performance.
> Why not get the shadow idx within virtio_queue_set_notification_and_check()
> directly and change the function name to
> virtio_queue_set_notification_and_check_shadow()? 

Keep shadow an internal implementation detail otherwise
it's unmaintainable.

We already call virtqueue_get_avail_bytes.
The situation with the stall is super unlikely, it is not
worth worrying about the performance in that case.




> On 2024/7/4, 19:24, "Michael S. Tsirkin"  > wrote:
> 
> 
> On Thu, Jul 04, 2024 at 10:20:15AM +, Yang Dongshan wrote:
> > Hi, Michael
> > 
> > > My suggestion:
> > > 
> > > 
> > > change virtqueue_get_avail_bytes to return the shadow
> > > in an opaque unsigned value.
> > > 
> > > 
> > > add virtqueue_poll that gets this opaque and tells us whether any new
> > > buffers became available in the queue since that value> 
> > > was returned.
> > 
> > 
> > > accordingly, virtio_queue_set_notification_and_check
> > > will accept this opaque value and check avail buffers
> > > against it.
> > 
> > According to your suggestion, it's able to handle the case where the
> > queue is not empty, when the queue is empty, should I add an API to
> > get the shadow idx as virtio_queue_set_notification_and_check()
> > needs the opaque arg.
> 
> 
> virtqueue_get_avail_bytes would always return the opaque.
> 
> 
> > What value should return from virtqueue_get_avail_bytes() in case of
> > error branch in the function? 
> 
> 
> One way would be to make opaque int, return a negative value on error,
> positive on success.
> 
> 
> 
> 
> > On 2024/7/2, 19:27, "Michael S. Tsirkin"  >  >> 
> > wrote:
> > 
> > 
> > On Tue, Jul 02, 2024 at 07:45:31AM +0800, Yang Dongshan wrote:
> > > > what does "changed" mean here? changed compared to what?
> > > For a split queue, if the shadow_avail_idx synced from avail ring idx
> > > by vring_avail_idx(vq) last time doesn't equal the current value of avail 
> > > ring
> > > idx.
> > > 
> > > vq->shadow_avail_idx != vring_avail_idx(vq);
> > > 
> > > For packed queue, the logic is similar, if vq->shadow_avail_idx
> > > 
> > > becomes available, it means the guest has added buf at the slot.
> > > 
> > > vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
> > > 
> > > vq->shadow_avail_idx, true);
> > > 
> > > if (is_desc_avail(desc.flags, vq-> shadow_avail_wrap_counter))
> > > 
> > > return true;
> > > 
> > > 
> > 
> > 
> > This answer does not make sense from API POV.
> > 
> > 
> > My suggestion:
> > 
> > 
> > change virtqueue_get_avail_bytes to return the shadow
> > in an opaque unsigned value.
> > 
> > 
> > add virtqueue_poll that gets this opaque and tells us whether any new
> > buffers became available in the queue since that value
> > was returned.
> > 
> > 
> > accordingly, virtio_queue_set_notification_and_check
> > will accept this opaque value and check avail buffers
> > against it.
> > 
> > 
> > 
> > 
> > 
> > 
> > > On Tue, Jul 2, 2024 at 2:46 AM Michael S. Tsirkin  > >   > > >> wrote:
> > > 
> > > On Tue, Jul 02, 2024 at 01:18:15AM +0800, Yang Dongshan wrote:
> > > > > Please document what this does.
> > > > okay, i will.
> > > >
> > > > > So this will return false if ring has any available buffers?
> > > > > Equivalent to:
> > > > > 
> > > > > bool virtio_queue_set_notification_and_check(VirtQueue *vq, int 
> > > > > enable)
> > > > > {
> > > > > virtio_queue_packed_set_notification(vq, enable);
> > > > > return virtio_queue_empty(vq);
> > > > > }
> > > >
> > > > No, only when the shadow_avail_idx is changed shall the function return
> > > true,
> > > 
> > > 
> > > what does "changed" mean here? changed compared to what?
> > > 
> > > > compared with the value see

Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-04 Thread Nicholas Piggin

On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > On Sat, 29 Jun 2024 at 04:17, David Gibson  
> > > wrote:
> > > >
> > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki 
> > > > >  wrote:
> > > > > >
> > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > >
> > > > > > Signed-off-by: Akihiko Odaki 
> > > > > > ---
> > > > > >  hw/ppc/vof.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > --- a/hw/ppc/vof.c
> > > > > > +++ b/hw/ppc/vof.c
> > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, 
> > > > > > GArray *claimed, uint64_t base)
> > > > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > > >  if (sc == 2) {
> > > > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > > > sizeof(uint32_t) * ac));
> > > > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > >  } else {
> > > > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > > > sizeof(uint32_t) * ac));
> > > > > >  }
> > > > >
> > > > > I did wonder if there was a better way to do what this is doing,
> > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > provide one.
> > > >
> > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > not an automatic aligned-or-unaligned helper.   Maybe we should add 
> > > > that?
> > >
> > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > part, which we already have QEMU utility functions for (and which
> > > are this patch uses).
> > >
> > > This particular bit of code is dealing with an fdt property ("memory")
> > > that is an array of (address, size) tuples where address and size
> > > can independently be either 32 or 64 bits, and it wants the
> > > size value of tuple 0. So the missing functionality is something at
> > > a higher level than fdt32_ld() which would let you say "give me
> > > tuple N field X" with some way to specify the tuple layout. (Which
> > > is an awkward kind of API to write in C.)
> > >
> > > Slightly less general, but for this case we could perhaps have
> > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > >
> > >   uint64_t value_array[2];
> > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> > >ac, sc);
> > >   /*
> > >* fills in value_array[0] with address, value_array[1] with size,
> > >* probably barfs if the varargs-list of cell-sizes doesn't
> > >* cover the whole property, similar to the current assert on
> > >* proplen.
> > >*/
> > >   mem0_end = value_array[0];
> > 
> > Since 4/8 byte cells are most common and size is probably
> > normally known, what about something simpler to start with?
>
> Hrm, I don't think this helps much.  As Peter points out the actual
> load isn't really the issue, it's locating the right spot for it.

I don't really see why that's a problem, it's just a pointer
addition - base + fdt_address_cells * 4. The problem was in
the memory access (yes it's fixed with the patch but you could
add a general libfdt way to do it).

Some fancy function like above could be used, But is it really
worth implementing such a thing for this?

Thanks,
Nick

Re: [PATCH 0/8] aspeed: Add boot from eMMC support (AST2600)

2024-07-04 Thread Andrew Jeffery

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> Hello,
> 
> This series enables boot from eMMC on the rainier-bmc machine, which
> is the default behavior and also on the AST2600 EVB using a machine
> option to change the default.
> 
> It depends solely on the availability of an eMMC device model which is
> currently being discussed upstream. Numerous patches have been merged
> already and remaining ones [1] should be in 9.1 development cycle.
> Since the changes of this series are small and localized, it would be
> an interesting extension of the AST2600 SoC model for QEMU 9.1.
> 
> First 6 patches adjust the machine setup and HW strapping to boot from
> eMMC, the last 2 are for the AST2600 EVB and are optional.
> 
> Thanks,
> 
> C.
> 
> [1] https://github.com/legoater/qemu/commits/aspeed-9.1

I built the series using the tree you linked above. It works well!

In that context:

Tested-by: Andrew Jeffery

Re: [PATCH 8/8] aspeed: Introduce a 'boot-emmc' machine option

2024-07-04 Thread Andrew Jeffery

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> The default behavior of some Aspeed machines is to boot from the eMMC
> device, like the rainier-bmc. Others like ast2600-evb could also boot
> from eMMC if the HW strapping boot-from-eMMC bit was set. Add a
> property to set or unset this bit. This is useful to test boot images.
> 
> For now, only activate this property on the ast2600-evb and rainier-bmc
> machines for which eMMC images are available or can be built.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: Andrew Jeffery

Re: [PATCH 7/8] aspeed: Introduce a 'hw_strap1' machine attribute

2024-07-04 Thread Andrew Jeffery

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> To change default behavior of a machine and boot from eMMC, future
> changes will add a machine option to let the user configure the
> boot-from-eMMC HW strapping bit. Add a new machine attribute first.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: Andrew Jeffery

Re: [PATCH 6/8] aspeed: Add boot-from-eMMC HW strapping bit to rainier-bmc machine

2024-07-04 Thread Andrew Jeffery

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> This value is taken from a running Rainier machine.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: Andrew Jeffery

Re: [PATCH 5/8] aspeed: Set eMMC 'boot-config' property to reflect HW strapping

2024-07-04 Thread Andrew Jeffery

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> When the boot-from-eMMC HW strapping bit is set, use the 'boot-config'
> property to set the boot config register to boot from the first boot
> area partition of the eMMC device.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/arm/aspeed.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 756deb91efd1..135f4eb72215 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -327,7 +327,8 @@ void aspeed_board_init_flashes(AspeedSMCState *s, const 
> char *flashtype,
>  }
>  }
>  
> -static void sdhci_attach_drive(SDHCIState *sdhci, DriveInfo *dinfo, bool 
> emmc)
> +static void sdhci_attach_drive(SDHCIState *sdhci, DriveInfo *dinfo, bool 
> emmc,
> +   bool boot_emmc)
>  {
>  DeviceState *card;
>  
> @@ -335,6 +336,9 @@ static void sdhci_attach_drive(SDHCIState *sdhci, 
> DriveInfo *dinfo, bool emmc)
>  return;
>  }
>  card = qdev_new(emmc ? TYPE_EMMC : TYPE_SD_CARD);
> +if (emmc) {
> +qdev_prop_set_uint8(card, "boot-config", boot_emmc ? 0x48 : 0x0);

0x48 feels a little bit magic. I poked around a bit and there are some
boot-config macros, but not the ones you need and they're all in an
"internal" header anyway. I guess this is fine for now?

Reviewed-by: Andrew Jeffery

Re: [PATCH 4/8] aspeed: Introduce a AspeedSoCClass 'boot_from_emmc' handler

2024-07-04 Thread Andrew Jeffery

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> Report support on the AST2600 SoC if the boot-from-eMMC HW strapping
> bit is set at the board level. AST2700 also has support but it is not
> yet ready in QEMU and others SoCs do not have support, so return false
> always for these.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: Andrew Jeffery

Re: [PATCH 3/8] aspeed/scu: Add boot-from-eMMC HW strapping bit for AST2600 SoC

2024-07-04 Thread Andrew Jeffery

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> Bit SCU500[2] of the AST2600 controls the boot device of the SoC.
> 
> Future changes will configure this bit to boot from eMMC disk images
> specially built for this purpose.
> 
> Signed-off-by: Joel Stanley 
> Signed-off-by: Cédric Le Goater 
> ---
>  include/hw/misc/aspeed_scu.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/hw/misc/aspeed_scu.h b/include/hw/misc/aspeed_scu.h
> index 58db28db45aa..c9f98c20ffd9 100644
> --- a/include/hw/misc/aspeed_scu.h
> +++ b/include/hw/misc/aspeed_scu.h
> @@ -349,6 +349,10 @@ uint32_t aspeed_scu_get_apb_freq(AspeedSCUState *s);
>  #define SCU_AST2600_H_PLL_BYPASS_EN(0x1 << 24)
>  #define SCU_AST2600_H_PLL_OFF  (0x1 << 23)
>  
> +/* STRAP1 SCU500 */
> +#define AST2600_HW_STRAP_BOOT_SRC_EMMC(0x1 << 2)
> +#define AST2600_HW_STRAP_BOOT_SRC_SPI (0x0 << 2)

Maybe these should have a `SCU_` prefix for consistency?

Anyway:

Reviewed-by: Andrew Jeffery

[PING][PATCH] hw/intc/ioapic: Delete a wrong IRQ redirection on I/O APIC

2024-07-04 Thread 伊藤太清

This is a ping to the patch below.
https://lore.kernel.org/qemu-devel/ty0pr0101mb42850337f8917d1f514107fba4...@ty0pr0101mb4285.apcprd01.prod.exchangelabs.com/


差出人: TaiseiIto 
送信日時: 2024年6月25日 21:03
宛先: qemu-devel@nongnu.org 
CC: pbonz...@redhat.com ; m...@redhat.com 
; TaiseiIto 
件名: [PATCH] hw/intc/ioapic: Delete a wrong IRQ redirection on I/O APIC

Before this commit, interruptions from i8254 which should be sent to IRQ0
were sent to IRQ2. After this commit, these are correctly sent to IRQ0. When
I had an HPET timer generate interruptions once per second to test an HPET
driver in my operating system on QEMU, I observed more frequent
interruptions than I configured on the HPET timer. I investigated the cause
and found that not only interruptions from HPET but also interruptions from
i8254 were sent to IRQ2 because of a redirection from IRQ0 to IRQ2. This
redirection is added in hw/apic.c at commit
16b29ae1807b024bd5052301550f5d47dae958a2 but this redirection caused wrong
interruptions. So I deleted the redirection. Finally, I confirmed there is
no problem on 'make check' results and that interruptions from i8254 and
interruptions from HPET are correclty sent to IRQ0 and IRQ2 respectively.

Signed-off-by: TaiseiIto 
---
 hw/intc/ioapic.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 716ffc8bbb..6b630b45ca 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -154,15 +154,8 @@ static void ioapic_set_irq(void *opaque, int vector, int 
level)
 {
 IOAPICCommonState *s = opaque;

-/* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
- * to GSI 2.  GSI maps to ioapic 1-1.  This is not
- * the cleanest way of doing it but it should work. */
-
 trace_ioapic_set_irq(vector, level);
 ioapic_stat_update_irq(s, vector, level);
-if (vector == 0) {
-vector = 2;
-}
 if (vector < IOAPIC_NUM_PINS) {
 uint32_t mask = 1 << vector;
 uint64_t entry = s->ioredtbl[vector];
--
2.34.1

Re: [PATCH 2/8] aspeed: Load eMMC first boot area as a boot rom

2024-07-04 Thread Andrew Jeffery

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> The first boot area partition (64K) of the eMMC device should contain
> an initial boot loader (u-boot SPL). Load it as a ROM only if an eMMC
> device is available to boot from but no flash device is.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: Andrew Jeffery

Re: [PATCH 1/8] aspeed: Change type of eMMC device

2024-07-04 Thread Andrew Jeffery

On Thu, 2024-07-04 at 07:36 +0200, Cédric Le Goater wrote:
> From: Cédric Le Goater 
> 
> The QEMU device representing the eMMC device of machine is currently
> created with type SD_CARD. Change the type to EMMC now that it is
> available.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: Andrew Jeffery

Re: [PATCH v2 1/3] intel_iommu: fix FRCD construction macro.

2024-07-04 Thread Yi Liu


On 2024/7/4 23:12, CLEMENT MATHIEU--DRIF wrote:

From: Clément Mathieu--Drif 

The constant must be unsigned, otherwise the two's complement
overrides the other fields when a PASID is present


I'm not native speaker. But it's better to see a "." in the end
of the sentence. :)



Fixes: 1b2b12376c ("intel-iommu: PASID support")


you need more digits per the result of "grep Fixes 
docs/devel/submitting-a-patch.rst".


docs/devel/submitting-a-patch.rst:add an additional line with "Fixes: 




Signed-off-by: Clément Mathieu--Drif 
Reviewed-by: Yi Liu 
---
  hw/i386/intel_iommu_internal.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f8cf99bddf..cbc4030031 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -267,7 +267,7 @@
  /* For the low 64-bit of 128-bit */
  #define VTD_FRCD_FI(val)((val) & ~0xfffULL)
  #define VTD_FRCD_PV(val)(((val) & 0xULL) << 40)
-#define VTD_FRCD_PP(val)(((val) & 0x1) << 31)
+#define VTD_FRCD_PP(val)(((val) & 0x1ULL) << 31)
  #define VTD_FRCD_IR_IDX(val)(((val) & 0xULL) << 48)
  
  /* DMA Remapping Fault Conditions */


--
Regards,
Yi Liu

Re: [PATCH v1 0/8] PRI support for VT-d

2024-07-04 Thread Yi Liu


On 2024/5/30 20:24, CLEMENT MATHIEU--DRIF wrote:

This series belongs to a list of series that add SVM support for VT-d.

Here we focus on the implementation of PRI support in the IOMMU and on a 
PCI-level
API for PRI to be used by virtual devices.

This work is based on the VT-d specification version 4.1 (March 2023).
Here is a link to a GitHub repository where you can find the following elements 
:
 - Qemu with all the patches for SVM
 - ATS
 - PRI
 - Device IOTLB invalidations
 - Requests with already translated addresses
 - A demo device
 - A simple driver for the demo device
 - A userspace program (for testing and demonstration purposes)


I didn't see the drain PRQ related logics in this series. Please consider
adding it in next version. It's needed when repurposing a PASID.


https://github.com/BullSequana/Qemu-in-guest-SVM-demo

Clément Mathieu--Drif (8):
   pcie: add a helper to declare the PRI capability for a pcie device
   pcie: helper functions to check to check if PRI is enabled
   pcie: add a way to get the outstanding page request allocation (pri)
 from the config space.
   pci: declare structures and IOMMU operation for PRI
   pci: add a PCI-level API for PRI
   intel_iommu: declare PRI constants and structures
   intel_iommu: declare registers for PRI
   intel_iommu: add PRI operations support

  hw/i386/intel_iommu.c  | 302 +
  hw/i386/intel_iommu_internal.h |  54 +-
  hw/pci/pci.c   |  37 
  hw/pci/pcie.c  |  42 +
  include/exec/memory.h  |  65 +++
  include/hw/pci/pci.h   |  45 +
  include/hw/pci/pci_bus.h   |   1 +
  include/hw/pci/pcie.h  |   7 +-
  include/hw/pci/pcie_regs.h |   4 +
  system/memory.c|  49 ++
  10 files changed, 604 insertions(+), 2 deletions(-)



--
Regards,
Yi Liu

Re: [PATCH v2 3/3] intel_iommu: Bypass barrier wait descriptor

2024-07-04 Thread Yi Liu


On 2024/7/4 23:12, CLEMENT MATHIEU--DRIF wrote:

From: Clément Mathieu--Drif 

wait_desc with SW=0,IF=0,FN=1 must not be considered as an
invalid descriptor as it is used to implement section 7.10 of
the VT-d spec


After a second thinking. t would be better to move this patch to the
PRI series [1]. Reason as below:

This wait descriptor is used to drain PRQ. While, the guest need not
to drain PRQ until the PRI series which advertises the PRI cap to the
guest. So QEMU won't get such a wait descriptor before that series.

[1] 
https://lore.kernel.org/qemu-devel/713ece39-bc1e-4189-a1d9-f81f9cdbd...@eviden.com/



Signed-off-by: Clément Mathieu--Drif 
---
  hw/i386/intel_iommu.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index be0cb39b5c..12ea3a9aa0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2561,6 +2561,12 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, 
VTDInvDesc *inv_desc)
  } else if (inv_desc->lo & VTD_INV_DESC_WAIT_IF) {
  /* Interrupt flag */
  vtd_generate_completion_event(s);
+} else if (inv_desc->lo & VTD_INV_DESC_WAIT_FN) {
+/*
+ * SW = 0, IF = 0, FN = 1
+ * This kind of descriptor is defined in section 7.10 of VT-d
+ * Nothing to do as we process the events sequentially
+ */
  } else {
  error_report_once("%s: invalid wait desc: hi=%"PRIx64", lo=%"PRIx64
" (unknown type)", __func__, inv_desc->hi,


--
Regards,
Yi Liu

Re: [PATCH v2 2/3] intel_iommu: make types match

2024-07-04 Thread Yi Liu


On 2024/7/5 06:13, Michael S. Tsirkin wrote:

On Thu, Jul 04, 2024 at 03:12:48PM +, CLEMENT MATHIEU--DRIF wrote:

From: Clément Mathieu--Drif 

The 'level' field in vtd_iotlb_key is an unsigned integer.
We don't need to store level as an int in vtd_lookup_iotlb.

VTDIOTLBPageInvInfo.mask is used in binary operations with addresses.


this last sentence is a bit opaque. is there a bug ? E.g.
can mask ever get so big it does not fit in u8?


yes, this looks to be a bug. It's initialized and used by below code.
The am is a u8. So it may make more sense to split this patch. One
is to make type match, another is to fix a bug.

info.mask = ~((1 << am) - 1);

uint64_t gfn = (info->addr >> VTD_PAGE_SHIFT_4K) & info->mask;


Signed-off-by: Clément Mathieu--Drif 
---
  hw/i386/intel_iommu.c  | 2 +-
  hw/i386/intel_iommu_internal.h | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 37c21a0aec..be0cb39b5c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -358,7 +358,7 @@ static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, 
uint16_t source_id,
  {
  struct vtd_iotlb_key key;
  VTDIOTLBEntry *entry;
-int level;
+unsigned level;
  
  for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {

  key.gfn = vtd_get_iotlb_gfn(addr, level);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index cbc4030031..5fcbe2744f 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -436,7 +436,7 @@ struct VTDIOTLBPageInvInfo {
  uint16_t domain_id;
  uint32_t pasid;
  uint64_t addr;
-uint8_t mask;
+uint64_t mask;
  };
  typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
  
--

2.45.2




--
Regards,
Yi Liu

Re: [PATCH] hw/cxl/cxl-host: Fix guest crash when getting cxl-fmw property

2024-07-04 Thread Zhijian Li (Fujitsu)



On 04/07/2024 17:34, Zhao Liu wrote:
> From: Zhao Liu 
> 
> Guest crashes (Segmentation fault) when getting cxl-fmw property via
> qmp:
> 

IMO, it's fair to say "Guest crashes" which generally means the guest kernel 
panic etc.
I'd prefer the subject like:
hw/cxl/cxl-host: Fix segmentation fault when getting cxl-fmw property


Otherwise,

Reviewed-by: Li Zhijian 


> (QEMU) qom-get path=machine property=cxl-fmw
> 
> This issue is caused by accessing wrong callback (opaque) type in
> machine_get_cfmw().
> 
> cxl_machine_init() sets the callback as `CXLState *` type but
> machine_get_cfmw() treats the callback as
> `CXLFixedMemoryWindowOptionsList **`.
> 
> Fix this error by casting opaque to `CXLState *` type in
> machine_get_cfmw().
> 
> Fixes: 03b39fcf64bc ("hw/cxl: Make the CXL fixed memory window setup a 
> machine parameter.")
> Signed-off-by: Zhao Liu 
> ---
>   hw/cxl/cxl-host.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
> index c5f5fcfd64d0..e9f2543c43c6 100644
> --- a/hw/cxl/cxl-host.c
> +++ b/hw/cxl/cxl-host.c
> @@ -315,7 +315,8 @@ static void machine_set_cxl(Object *obj, Visitor *v, 
> const char *name,
>   static void machine_get_cfmw(Object *obj, Visitor *v, const char *name,
>void *opaque, Error **errp)
>   {
> -CXLFixedMemoryWindowOptionsList **list = opaque;
> +CXLState *state = opaque;
> +CXLFixedMemoryWindowOptionsList **list = &state->cfmw_list;
>   
>   visit_type_CXLFixedMemoryWindowOptionsList(v, name, list, errp);
>   }

[PATCH v2 1/2] target/loongarch: Set CSR_PRCFG1 and CSR_PRCFG2 values

2024-07-04 Thread Song Gao

We set the value of register CSR_PRCFG3, but left out CSR_PRCFG1
and CSR_PRCFG2. Set CSR_PRCFG1 and CSR_PRCFG2 according to the
default values of the physical machine.

Signed-off-by: Song Gao 
---
v2:
 - Add a new patch fix set CSR_CRMD wrong value;
 - Set PRCFG1-PRCFG3 values in loongarch_la464_initfn.

 target/loongarch/cpu.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 270f711f11..55d468af3c 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -457,6 +457,18 @@ static void loongarch_la464_initfn(Object *obj)
 env->cpucfg[20] = data;
 
 env->CSR_ASID = FIELD_DP64(0, CSR_ASID, ASIDBITS, 0xa);
+
+env->CSR_PRCFG1 = FIELD_DP64(env->CSR_PRCFG1, CSR_PRCFG1, SAVE_NUM, 8);
+env->CSR_PRCFG1 = FIELD_DP64(env->CSR_PRCFG1, CSR_PRCFG1, TIMER_BITS, 
0x2f);
+env->CSR_PRCFG1 = FIELD_DP64(env->CSR_PRCFG1, CSR_PRCFG1, VSMAX, 7);
+
+env->CSR_PRCFG2 = 0x3000;
+
+env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, TLB_TYPE, 2);
+env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, MTLB_ENTRY, 63);
+env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, STLB_WAYS, 7);
+env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, STLB_SETS, 8);
+
 loongarch_cpu_post_init(obj);
 }
 
@@ -538,11 +550,6 @@ static void loongarch_cpu_reset_hold(Object *obj, 
ResetType type)
 env->CSR_MERRCTL = FIELD_DP64(env->CSR_MERRCTL, CSR_MERRCTL, ISMERR, 0);
 env->CSR_TID = cs->cpu_index;
 
-env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, TLB_TYPE, 2);
-env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, MTLB_ENTRY, 63);
-env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, STLB_WAYS, 7);
-env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, STLB_SETS, 8);
-
 for (n = 0; n < 4; n++) {
 env->CSR_DMW[n] = FIELD_DP64(env->CSR_DMW[n], CSR_DMW, PLV0, 0);
 env->CSR_DMW[n] = FIELD_DP64(env->CSR_DMW[n], CSR_DMW, PLV1, 0);
-- 
2.33.0

[PATCH v2 2/2] target/loongarch: Fix cpu_reset set wrong CSR_CRMD

2024-07-04 Thread Song Gao

After cpu_reset, DATF in CSR_CRMD is 0, DATM is 0.
See the manual[1] 6.4.

  [1]: 
https://github.com/loongson/LoongArch-Documentation/releases/download/2023.04.20/LoongArch-Vol1-v1.10-EN.pdf

Signed-off-by: Song Gao 
---
 target/loongarch/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 55d468af3c..763cde41c3 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -523,13 +523,13 @@ static void loongarch_cpu_reset_hold(Object *obj, 
ResetType type)
 env->fcsr0 = 0x0;
 
 int n;
-/* Set csr registers value after reset */
+/* Set csr registers value after reset, see the manual 6.4. */
 env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, PLV, 0);
 env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, IE, 0);
 env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, DA, 1);
 env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, PG, 0);
-env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, DATF, 1);
-env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, DATM, 1);
+env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, DATF, 0);
+env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, DATM, 0);
 
 env->CSR_EUEN = FIELD_DP64(env->CSR_EUEN, CSR_EUEN, FPE, 0);
 env->CSR_EUEN = FIELD_DP64(env->CSR_EUEN, CSR_EUEN, SXE, 0);
-- 
2.33.0

Re: [PATCH 1/3] hw/cxl: Get rid of unused cfmw_list

2024-07-04 Thread Zhijian Li (Fujitsu)



On 05/07/2024 10:15, Zhao Liu wrote:
>> There is a new user for cfmw_list now
>> https://lore.kernel.org/qemu-devel/20240704093404.1848132-1-zhao1@linux.intel.com/
>>
>> So I think we should drop this patch.

> Hi Zhijian,
> 
> I'm not a "real" user and that bug was originally found by code reading.
> 
> So that fix won't block your drop. 🙂


"hw/cxl: Get rid of unused cfmw_list" is no longer needed.



> 
> And I think the fix is worth being laned before cfmw_list gets dropped,
> for being able to port backwards to stable QEMU.

Your fix[0] requires CXLState.cfmw_list, and I think CXLState.cfmw_list was 
designed for *get* purpose
but got mistake at that time.

[0] 
https://lore.kernel.org/qemu-devel/20240704093404.1848132-1-zhao1@linux.intel.com/


>   
>> On 02/07/2024 22:34, Jonathan Cameron wrote:
>>> From: Li Zhijian
>>>
>>> There is no user for this member. All '-M cxl-fmw.N' options have
>>> been parsed and saved to CXLState.fixed_windows.
>>>
>>> Signed-off-by: Li Zhijian
>>> Signed-off-by: Jonathan Cameron
>>> ---
>>>include/hw/cxl/cxl.h | 1 -
>>>hw/cxl/cxl-host.c| 1 -
>>>2 files changed, 2 deletions(-)
>>>
>>> diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
>>> index 75e47b6864..e3ecbef038 100644
>>> --- a/include/hw/cxl/cxl.h
>>> +++ b/include/hw/cxl/cxl.h
>>> @@ -43,7 +43,6 @@ typedef struct CXLState {
>>>MemoryRegion host_mr;
>>>unsigned int next_mr_idx;
>>>GList *fixed_windows;
>>> -CXLFixedMemoryWindowOptionsList *cfmw_list;
>>>} CXLState;
>>>

Re: [PATCH v7] virtio-net: Fix network stall at the host side waiting for kick

2024-07-04 Thread Yang Dongshan

> virtqueue_get_avail_bytes would always return the opaque.
Then the condition will change from:

static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
{
  if (virtio_queue_empty(q->rx_vq) ||
  (n->mergeable_rx_bufs &&
   !virtqueue_avail_bytes(q->rx_vq, bufsize, 0)))
...
}

to:

static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
{
  if (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) { 
   shadow_Idx = virtqueue_get_avail_bytes (q->rx_vq, &in_total, NULL, 
bufsize, 0));
   if (bufsize <= in_totail) {
// bufsize is okay
} else {
 // do notification and recheck available buffers.
If (virtio_queue_set_notification_and_check(q->rx_vq, 
shadow_idx)) {
 ...
} else {
...
   }
   }
...
}

When the queue is empty, always call virtqueue_get_avail_bytes() just to
get the opaque, it seems not friendly to the performance.
Why not get the shadow idx within virtio_queue_set_notification_and_check()
directly and change the function name to
virtio_queue_set_notification_and_check_shadow()? 

On 2024/7/4, 19:24, "Michael S. Tsirkin" mailto:m...@redhat.com>> wrote:


On Thu, Jul 04, 2024 at 10:20:15AM +, Yang Dongshan wrote:
> Hi, Michael
> 
> > My suggestion:
> > 
> > 
> > change virtqueue_get_avail_bytes to return the shadow
> > in an opaque unsigned value.
> > 
> > 
> > add virtqueue_poll that gets this opaque and tells us whether any new
> > buffers became available in the queue since that value> 
> > was returned.
> 
> 
> > accordingly, virtio_queue_set_notification_and_check
> > will accept this opaque value and check avail buffers
> > against it.
> 
> According to your suggestion, it's able to handle the case where the
> queue is not empty, when the queue is empty, should I add an API to
> get the shadow idx as virtio_queue_set_notification_and_check()
> needs the opaque arg.


virtqueue_get_avail_bytes would always return the opaque.


> What value should return from virtqueue_get_avail_bytes() in case of
> error branch in the function? 


One way would be to make opaque int, return a negative value on error,
positive on success.




> On 2024/7/2, 19:27, "Michael S. Tsirkin"   >> 
> wrote:
> 
> 
> On Tue, Jul 02, 2024 at 07:45:31AM +0800, Yang Dongshan wrote:
> > > what does "changed" mean here? changed compared to what?
> > For a split queue, if the shadow_avail_idx synced from avail ring idx
> > by vring_avail_idx(vq) last time doesn't equal the current value of avail 
> > ring
> > idx.
> > 
> > vq->shadow_avail_idx != vring_avail_idx(vq);
> > 
> > For packed queue, the logic is similar, if vq->shadow_avail_idx
> > 
> > becomes available, it means the guest has added buf at the slot.
> > 
> > vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
> > 
> > vq->shadow_avail_idx, true);
> > 
> > if (is_desc_avail(desc.flags, vq-> shadow_avail_wrap_counter))
> > 
> > return true;
> > 
> > 
> 
> 
> This answer does not make sense from API POV.
> 
> 
> My suggestion:
> 
> 
> change virtqueue_get_avail_bytes to return the shadow
> in an opaque unsigned value.
> 
> 
> add virtqueue_poll that gets this opaque and tells us whether any new
> buffers became available in the queue since that value
> was returned.
> 
> 
> accordingly, virtio_queue_set_notification_and_check
> will accept this opaque value and check avail buffers
> against it.
> 
> 
> 
> 
> 
> 
> > On Tue, Jul 2, 2024 at 2:46 AM Michael S. Tsirkin  >  >> 
> > wrote:
> > 
> > On Tue, Jul 02, 2024 at 01:18:15AM +0800, Yang Dongshan wrote:
> > > > Please document what this does.
> > > okay, i will.
> > >
> > > > So this will return false if ring has any available buffers?
> > > > Equivalent to:
> > > > 
> > > > bool virtio_queue_set_notification_and_check(VirtQueue *vq, int enable)
> > > > {
> > > > virtio_queue_packed_set_notification(vq, enable);
> > > > return virtio_queue_empty(vq);
> > > > }
> > >
> > > No, only when the shadow_avail_idx is changed shall the function return
> > true,
> > 
> > 
> > what does "changed" mean here? changed compared to what?
> > 
> > > compared with the value seen by the host last time, else return false
> > even if
> > > there are some buffers available in the queue, as the total size of the
> > > available
> > > buffers in the queue can't satisfy the request.
> > >
> > > It maybe better to pass only one arg to the function like this:
> > > bool virtio_queue_set_notification_and_check(VirtQueue *vq)
> > > {
> > > virtio_queue_packed_set_notification(vq, true);
> > > 
> > > return shadow_avail_idx_changed()? true: false;
> > > }
> > >
> > > Thanks Michael a lot!
> > >
> > > On Mon, Jul 1, 2024 at 11:05 PM Michael S. Tsirkin  > >

Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-04 Thread David Gibson

On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > On Sat, 29 Jun 2024 at 04:17, David Gibson  
> > wrote:
> > >
> > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki  
> > > > wrote:
> > > > >
> > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > >
> > > > > Signed-off-by: Akihiko Odaki 
> > > > > ---
> > > > >  hw/ppc/vof.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > --- a/hw/ppc/vof.c
> > > > > +++ b/hw/ppc/vof.c
> > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, 
> > > > > GArray *claimed, uint64_t base)
> > > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > >  if (sc == 2) {
> > > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > > sizeof(uint32_t) * ac));
> > > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > >  } else {
> > > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > > sizeof(uint32_t) * ac));
> > > > >  }
> > > >
> > > > I did wonder if there was a better way to do what this is doing,
> > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > provide one.
> > >
> > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> >
> > fdt32_ld() and friends only do the "load from this bit of memory"
> > part, which we already have QEMU utility functions for (and which
> > are this patch uses).
> >
> > This particular bit of code is dealing with an fdt property ("memory")
> > that is an array of (address, size) tuples where address and size
> > can independently be either 32 or 64 bits, and it wants the
> > size value of tuple 0. So the missing functionality is something at
> > a higher level than fdt32_ld() which would let you say "give me
> > tuple N field X" with some way to specify the tuple layout. (Which
> > is an awkward kind of API to write in C.)
> >
> > Slightly less general, but for this case we could perhaps have
> > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> >
> >   uint64_t value_array[2];
> >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> >ac, sc);
> >   /*
> >* fills in value_array[0] with address, value_array[1] with size,
> >* probably barfs if the varargs-list of cell-sizes doesn't
> >* cover the whole property, similar to the current assert on
> >* proplen.
> >*/
> >   mem0_end = value_array[0];
> 
> Since 4/8 byte cells are most common and size is probably
> normally known, what about something simpler to start with?

Hrm, I don't think this helps much.  As Peter points out the actual
load isn't really the issue, it's locating the right spot for it.

> 
> Thanks,
> Nick
> 
> ---
> diff --git a/libfdt/libfdt.h b/libfdt/libfdt.h
> index 0677fea..c4b6355 100644
> --- a/libfdt/libfdt.h
> +++ b/libfdt/libfdt.h
> @@ -148,6 +148,15 @@ static inline uint32_t fdt32_ld(const fdt32_t *p)
>   | bp[3];
>  }
>  
> +/*
> + * Load the value from a 32-bit cell of a property. Cells are 32-bit aligned
> + * so can use a single load.
> + */
> +static inline uint32_t fdt32_ld_prop(const fdt32_t *p)
> +{
> + return fdt32_to_cpu(*p);
> +}
> +
>  static inline void fdt32_st(void *property, uint32_t value)
>  {
>   uint8_t *bp = (uint8_t *)property;
> @@ -172,6 +181,18 @@ static inline uint64_t fdt64_ld(const fdt64_t *p)
>   | bp[7];
>  }
>  
> +/*
> + * Load the value from a 64-bit cell of a property. Cells are 32-bit aligned
> + * so can use two loads.
> + */
> +static inline uint64_t fdt64_ld_prop(const fdt64_t *p)
> +{
> + const fdt64_t *_p = p;
> +
> + return ((uint64_t)fdt32_to_cpu(_p[0]) << 32)
> + | fdt32_to_cpu(_p[1]);
> +}
> +
>  static inline void fdt64_st(void *property, uint64_t value)
>  {
>   uint8_t *bp = (uint8_t *)property;
> 

-- 
David Gibson (he or they)   | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-04 Thread David Gibson

On Thu, Jul 04, 2024 at 01:15:57PM +0100, Peter Maydell wrote:
> On Sat, 29 Jun 2024 at 04:17, David Gibson  
> wrote:
> >
> > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki  
> > > wrote:
> > > >
> > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > >
> > > > Signed-off-by: Akihiko Odaki 
> > > > ---
> > > >  hw/ppc/vof.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > --- a/hw/ppc/vof.c
> > > > +++ b/hw/ppc/vof.c
> > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, 
> > > > GArray *claimed, uint64_t base)
> > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > >  if (sc == 2) {
> > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > sizeof(uint32_t) * ac));
> > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > >  } else {
> > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > sizeof(uint32_t) * ac));
> > > >  }
> > >
> > > I did wonder if there was a better way to do what this is doing,
> > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > provide one.
> >
> > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> 
> fdt32_ld() and friends only do the "load from this bit of memory"
> part, which we already have QEMU utility functions for (and which
> are this patch uses).
> 
> This particular bit of code is dealing with an fdt property ("memory")
> that is an array of (address, size) tuples where address and size
> can independently be either 32 or 64 bits, and it wants the
> size value of tuple 0. So the missing functionality is something at
> a higher level than fdt32_ld() which would let you say "give me
> tuple N field X" with some way to specify the tuple layout. (Which
> is an awkward kind of API to write in C.)

Ah, right.  Yeah.. that's a pretty awkward API in C.

> Slightly less general, but for this case we could perhaps have
> something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> 
>   uint64_t value_array[2];
>   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
>ac, sc);
>   /*
>* fills in value_array[0] with address, value_array[1] with size,
>* probably barfs if the varargs-list of cell-sizes doesn't
>* cover the whole property, similar to the current assert on
>* proplen.
>*/
>   mem0_end = value_array[0];

Seems reasonable to me.  The only other thought I had was something
like Python's struct.unpack() [0].  But your suggestion is probably
more natural in C.

[0] https://docs.python.org/3/library/struct.html#struct.unpack

-- 
David Gibson (he or they)   | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH 1/3] hw/cxl: Get rid of unused cfmw_list

2024-07-04 Thread Zhao Liu

On Fri, Jul 05, 2024 at 01:04:51AM +, Zhijian Li (Fujitsu) wrote:
> Date: Fri, 5 Jul 2024 01:04:51 +
> From: "Zhijian Li (Fujitsu)" 
> Subject: Re: [PATCH 1/3] hw/cxl: Get rid of unused cfmw_list
> 
> Jonathan,
> 
> 
> There is a new user for cfmw_list now
> https://lore.kernel.org/qemu-devel/20240704093404.1848132-1-zhao1@linux.intel.com/
> 
> So I think we should drop this patch.

Hi Zhijian,

I'm not a "real" user and that bug was originally found by code reading.

So that fix won't block your drop. :-)

And I think the fix is worth being laned before cfmw_list gets dropped,
for being able to port backwards to stable QEMU.
 
> On 02/07/2024 22:34, Jonathan Cameron wrote:
> > From: Li Zhijian 
> > 
> > There is no user for this member. All '-M cxl-fmw.N' options have
> > been parsed and saved to CXLState.fixed_windows.
> > 
> > Signed-off-by: Li Zhijian 
> > Signed-off-by: Jonathan Cameron 
> > ---
> >   include/hw/cxl/cxl.h | 1 -
> >   hw/cxl/cxl-host.c| 1 -
> >   2 files changed, 2 deletions(-)
> > 
> > diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
> > index 75e47b6864..e3ecbef038 100644
> > --- a/include/hw/cxl/cxl.h
> > +++ b/include/hw/cxl/cxl.h
> > @@ -43,7 +43,6 @@ typedef struct CXLState {
> >   MemoryRegion host_mr;
> >   unsigned int next_mr_idx;
> >   GList *fixed_windows;
> > -CXLFixedMemoryWindowOptionsList *cfmw_list;
> >   } CXLState;
> >   
> >   struct CXLHost {
> > diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
> > index c5f5fcfd64..926d3d3da7 100644
> > --- a/hw/cxl/cxl-host.c
> > +++ b/hw/cxl/cxl-host.c
> > @@ -335,7 +335,6 @@ static void machine_set_cfmw(Object *obj, Visitor *v, 
> > const char *name,
> >   for (it = cfmw_list; it; it = it->next) {
> >   cxl_fixed_memory_window_config(state, it->value, errp);
> >   }
> > -state->cfmw_list = cfmw_list;
> >   }
> >   
> >   void cxl_machine_init(Object *obj, CXLState *state)

Re: [PATCH] target/loongarch: Set CSR_PRCFG1 and CSR_PRCFG2 values

2024-07-04 Thread gaosong


在 2024/7/4 下午8:47, maobibo 写道:



On 2024/7/4 下午7:12, Song Gao wrote:

We set the value of register CSR_PRCFG3, but left out CSR_PRCFG1
and CSR_PRCFG2. Set CSR_PRCFG1 and CSR_PRCFG2 according to the
default values of the physical machine.

Signed-off-by: Song Gao 
---
  target/loongarch/cpu.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 270f711f11..ad40750701 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -538,6 +538,12 @@ static void loongarch_cpu_reset_hold(Object 
*obj, ResetType type)
  env->CSR_MERRCTL = FIELD_DP64(env->CSR_MERRCTL, CSR_MERRCTL, 
ISMERR, 0);

  env->CSR_TID = cs->cpu_index;
  +    env->CSR_PRCFG1 = FIELD_DP64(env->CSR_PRCFG1, CSR_PRCFG1, 
SAVE_NUM, 8);
+    env->CSR_PRCFG1 = FIELD_DP64(env->CSR_PRCFG1, CSR_PRCFG1, 
TIMER_BITS, 0x2f);
+    env->CSR_PRCFG1 = FIELD_DP64(env->CSR_PRCFG1, CSR_PRCFG1, VSMAX, 
7);

+
+    env->CSR_PRCFG2 = 0x3000;
+
  env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, 
TLB_TYPE, 2);
  env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, 
MTLB_ENTRY, 63);
  env->CSR_PRCFG3 = FIELD_DP64(env->CSR_PRCFG3, CSR_PRCFG3, 
STLB_WAYS, 7);



For the function, it looks good to me. There are some nits:
For PRCFG1-PRCFG3, it is constant. I thinks it had better be put at 
cpu instance_init() or instance_finalize().

You are right, we should put them in instance_init().


Also it is strange that most of cpu_state should be cleared with 0 
besides cpucfg/prcfg etc, such as end_reset_fields marker in other 
architectures.


Maybe it will better if there is double check with cpu state change 
callback such as init/reset etc :)


The CSRs that need to be reset are from chapter 6, section 4 of the 
manual, and they are not always reset to 0.
I think maybe that's why end_reset_fields was not used at the time. We 
should add some comments to loongarch_cpu_reset_hold().


Thanks.
Song Gao

Regards
Bibo Mao

RE: [PATCH v5 3/7] plugins: extend API to get latest memory value accessed

2024-07-04 Thread Xingtao Yao (Fujitsu)

Reviewed-by: Xingtao Yao 

> -Original Message-
> From: qemu-devel-bounces+yaoxt.fnst=fujitsu@nongnu.org
>  On Behalf Of
> Pierrick Bouvier
> Sent: Friday, July 5, 2024 8:34 AM
> To: qemu-devel@nongnu.org
> Cc: Alexandre Iooss ; Richard Henderson
> ; Marcel Apfelbaum
> ; Pierrick Bouvier ;
> Alex Bennée ; Paolo Bonzini ;
> Yanan Wang ; Mahmoud Mandour
> ; Eduardo Habkost ; Philippe
> Mathieu-Daudé 
> Subject: [PATCH v5 3/7] plugins: extend API to get latest memory value 
> accessed
> 
> This value can be accessed only during a memory callback, using
> new qemu_plugin_mem_get_value function.
> 
> Returned value can be extended when QEMU will support accesses wider
> than 128 bits.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1719
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2152
> Reviewed-by: Richard Henderson 
> Signed-off-by: Pierrick Bouvier 
> ---
>  include/qemu/qemu-plugin.h   | 32
> 
>  plugins/api.c| 33
> +
>  plugins/qemu-plugins.symbols |  1 +
>  3 files changed, 66 insertions(+)
> 
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index c71c705b699..649ce89815f 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -262,6 +262,29 @@ enum qemu_plugin_mem_rw {
>  QEMU_PLUGIN_MEM_RW,
>  };
> 
> +enum qemu_plugin_mem_value_type {
> +QEMU_PLUGIN_MEM_VALUE_U8,
> +QEMU_PLUGIN_MEM_VALUE_U16,
> +QEMU_PLUGIN_MEM_VALUE_U32,
> +QEMU_PLUGIN_MEM_VALUE_U64,
> +QEMU_PLUGIN_MEM_VALUE_U128,
> +};
> +
> +/* typedef qemu_plugin_mem_value - value accessed during a load/store */
> +typedef struct {
> +enum qemu_plugin_mem_value_type type;
> +union {
> +uint8_t u8;
> +uint16_t u16;
> +uint32_t u32;
> +uint64_t u64;
> +struct {
> +uint64_t low;
> +uint64_t high;
> +} u128;
> +} data;
> +} qemu_plugin_mem_value;
> +
>  /**
>   * enum qemu_plugin_cond - condition to enable callback
>   *
> @@ -551,6 +574,15 @@ bool
> qemu_plugin_mem_is_big_endian(qemu_plugin_meminfo_t info);
>  QEMU_PLUGIN_API
>  bool qemu_plugin_mem_is_store(qemu_plugin_meminfo_t info);
> 
> +/**
> + * qemu_plugin_mem_get_mem_value() - return last value loaded/stored
> + * @info: opaque memory transaction handle
> + *
> + * Returns: memory value
> + */
> +QEMU_PLUGIN_API
> +qemu_plugin_mem_value
> qemu_plugin_mem_get_value(qemu_plugin_meminfo_t info);
> +
>  /**
>   * qemu_plugin_get_hwaddr() - return handle for memory operation
>   * @info: opaque memory info structure
> diff --git a/plugins/api.c b/plugins/api.c
> index 2ff13d09de6..3316d4a04d4 100644
> --- a/plugins/api.c
> +++ b/plugins/api.c
> @@ -351,6 +351,39 @@ bool
> qemu_plugin_mem_is_store(qemu_plugin_meminfo_t info)
>  return get_plugin_meminfo_rw(info) & QEMU_PLUGIN_MEM_W;
>  }
> 
> +qemu_plugin_mem_value
> qemu_plugin_mem_get_value(qemu_plugin_meminfo_t info)
> +{
> +uint64_t low = current_cpu->neg.plugin_mem_value_low;
> +qemu_plugin_mem_value value;
> +
> +switch (qemu_plugin_mem_size_shift(info)) {
> +case 0:
> +value.type = QEMU_PLUGIN_MEM_VALUE_U8;
> +value.data.u8 = (uint8_t)low;
> +break;
> +case 1:
> +value.type = QEMU_PLUGIN_MEM_VALUE_U16;
> +value.data.u16 = (uint16_t)low;
> +break;
> +case 2:
> +value.type = QEMU_PLUGIN_MEM_VALUE_U32;
> +value.data.u32 = (uint32_t)low;
> +break;
> +case 3:
> +value.type = QEMU_PLUGIN_MEM_VALUE_U64;
> +value.data.u64 = low;
> +break;
> +case 4:
> +value.type = QEMU_PLUGIN_MEM_VALUE_U128;
> +value.data.u128.low = low;
> +value.data.u128.high = current_cpu->neg.plugin_mem_value_high;
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +return value;
> +}
> +
>  /*
>   * Virtual Memory queries
>   */
> diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
> index ca773d8d9fe..eed9d8abd90 100644
> --- a/plugins/qemu-plugins.symbols
> +++ b/plugins/qemu-plugins.symbols
> @@ -13,6 +13,7 @@
>qemu_plugin_insn_size;
>qemu_plugin_insn_symbol;
>qemu_plugin_insn_vaddr;
> +  qemu_plugin_mem_get_value;
>qemu_plugin_mem_is_big_endian;
>qemu_plugin_mem_is_sign_extended;
>qemu_plugin_mem_is_store;
> --
> 2.39.2
>

RE: [PATCH v5 6/7] tests/plugin/mem: add option to print memory accesses

2024-07-04 Thread Xingtao Yao (Fujitsu)

Reviewed-by: Xingtao Yao 

> -Original Message-
> From: qemu-devel-bounces+yaoxt.fnst=fujitsu@nongnu.org
>  On Behalf Of
> Pierrick Bouvier
> Sent: Friday, July 5, 2024 8:34 AM
> To: qemu-devel@nongnu.org
> Cc: Alexandre Iooss ; Richard Henderson
> ; Marcel Apfelbaum
> ; Pierrick Bouvier ;
> Alex Bennée ; Paolo Bonzini ;
> Yanan Wang ; Mahmoud Mandour
> ; Eduardo Habkost ; Philippe
> Mathieu-Daudé 
> Subject: [PATCH v5 6/7] tests/plugin/mem: add option to print memory accesses
> 
> By using "print-accesses=true" option, mem plugin will now print every
> value accessed, with associated size, type (store vs load), symbol,
> instruction address and phys/virt address accessed.
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: Pierrick Bouvier 
> ---
>  tests/plugin/mem.c | 69
> +-
>  1 file changed, 68 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/plugin/mem.c b/tests/plugin/mem.c
> index b650dddcce1..086e6f5bdfc 100644
> --- a/tests/plugin/mem.c
> +++ b/tests/plugin/mem.c
> @@ -21,10 +21,15 @@ typedef struct {
>  uint64_t io_count;
>  } CPUCount;
> 
> +typedef struct {
> +uint64_t vaddr;
> +const char *sym;
> +} InsnInfo;
> +
>  static struct qemu_plugin_scoreboard *counts;
>  static qemu_plugin_u64 mem_count;
>  static qemu_plugin_u64 io_count;
> -static bool do_inline, do_callback;
> +static bool do_inline, do_callback, do_print_accesses;
>  static bool do_haddr;
>  static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
> 
> @@ -60,6 +65,44 @@ static void vcpu_mem(unsigned int cpu_index,
> qemu_plugin_meminfo_t meminfo,
>  }
>  }
> 
> +static void print_access(unsigned int cpu_index, qemu_plugin_meminfo_t
> meminfo,
> + uint64_t vaddr, void *udata)
> +{
> +InsnInfo *insn_info = udata;
> +unsigned size = 8 << qemu_plugin_mem_size_shift(meminfo);
> +const char *type = qemu_plugin_mem_is_store(meminfo) ? "store" : "load";
> +qemu_plugin_mem_value value = qemu_plugin_mem_get_value(meminfo);
> +uint64_t hwaddr =
> +qemu_plugin_hwaddr_phys_addr(qemu_plugin_get_hwaddr(meminfo,
> vaddr));
> +g_autoptr(GString) out = g_string_new("");
> +g_string_printf(out,
> +"0x%"PRIx64",%s,0x%"PRIx64",0x%"PRIx64",%d,%s,",
> +insn_info->vaddr, insn_info->sym,
> +vaddr, hwaddr, size, type);
> +switch (value.type) {
> +case QEMU_PLUGIN_MEM_VALUE_U8:
> +g_string_append_printf(out, "0x%02"PRIx8, value.data.u8);
> +break;
> +case QEMU_PLUGIN_MEM_VALUE_U16:
> +g_string_append_printf(out, "0x%04"PRIx16, value.data.u16);
> +break;
> +case QEMU_PLUGIN_MEM_VALUE_U32:
> +g_string_append_printf(out, "0x%08"PRIx32, value.data.u32);
> +break;
> +case QEMU_PLUGIN_MEM_VALUE_U64:
> +g_string_append_printf(out, "0x%016"PRIx64, value.data.u64);
> +break;
> +case QEMU_PLUGIN_MEM_VALUE_U128:
> +g_string_append_printf(out, "0x%016"PRIx64"%016"PRIx64,
> +   value.data.u128.high, value.data.u128.low);
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +g_string_append_printf(out, "\n");
> +qemu_plugin_outs(out->str);
> +}
> +
>  static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>  {
>  size_t n = qemu_plugin_tb_n_insns(tb);
> @@ -79,6 +122,16 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct
> qemu_plugin_tb *tb)
>   QEMU_PLUGIN_CB_NO_REGS,
>   rw, NULL);
>  }
> +if (do_print_accesses) {
> +/* we leak this pointer, to avoid locking to keep track of it */
> +InsnInfo *insn_info = g_malloc(sizeof(InsnInfo));
> +const char *sym = qemu_plugin_insn_symbol(insn);
> +insn_info->sym = sym ? sym : "";
> +insn_info->vaddr = qemu_plugin_insn_vaddr(insn);
> +qemu_plugin_register_vcpu_mem_cb(insn, print_access,
> + QEMU_PLUGIN_CB_NO_REGS,
> + rw, (void *) insn_info);
> +}
>  }
>  }
> 
> @@ -117,6 +170,12 @@ QEMU_PLUGIN_EXPORT int
> qemu_plugin_install(qemu_plugin_id_t id,
>  fprintf(stderr, "boolean argument parsing failed: %s\n", 
> opt);
>  return -1;
>  }
> +} else if (g_strcmp0(tokens[0], "print-accesses") == 0) {
> +if (!qemu_plugin_bool_parse(tokens[0], tokens[1],
> +&do_print_accesses)) {
> +fprintf(stderr, "boolean argument parsing failed: %s\n", 
> opt);
> +return -1;
> +}
>  } else {
>  fprintf(stderr, "option parsing failed: %s\n", opt);
>  return -1;
> @@ -129,6 +188,14 @@ QEMU_PLUGIN_EXPORT int
> qemu_plugin_insta

Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-04 Thread Nicholas Piggin

On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> On Sat, 29 Jun 2024 at 04:17, David Gibson  
> wrote:
> >
> > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki  
> > > wrote:
> > > >
> > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > >
> > > > Signed-off-by: Akihiko Odaki 
> > > > ---
> > > >  hw/ppc/vof.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > --- a/hw/ppc/vof.c
> > > > +++ b/hw/ppc/vof.c
> > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, 
> > > > GArray *claimed, uint64_t base)
> > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > >  if (sc == 2) {
> > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > sizeof(uint32_t) * ac));
> > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > >  } else {
> > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > sizeof(uint32_t) * ac));
> > > >  }
> > >
> > > I did wonder if there was a better way to do what this is doing,
> > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > provide one.
> >
> > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
>
> fdt32_ld() and friends only do the "load from this bit of memory"
> part, which we already have QEMU utility functions for (and which
> are this patch uses).
>
> This particular bit of code is dealing with an fdt property ("memory")
> that is an array of (address, size) tuples where address and size
> can independently be either 32 or 64 bits, and it wants the
> size value of tuple 0. So the missing functionality is something at
> a higher level than fdt32_ld() which would let you say "give me
> tuple N field X" with some way to specify the tuple layout. (Which
> is an awkward kind of API to write in C.)
>
> Slightly less general, but for this case we could perhaps have
> something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
>
>   uint64_t value_array[2];
>   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
>ac, sc);
>   /*
>* fills in value_array[0] with address, value_array[1] with size,
>* probably barfs if the varargs-list of cell-sizes doesn't
>* cover the whole property, similar to the current assert on
>* proplen.
>*/
>   mem0_end = value_array[0];

Since 4/8 byte cells are most common and size is probably
normally known, what about something simpler to start with?

Thanks,
Nick

---
diff --git a/libfdt/libfdt.h b/libfdt/libfdt.h
index 0677fea..c4b6355 100644
--- a/libfdt/libfdt.h
+++ b/libfdt/libfdt.h
@@ -148,6 +148,15 @@ static inline uint32_t fdt32_ld(const fdt32_t *p)
| bp[3];
 }
 
+/*
+ * Load the value from a 32-bit cell of a property. Cells are 32-bit aligned
+ * so can use a single load.
+ */
+static inline uint32_t fdt32_ld_prop(const fdt32_t *p)
+{
+   return fdt32_to_cpu(*p);
+}
+
 static inline void fdt32_st(void *property, uint32_t value)
 {
uint8_t *bp = (uint8_t *)property;
@@ -172,6 +181,18 @@ static inline uint64_t fdt64_ld(const fdt64_t *p)
| bp[7];
 }
 
+/*
+ * Load the value from a 64-bit cell of a property. Cells are 32-bit aligned
+ * so can use two loads.
+ */
+static inline uint64_t fdt64_ld_prop(const fdt64_t *p)
+{
+   const fdt64_t *_p = p;
+
+   return ((uint64_t)fdt32_to_cpu(_p[0]) << 32)
+   | fdt32_to_cpu(_p[1]);
+}
+
 static inline void fdt64_st(void *property, uint64_t value)
 {
uint8_t *bp = (uint8_t *)property;

Re: [PATCH 1/3] hw/cxl: Get rid of unused cfmw_list

2024-07-04 Thread Zhijian Li (Fujitsu)

Jonathan,


There is a new user for cfmw_list now
https://lore.kernel.org/qemu-devel/20240704093404.1848132-1-zhao1@linux.intel.com/

So I think we should drop this patch.


On 02/07/2024 22:34, Jonathan Cameron wrote:
> From: Li Zhijian 
> 
> There is no user for this member. All '-M cxl-fmw.N' options have
> been parsed and saved to CXLState.fixed_windows.
> 
> Signed-off-by: Li Zhijian 
> Signed-off-by: Jonathan Cameron 
> ---
>   include/hw/cxl/cxl.h | 1 -
>   hw/cxl/cxl-host.c| 1 -
>   2 files changed, 2 deletions(-)
> 
> diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
> index 75e47b6864..e3ecbef038 100644
> --- a/include/hw/cxl/cxl.h
> +++ b/include/hw/cxl/cxl.h
> @@ -43,7 +43,6 @@ typedef struct CXLState {
>   MemoryRegion host_mr;
>   unsigned int next_mr_idx;
>   GList *fixed_windows;
> -CXLFixedMemoryWindowOptionsList *cfmw_list;
>   } CXLState;
>   
>   struct CXLHost {
> diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
> index c5f5fcfd64..926d3d3da7 100644
> --- a/hw/cxl/cxl-host.c
> +++ b/hw/cxl/cxl-host.c
> @@ -335,7 +335,6 @@ static void machine_set_cfmw(Object *obj, Visitor *v, 
> const char *name,
>   for (it = cfmw_list; it; it = it->next) {
>   cxl_fixed_memory_window_config(state, it->value, errp);
>   }
> -state->cfmw_list = cfmw_list;
>   }
>   
>   void cxl_machine_init(Object *obj, CXLState *state)

RE: [PATCH] hw/cxl/cxl-host: Fix guest crash when getting cxl-fmw property

2024-07-04 Thread Xingtao Yao (Fujitsu)




> -Original Message-
> From: qemu-devel-bounces+yaoxt.fnst=fujitsu@nongnu.org
>  On Behalf Of Zhao
> Liu
> Sent: Thursday, July 4, 2024 5:34 PM
> To: Jonathan Cameron ; Fan Ni
> 
> Cc: qemu-devel@nongnu.org; qemu-sta...@nongnu.org; Zhao Liu
> 
> Subject: [PATCH] hw/cxl/cxl-host: Fix guest crash when getting cxl-fmw 
> property
> 
> From: Zhao Liu 
> 
> Guest crashes (Segmentation fault) when getting cxl-fmw property via
> qmp:
> 
> (QEMU) qom-get path=machine property=cxl-fmw
> 
> This issue is caused by accessing wrong callback (opaque) type in
> machine_get_cfmw().
> 
> cxl_machine_init() sets the callback as `CXLState *` type but
> machine_get_cfmw() treats the callback as
> `CXLFixedMemoryWindowOptionsList **`.
> 
> Fix this error by casting opaque to `CXLState *` type in
> machine_get_cfmw().
> 
> Fixes: 03b39fcf64bc ("hw/cxl: Make the CXL fixed memory window setup a
> machine parameter.")
> Signed-off-by: Zhao Liu 
> ---
>  hw/cxl/cxl-host.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
> index c5f5fcfd64d0..e9f2543c43c6 100644
> --- a/hw/cxl/cxl-host.c
> +++ b/hw/cxl/cxl-host.c
> @@ -315,7 +315,8 @@ static void machine_set_cxl(Object *obj, Visitor *v, const
> char *name,
>  static void machine_get_cfmw(Object *obj, Visitor *v, const char *name,
>   void *opaque, Error **errp)
>  {
> -CXLFixedMemoryWindowOptionsList **list = opaque;
> +CXLState *state = opaque;
> +CXLFixedMemoryWindowOptionsList **list = &state->cfmw_list;
> 
>  visit_type_CXLFixedMemoryWindowOptionsList(v, name, list, errp);
>  }
> --
> 2.34.1
> 

Reviewed-by: Xingtao Yao

Re: [PATCH v4 0/7] plugins: access values during a memory read/write

2024-07-04 Thread Pierrick Bouvier


Posted v5.

On 7/2/24 11:44, Pierrick Bouvier wrote:

This series allows plugins to know which value is read/written during a memory
access.

For every memory access, we know copy this value before calling mem callbacks,
and those can query it using new API function:
- qemu_plugin_mem_get_value

Mem plugin was extended to print accesses, and a new test was added to check
functionality work as expected. A bug was found where callbacks were not
called as expected.

This will open new use cases for plugins, such as following specific values in
memory.

v4
- fix prototype for stubs qemu_plugin_vcpu_mem_cb (inverted low/high parameters
   names)
- link gitlab bugs resolved (thanks @Anton Kochkov for reporting)
   https://gitlab.com/qemu-project/qemu/-/issues/1719
   https://gitlab.com/qemu-project/qemu/-/issues/2152

v3
- simplify API: return an algebraic data type for value accessed
   this can be easily extended when QEMU will support wider accesses
- fix Makefile test (use quiet-command instead of manually run the command)
- rename upper/lower to high/low
- reorder functions parameters and code to low/high instead of high/low, to
   follow current convention in QEMU codebase

v2
- fix compilation on aarch64 (missing undef in accel/tcg/atomic_template.h)

v3
- add info when printing memory accesses (insn_vaddr,mem_vaddr,mem_hwaddr)

Pierrick Bouvier (7):
   plugins: fix mem callback array size
   plugins: save value during memory accesses
   plugins: extend API to get latest memory value accessed
   tests/tcg: add mechanism to run specific tests with plugins
   tests/tcg: allow to check output of plugins
   tests/plugin/mem: add option to print memory accesses
   tests/tcg/x86_64: add test for plugin memory access

  accel/tcg/atomic_template.h | 66 +--
  include/qemu/plugin.h   |  8 ++
  include/qemu/qemu-plugin.h  | 32 
  accel/tcg/plugin-gen.c  |  3 +-
  plugins/api.c   | 34 
  plugins/core.c  |  7 ++
  tcg/tcg-op-ldst.c   | 72 +++--
  tests/plugin/mem.c  | 69 +++-
  tests/tcg/x86_64/test-plugin-mem-access.c   | 89 +
  accel/tcg/atomic_common.c.inc   | 13 ++-
  accel/tcg/ldst_common.c.inc | 38 +
  plugins/qemu-plugins.symbols|  1 +
  tests/tcg/Makefile.target   | 10 ++-
  tests/tcg/x86_64/Makefile.target|  7 ++
  tests/tcg/x86_64/check-plugin-mem-access.sh | 48 +++
  15 files changed, 462 insertions(+), 35 deletions(-)
  create mode 100644 tests/tcg/x86_64/test-plugin-mem-access.c
  create mode 100755 tests/tcg/x86_64/check-plugin-mem-access.sh

Re: [PATCH 2/3] hw/isa/vt82c686: Resolve intermediate IRQ forwarder

2024-07-04 Thread BALATON Zoltan


On Fri, 5 Jul 2024, BALATON Zoltan wrote:

On Thu, 4 Jul 2024, Bernhard Beschow wrote:
When @cpu_intr is populated before vt82xx's realize(), it can be directly 
passed

to i8259_init(), avoiding the need for the intermediate
via_isa_request_i8259_irq() handler. The result is less code and runtime
overhead, and a fixed memory leak caused by qemu_allocate_irqs().

Inspired-by: Philippe Mathieu-Daudé 
Signed-off-by: Bernhard Beschow 
---
hw/isa/vt82c686.c   | 12 ++--
hw/mips/fuloong2e.c |  2 +-
hw/ppc/amigaone.c   |  8 
hw/ppc/pegasos2.c   |  4 ++--
4 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 505b44c4e6..ca02ad4c20 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -624,6 +624,7 @@ static void via_isa_init(Object *obj)
object_initialize_child(obj, "uhci2", &s->uhci[1], 
TYPE_VT82C686B_USB_UHCI);

object_initialize_child(obj, "ac97", &s->ac97, TYPE_VIA_AC97);
object_initialize_child(obj, "mc97", &s->mc97, TYPE_VIA_MC97);
+qdev_init_gpio_out_named(DEVICE(obj), &s->cpu_intr, "intr", 1);
}

static const TypeInfo via_isa_info = {
@@ -704,24 +705,15 @@ static void via_isa_pirq(void *opaque, int pin, int 
level)

via_isa_set_irq(opaque, pin, level);
}

-static void via_isa_request_i8259_irq(void *opaque, int irq, int level)
-{
-ViaISAState *s = opaque;
-qemu_set_irq(s->cpu_intr, level);
-}
-
static void via_isa_realize(PCIDevice *d, Error **errp)
{
ViaISAState *s = VIA_ISA(d);
DeviceState *dev = DEVICE(d);
PCIBus *pci_bus = pci_get_bus(d);
-qemu_irq *isa_irq;
ISABus *isa_bus;
int i;

-qdev_init_gpio_out_named(dev, &s->cpu_intr, "intr", 1);
qdev_init_gpio_in_named(dev, via_isa_pirq, "pirq", PCI_NUM_PINS);


I still don't like how this makes handling of out and in gpios different and 
it also prevents to create the device with pci_create_simple_multifunction() 
and needs tweaking before realize. I think the fix should be in i8259 and not 
in this device.


I mean users of this device should not need changing.


Regards,
BALATON Zoltan


-isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
isa_bus = isa_bus_new(dev, pci_address_space(d), 
pci_address_space_io(d),

  errp);

@@ -729,7 +721,7 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
return;
}

-s->isa_irqs_in = i8259_init(isa_bus, *isa_irq);
+s->isa_irqs_in = i8259_init(isa_bus, s->cpu_intr);
isa_bus_register_input_irqs(isa_bus, s->isa_irqs_in);
i8254_pit_init(isa_bus, 0x40, 0, NULL);
i8257_dma_init(OBJECT(d), isa_bus, 0);
diff --git a/hw/mips/fuloong2e.c b/hw/mips/fuloong2e.c
index 6e4303ba47..e6487c34d8 100644
--- a/hw/mips/fuloong2e.c
+++ b/hw/mips/fuloong2e.c
@@ -286,6 +286,7 @@ static void mips_fuloong2e_init(MachineState *machine)
/* South bridge -> IP5 */
pci_dev = pci_new_multifunction(PCI_DEVFN(FULOONG2E_VIA_SLOT, 0),
TYPE_VT82C686B_ISA);
+qdev_connect_gpio_out_named(DEVICE(pci_dev), "intr", 0, env->irq[5]);

/* Set properties on individual devices before realizing the south 
bridge */

if (machine->audiodev) {
@@ -299,7 +300,6 @@ static void mips_fuloong2e_init(MachineState *machine)
  object_resolve_path_component(OBJECT(pci_dev),
"rtc"),
  "date");
-qdev_connect_gpio_out_named(DEVICE(pci_dev), "intr", 0, env->irq[5]);

dev = DEVICE(object_resolve_path_component(OBJECT(pci_dev), "ide"));
pci_ide_create_devs(PCI_DEVICE(dev));
diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c
index 9dcc486c1a..2110875f56 100644
--- a/hw/ppc/amigaone.c
+++ b/hw/ppc/amigaone.c
@@ -148,13 +148,13 @@ static void amigaone_init(MachineState *machine)
pci_bus = PCI_BUS(qdev_get_child_bus(dev, "pci.0"));

/* VIA VT82c686B South Bridge (multifunction PCI device) */
-via = OBJECT(pci_create_simple_multifunction(pci_bus, PCI_DEVFN(7, 0),
- TYPE_VT82C686B_ISA));
+via = OBJECT(pci_new_multifunction(PCI_DEVFN(7, 0), 
TYPE_VT82C686B_ISA));

+qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
+qdev_get_gpio_in(DEVICE(cpu), 
PPC6xx_INPUT_INT));

+pci_realize_and_unref(PCI_DEVICE(via), pci_bus, &error_abort);
object_property_add_alias(OBJECT(machine), "rtc-time",
  object_resolve_path_component(via, "rtc"),
  "date");
-qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
-qdev_get_gpio_in(DEVICE(cpu), 
PPC6xx_INPUT_INT));

for (i = 0; i < PCI_NUM_PINS; i++) {
qdev_connect_gpio_out(dev, i, qdev_get_gpio_in_named(DEVICE(via),
 "pirq", i));
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 9b0a6b70ab..54e60

[PATCH v5 3/7] plugins: extend API to get latest memory value accessed

2024-07-04 Thread Pierrick Bouvier

This value can be accessed only during a memory callback, using
new qemu_plugin_mem_get_value function.

Returned value can be extended when QEMU will support accesses wider
than 128 bits.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1719
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2152
Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 include/qemu/qemu-plugin.h   | 32 
 plugins/api.c| 33 +
 plugins/qemu-plugins.symbols |  1 +
 3 files changed, 66 insertions(+)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index c71c705b699..649ce89815f 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -262,6 +262,29 @@ enum qemu_plugin_mem_rw {
 QEMU_PLUGIN_MEM_RW,
 };
 
+enum qemu_plugin_mem_value_type {
+QEMU_PLUGIN_MEM_VALUE_U8,
+QEMU_PLUGIN_MEM_VALUE_U16,
+QEMU_PLUGIN_MEM_VALUE_U32,
+QEMU_PLUGIN_MEM_VALUE_U64,
+QEMU_PLUGIN_MEM_VALUE_U128,
+};
+
+/* typedef qemu_plugin_mem_value - value accessed during a load/store */
+typedef struct {
+enum qemu_plugin_mem_value_type type;
+union {
+uint8_t u8;
+uint16_t u16;
+uint32_t u32;
+uint64_t u64;
+struct {
+uint64_t low;
+uint64_t high;
+} u128;
+} data;
+} qemu_plugin_mem_value;
+
 /**
  * enum qemu_plugin_cond - condition to enable callback
  *
@@ -551,6 +574,15 @@ bool qemu_plugin_mem_is_big_endian(qemu_plugin_meminfo_t 
info);
 QEMU_PLUGIN_API
 bool qemu_plugin_mem_is_store(qemu_plugin_meminfo_t info);
 
+/**
+ * qemu_plugin_mem_get_mem_value() - return last value loaded/stored
+ * @info: opaque memory transaction handle
+ *
+ * Returns: memory value
+ */
+QEMU_PLUGIN_API
+qemu_plugin_mem_value qemu_plugin_mem_get_value(qemu_plugin_meminfo_t info);
+
 /**
  * qemu_plugin_get_hwaddr() - return handle for memory operation
  * @info: opaque memory info structure
diff --git a/plugins/api.c b/plugins/api.c
index 2ff13d09de6..3316d4a04d4 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -351,6 +351,39 @@ bool qemu_plugin_mem_is_store(qemu_plugin_meminfo_t info)
 return get_plugin_meminfo_rw(info) & QEMU_PLUGIN_MEM_W;
 }
 
+qemu_plugin_mem_value qemu_plugin_mem_get_value(qemu_plugin_meminfo_t info)
+{
+uint64_t low = current_cpu->neg.plugin_mem_value_low;
+qemu_plugin_mem_value value;
+
+switch (qemu_plugin_mem_size_shift(info)) {
+case 0:
+value.type = QEMU_PLUGIN_MEM_VALUE_U8;
+value.data.u8 = (uint8_t)low;
+break;
+case 1:
+value.type = QEMU_PLUGIN_MEM_VALUE_U16;
+value.data.u16 = (uint16_t)low;
+break;
+case 2:
+value.type = QEMU_PLUGIN_MEM_VALUE_U32;
+value.data.u32 = (uint32_t)low;
+break;
+case 3:
+value.type = QEMU_PLUGIN_MEM_VALUE_U64;
+value.data.u64 = low;
+break;
+case 4:
+value.type = QEMU_PLUGIN_MEM_VALUE_U128;
+value.data.u128.low = low;
+value.data.u128.high = current_cpu->neg.plugin_mem_value_high;
+break;
+default:
+g_assert_not_reached();
+}
+return value;
+}
+
 /*
  * Virtual Memory queries
  */
diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
index ca773d8d9fe..eed9d8abd90 100644
--- a/plugins/qemu-plugins.symbols
+++ b/plugins/qemu-plugins.symbols
@@ -13,6 +13,7 @@
   qemu_plugin_insn_size;
   qemu_plugin_insn_symbol;
   qemu_plugin_insn_vaddr;
+  qemu_plugin_mem_get_value;
   qemu_plugin_mem_is_big_endian;
   qemu_plugin_mem_is_sign_extended;
   qemu_plugin_mem_is_store;
-- 
2.39.2

Re: [PATCH 2/3] hw/isa/vt82c686: Resolve intermediate IRQ forwarder

2024-07-04 Thread BALATON Zoltan


On Thu, 4 Jul 2024, Bernhard Beschow wrote:

When @cpu_intr is populated before vt82xx's realize(), it can be directly passed
to i8259_init(), avoiding the need for the intermediate
via_isa_request_i8259_irq() handler. The result is less code and runtime
overhead, and a fixed memory leak caused by qemu_allocate_irqs().

Inspired-by: Philippe Mathieu-Daudé 
Signed-off-by: Bernhard Beschow 
---
hw/isa/vt82c686.c   | 12 ++--
hw/mips/fuloong2e.c |  2 +-
hw/ppc/amigaone.c   |  8 
hw/ppc/pegasos2.c   |  4 ++--
4 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 505b44c4e6..ca02ad4c20 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -624,6 +624,7 @@ static void via_isa_init(Object *obj)
object_initialize_child(obj, "uhci2", &s->uhci[1], TYPE_VT82C686B_USB_UHCI);
object_initialize_child(obj, "ac97", &s->ac97, TYPE_VIA_AC97);
object_initialize_child(obj, "mc97", &s->mc97, TYPE_VIA_MC97);
+qdev_init_gpio_out_named(DEVICE(obj), &s->cpu_intr, "intr", 1);
}

static const TypeInfo via_isa_info = {
@@ -704,24 +705,15 @@ static void via_isa_pirq(void *opaque, int pin, int level)
via_isa_set_irq(opaque, pin, level);
}

-static void via_isa_request_i8259_irq(void *opaque, int irq, int level)
-{
-ViaISAState *s = opaque;
-qemu_set_irq(s->cpu_intr, level);
-}
-
static void via_isa_realize(PCIDevice *d, Error **errp)
{
ViaISAState *s = VIA_ISA(d);
DeviceState *dev = DEVICE(d);
PCIBus *pci_bus = pci_get_bus(d);
-qemu_irq *isa_irq;
ISABus *isa_bus;
int i;

-qdev_init_gpio_out_named(dev, &s->cpu_intr, "intr", 1);
qdev_init_gpio_in_named(dev, via_isa_pirq, "pirq", PCI_NUM_PINS);


I still don't like how this makes handling of out and in gpios different 
and it also prevents to create the device with 
pci_create_simple_multifunction() and needs tweaking before realize. I 
think the fix should be in i8259 and not in this device.


Regards,
BALATON Zoltan


-isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
isa_bus = isa_bus_new(dev, pci_address_space(d), pci_address_space_io(d),
  errp);

@@ -729,7 +721,7 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
return;
}

-s->isa_irqs_in = i8259_init(isa_bus, *isa_irq);
+s->isa_irqs_in = i8259_init(isa_bus, s->cpu_intr);
isa_bus_register_input_irqs(isa_bus, s->isa_irqs_in);
i8254_pit_init(isa_bus, 0x40, 0, NULL);
i8257_dma_init(OBJECT(d), isa_bus, 0);
diff --git a/hw/mips/fuloong2e.c b/hw/mips/fuloong2e.c
index 6e4303ba47..e6487c34d8 100644
--- a/hw/mips/fuloong2e.c
+++ b/hw/mips/fuloong2e.c
@@ -286,6 +286,7 @@ static void mips_fuloong2e_init(MachineState *machine)
/* South bridge -> IP5 */
pci_dev = pci_new_multifunction(PCI_DEVFN(FULOONG2E_VIA_SLOT, 0),
TYPE_VT82C686B_ISA);
+qdev_connect_gpio_out_named(DEVICE(pci_dev), "intr", 0, env->irq[5]);

/* Set properties on individual devices before realizing the south bridge */
if (machine->audiodev) {
@@ -299,7 +300,6 @@ static void mips_fuloong2e_init(MachineState *machine)
  object_resolve_path_component(OBJECT(pci_dev),
"rtc"),
  "date");
-qdev_connect_gpio_out_named(DEVICE(pci_dev), "intr", 0, env->irq[5]);

dev = DEVICE(object_resolve_path_component(OBJECT(pci_dev), "ide"));
pci_ide_create_devs(PCI_DEVICE(dev));
diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c
index 9dcc486c1a..2110875f56 100644
--- a/hw/ppc/amigaone.c
+++ b/hw/ppc/amigaone.c
@@ -148,13 +148,13 @@ static void amigaone_init(MachineState *machine)
pci_bus = PCI_BUS(qdev_get_child_bus(dev, "pci.0"));

/* VIA VT82c686B South Bridge (multifunction PCI device) */
-via = OBJECT(pci_create_simple_multifunction(pci_bus, PCI_DEVFN(7, 0),
- TYPE_VT82C686B_ISA));
+via = OBJECT(pci_new_multifunction(PCI_DEVFN(7, 0), TYPE_VT82C686B_ISA));
+qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
+qdev_get_gpio_in(DEVICE(cpu), 
PPC6xx_INPUT_INT));
+pci_realize_and_unref(PCI_DEVICE(via), pci_bus, &error_abort);
object_property_add_alias(OBJECT(machine), "rtc-time",
  object_resolve_path_component(via, "rtc"),
  "date");
-qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
-qdev_get_gpio_in(DEVICE(cpu), 
PPC6xx_INPUT_INT));
for (i = 0; i < PCI_NUM_PINS; i++) {
qdev_connect_gpio_out(dev, i, qdev_get_gpio_in_named(DEVICE(via),
 "pirq", i));
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 9b0a6b70ab..54e60082ce 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -181,6 +181,8 @@ static void pegasos2_init(Machi

[PATCH v5 2/7] plugins: save value during memory accesses

2024-07-04 Thread Pierrick Bouvier

Different code paths handle memory accesses:
- tcg generated code
- load/store helpers
- atomic helpers

This value is saved in cpu->neg.plugin_mem_value_{high,low}. Values are
written only for accessed word size (upper bits are not set).

Atomic operations are doing read/write at the same time, so we generate
two memory callbacks instead of one, to allow plugins to access distinct
values.

For now, we can have access only up to 128 bits, thus split this in two
64 bits words. When QEMU will support wider operations, we'll be able to
reconsider this.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 accel/tcg/atomic_template.h   | 66 ++-
 include/hw/core/cpu.h |  4 +++
 include/qemu/plugin.h |  4 +++
 plugins/core.c|  6 
 tcg/tcg-op-ldst.c | 66 +++
 accel/tcg/atomic_common.c.inc | 13 ++-
 accel/tcg/ldst_common.c.inc   | 38 
 7 files changed, 167 insertions(+), 30 deletions(-)

diff --git a/accel/tcg/atomic_template.h b/accel/tcg/atomic_template.h
index 1dc2151dafd..89593b2502f 100644
--- a/accel/tcg/atomic_template.h
+++ b/accel/tcg/atomic_template.h
@@ -53,6 +53,14 @@
 # error unsupported data size
 #endif
 
+#if DATA_SIZE == 16
+# define VALUE_LOW(val) int128_getlo(val)
+# define VALUE_HIGH(val) int128_gethi(val)
+#else
+# define VALUE_LOW(val) val
+# define VALUE_HIGH(val) 0
+#endif
+
 #if DATA_SIZE >= 4
 # define ABI_TYPE  DATA_TYPE
 #else
@@ -83,7 +91,12 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, abi_ptr 
addr,
 ret = qatomic_cmpxchg__nocheck(haddr, cmpv, newv);
 #endif
 ATOMIC_MMU_CLEANUP;
-atomic_trace_rmw_post(env, addr, oi);
+atomic_trace_rmw_post(env, addr,
+  VALUE_LOW(ret),
+  VALUE_HIGH(ret),
+  VALUE_LOW(newv),
+  VALUE_HIGH(newv),
+  oi);
 return ret;
 }
 
@@ -97,7 +110,12 @@ ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, abi_ptr addr, 
ABI_TYPE val,
 
 ret = qatomic_xchg__nocheck(haddr, val);
 ATOMIC_MMU_CLEANUP;
-atomic_trace_rmw_post(env, addr, oi);
+atomic_trace_rmw_post(env, addr,
+  VALUE_LOW(ret),
+  VALUE_HIGH(ret),
+  VALUE_LOW(val),
+  VALUE_HIGH(val),
+  oi);
 return ret;
 }
 
@@ -109,7 +127,12 @@ ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,   
 \
 haddr = atomic_mmu_lookup(env_cpu(env), addr, oi, DATA_SIZE, retaddr);   \
 ret = qatomic_##X(haddr, val);  \
 ATOMIC_MMU_CLEANUP; \
-atomic_trace_rmw_post(env, addr, oi);   \
+atomic_trace_rmw_post(env, addr,\
+  VALUE_LOW(ret),   \
+  VALUE_HIGH(ret),  \
+  VALUE_LOW(val),   \
+  VALUE_HIGH(val),  \
+  oi);  \
 return ret; \
 }
 
@@ -145,7 +168,12 @@ ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,   
 \
 cmp = qatomic_cmpxchg__nocheck(haddr, old, new);\
 } while (cmp != old);   \
 ATOMIC_MMU_CLEANUP; \
-atomic_trace_rmw_post(env, addr, oi);   \
+atomic_trace_rmw_post(env, addr,\
+  VALUE_LOW(old),   \
+  VALUE_HIGH(old),  \
+  VALUE_LOW(xval),  \
+  VALUE_HIGH(xval), \
+  oi);  \
 return RET; \
 }
 
@@ -188,7 +216,12 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, abi_ptr 
addr,
 ret = qatomic_cmpxchg__nocheck(haddr, BSWAP(cmpv), BSWAP(newv));
 #endif
 ATOMIC_MMU_CLEANUP;
-atomic_trace_rmw_post(env, addr, oi);
+atomic_trace_rmw_post(env, addr,
+  VALUE_LOW(ret),
+  VALUE_HIGH(ret),
+  VALUE_LOW(newv),
+  VALUE_HIGH(newv),
+  oi);
 return BSWAP(ret);
 }
 
@@ -202,7 +235,12 @@ ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, abi_ptr 
addr, ABI_TYPE val,
 
 ret = qatomic_xchg__nocheck(haddr, BSWAP(val));
 ATOMIC_MMU_CLEANUP;
-atomic_trace_rmw_post(env, addr, oi);
+atomic_trace_rmw_post(

[PATCH v5 6/7] tests/plugin/mem: add option to print memory accesses

2024-07-04 Thread Pierrick Bouvier

By using "print-accesses=true" option, mem plugin will now print every
value accessed, with associated size, type (store vs load), symbol,
instruction address and phys/virt address accessed.

Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 tests/plugin/mem.c | 69 +-
 1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/tests/plugin/mem.c b/tests/plugin/mem.c
index b650dddcce1..086e6f5bdfc 100644
--- a/tests/plugin/mem.c
+++ b/tests/plugin/mem.c
@@ -21,10 +21,15 @@ typedef struct {
 uint64_t io_count;
 } CPUCount;
 
+typedef struct {
+uint64_t vaddr;
+const char *sym;
+} InsnInfo;
+
 static struct qemu_plugin_scoreboard *counts;
 static qemu_plugin_u64 mem_count;
 static qemu_plugin_u64 io_count;
-static bool do_inline, do_callback;
+static bool do_inline, do_callback, do_print_accesses;
 static bool do_haddr;
 static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
 
@@ -60,6 +65,44 @@ static void vcpu_mem(unsigned int cpu_index, 
qemu_plugin_meminfo_t meminfo,
 }
 }
 
+static void print_access(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,
+ uint64_t vaddr, void *udata)
+{
+InsnInfo *insn_info = udata;
+unsigned size = 8 << qemu_plugin_mem_size_shift(meminfo);
+const char *type = qemu_plugin_mem_is_store(meminfo) ? "store" : "load";
+qemu_plugin_mem_value value = qemu_plugin_mem_get_value(meminfo);
+uint64_t hwaddr =
+qemu_plugin_hwaddr_phys_addr(qemu_plugin_get_hwaddr(meminfo, vaddr));
+g_autoptr(GString) out = g_string_new("");
+g_string_printf(out,
+"0x%"PRIx64",%s,0x%"PRIx64",0x%"PRIx64",%d,%s,",
+insn_info->vaddr, insn_info->sym,
+vaddr, hwaddr, size, type);
+switch (value.type) {
+case QEMU_PLUGIN_MEM_VALUE_U8:
+g_string_append_printf(out, "0x%02"PRIx8, value.data.u8);
+break;
+case QEMU_PLUGIN_MEM_VALUE_U16:
+g_string_append_printf(out, "0x%04"PRIx16, value.data.u16);
+break;
+case QEMU_PLUGIN_MEM_VALUE_U32:
+g_string_append_printf(out, "0x%08"PRIx32, value.data.u32);
+break;
+case QEMU_PLUGIN_MEM_VALUE_U64:
+g_string_append_printf(out, "0x%016"PRIx64, value.data.u64);
+break;
+case QEMU_PLUGIN_MEM_VALUE_U128:
+g_string_append_printf(out, "0x%016"PRIx64"%016"PRIx64,
+   value.data.u128.high, value.data.u128.low);
+break;
+default:
+g_assert_not_reached();
+}
+g_string_append_printf(out, "\n");
+qemu_plugin_outs(out->str);
+}
+
 static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
 {
 size_t n = qemu_plugin_tb_n_insns(tb);
@@ -79,6 +122,16 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct 
qemu_plugin_tb *tb)
  QEMU_PLUGIN_CB_NO_REGS,
  rw, NULL);
 }
+if (do_print_accesses) {
+/* we leak this pointer, to avoid locking to keep track of it */
+InsnInfo *insn_info = g_malloc(sizeof(InsnInfo));
+const char *sym = qemu_plugin_insn_symbol(insn);
+insn_info->sym = sym ? sym : "";
+insn_info->vaddr = qemu_plugin_insn_vaddr(insn);
+qemu_plugin_register_vcpu_mem_cb(insn, print_access,
+ QEMU_PLUGIN_CB_NO_REGS,
+ rw, (void *) insn_info);
+}
 }
 }
 
@@ -117,6 +170,12 @@ QEMU_PLUGIN_EXPORT int 
qemu_plugin_install(qemu_plugin_id_t id,
 fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
 return -1;
 }
+} else if (g_strcmp0(tokens[0], "print-accesses") == 0) {
+if (!qemu_plugin_bool_parse(tokens[0], tokens[1],
+&do_print_accesses)) {
+fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
+return -1;
+}
 } else {
 fprintf(stderr, "option parsing failed: %s\n", opt);
 return -1;
@@ -129,6 +188,14 @@ QEMU_PLUGIN_EXPORT int 
qemu_plugin_install(qemu_plugin_id_t id,
 return -1;
 }
 
+if (do_print_accesses) {
+g_autoptr(GString) out = g_string_new("");
+g_string_printf(out,
+"insn_vaddr,insn_symbol,mem_vaddr,mem_hwaddr,"
+"access_size,access_type,mem_value\n");
+qemu_plugin_outs(out->str);
+}
+
 counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
 mem_count = qemu_plugin_scoreboard_u64_in_struct(
 counts, CPUCount, mem_count);
-- 
2.39.2

[PATCH v5 5/7] tests/tcg: allow to check output of plugins

2024-07-04 Thread Pierrick Bouvier

A specific plugin test can now read and check a plugin output, to ensure
it contains expected values.

Tested-by: Xingtao Yao 
Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 tests/tcg/Makefile.target | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
index dc5c8b7a3b4..b78fd99c337 100644
--- a/tests/tcg/Makefile.target
+++ b/tests/tcg/Makefile.target
@@ -90,6 +90,7 @@ CFLAGS=
 LDFLAGS=
 
 QEMU_OPTS=
+CHECK_PLUGIN_OUTPUT_COMMAND=true
 
 
 # If TCG debugging, or TCI is enabled things are a lot slower
@@ -180,6 +181,9 @@ run-plugin-%:
-plugin $(PLUGIN_LIB)/$(call extract-plugin,$@)$(PLUGIN_ARGS) \
-d plugin -D $*.pout \
 $(call strip-plugin,$<))
+   $(call quiet-command, $(CHECK_PLUGIN_OUTPUT_COMMAND) $*.pout, \
+  TEST, check plugin $(call extract-plugin,$@) output \
+  with $(call strip-plugin,$<))
 else
 run-%: %
$(call run-test, $<, \
@@ -194,6 +198,9 @@ run-plugin-%:
  -plugin $(PLUGIN_LIB)/$(call extract-plugin,$@)$(PLUGIN_ARGS) 
\
  -d plugin -D $*.pout \
  $(QEMU_OPTS) $(call strip-plugin,$<))
+   $(call quiet-command, $(CHECK_PLUGIN_OUTPUT_COMMAND) $*.pout, \
+  TEST, check plugin $(call extract-plugin,$@) output \
+  with $(call strip-plugin,$<))
 endif
 
 gdb-%: %
-- 
2.39.2

[PATCH v5 7/7] tests/tcg/x86_64: add test for plugin memory access

2024-07-04 Thread Pierrick Bouvier

Add an explicit test to check expected memory values are read/written.
For sizes 8, 16, 32, 64 and 128, we generate a load/store operation.
For size 8 -> 64, we generate an atomic __sync_val_compare_and_swap too.
For 128bits memory access, we rely on SSE2 instructions.

By default, atomic accesses are non atomic if a single cpu is running,
so we force creation of a second one by creating a new thread first.

load/store helpers code path can't be triggered easily in user mode (no
softmmu), so we can't test it here.

Can be run with:
make -C build/tests/tcg/x86_64-linux-user 
run-plugin-test-plugin-mem-access-with-libmem.so

Tested-by: Xingtao Yao 
Signed-off-by: Pierrick Bouvier 
---
 tests/tcg/x86_64/test-plugin-mem-access.c   | 89 +
 tests/tcg/x86_64/Makefile.target|  7 ++
 tests/tcg/x86_64/check-plugin-mem-access.sh | 48 +++
 3 files changed, 144 insertions(+)
 create mode 100644 tests/tcg/x86_64/test-plugin-mem-access.c
 create mode 100755 tests/tcg/x86_64/check-plugin-mem-access.sh

diff --git a/tests/tcg/x86_64/test-plugin-mem-access.c 
b/tests/tcg/x86_64/test-plugin-mem-access.c
new file mode 100644
index 000..7fdd6a55829
--- /dev/null
+++ b/tests/tcg/x86_64/test-plugin-mem-access.c
@@ -0,0 +1,89 @@
+#include 
+#include 
+#include 
+#include 
+
+static void *data;
+
+#define DEFINE_STORE(name, type, value) \
+static void store_##name(void)  \
+{   \
+*((type *)data) = value;\
+}
+
+#define DEFINE_ATOMIC_OP(name, type, value) \
+static void atomic_op_##name(void)  \
+{   \
+*((type *)data) = 0x42; \
+__sync_val_compare_and_swap((type *)data, 0x42, value); \
+}
+
+#define DEFINE_LOAD(name, type) \
+static void load_##name(void)   \
+{   \
+register type var asm("eax") = *((type *) data);\
+(void)var;  \
+}
+
+DEFINE_STORE(u8, uint8_t, 0xf1)
+DEFINE_ATOMIC_OP(u8, uint8_t, 0xf1)
+DEFINE_LOAD(u8, uint8_t)
+DEFINE_STORE(u16, uint16_t, 0xf123)
+DEFINE_ATOMIC_OP(u16, uint16_t, 0xf123)
+DEFINE_LOAD(u16, uint16_t)
+DEFINE_STORE(u32, uint32_t, 0xff112233)
+DEFINE_ATOMIC_OP(u32, uint32_t, 0xff112233)
+DEFINE_LOAD(u32, uint32_t)
+DEFINE_STORE(u64, uint64_t, 0xf123456789abcdef)
+DEFINE_ATOMIC_OP(u64, uint64_t, 0xf123456789abcdef)
+DEFINE_LOAD(u64, uint64_t)
+
+static void store_u128(void)
+{
+_mm_store_si128(data, _mm_set_epi32(0xf1223344, 0x55667788,
+0xf1234567, 0x89abcdef));
+}
+
+static void load_u128(void)
+{
+__m128i var = _mm_load_si128(data);
+(void)var;
+}
+
+static void *f(void *p)
+{
+return NULL;
+}
+
+int main(void)
+{
+/*
+ * We force creation of a second thread to enable cpu flag CF_PARALLEL.
+ * This will generate atomic operations when needed.
+ */
+pthread_t thread;
+pthread_create(&thread, NULL, &f, NULL);
+pthread_join(thread, NULL);
+
+data = malloc(sizeof(__m128i));
+atomic_op_u8();
+store_u8();
+load_u8();
+
+atomic_op_u16();
+store_u16();
+load_u16();
+
+atomic_op_u32();
+store_u32();
+load_u32();
+
+atomic_op_u64();
+store_u64();
+load_u64();
+
+store_u128();
+load_u128();
+
+free(data);
+}
diff --git a/tests/tcg/x86_64/Makefile.target b/tests/tcg/x86_64/Makefile.target
index eda9bd7396c..3edc29b924d 100644
--- a/tests/tcg/x86_64/Makefile.target
+++ b/tests/tcg/x86_64/Makefile.target
@@ -16,6 +16,7 @@ X86_64_TESTS += noexec
 X86_64_TESTS += cmpxchg
 X86_64_TESTS += adox
 X86_64_TESTS += test-1648
+PLUGINS_TESTS += test-plugin-mem-access
 TESTS=$(MULTIARCH_TESTS) $(X86_64_TESTS) test-x86_64
 else
 TESTS=$(MULTIARCH_TESTS)
@@ -26,6 +27,12 @@ adox: CFLAGS=-O2
 run-test-i386-ssse3: QEMU_OPTS += -cpu max
 run-plugin-test-i386-ssse3-%: QEMU_OPTS += -cpu max
 
+run-plugin-test-plugin-mem-access-with-libmem.so: \
+   PLUGIN_ARGS=$(COMMA)print-accesses=true
+run-plugin-test-plugin-mem-access-with-libmem.so: \
+   CHECK_PLUGIN_OUTPUT_COMMAND= \
+   $(SRC_PATH)/tests/tcg/x86_64/check-plugin-mem-access.sh
+
 test-x86_64: LDFLAGS+=-lm -lc
 test-x86_64: test-i386.c test-i386.h test-i386-shift.h test-i386-muldiv.h
$(CC) $(CFLAGS) $< -o $@ $(LDFLAGS)
diff --git a/tests/tcg/x86_64/check-plugin-mem-access.sh 
b/tests/tcg/x86_64/check-plugin-mem-access.sh
new file mode 100755
index 000..163f1cfad34
--- /dev/null
+++ b/tests/tcg/x86_64/check-plugin-mem-access.sh
@@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+die()
+{
+echo "$@" 1>&2
+exit 1
+}
+
+check()
+{
+file=$1
+pattern=$2
+grep "$pattern" "$file" > /dev/null || die "\"$pattern\" not found in 
$file"
+}
+
+[ $# -eq 1 ] || die "usage: plugin_out_file"
+
+plugi

[PATCH v5 0/7] plugins: access values during a memory read/write

2024-07-04 Thread Pierrick Bouvier

This series allows plugins to know which value is read/written during a memory
access.

For every memory access, we know copy this value before calling mem callbacks,
and those can query it using new API function:
- qemu_plugin_mem_get_value

Mem plugin was extended to print accesses, and a new test was added to check
functionality work as expected. A bug was found where callbacks were not
called as expected.

This will open new use cases for plugins, such as following specific values in
memory.

v5
- fixed width output for mem values in mem plugin
- move plugin_mem_value to CPUNegativeOffset
- tcg/tcg-op-ldst.c: only store word size mem access (do not set upper bits)

v4
- fix prototype for stubs qemu_plugin_vcpu_mem_cb (inverted low/high parameters
  names)
- link gitlab bugs resolved (thanks @Anton Kochkov for reporting)
  https://gitlab.com/qemu-project/qemu/-/issues/1719
  https://gitlab.com/qemu-project/qemu/-/issues/2152

v3
- simplify API: return an algebraic data type for value accessed
  this can be easily extended when QEMU will support wider accesses
- fix Makefile test (use quiet-command instead of manually run the command)
- rename upper/lower to high/low
- reorder functions parameters and code to low/high instead of high/low, to
  follow current convention in QEMU codebase

v2
- fix compilation on aarch64 (missing undef in accel/tcg/atomic_template.h)

v3
- add info when printing memory accesses (insn_vaddr,mem_vaddr,mem_hwaddr)

Pierrick Bouvier (7):
  plugins: fix mem callback array size
  plugins: save value during memory accesses
  plugins: extend API to get latest memory value accessed
  tests/tcg: add mechanism to run specific tests with plugins
  tests/tcg: allow to check output of plugins
  tests/plugin/mem: add option to print memory accesses
  tests/tcg/x86_64: add test for plugin memory access

 accel/tcg/atomic_template.h | 66 +--
 include/hw/core/cpu.h   |  4 +
 include/qemu/plugin.h   |  4 +
 include/qemu/qemu-plugin.h  | 32 
 accel/tcg/plugin-gen.c  |  3 +-
 plugins/api.c   | 33 
 plugins/core.c  |  6 ++
 tcg/tcg-op-ldst.c   | 66 +--
 tests/plugin/mem.c  | 69 +++-
 tests/tcg/x86_64/test-plugin-mem-access.c   | 89 +
 accel/tcg/atomic_common.c.inc   | 13 ++-
 accel/tcg/ldst_common.c.inc | 38 +
 plugins/qemu-plugins.symbols|  1 +
 tests/tcg/Makefile.target   | 10 ++-
 tests/tcg/x86_64/Makefile.target|  7 ++
 tests/tcg/x86_64/check-plugin-mem-access.sh | 48 +++
 16 files changed, 455 insertions(+), 34 deletions(-)
 create mode 100644 tests/tcg/x86_64/test-plugin-mem-access.c
 create mode 100755 tests/tcg/x86_64/check-plugin-mem-access.sh

-- 
2.39.2

[PATCH v5 1/7] plugins: fix mem callback array size

2024-07-04 Thread Pierrick Bouvier

data was correctly copied, but size of array was not set
(g_array_sized_new only reserves memory, but does not set size).

As a result, callbacks were not called for code path relying on
plugin_register_vcpu_mem_cb().

Found when trying to trigger mem access callbacks for atomic
instructions.

Reviewed-by: Xingtao Yao 
Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 accel/tcg/plugin-gen.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index b6bae32b997..ec89a085b43 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -85,8 +85,7 @@ static void gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
 len = insn->mem_cbs->len;
 arr = g_array_sized_new(false, false,
 sizeof(struct qemu_plugin_dyn_cb), len);
-memcpy(arr->data, insn->mem_cbs->data,
-   len * sizeof(struct qemu_plugin_dyn_cb));
+g_array_append_vals(arr, insn->mem_cbs->data, len);
 qemu_plugin_add_dyn_cb_arr(arr);
 
 tcg_gen_st_ptr(tcg_constant_ptr((intptr_t)arr), tcg_env,
-- 
2.39.2

[PATCH v5 4/7] tests/tcg: add mechanism to run specific tests with plugins

2024-07-04 Thread Pierrick Bouvier

Only multiarch tests are run with plugins, and we want to be able to run
per-arch test with plugins too.

Tested-by: Xingtao Yao 
Reviewed-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 tests/tcg/Makefile.target | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
index f21be50d3b2..dc5c8b7a3b4 100644
--- a/tests/tcg/Makefile.target
+++ b/tests/tcg/Makefile.target
@@ -152,10 +152,11 @@ PLUGINS=$(patsubst %.c, lib%.so, $(notdir $(wildcard 
$(PLUGIN_SRC)/*.c)))
 # only expand MULTIARCH_TESTS which are common on most of our targets
 # to avoid an exponential explosion as new tests are added. We also
 # add some special helpers the run-plugin- rules can use below.
+# In more, extra tests can be added using PLUGINS_TESTS variable.
 
 ifneq ($(MULTIARCH_TESTS),)
 $(foreach p,$(PLUGINS), \
-   $(foreach t,$(MULTIARCH_TESTS),\
+   $(foreach t,$(MULTIARCH_TESTS) $(PLUGINS_TESTS),\
$(eval run-plugin-$(t)-with-$(p): $t $p) \
$(eval RUN_TESTS+=run-plugin-$(t)-with-$(p
 endif # MULTIARCH_TESTS
-- 
2.39.2

Re: [PATCH v4 2/7] plugins: save value during memory accesses

2024-07-04 Thread Pierrick Bouvier


On 7/3/24 11:58, Richard Henderson wrote:

On 7/2/24 11:44, Pierrick Bouvier wrote:

Different code paths handle memory accesses:
- tcg generated code
- load/store helpers
- atomic helpers

This value is saved in cpu->plugin_state.

Atomic operations are doing read/write at the same time, so we generate
two memory callbacks instead of one, to allow plugins to access distinct
values.

For now, we can have access only up to 128 bits, thus split this in two
64 bits words. When QEMU will support wider operations, we'll be able to
reconsider this.

Signed-off-by: Pierrick Bouvier 
---
   accel/tcg/atomic_template.h   | 66 
   include/qemu/plugin.h |  8 
   plugins/core.c|  7 
   tcg/tcg-op-ldst.c | 72 +++
   accel/tcg/atomic_common.c.inc | 13 ++-
   accel/tcg/ldst_common.c.inc   | 38 +++---
   6 files changed, 173 insertions(+), 31 deletions(-)


It looks correct so,
Reviewed-by: Richard Henderson 

Possibilities for follow-up improvement:



--- a/tcg/tcg-op-ldst.c
+++ b/tcg/tcg-op-ldst.c
@@ -148,14 +148,24 @@ static TCGv_i64 plugin_maybe_preserve_addr(TCGTemp *addr)
   return NULL;
   }
   
+#ifdef CONFIG_PLUGIN

   static void
-plugin_gen_mem_callbacks(TCGv_i64 copy_addr, TCGTemp *orig_addr, MemOpIdx oi,
+plugin_gen_mem_callbacks(TCGv_i64 value_low, TCGv_i64 value_high,
+ TCGv_i64 copy_addr, TCGTemp *orig_addr, MemOpIdx oi,
enum qemu_plugin_mem_rw rw)
   {
-#ifdef CONFIG_PLUGIN
   if (tcg_ctx->plugin_insn != NULL) {
   qemu_plugin_meminfo_t info = make_plugin_meminfo(oi, rw);
   
+TCGv_ptr plugin_state = tcg_temp_ebb_new_ptr();

+tcg_gen_ld_ptr(plugin_state, tcg_env,
+   offsetof(CPUState, plugin_state) - sizeof(CPUState));
+tcg_gen_st_i64(value_low, plugin_state,
+   offsetof(CPUPluginState, mem_value_low));
+tcg_gen_st_i64(value_high, plugin_state,
+   offsetof(CPUPluginState, mem_value_high));


Maybe better to place this data at the beginning of CPUNegativeOffsetState?
This would eliminate a load dependency and most hosts would be able to use 
(relatively)
small negative offset efficiently.



That's a good suggestion. Moved it here for v5.


+static void
+plugin_gen_mem_callbacks_i32(TCGv_i32 val,
+ TCGv_i64 copy_addr, TCGTemp *orig_addr,
+ MemOpIdx oi, enum qemu_plugin_mem_rw rw)
+{
+#ifdef CONFIG_PLUGIN
+if (tcg_ctx->plugin_insn != NULL) {
+TCGv_i64 ext_val = tcg_temp_ebb_new_i64();
+tcg_gen_extu_i32_i64(ext_val, val);
+plugin_gen_mem_callbacks(ext_val, tcg_constant_i64(0),


This zero extension got me to thinking:
(1) why zero extension and not sign-extension based on MO_SIGN from oi?


This was to truncate upper value, but as you mentioned, I simply clip it 
later, so it's not important.



(2) given that the callback will have oi, do we really need any extension
  at all here?  We could allow the bits outside oi be garbage.
  This would eliminate the store to value_high entirely for most ops,
  and would allow this i32 op to avoid the extension -- simply perform
  a 32-bit store into the low half of value_low.



I implemented selective store for plugin_gen_mem_callbacks function for 
next version.


However, for helpers based memory accesses, I prefer to keep existing 
version. It would require to create several small functions 
(atomic_trace_rmw_post, plugin_load/store_cb and qemu_plugin_vcpu_mem_cb 
for every size). It's something that could be desirable later when we'll 
introduce bigger tcg word size though.



That appears to be what you're doing anyway with qemu_plugin_mem_value in the 
next patch.


r~

Re: [PATCH 1/3] hw/isa/vt82c686: Turn "intr" irq into a named gpio

2024-07-04 Thread BALATON Zoltan


On Thu, 4 Jul 2024, Bernhard Beschow wrote:

Makes the code more comprehensible, matches the datasheet and the piix4 device
model.

Signed-off-by: Bernhard Beschow 
---
hw/isa/vt82c686.c   | 2 +-
hw/mips/fuloong2e.c | 2 +-
hw/ppc/amigaone.c   | 4 ++--
hw/ppc/pegasos2.c   | 4 ++--
4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 8582ac0322..505b44c4e6 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -719,7 +719,7 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
ISABus *isa_bus;
int i;

-qdev_init_gpio_out(dev, &s->cpu_intr, 1);
+qdev_init_gpio_out_named(dev, &s->cpu_intr, "intr", 1);
qdev_init_gpio_in_named(dev, via_isa_pirq, "pirq", PCI_NUM_PINS);
isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
isa_bus = isa_bus_new(dev, pci_address_space(d), pci_address_space_io(d),
diff --git a/hw/mips/fuloong2e.c b/hw/mips/fuloong2e.c
index a45aac368c..6e4303ba47 100644
--- a/hw/mips/fuloong2e.c
+++ b/hw/mips/fuloong2e.c
@@ -299,7 +299,7 @@ static void mips_fuloong2e_init(MachineState *machine)
  object_resolve_path_component(OBJECT(pci_dev),
"rtc"),
  "date");
-qdev_connect_gpio_out(DEVICE(pci_dev), 0, env->irq[5]);
+qdev_connect_gpio_out_named(DEVICE(pci_dev), "intr", 0, env->irq[5]);


I was wondering why we still have 0 when we have a name so checked the doc 
commant in include/hw/qdev-core.h and found that the docs in 
qdev_connect_gpio_out_named is mostly just a copy&paste of the 
qdev_connect_gpio_out and it also talks about output GPIO array but then 
says input GPIOs in that array. I've stopped reading at that point as this 
text makes little sense. Somebody who knows how this actually works might 
want to update that doc comment.


But that's unrelated to this patch so this is

Reviewed-by: BALATON Zoltan 



dev = DEVICE(object_resolve_path_component(OBJECT(pci_dev), "ide"));
pci_ide_create_devs(PCI_DEVICE(dev));
diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c
index ddfa09457a..9dcc486c1a 100644
--- a/hw/ppc/amigaone.c
+++ b/hw/ppc/amigaone.c
@@ -153,8 +153,8 @@ static void amigaone_init(MachineState *machine)
object_property_add_alias(OBJECT(machine), "rtc-time",
  object_resolve_path_component(via, "rtc"),
  "date");
-qdev_connect_gpio_out(DEVICE(via), 0,
-  qdev_get_gpio_in(DEVICE(cpu), PPC6xx_INPUT_INT));
+qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
+qdev_get_gpio_in(DEVICE(cpu), 
PPC6xx_INPUT_INT));
for (i = 0; i < PCI_NUM_PINS; i++) {
qdev_connect_gpio_out(dev, i, qdev_get_gpio_in_named(DEVICE(via),
 "pirq", i));
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index c1bd8dfa21..9b0a6b70ab 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -195,8 +195,8 @@ static void pegasos2_init(MachineState *machine)
object_property_add_alias(OBJECT(machine), "rtc-time",
  object_resolve_path_component(via, "rtc"),
  "date");
-qdev_connect_gpio_out(DEVICE(via), 0,
-  qdev_get_gpio_in_named(pm->mv, "gpp", 31));
+qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
+qdev_get_gpio_in_named(pm->mv, "gpp", 31));

dev = PCI_DEVICE(object_resolve_path_component(via, "ide"));
pci_ide_create_devs(dev);

Re: [PATCH 11/43] target/ppc/mmu_common.c: Remove pte_update_flags()

2024-07-04 Thread Nicholas Piggin

On Thu Jul 4, 2024 at 10:34 PM AEST, BALATON Zoltan wrote:
> On Thu, 4 Jul 2024, Nicholas Piggin wrote:
> > On Mon May 27, 2024 at 9:12 AM AEST, BALATON Zoltan wrote:
> >> This function is used only once, its return value is ignored and one
> >> of its parameter is a return value from a previous call. It is better
> >> to inline it in the caller and remove it.
> >>
> >> Signed-off-by: BALATON Zoltan 
> >> ---
> >>  target/ppc/mmu_common.c | 41 +
> >>  1 file changed, 13 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/target/ppc/mmu_common.c b/target/ppc/mmu_common.c
> >> index e3537c63c0..c4902b7632 100644
> >> --- a/target/ppc/mmu_common.c
> >> +++ b/target/ppc/mmu_common.c
> >> @@ -119,39 +119,14 @@ static int ppc6xx_tlb_pte_check(mmu_ctx_t *ctx, 
> >> target_ulong pte0,
> >>  }
> >>  }
> >>
> >> -static int pte_update_flags(mmu_ctx_t *ctx, target_ulong *pte1p,
> >> -int ret, MMUAccessType access_type)
> >> -{
> >> -int store = 0;
> >> -
> >> -/* Update page flags */
> >> -if (!(*pte1p & 0x0100)) {
> >> -/* Update accessed flag */
> >> -*pte1p |= 0x0100;
> >> -store = 1;
> >> -}
> >> -if (!(*pte1p & 0x0080)) {
> >> -if (access_type == MMU_DATA_STORE && ret == 0) {
> >> -/* Update changed flag */
> >> -*pte1p |= 0x0080;
> >> -store = 1;
> >> -} else {
> >> -/* Force page fault for first write access */
> >> -ctx->prot &= ~PAGE_WRITE;
> >> -}
> >> -}
> >> -
> >> -return store;
> >> -}
> >> -
> >>  /* Software driven TLB helpers */
> >>
> >>  static int ppc6xx_tlb_check(CPUPPCState *env, mmu_ctx_t *ctx,
> >>  target_ulong eaddr, MMUAccessType access_type)
> >>  {
> >>  ppc6xx_tlb_t *tlb;
> >> -int nr, best, way;
> >> -int ret;
> >> +target_ulong *pte1p;
> >> +int nr, best, way, ret;
> >>
> >>  best = -1;
> >>  ret = -1; /* No TLB found */
> >> @@ -204,7 +179,17 @@ done:
> >>" prot=%01x ret=%d\n",
> >>ctx->raddr & TARGET_PAGE_MASK, ctx->prot, ret);
> >>  /* Update page flags */
> >> -pte_update_flags(ctx, &env->tlb.tlb6[best].pte1, ret, 
> >> access_type);
> >> +pte1p = &env->tlb.tlb6[best].pte1;
> >> +*pte1p |= 0x0100; /* Update accessed flag */
> >> +if (!(*pte1p & 0x0080)) {
> >> +if (access_type == MMU_DATA_STORE && ret == 0) {
> >> +/* Update changed flag */
> >> +*pte1p |= 0x0080;
> >> +} else {
> >> +/* Force page fault for first write access */
> >> +ctx->prot &= ~PAGE_WRITE;
> >
> > Out of curiosity, I guess this unusual part is because ctx->prot can get
> > PAGE_WRITE set in the bat lookup, then it has to be cleared if the PTE
> > does not have changed bit?
>
> I have no idea. I was just trying to clean up this code to make it simpler 

Yeah that's fine I wouldn't expect it to change here, just wondering
if you'd dug into it more. I *think* that is the reaon for it.

Thanks,
Nick

> with this series. I think historically there was a single function that 
> handled all models but as these became too different it was split up by 
> MMU models. It could be some of this are remnants of some old code where 
> some other model needed it and not needed any more or this could be merged 
> with hash32 but I did not try to find that out, just try to make sure not 
> to break it any more than it might already be broken.
>
> Regards,
> BALATON Zoltan
>
> >> +}
> >> +}
> >>  }
> >
> > Reviewed-by: Nicholas Piggin 
> >
> >>  #if defined(DUMP_PAGE_TABLES)
> >>  if (qemu_loglevel_mask(CPU_LOG_MMU)) {
> >
> >

Re: [PULL 02/12] tests/qtest/migration-test: enable on s390x with TCG

2024-07-04 Thread Nicholas Piggin

On Thu Jul 4, 2024 at 9:48 PM AEST, Thomas Huth wrote:
> On 04/07/2024 13.20, Nicholas Piggin wrote:
> > On Tue Jul 2, 2024 at 8:33 PM AEST, Thomas Huth wrote:
> >> From: Nicholas Piggin 
> >>
> >> s390x with TCG is more stable now. Enable it.
> > 
> > Ah, you did a more complete version of my flic fix that migrates all the
> > state. I didn't see that go by but yeah I suspect that was probably the
> > correct thing to do. Thanks for that.
>
> Drat, seems like I forgot to CC: you on that patch, sorry for that, that was 
> by accident and certainly not on purpose :-(

Ah that's fine I was leaving it for s390x people as I said, and
you're s390x people :)

> > Should the s390x flic migrate fix could be got to stable, perhaps?
>
> We need a new machine type for enabling the fix, so it does not make much 
> sense on stable, unfortunately.

Okay.

> > There's some kvm-unit-tests s390x migration tests that can be enabled
> > after the fix too don't forget.
>
> Right, I'll try to remember to enable it once QEMU 9.1 has been released.

Great.

Thanks,
Nick

Re: [PATCH RFC V3 13/29] arm/virt: Make ARM vCPU present status ACPI persistent

2024-07-04 Thread Nicholas Piggin

On Thu Jul 4, 2024 at 9:23 PM AEST, Salil Mehta wrote:
> HI Nick,
>
> Thanks for taking time to review. Please find my replies inline.
>
> >  From: Nicholas Piggin 
> >  Sent: Thursday, July 4, 2024 3:49 AM
> >  To: Salil Mehta ; qemu-devel@nongnu.org;
> >  qemu-...@nongnu.org; m...@redhat.com
> >  
> >  On Fri Jun 14, 2024 at 9:36 AM AEST, Salil Mehta wrote:
> >  > ARM arch does not allow CPUs presence to be changed [1] after kernel
> >  has booted.
> >  > Hence, firmware/ACPI/Qemu must ensure persistent view of the vCPUs  to
> >  > the Guest kernel even when they are not present in the QoM i.e. are
> >  > unplugged or are yet-to-be-plugged
> >  
> >  Do you need arch-independent state for this? If ARM always requires it
> >  then can it be implemented between arm and acpi interface?
>
>
> Yes, we do need as we cannot say if the same constraint applies to other
> architectures as well. Above stated constraint affects how the architecture
> common ACPI CPU code is initialized.

Right, but could it be done with an ACPI property that the arch can
change, or an argument from arch code to an ACPI init routine? Or
even a machine property that ACPI could query.

> >  If not, then perhaps could it be done in the patch that introduces all the
> >  other state?
> >  
> >  > References:
> >  > [1] Check comment 5 in the bugzilla entry
> >  >Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> >  
> >  If I understand correctly (and I don't know ACPI, so it's likely I don't), 
> > that is
> >  and update to ACPI spec to say some bit in ACPI table must remain set
> >  regardless of CPU hotplug state.
>
>
> ARM does not claims anything related to CPU hotplug right now. It simply
> does not exists. The ACPI update is simply reinforcing the existing fact that
> _STA.Present bit in the ACPI spec cannot be changed after system has booted. 
>
> This is  because for ARM arch there are many other initializations which 
> depend
> upon the exact availability of CPU count during boot and they do not expect
> that to change after boot. For example, there are so many per-CPU features
> and the GIC CPU interface etc. which all expect this to be fixed at boot time.
> This is immutable requirement from ARM.
>
>
> >  
> >  Reference links are good, I think it would be nice to add a small summary 
> > in
> >  the changelog too.
>
> sure. I will do.

Thanks. Something like what you wrote above would work.

Thanks,
Nick

Re: [PATCH v4 6/7] tests/plugin/mem: add option to print memory accesses

2024-07-04 Thread Pierrick Bouvier


On 7/2/24 18:56, Xingtao Yao (Fujitsu) wrote:

Tested-by: Xingtao Yao 

one small suggestion:
Keeping the addresses or values of fixed size in output message can improve the 
readability of logs.


Ok, I'll do it for every size.


like:

+case QEMU_PLUGIN_MEM_VALUE_U8:
+g_string_append_printf(out, "0x%"PRIx8, value.data.u8);
+break;

case QEMU_PLUGIN_MEM_VALUE_U8:
 g_string_append_printf(out, "0x02%"PRIx8, value.data.u8);
 break;



-Original Message-
From: qemu-devel-bounces+yaoxt.fnst=fujitsu@nongnu.org
 On Behalf Of
Pierrick Bouvier
Sent: Wednesday, July 3, 2024 2:45 AM
To: qemu-devel@nongnu.org
Cc: Alex Bennée ; Mahmoud Mandour
; Pierrick Bouvier ;
Alexandre Iooss ; Philippe Mathieu-Daudé
; Paolo Bonzini ; Richard Henderson
; Eduardo Habkost 
Subject: [PATCH v4 6/7] tests/plugin/mem: add option to print memory accesses

By using "print-accesses=true" option, mem plugin will now print every
value accessed, with associated size, type (store vs load), symbol,
instruction address and phys/virt address accessed.

Signed-off-by: Pierrick Bouvier 
---
  tests/plugin/mem.c | 69
+-
  1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/tests/plugin/mem.c b/tests/plugin/mem.c
index b650dddcce1..aecbad8e68d 100644
--- a/tests/plugin/mem.c
+++ b/tests/plugin/mem.c
@@ -21,10 +21,15 @@ typedef struct {
  uint64_t io_count;
  } CPUCount;

+typedef struct {
+uint64_t vaddr;
+const char *sym;
+} InsnInfo;
+
  static struct qemu_plugin_scoreboard *counts;
  static qemu_plugin_u64 mem_count;
  static qemu_plugin_u64 io_count;
-static bool do_inline, do_callback;
+static bool do_inline, do_callback, do_print_accesses;
  static bool do_haddr;
  static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;

@@ -60,6 +65,44 @@ static void vcpu_mem(unsigned int cpu_index,
qemu_plugin_meminfo_t meminfo,
  }
  }

+static void print_access(unsigned int cpu_index, qemu_plugin_meminfo_t
meminfo,
+ uint64_t vaddr, void *udata)
+{
+InsnInfo *insn_info = udata;
+unsigned size = 8 << qemu_plugin_mem_size_shift(meminfo);
+const char *type = qemu_plugin_mem_is_store(meminfo) ? "store" : "load";
+qemu_plugin_mem_value value = qemu_plugin_mem_get_value(meminfo);
+uint64_t hwaddr =
+qemu_plugin_hwaddr_phys_addr(qemu_plugin_get_hwaddr(meminfo,
vaddr));
+g_autoptr(GString) out = g_string_new("");
+g_string_printf(out,
+"0x%"PRIx64",%s,0x%"PRIx64",0x%"PRIx64",%d,%s,",
+insn_info->vaddr, insn_info->sym,
+vaddr, hwaddr, size, type);
+switch (value.type) {
+case QEMU_PLUGIN_MEM_VALUE_U8:
+g_string_append_printf(out, "0x%"PRIx8, value.data.u8);
+break;
+case QEMU_PLUGIN_MEM_VALUE_U16:
+g_string_append_printf(out, "0x%"PRIx16, value.data.u16);
+break;
+case QEMU_PLUGIN_MEM_VALUE_U32:
+g_string_append_printf(out, "0x%"PRIx32, value.data.u32);
+break;
+case QEMU_PLUGIN_MEM_VALUE_U64:
+g_string_append_printf(out, "0x%"PRIx64, value.data.u64);
+break;
+case QEMU_PLUGIN_MEM_VALUE_U128:
+g_string_append_printf(out, "0x%.0"PRIx64"%"PRIx64,
+   value.data.u128.high, value.data.u128.low);
+break;
+default:
+g_assert_not_reached();
+}
+g_string_append_printf(out, "\n");
+qemu_plugin_outs(out->str);
+}
+
  static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
  {
  size_t n = qemu_plugin_tb_n_insns(tb);
@@ -79,6 +122,16 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct
qemu_plugin_tb *tb)
   QEMU_PLUGIN_CB_NO_REGS,
   rw, NULL);
  }
+if (do_print_accesses) {
+/* we leak this pointer, to avoid locking to keep track of it */
+InsnInfo *insn_info = g_malloc(sizeof(InsnInfo));
+const char *sym = qemu_plugin_insn_symbol(insn);
+insn_info->sym = sym ? sym : "";
+insn_info->vaddr = qemu_plugin_insn_vaddr(insn);
+qemu_plugin_register_vcpu_mem_cb(insn, print_access,
+ QEMU_PLUGIN_CB_NO_REGS,
+ rw, (void *) insn_info);
+}
  }
  }

@@ -117,6 +170,12 @@ QEMU_PLUGIN_EXPORT int
qemu_plugin_install(qemu_plugin_id_t id,
  fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
  return -1;
  }
+} else if (g_strcmp0(tokens[0], "print-accesses") == 0) {
+if (!qemu_plugin_bool_parse(tokens[0], tokens[1],
+&do_print_accesses)) {
+fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
+return -1;
+}
  } else {
  fprintf(stder

Re: [PATCH v4 6/7] tests/plugin/mem: add option to print memory accesses

2024-07-04 Thread Pierrick Bouvier


On 7/4/24 09:30, Richard Henderson wrote:

On 7/2/24 11:44, Pierrick Bouvier wrote:

+case QEMU_PLUGIN_MEM_VALUE_U128:
+g_string_append_printf(out, "0x%.0"PRIx64"%"PRIx64,
+   value.data.u128.high, value.data.u128.low);


PRIx64 does not pad.
You need %016"PRIx64 for the low value.



Oops, indeed. I'll output all values with fixed width.


Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [PATCH v2 00/10] target/s390x: pc-relative translation

2024-07-04 Thread Richard Henderson


Ping.  It rebases onto master just fine.

r~

On 6/5/24 14:57, Richard Henderson wrote:

v1: 20220906101747.344559-1-richard.hender...@linaro.org

A lot has changed in the 20 months since, including generic
cleanups and splitting out the PER fixes.


r~


Richard Henderson (10):
   target/s390x: Change help_goto_direct to work on displacements
   target/s390x: Introduce gen_psw_addr_disp
   target/s390x: Remove pc argument to pc_to_link_into
   target/s390x: Use gen_psw_addr_disp in pc_to_link_info
   target/s390x: Use gen_psw_addr_disp in save_link_info
   target/s390x: Use deposit in save_link_info
   target/s390x: Use gen_psw_addr_disp in op_sam
   target/s390x: Use ilen instead in branches
   target/s390x: Assert masking of psw.addr in cpu_get_tb_cpu_state
   target/s390x: Enable CF_PCREL

  target/s390x/cpu.c   |  23 +
  target/s390x/tcg/translate.c | 190 +--
  2 files changed, 138 insertions(+), 75 deletions(-)

Re: [PATCH v2 2/3] intel_iommu: make types match

2024-07-04 Thread Michael S. Tsirkin

On Thu, Jul 04, 2024 at 03:12:48PM +, CLEMENT MATHIEU--DRIF wrote:
> From: Clément Mathieu--Drif 
> 
> The 'level' field in vtd_iotlb_key is an unsigned integer.
> We don't need to store level as an int in vtd_lookup_iotlb.
> 
> VTDIOTLBPageInvInfo.mask is used in binary operations with addresses.

this last sentence is a bit opaque. is there a bug ? E.g.
can mask ever get so big it does not fit in u8?

> Signed-off-by: Clément Mathieu--Drif 
> ---
>  hw/i386/intel_iommu.c  | 2 +-
>  hw/i386/intel_iommu_internal.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 37c21a0aec..be0cb39b5c 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -358,7 +358,7 @@ static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState 
> *s, uint16_t source_id,
>  {
>  struct vtd_iotlb_key key;
>  VTDIOTLBEntry *entry;
> -int level;
> +unsigned level;
>  
>  for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
>  key.gfn = vtd_get_iotlb_gfn(addr, level);
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index cbc4030031..5fcbe2744f 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -436,7 +436,7 @@ struct VTDIOTLBPageInvInfo {
>  uint16_t domain_id;
>  uint32_t pasid;
>  uint64_t addr;
> -uint8_t mask;
> +uint64_t mask;
>  };
>  typedef struct VTDIOTLBPageInvInfo VTDIOTLBPageInvInfo;
>  
> -- 
> 2.45.2

Re: [PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS

2024-07-04 Thread Richard Henderson


On 7/4/24 08:18, Richard Henderson wrote:

On 7/4/24 07:50, Ilya Leoshkevich wrote:

On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote:

While looking into Zoltan's attempt to speed up ppc64 DCBZ
(data cache block set to zero), I wondered what AArch64 was
doing differently.  It turned out that Arm is the only user
of tlb_vaddr_to_host.

None of the code sequences in use between AArch64, Power64 and S390X
are 100% safe, with race conditions vs mmap et al, however, AArch64
is the only one that will fail this single threaded test case.  Use
of these new functions fixes the race condition as well, though I
have not yet touched the other guests.

I thought about exposing accel/tcg/user-retaddr.h for direct use
from the targets, but perhaps these wrappers are cleaner.  RFC?


r~


Richard Henderson (2):
   accel/tcg: Introduce memset_ra, memmove_ra
   target/arm: Use memset_ra, memmove_ra in helper-a64.c

  include/exec/cpu_ldst.h    | 40 
  accel/tcg/user-exec.c  | 22 +
  target/arm/tcg/helper-a64.c    | 10 ++--
  tests/tcg/multiarch/memset-fault.c | 77
++
  4 files changed, 144 insertions(+), 5 deletions(-)
  create mode 100644 tests/tcg/multiarch/memset-fault.c


This sounds good to me.

I haven't debugged it, but I wonder why doesn't s390x fail here.
For XC with src == dst, it does access_memset() -> do_access_memset()
-> memset() without setting the RA. And I don't think that anything
around it sets the RA either.


s390x uses probe_access_flags, which verifies the page is mapped and writable, and raises 
the exception when it isn't.  In contrast, for user-only, tlb_vaddr_to_host *only* 
performs the guest -> host address mapping, i.e. (addr + guest_base).


I should clarify: probe_access_flags verifies that the page is mapped *at that moment*, 
but does not take the mmap_lock.  So the race is that the page can be unmapped by another 
thread after probe_access_flags and before the memset completes.



r~

Re: [PATCH v2 7/7] crypto: Add test suite for ECDSA algorithm

2024-07-04 Thread Michael S. Tsirkin

On Thu, Jul 04, 2024 at 11:11:20PM +0200, Giacomo Parmeggiani wrote:
> PING
> Hello Li He, maintainers,
> 
> Any chance to revive this thread? 
> The patch series no longer applies to latest QEMU and it would be a useful 
> feature to have.
> 
> BR
> Giacomo

Not sure why I didn't apply this.
Care rebasing?

-- 
MST

[PATCH RFC] virtio-balloon: make it spec compliant

2024-07-04 Thread Michael S. Tsirkin

Currently, if VIRTIO_BALLOON_F_FREE_PAGE_HINT is off but
VIRTIO_BALLOON_F_REPORTING is on, then the reporting vq
gets number 3 while spec says it's number 4.
It happens to work because the linux virtio pci driver
is *also* out of spec.

To fix:
1. add vq4 as per spec
2. to help out the buggy Linux driver, in the above configuration,
   also create vq3, and handle it exactly as we do vq4.

I think that some clever hack is doable to address the issue
for existing machine types (which would get it in user's hands
sooner), but I'm not 100% sure what, exactly.

This is a simpler, straight-forward approach.

Reported-by: Xuan Zhuo 
Signed-off-by: Michael S. Tsirkin 
---

I don't think I'll stop here, I want to fix exiting machine types,
but sending this here for comparison.
I'll send a Linux patch later.


 include/hw/virtio/virtio-balloon.h | 1 +
 hw/core/machine.c  | 1 +
 hw/virtio/virtio-balloon.c | 5 -
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 5139cf8ab6..d4426404ed 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+bool reporting_vq_is_vq3;
 uint32_t poison_val;
 };
 
diff --git a/hw/core/machine.c b/hw/core/machine.c
index f4cba6496c..353a143b2b 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -39,6 +39,7 @@ GlobalProperty hw_compat_9_0[] = {
 {"scsi-disk-base", "migrate-emulated-scsi-request", "false" },
 {"vfio-pci", "skip-vsc-check", "false" },
 { "virtio-pci", "x-pcie-pm-no-soft-reset", "off" },
+{ "virtio-balloon-device", "x-reporting-vq-is-vq3", "true" },
 };
 const size_t hw_compat_9_0_len = G_N_ELEMENTS(hw_compat_9_0);
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 50da34d6cc..4712bee521 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -892,7 +892,8 @@ static void virtio_balloon_device_realize(DeviceState *dev, 
Error **errp)
 
 if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
 /* Work around Linux driver which is buggy in this configuration */
-if (!virtio_has_feature(s->host_features, 
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+if (!s->reporting_vq_is_vq3 &&
+!virtio_has_feature(s->host_features, 
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 s->free_page_vq = virtio_add_queue(vdev, 32,
virtio_balloon_handle_report);
 }
@@ -1021,6 +1022,8 @@ static Property virtio_balloon_properties[] = {
  */
 DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
  qemu_4_0_config_size, false),
+DEFINE_PROP_BOOL("x-reporting-vq-is-vq3", VirtIOBalloon,
+ reporting_vq_is_vq3, false),
 DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
  IOThread *),
 DEFINE_PROP_END_OF_LIST(),
-- 
MST

Re: [PATCH v2 7/7] crypto: Add test suite for ECDSA algorithm

2024-07-04 Thread Giacomo Parmeggiani

PING
Hello Li He, maintainers,

Any chance to revive this thread? 
The patch series no longer applies to latest QEMU and it would be a useful 
feature to have.

BR
Giacomo

> On 22 Jun 2022, at 11:15, Lei He  wrote:
> 
> 1. add test suite for ecdsa algorithm.
> 2. use qcrypto_akcihper_max_xxx_len to help create buffers in
> 
> Signed-off-by: lei he 
> Reviewed-by: Daniel P. Berrangé 
> ---
> tests/unit/test-crypto-akcipher.c | 338 --
> 1 file changed, 323 insertions(+), 15 deletions(-)
> 
> diff --git a/tests/unit/test-crypto-akcipher.c 
> b/tests/unit/test-crypto-akcipher.c
> index 4f1f4214dd..414387cfb4 100644
> --- a/tests/unit/test-crypto-akcipher.c
> +++ b/tests/unit/test-crypto-akcipher.c
> @@ -314,12 +314,117 @@ static const uint8_t rsa2048_public_key[] = {
>0xed, 0x02, 0x03, 0x01, 0x00, 0x01
> };
> 
> +static const uint8_t ecdsa_p192_public_key[] = {
> +0x04, 0xc4, 0x16, 0xb3, 0xff, 0xac, 0xd5, 0x87,
> +0x98, 0xf7, 0xd9, 0x45, 0xfe, 0xd3, 0x5c, 0x17,
> +0x9d, 0xb2, 0x36, 0x22, 0xcc, 0x07, 0xb3, 0x6d,
> +0x3c, 0x4e, 0x04, 0x5f, 0xeb, 0xb6, 0x52, 0x58,
> +0xfb, 0x36, 0x10, 0x52, 0xb7, 0x01, 0x62, 0x0e,
> +0x94, 0x51, 0x1d, 0xe2, 0xef, 0x10, 0x82, 0x88,
> +0x78,
> +};
> +
> +static const uint8_t ecdsa_p192_private_key[] = {
> +0x30, 0x53, 0x02, 0x01, 0x01, 0x04, 0x18, 0xcb,
> +0xc8, 0x86, 0x0e, 0x66, 0x3c, 0xf7, 0x5a, 0x44,
> +0x13, 0xb8, 0xef, 0xea, 0x1d, 0x7b, 0xa6, 0x1c,
> +0xda, 0xf4, 0x1b, 0xc7, 0x67, 0x6b, 0x35, 0xa1,
> +0x34, 0x03, 0x32, 0x00, 0x04, 0xc4, 0x16, 0xb3,
> +0xff, 0xac, 0xd5, 0x87, 0x98, 0xf7, 0xd9, 0x45,
> +0xfe, 0xd3, 0x5c, 0x17, 0x9d, 0xb2, 0x36, 0x22,
> +0xcc, 0x07, 0xb3, 0x6d, 0x3c, 0x4e, 0x04, 0x5f,
> +0xeb, 0xb6, 0x52, 0x58, 0xfb, 0x36, 0x10, 0x52,
> +0xb7, 0x01, 0x62, 0x0e, 0x94, 0x51, 0x1d, 0xe2,
> +0xef, 0x10, 0x82, 0x88, 0x78,
> +};
> +
> +static const uint8_t ecdsa_p256_private_key[] = {
> +0x30, 0x77, 0x02, 0x01, 0x01, 0x04, 0x20, 0xf6,
> +0x92, 0xdd, 0x29, 0x1c, 0x6e, 0xef, 0xb6, 0xb2,
> +0x73, 0x9f, 0x40, 0x1b, 0xb3, 0x2a, 0x28, 0xd2,
> +0x37, 0xd6, 0x4a, 0x5b, 0xe4, 0x40, 0x4c, 0x6a,
> +0x95, 0x99, 0xfa, 0xf7, 0x92, 0x49, 0xbe, 0xa0,
> +0x0a, 0x06, 0x08, 0x2a, 0x86, 0x48, 0xce, 0x3d,
> +0x03, 0x01, 0x07, 0xa1, 0x44, 0x03, 0x42, 0x00,
> +0x04, 0xed, 0x42, 0x9c, 0x67, 0x79, 0xbe, 0x46,
> +0x83, 0x88, 0x3e, 0x8c, 0xc1, 0x33, 0xf3, 0xc3,
> +0xf6, 0x2c, 0xf3, 0x13, 0x6a, 0x00, 0xc2, 0xc9,
> +0x3e, 0x87, 0x7f, 0x86, 0x39, 0xe6, 0xae, 0xe3,
> +0xb9, 0xba, 0x2f, 0x58, 0x63, 0x32, 0x62, 0x62,
> +0x54, 0x07, 0x27, 0xf9, 0x5a, 0x3a, 0xc7, 0x3a,
> +0x6b, 0x5b, 0xbc, 0x0d, 0x33, 0xba, 0xbb, 0xd4,
> +0xa3, 0xff, 0x4f, 0x9e, 0xdd, 0xf5, 0x59, 0xc0,
> +0xf6,
> +};
> +
> +static const uint8_t ecdsa_p256_public_key[] = {
> +0x04, 0xed, 0x42, 0x9c, 0x67, 0x79, 0xbe, 0x46,
> +0x83, 0x88, 0x3e, 0x8c, 0xc1, 0x33, 0xf3, 0xc3,
> +0xf6, 0x2c, 0xf3, 0x13, 0x6a, 0x00, 0xc2, 0xc9,
> +0x3e, 0x87, 0x7f, 0x86, 0x39, 0xe6, 0xae, 0xe3,
> +0xb9, 0xba, 0x2f, 0x58, 0x63, 0x32, 0x62, 0x62,
> +0x54, 0x07, 0x27, 0xf9, 0x5a, 0x3a, 0xc7, 0x3a,
> +0x6b, 0x5b, 0xbc, 0x0d, 0x33, 0xba, 0xbb, 0xd4,
> +0xa3, 0xff, 0x4f, 0x9e, 0xdd, 0xf5, 0x59, 0xc0,
> +0xf6,
> +};
> +
> +static const uint8_t ecdsa_p384_public_key[] = {
> +0x04, 0xab, 0xd5, 0xf8, 0x87, 0x1d, 0x23, 0x9b,
> +0x26, 0xb9, 0x57, 0x7e, 0x97, 0x78, 0x10, 0xcd,
> +0x13, 0xe3, 0x98, 0x25, 0xa8, 0xd6, 0xab, 0x66,
> +0x35, 0x26, 0x68, 0x8a, 0x0e, 0x49, 0xd9, 0x4a,
> +0x91, 0x7d, 0x6c, 0x94, 0x06, 0x06, 0x99, 0xf1,
> +0x8d, 0x2a, 0x25, 0x8d, 0xf9, 0xbf, 0x40, 0xfa,
> +0xb7, 0xcb, 0xe1, 0x14, 0x22, 0x0a, 0xa7, 0xfb,
> +0x0a, 0xb4, 0x02, 0x05, 0x8b, 0x98, 0xaa, 0x78,
> +0xcd, 0x53, 0x00, 0x1e, 0xd1, 0x79, 0x6a, 0x5f,
> +0x09, 0x01, 0x88, 0xb4, 0xbc, 0x32, 0x62, 0x83,
> +0x92, 0x84, 0x2d, 0xc6, 0xf8, 0xda, 0xc4, 0x7f,
> +0x10, 0xa3, 0x18, 0x1d, 0xae, 0x0d, 0xa4, 0x41,
> +0x9f,
> +};
> +
> +static const uint8_t ecdsa_p384_private_key[] = {
> +0x30, 0x81, 0x9b, 0x02, 0x01, 0x01, 0x04, 0x30,
> +0xb6, 0x04, 0xef, 0xb1, 0x2c, 0x98, 0xdf, 0xcf,
> +0xd4, 0x16, 0x31, 0xd4, 0x69, 0x0c, 0x27, 0x81,
> +0x4a, 0xac, 0x1a, 0x83, 0x3c, 0xe4, 0xef, 0x65,
> +0xe1, 0x7a, 0x6a, 0xc6, 0xd6, 0xf7, 0xea, 0x79,
> +0xbe, 0xf1, 0x00, 0x3c, 0xdf, 0x6e, 0x9d, 0x10,
> +0x22, 0x61, 0x1b, 0x11, 0xcf, 0x49, 0x6e, 0x62,
> +0xa1, 0x64, 0x03, 0x62, 0x00, 0x04, 0xab, 0xd5,
> +0xf8, 0x87, 0x1d, 0x23, 0x9b, 0x26, 0xb9, 0x57,
> +0x7e, 0x97, 0x78, 0x10, 0xcd, 0x13, 0xe3, 0x98,
> +0x25, 0xa8, 0xd6, 0xab, 0x66, 0x35, 0x26, 0x68,
> +0x8a, 0x0e, 0x49, 0xd9, 0x4a, 0x91, 0x7d, 0x6c,
> +0x94, 0x06, 0x06, 0x99, 0xf1, 0x8d, 0x2a, 0x25,
> +0x8d, 0xf9, 0xbf, 0x40, 0xfa, 0xb7, 0xcb, 0xe1,
> +0x14, 0x22, 0x0a, 0xa7, 0xfb, 0x0a, 0xb4, 0x02,
> +0x05, 0x8b, 0x98, 0xaa, 0x78, 0xcd, 0x53, 0x00,
>

Re: [PATCH] hw/ufs: Fix mcq register range determination logic

2024-07-04 Thread Minwoo Im

On 24-07-03 17:54:10, Jeuk Kim wrote:
> The function ufs_is_mcq_reg() only evaluated the range of the
> mcq_op_reg offset, which is defined as a constant.
> Therefore, it was possible for ufs_is_mcq_reg() to return true
> despite ufs device is configured to not support the mcq.
> This could cause ufs_mmio_read()/ufs_mmio_write() to overflow the
> buffer. So fix it.
> 
> Fixes: 5c079578d2e4 ("hw/ufs: Add support MCQ of UFSHCI 4.0")
> Signed-off-by: Jeuk Kim 

Reviewed-by: Minwoo Im

Re: [PATCH 2/2] hw/isa/vt82c686.c: Embed i8259 irq in device state instead of allocating

2024-07-04 Thread Bernhard Beschow




Am 3. Juli 2024 11:13:08 UTC schrieb BALATON Zoltan :
>On Wed, 3 Jul 2024, Bernhard Beschow wrote:
>> Am 3. Juli 2024 00:09:45 UTC schrieb BALATON Zoltan :
>>> On Tue, 2 Jul 2024, Bernhard Beschow wrote:
 Am 2. Juli 2024 18:42:23 UTC schrieb Bernhard Beschow :
> Am 1. Juli 2024 12:58:15 UTC schrieb Peter Maydell 
> :
>> On Sat, 29 Jun 2024 at 21:01, BALATON Zoltan  wrote:
>>> 
>>> To avoid a warning about unfreed qemu_irq embed the i8259 irq in the
>>> device state instead of allocating it.
>>> 
>>> Signed-off-by: BALATON Zoltan 
>>> ---
>>>  hw/isa/vt82c686.c | 7 ---
>>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
>>> index 8582ac0322..834051abeb 100644
>>> --- a/hw/isa/vt82c686.c
>>> +++ b/hw/isa/vt82c686.c
>>> @@ -592,6 +592,8 @@ OBJECT_DECLARE_SIMPLE_TYPE(ViaISAState, VIA_ISA)
>>> 
>>>  struct ViaISAState {
>>>  PCIDevice dev;
>>> +
>>> +IRQState i8259_irq;
>>>  qemu_irq cpu_intr;
>>>  qemu_irq *isa_irqs_in;
>>>  uint16_t irq_state[ISA_NUM_IRQS];
>>> @@ -715,13 +717,12 @@ static void via_isa_realize(PCIDevice *d, Error 
>>> **errp)
>>>  ViaISAState *s = VIA_ISA(d);
>>>  DeviceState *dev = DEVICE(d);
>>>  PCIBus *pci_bus = pci_get_bus(d);
>>> -qemu_irq *isa_irq;
>>>  ISABus *isa_bus;
>>>  int i;
>>> 
>>>  qdev_init_gpio_out(dev, &s->cpu_intr, 1);
>>>  qdev_init_gpio_in_named(dev, via_isa_pirq, "pirq", PCI_NUM_PINS);
>>> -isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
>>> +qemu_init_irq(&s->i8259_irq, via_isa_request_i8259_irq, s, 0);
>>>  isa_bus = isa_bus_new(dev, pci_address_space(d), 
>>> pci_address_space_io(d),
>>>errp);
>> 
>> So if I understand correctly, this IRQ line isn't visible
>> from outside this chip,
> 
> Actally it is, in the form of the INTR pin. Assuming similar naming
>>> 
>>> The INTR pin corresponds to qemu_irq cpu_intr not the i8259_irq.
>>> 
> conventions in vt82xx and piix, one can confirm this by consulting the 
> piix4 datasheet, "Figure 5. Interrupt Controller Block Diagram". 
> Moreover, the pegasos2 schematics (linked in the QEMU documentation) 
> suggest that this pin is actually used there, although not modeled in 
> QEMU.
 
 Well, QEMU does actually wire the intr pin in the pegasos2 board code, 
 except that it isn't a named gpio like in piix4. If we allow this pin to
>>> 
>>> I could make that named to make it clearer, now it's the only output gpio 
>>> so did not name it as usually devices that only have one output don't use 
>>> named gpios for that.
>>> 
 be wired before the south bridge's realize we might be able to eliminate 
 the "intermediate irq forwarder" as Phil used to name it, resulting in 
 less and more efficient code. This solution would basically follow the 
 pattern I outlined under below link.
>>> 
>>> I think the problem here is that i8259 does not provide an output gpio for 
>>> this interrupt that the VT82xx could pass on but instead i8259_init() needs 
>>> a qemu_irq to be passed rhat the i8259 model will set. This seems to be a 
>>> legacy init function so the fix may be to Qdev-ify i8259 and add an output 
>>> irq to it then its users could instantiate and connect its IRQs as usual 
>>> and we don't need to create a qemu_irq to pass it to i8259_init().
>> 
>> I've implemented the approach avoiding the intermediate IRQ forwarders here: 
>> https://github.com/shentok/qemu/commits/upstream/vt82c686-irq/ . I'd send 
>> this series to the list as soon as I resolve some email authentication 
>> issues.
>
>This connects the gpio out before the device is realized. I don't think that's 
>the right fix and confuses all the users of this device as they will need to 
>remember to do this. I think the current interrupt forwarder is OK until i8259 
>is Qdev-ified and solves this within the device. I'm OK with the patch that 
>makes intr named if you can submit just that.

I've now sent a series with naming the gpio as the first patch: 
https://lore.kernel.org/qemu-devel/20240704205854.18537-1-shen...@gmail.com/

Best regards,
Bernhard

>
>Regards,
>BALATON Zoltan

[Stable-9.0.2 07/22] accel/tcg: Fix typo causing tb->page_addr[1] to not be recorded

2024-07-04 Thread Michael Tokarev

From: Anton Johansson 

For TBs crossing page boundaries, the 2nd page will never be
recorded/removed, as the index of the 2nd page is computed from the
address of the 1st page. This is due to a typo, fix it.

Cc: qemu-sta...@nongnu.org
Fixes: deba78709a ("accel/tcg: Always lock pages before translation")
Signed-off-by: Anton Johansson 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Message-Id: <20240612133031.15298-1-a...@rev.ng>
Signed-off-by: Richard Henderson 
(cherry picked from commit 3b279f73fa37bec8d3ba04a15f5153d6491cffaf)
Signed-off-by: Michael Tokarev 

diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index da39a43bd8..653397eca3 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -712,7 +712,7 @@ static void tb_record(TranslationBlock *tb)
 tb_page_addr_t paddr0 = tb_page_addr0(tb);
 tb_page_addr_t paddr1 = tb_page_addr1(tb);
 tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS;
-tb_page_addr_t pindex1 = paddr0 >> TARGET_PAGE_BITS;
+tb_page_addr_t pindex1 = paddr1 >> TARGET_PAGE_BITS;
 
 assert(paddr0 != -1);
 if (unlikely(paddr1 != -1) && pindex0 != pindex1) {
@@ -744,7 +744,7 @@ static void tb_remove(TranslationBlock *tb)
 tb_page_addr_t paddr0 = tb_page_addr0(tb);
 tb_page_addr_t paddr1 = tb_page_addr1(tb);
 tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS;
-tb_page_addr_t pindex1 = paddr0 >> TARGET_PAGE_BITS;
+tb_page_addr_t pindex1 = paddr1 >> TARGET_PAGE_BITS;
 
 assert(paddr0 != -1);
 if (unlikely(paddr1 != -1) && pindex0 != pindex1) {
-- 
2.39.2

[Stable-9.0.2 18/22] qcow2: Don't open data_file with BDRV_O_NO_IO

2024-07-04 Thread Michael Tokarev

From: Kevin Wolf 

One use case for 'qemu-img info' is verifying that untrusted images
don't reference an unwanted external file, be it as a backing file or an
external data file. To make sure that calling 'qemu-img info' can't
already have undesired side effects with a malicious image, just don't
open the data file at all with BDRV_O_NO_IO. If nothing ever tries to do
I/O, we don't need to have it open.

This changes the output of iotests case 061, which used 'qemu-img info'
to show that opening an image with an invalid data file fails. After
this patch, it succeeds. Replace this part of the test with a qemu-io
call, but keep the final 'qemu-img info' to show that the invalid data
file is correctly displayed in the output.

Fixes: CVE-2024-4467
Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Hanna Czenczek 
(cherry picked from commit bd385a5298d7062668e804d73944d52aec9549f1)
Signed-off-by: Michael Tokarev 

diff --git a/block/qcow2.c b/block/qcow2.c
index 956128b409..4c78665bcb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1636,7 +1636,22 @@ qcow2_do_open(BlockDriverState *bs, QDict *options, int 
flags,
 goto fail;
 }
 
-if (open_data_file) {
+if (open_data_file && (flags & BDRV_O_NO_IO)) {
+/*
+ * Don't open the data file for 'qemu-img info' so that it can be used
+ * to verify that an untrusted qcow2 image doesn't refer to external
+ * files.
+ *
+ * Note: This still makes has_data_file() return true.
+ */
+if (s->incompatible_features & QCOW2_INCOMPAT_DATA_FILE) {
+s->data_file = NULL;
+} else {
+s->data_file = bs->file;
+}
+qdict_extract_subqdict(options, NULL, "data-file.");
+qdict_del(options, "data-file");
+} else if (open_data_file) {
 /* Open external data file */
 bdrv_graph_co_rdunlock();
 s->data_file = bdrv_co_open_child(NULL, options, "data-file", bs,
diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061
index 53c7d428e3..b71ac097d1 100755
--- a/tests/qemu-iotests/061
+++ b/tests/qemu-iotests/061
@@ -326,12 +326,14 @@ $QEMU_IMG amend -o "data_file=foo" "$TEST_IMG"
 echo
 _make_test_img -o "compat=1.1,data_file=$TEST_IMG.data" 64M
 $QEMU_IMG amend -o "data_file=foo" "$TEST_IMG"
-_img_info --format-specific
+$QEMU_IO -c "read 0 4k" "$TEST_IMG" 2>&1 | _filter_testdir | _filter_imgfmt
+$QEMU_IO -c "open -o 
data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG" -c "read 0 4k" | 
_filter_qemu_io
 TEST_IMG="data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG" _img_info 
--format-specific --image-opts
 
 echo
 $QEMU_IMG amend -o "data_file=" --image-opts 
"data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG"
-_img_info --format-specific
+$QEMU_IO -c "read 0 4k" "$TEST_IMG" 2>&1 | _filter_testdir | _filter_imgfmt
+$QEMU_IO -c "open -o 
data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG" -c "read 0 4k" | 
_filter_qemu_io
 TEST_IMG="data-file.filename=$TEST_IMG.data,file.filename=$TEST_IMG" _img_info 
--format-specific --image-opts
 
 echo
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 139fc68177..24c33add7c 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -545,7 +545,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 qemu-img: data-file can only be set for images that use an external data file
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
data_file=TEST_DIR/t.IMGFMT.data
-qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Could not open 'foo': No such 
file or directory
+qemu-io: can't open device TEST_DIR/t.IMGFMT: Could not open 'foo': No such 
file or directory
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 64 MiB (67108864 bytes)
@@ -560,7 +562,9 @@ Format specific information:
 corrupt: false
 extended l2: false
 
-qemu-img: Could not open 'TEST_DIR/t.IMGFMT': 'data-file' is required for this 
image
+qemu-io: can't open device TEST_DIR/t.IMGFMT: 'data-file' is required for this 
image
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 64 MiB (67108864 bytes)
-- 
2.39.2

[Stable-9.0.2 17/22] tests: add testing of parameter=1 for SMP topology

2024-07-04 Thread Michael Tokarev

From: Daniel P. Berrangé 

Validate that it is possible to pass 'parameter=1' for any SMP topology
parameter, since unsupported parameters are implicitly considered to
always have a value of 1.

Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Zhao Liu 
Reviewed-by: Ján Tomko 
Message-ID: <20240513123358.612355-3-berra...@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé 
(cherry picked from commit e68dcbb07923df0886802727edc3b21a10b0d342)
Signed-off-by: Michael Tokarev 

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index 56165e6644..9fdba24fce 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -330,6 +330,14 @@ static const struct SMPTestData data_generic_valid[] = {
 .config = SMP_CONFIG_GENERIC(T, 8, T, 2, T, 4, T, 2, T, 16),
 .expect_prefer_sockets = CPU_TOPOLOGY_GENERIC(8, 2, 4, 2, 16),
 .expect_prefer_cores   = CPU_TOPOLOGY_GENERIC(8, 2, 4, 2, 16),
+}, {
+/*
+ * Unsupported parameters are always allowed to be set to '1'
+ * config: -smp 
8,books=1,drawers=1,sockets=2,modules=1,dies=1,cores=2,threads=2,maxcpus=8
+ * expect: cpus=8,sockets=2,cores=2,threads=2,maxcpus=8 */
+.config = SMP_CONFIG_WITH_FULL_TOPO(8, 1, 1, 2, 1, 1, 2, 2, 8),
+.expect_prefer_sockets = CPU_TOPOLOGY_GENERIC(8, 2, 2, 2, 8),
+.expect_prefer_cores   = CPU_TOPOLOGY_GENERIC(8, 2, 2, 2, 8),
 },
 };
 
-- 
2.39.2

[Stable-9.0.2 05/22] hw/audio/virtio-snd: Always use little endian audio format

2024-07-04 Thread Michael Tokarev

From: Philippe Mathieu-Daudé 

The VIRTIO Sound Device conforms with the Virtio spec v1.2,
thus only use little endianness.

Remove the suspicious target_words_bigendian() noticed during
code review.

Cc: qemu-sta...@nongnu.org
Fixes: eb9ad377bb ("virtio-sound: handle control messages and streams")
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael S. Tsirkin 
Message-Id: <20240422211830.25606-1-phi...@linaro.org>
(cherry picked from commit a276ec8e2632c9015d0f9b4e47194e4e91dfa8bb)
Signed-off-by: Michael Tokarev 

diff --git a/hw/audio/virtio-snd.c b/hw/audio/virtio-snd.c
index c80b58bf5d..4a56c00ec9 100644
--- a/hw/audio/virtio-snd.c
+++ b/hw/audio/virtio-snd.c
@@ -401,7 +401,7 @@ static void virtio_snd_get_qemu_audsettings(audsettings *as,
 as->nchannels = MIN(AUDIO_MAX_CHANNELS, params->channels);
 as->fmt = virtio_snd_get_qemu_format(params->format);
 as->freq = virtio_snd_get_qemu_freq(params->rate);
-as->endianness = target_words_bigendian() ? 1 : 0;
+as->endianness = 0; /* Conforming to VIRTIO 1.0: always little endian. */
 }

 /*
-- 
2.39.2

[Stable-9.0.2 03/22] ui/gtk: Draw guest frame at refresh cycle

2024-07-04 Thread Michael Tokarev

From: Dongwon Kim 

Draw routine needs to be manually invoked in the next refresh
if there is a scanout blob from the guest. This is to prevent
a situation where there is a scheduled draw event but it won't
happen bacause the window is currently in inactive state
(minimized or tabified). If draw is not done for a long time,
gl_block timeout and/or fence timeout (on the guest) will happen
eventually.

v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c

Suggested-by: Vivek Kasireddy 
Cc: Gerd Hoffmann 
Cc: Marc-André Lureau 
Cc: Daniel P. Berrangé 
Signed-off-by: Dongwon Kim 
Acked-by: Marc-André Lureau 
Message-Id: <20240426225059.3871283-1-dongwon@intel.com>
(cherry picked from commit 77bf310084dad38b3a2badf01766c659056f1cf2)
Signed-off-by: Michael Tokarev 

diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 955234429d..bceeeb0352 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -150,6 +150,7 @@ void gd_egl_refresh(DisplayChangeListener *dcl)
 vc, vc->window ? vc->window : vc->gfx.drawing_area);
 
 if (vc->gfx.guest_fb.dmabuf && vc->gfx.guest_fb.dmabuf->draw_submitted) {
+gd_egl_draw(vc);
 return;
 }
 
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index 7fffd0544e..b490727402 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -126,6 +126,7 @@ void gd_gl_area_refresh(DisplayChangeListener *dcl)
 gd_update_monitor_refresh_rate(vc, vc->window ? vc->window : 
vc->gfx.drawing_area);
 
 if (vc->gfx.guest_fb.dmabuf && vc->gfx.guest_fb.dmabuf->draw_submitted) {
+gd_gl_area_draw(vc);
 return;
 }
 
-- 
2.39.2

[Stable-9.0.2 15/22] target/arm: Fix FJCVTZS vs flush-to-zero

2024-07-04 Thread Michael Tokarev

From: Richard Henderson 

Input denormals cause the Javascript inexact bit
(output to Z) to be set.

Cc: qemu-sta...@nongnu.org
Fixes: 6c1f6f2733a ("target/arm: Implement ARMv8.3-JSConv")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2375
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240625183536.1672454-4-richard.hender...@linaro.org
[PMM: fixed hardcoded tab in test case]
Signed-off-by: Peter Maydell 
(cherry picked from commit 7619129f0d4a14d918227c5c47ad7433662e9ccc)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 3e5e37abbe..ff59bc5522 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -1121,8 +1121,8 @@ const FloatRoundMode arm_rmode_to_sf_map[] = {
 uint64_t HELPER(fjcvtzs)(float64 value, void *vstatus)
 {
 float_status *status = vstatus;
-uint32_t inexact, frac;
-uint32_t e_old, e_new;
+uint32_t frac, e_old, e_new;
+bool inexact;
 
 e_old = get_float_exception_flags(status);
 set_float_exception_flags(0, status);
@@ -1130,13 +1130,13 @@ uint64_t HELPER(fjcvtzs)(float64 value, void *vstatus)
 e_new = get_float_exception_flags(status);
 set_float_exception_flags(e_old | e_new, status);
 
-if (value == float64_chs(float64_zero)) {
-/* While not inexact for IEEE FP, -0.0 is inexact for JavaScript. */
-inexact = 1;
-} else {
-/* Normal inexact or overflow or NaN */
-inexact = e_new & (float_flag_inexact | float_flag_invalid);
-}
+/* Normal inexact, denormal with flush-to-zero, or overflow or NaN */
+inexact = e_new & (float_flag_inexact |
+   float_flag_input_denormal |
+   float_flag_invalid);
+
+/* While not inexact for IEEE FP, -0.0 is inexact for JavaScript. */
+inexact |= value == float64_chs(float64_zero);
 
 /* Pack the result and the env->ZF representation of Z together.  */
 return deposit64(frac, 32, 32, inexact);
diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index 70d728ae9a..4ecbca6a41 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -41,8 +41,9 @@ endif
 
 # Pauth Tests
 ifneq ($(CROSS_CC_HAS_ARMV8_3),)
-AARCH64_TESTS += pauth-1 pauth-2 pauth-4 pauth-5
+AARCH64_TESTS += pauth-1 pauth-2 pauth-4 pauth-5 test-2375
 pauth-%: CFLAGS += -march=armv8.3-a
+test-2375: CFLAGS += -march=armv8.3-a
 run-pauth-1: QEMU_OPTS += -cpu max
 run-pauth-2: QEMU_OPTS += -cpu max
 # Choose a cpu with FEAT_Pauth but without FEAT_FPAC for pauth-[45].
diff --git a/tests/tcg/aarch64/test-2375.c b/tests/tcg/aarch64/test-2375.c
new file mode 100644
index 00..84c7e7de71
--- /dev/null
+++ b/tests/tcg/aarch64/test-2375.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright (c) 2024 Linaro Ltd */
+/* See https://gitlab.com/qemu-project/qemu/-/issues/2375 */
+
+#include 
+
+int main(void)
+{
+   int r, z;
+
+   asm("msr fpcr, %2\n\t"
+   "fjcvtzs %w0, %d3\n\t"
+   "cset %1, eq"
+   : "=r"(r), "=r"(z)
+   : "r"(0x0100L),  /* FZ = 1 */
+ "w"(0xfcff00L));   /* denormal */
+
+assert(r == 0);
+assert(z == 0);
+return 0;
+}
-- 
2.39.2

[Stable-9.0.2 08/22] linux-user: Make TARGET_NR_setgroups affect only the current thread

2024-07-04 Thread Michael Tokarev

From: Ilya Leoshkevich 

Like TARGET_NR_setuid, TARGET_NR_setgroups should affect only the
calling thread, and not the entire process. Therefore, implement it
using a syscall, and not a libc call.

Cc: qemu-sta...@nongnu.org
Fixes: 19b84f3c35d7 ("added setgroups and getgroups syscalls")
Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20240614154710.1078766-1-...@linux.ibm.com>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
(cherry picked from commit 54b27921026df384f67df86f04c39539df375c60)
Signed-off-by: Michael Tokarev 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 59fb3e911f..2edbd1ef15 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7210,11 +7210,17 @@ static inline int tswapid(int id)
 #else
 #define __NR_sys_setresgid __NR_setresgid
 #endif
+#ifdef __NR_setgroups32
+#define __NR_sys_setgroups __NR_setgroups32
+#else
+#define __NR_sys_setgroups __NR_setgroups
+#endif
 
 _syscall1(int, sys_setuid, uid_t, uid)
 _syscall1(int, sys_setgid, gid_t, gid)
 _syscall3(int, sys_setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
 _syscall3(int, sys_setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
+_syscall2(int, sys_setgroups, int, size, gid_t *, grouplist)
 
 void syscall_init(void)
 {
@@ -11892,7 +11898,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 unlock_user(target_grouplist, arg2,
 gidsetsize * sizeof(target_id));
 }
-return get_errno(setgroups(gidsetsize, grouplist));
+return get_errno(sys_setgroups(gidsetsize, grouplist));
 }
 case TARGET_NR_fchown:
 return get_errno(fchown(arg1, low2highuid(arg2), low2highgid(arg3)));
@@ -12228,7 +12234,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 }
 unlock_user(target_grouplist, arg2, 0);
 }
-return get_errno(setgroups(gidsetsize, grouplist));
+return get_errno(sys_setgroups(gidsetsize, grouplist));
 }
 #endif
 #ifdef TARGET_NR_fchown32
-- 
2.39.2

[Stable-9.0.2 12/22] tests: Update our CI to use CentOS Stream 9 instead of 8

2024-07-04 Thread Michael Tokarev

From: Thomas Huth 

RHEL 9 (and thus also the derivatives) have been available since two
years now, so according to QEMU's support policy, we can drop the active
support for the previous major version 8 now.

Another reason for doing this is that Centos Stream 8 will go EOL soon:

https://blog.centos.org/2023/04/end-dates-are-coming-for-centos-stream-8-and-centos-linux-7/

  "After May 31, 2024, CentOS Stream 8 will be archived
   and no further updates will be provided."

Thus upgrade our CentOS Stream container to major version 9 now.

Reviewed-by: Daniel P. Berrangé 
Message-ID: <20240418101056.302103-5-th...@redhat.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit 641b1efe01b2dd6e7ac92f23d392dcee73508746)
Signed-off-by: Michael Tokarev 

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 92e65bb78e..8440bc8ef6 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -158,9 +158,9 @@ build-system-centos:
 - .native_build_job_template
 - .native_build_artifact_template
   needs:
-job: amd64-centos8-container
+job: amd64-centos9-container
   variables:
-IMAGE: centos8
+IMAGE: centos9
 CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-vfio-user-server
   --enable-modules --enable-trace-backends=dtrace --enable-docs
 TARGETS: ppc64-softmmu or1k-softmmu s390x-softmmu
@@ -242,7 +242,7 @@ check-system-centos:
 - job: build-system-centos
   artifacts: true
   variables:
-IMAGE: centos8
+IMAGE: centos9
 MAKE_CHECK_ARGS: check
 
 avocado-system-centos:
@@ -251,7 +251,7 @@ avocado-system-centos:
 - job: build-system-centos
   artifacts: true
   variables:
-IMAGE: centos8
+IMAGE: centos9
 MAKE_CHECK_ARGS: check-avocado
 AVOCADO_TAGS: arch:ppc64 arch:or1k arch:s390x arch:x86_64 arch:rx
   arch:sh4 arch:nios2
@@ -327,9 +327,9 @@ avocado-system-flaky:
 build-tcg-disabled:
   extends: .native_build_job_template
   needs:
-job: amd64-centos8-container
+job: amd64-centos9-container
   variables:
-IMAGE: centos8
+IMAGE: centos9
   script:
 - mkdir build
 - cd build
@@ -654,9 +654,9 @@ build-tci:
 build-without-defaults:
   extends: .native_build_job_template
   needs:
-job: amd64-centos8-container
+job: amd64-centos9-container
   variables:
-IMAGE: centos8
+IMAGE: centos9
 CONFIGURE_ARGS:
   --without-default-devices
   --without-default-features
diff --git a/.gitlab-ci.d/container-core.yml b/.gitlab-ci.d/container-core.yml
index 08f8450fa1..5459447676 100644
--- a/.gitlab-ci.d/container-core.yml
+++ b/.gitlab-ci.d/container-core.yml
@@ -1,10 +1,10 @@
 include:
   - local: '/.gitlab-ci.d/container-template.yml'
 
-amd64-centos8-container:
+amd64-centos9-container:
   extends: .container_job_template
   variables:
-NAME: centos8
+NAME: centos9
 
 amd64-fedora-container:
   extends: .container_job_template
diff --git a/tests/docker/dockerfiles/centos8.docker 
b/tests/docker/dockerfiles/centos9.docker
similarity index 82%
rename from tests/docker/dockerfiles/centos8.docker
rename to tests/docker/dockerfiles/centos9.docker
index d97c30e96a..9fc9b27eb7 100644
--- a/tests/docker/dockerfiles/centos8.docker
+++ b/tests/docker/dockerfiles/centos9.docker
@@ -1,15 +1,14 @@
 # THIS FILE WAS AUTO-GENERATED
 #
-#  $ lcitool dockerfile --layers all centos-stream-8 qemu
+#  $ lcitool dockerfile --layers all centos-stream-9 qemu
 #
 # https://gitlab.com/libvirt/libvirt-ci
 
-FROM quay.io/centos/centos:stream8
+FROM quay.io/centos/centos:stream9
 
 RUN dnf distro-sync -y && \
 dnf install 'dnf-command(config-manager)' -y && \
-dnf config-manager --set-enabled -y powertools && \
-dnf install -y centos-release-advanced-virtualization && \
+dnf config-manager --set-enabled -y crb && \
 dnf install -y epel-release && \
 dnf install -y epel-next-release && \
 dnf install -y \
@@ -42,7 +41,6 @@ RUN dnf distro-sync -y && \
 glib2-static \
 glibc-langpack-en \
 glibc-static \
-glusterfs-api-devel \
 gnutls-devel \
 gtk3-devel \
 hostname \
@@ -82,6 +80,7 @@ RUN dnf distro-sync -y && \
 lzo-devel \
 make \
 mesa-libgbm-devel \
+meson \
 mtools \
 ncurses-devel \
 nettle-devel \
@@ -95,25 +94,25 @@ RUN dnf distro-sync -y && \
 pixman-devel \
 pkgconfig \
 pulseaudio-libs-devel \
-python38 \
-python38-PyYAML \
-python38-numpy \
-python38-pip \
-python38-setuptools \
-python38-wheel \
+python3 \
+python3-PyYAML \
+python3-numpy \
+python3-pillow \
+python3-pip \
+python3-sphinx \
+python3-sphinx_rtd_theme \
+python3-tomli \
 rdma-core-devel \
 sed \
 snappy-devel \
 socat \
 spice-protocol \
-spice-server-devel \
 swtpm \

[Stable-9.0.2 19/22] iotests/244: Don't store data-file with protocol in image

2024-07-04 Thread Michael Tokarev

From: Kevin Wolf 

We want to disable filename parsing for data files because it's too easy
to abuse in malicious image files. Make the test ready for the change by
passing the data file explicitly in command line options.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Hanna Czenczek 
(cherry picked from commit 2eb42a728d27a43fdcad5f37d3f65706ce6deba5)
Signed-off-by: Michael Tokarev 

diff --git a/tests/qemu-iotests/244 b/tests/qemu-iotests/244
index 3e61fa25bb..bb9cc6512f 100755
--- a/tests/qemu-iotests/244
+++ b/tests/qemu-iotests/244
@@ -215,9 +215,22 @@ $QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n -C 
"$TEST_IMG.src" "$TEST_IMG"
 $QEMU_IMG compare -f $IMGFMT -F $IMGFMT "$TEST_IMG.src" "$TEST_IMG"
 
 # blkdebug doesn't support copy offloading, so this tests the error path
-$QEMU_IMG amend -f $IMGFMT -o "data_file=blkdebug::$TEST_IMG.data" "$TEST_IMG"
-$QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n -C "$TEST_IMG.src" "$TEST_IMG"
-$QEMU_IMG compare -f $IMGFMT -F $IMGFMT "$TEST_IMG.src" "$TEST_IMG"
+test_img_with_blkdebug="json:{
+'driver': 'qcow2',
+'file': {
+'driver': 'file',
+'filename': '$TEST_IMG'
+},
+'data-file': {
+'driver': 'blkdebug',
+'image': {
+'driver': 'file',
+'filename': '$TEST_IMG.data'
+}
+}
+}"
+$QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n -C "$TEST_IMG.src" 
"$test_img_with_blkdebug"
+$QEMU_IMG compare -f $IMGFMT -F $IMGFMT "$TEST_IMG.src" 
"$test_img_with_blkdebug"
 
 echo
 echo "=== Flushing should flush the data file ==="
-- 
2.39.2

[Stable-9.0.2 21/22] block: Parse filenames only when explicitly requested

2024-07-04 Thread Michael Tokarev

From: Kevin Wolf 

When handling image filenames from legacy options such as -drive or from
tools, these filenames are parsed for protocol prefixes, including for
the json:{} pseudo-protocol.

This behaviour is intended for filenames that come directly from the
command line and for backing files, which may come from the image file
itself. Higher level management tools generally take care to verify that
untrusted images don't contain a bad (or any) backing file reference;
'qemu-img info' is a suitable tool for this.

However, for other files that can be referenced in images, such as
qcow2 data files or VMDK extents, the string from the image file is
usually not verified by management tools - and 'qemu-img info' wouldn't
be suitable because in contrast to backing files, it already opens these
other referenced files. So here the string should be interpreted as a
literal local filename. More complex configurations need to be specified
explicitly on the command line or in QMP.

This patch changes bdrv_open_inherit() so that it only parses filenames
if a new parameter parse_filename is true. It is set for the top level
in bdrv_open(), for the file child and for the backing file child. All
other callers pass false and disable filename parsing this way.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Hanna Czenczek 
(cherry picked from commit 7ead946998610657d38d1a505d5f25300d4ca613)
Signed-off-by: Michael Tokarev 

diff --git a/block.c b/block.c
index 468cf5e67d..50bdd197b7 100644
--- a/block.c
+++ b/block.c
@@ -86,6 +86,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
BlockDriverState *parent,
const BdrvChildClass *child_class,
BdrvChildRole child_role,
+   bool parse_filename,
Error **errp);
 
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
@@ -2058,7 +2059,8 @@ static void parse_json_protocol(QDict *options, const 
char **pfilename,
  * block driver has been specified explicitly.
  */
 static int bdrv_fill_options(QDict **options, const char *filename,
- int *flags, Error **errp)
+ int *flags, bool allow_parse_filename,
+ Error **errp)
 {
 const char *drvname;
 bool protocol = *flags & BDRV_O_PROTOCOL;
@@ -2100,7 +2102,7 @@ static int bdrv_fill_options(QDict **options, const char 
*filename,
 if (protocol && filename) {
 if (!qdict_haskey(*options, "filename")) {
 qdict_put_str(*options, "filename", filename);
-parse_filename = true;
+parse_filename = allow_parse_filename;
 } else {
 error_setg(errp, "Can't specify 'file' and 'filename' options at "
  "the same time");
@@ -3663,7 +3665,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 }
 
 backing_hd = bdrv_open_inherit(backing_filename, reference, options, 0, bs,
-   &child_of_bds, bdrv_backing_role(bs), errp);
+   &child_of_bds, bdrv_backing_role(bs), true,
+   errp);
 if (!backing_hd) {
 bs->open_flags |= BDRV_O_NO_BACKING;
 error_prepend(errp, "Could not open backing file: ");
@@ -3697,7 +3700,8 @@ free_exit:
 static BlockDriverState *
 bdrv_open_child_bs(const char *filename, QDict *options, const char *bdref_key,
BlockDriverState *parent, const BdrvChildClass *child_class,
-   BdrvChildRole child_role, bool allow_none, Error **errp)
+   BdrvChildRole child_role, bool allow_none,
+   bool parse_filename, Error **errp)
 {
 BlockDriverState *bs = NULL;
 QDict *image_options;
@@ -3728,7 +3732,8 @@ bdrv_open_child_bs(const char *filename, QDict *options, 
const char *bdref_key,
 }
 
 bs = bdrv_open_inherit(filename, reference, image_options, 0,
-   parent, child_class, child_role, errp);
+   parent, child_class, child_role, parse_filename,
+   errp);
 if (!bs) {
 goto done;
 }
@@ -3738,6 +3743,33 @@ done:
 return bs;
 }
 
+static BdrvChild *bdrv_open_child_common(const char *filename,
+ QDict *options, const char *bdref_key,
+ BlockDriverState *parent,
+ const BdrvChildClass *child_class,
+ BdrvChildRole child_role,
+ bool allow_none, bool parse_filename,
+ Error **errp)
+{
+BlockDriverState *bs;

[Stable-9.0.2 22/22] tcg/optimize: Fix TCG_COND_TST* simplification of setcond2

2024-07-04 Thread Michael Tokarev

From: Richard Henderson 

Argument ordering for setcond2 is:

  output, a_low, a_high, b_low, b_high, cond

The test is supposed to be against b_low, not a_high.

Cc: qemu-sta...@nongnu.org
Fixes: ceb9ee06b71 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2413
Signed-off-by: Richard Henderson 
Tested-by: Alex Bennée 
Message-Id: <20240701024623.1265028-1-richard.hender...@linaro.org>
(cherry picked from commit a71d9dfbf63db42d6e6ae87fc112d1f5502183bd)
Signed-off-by: Michael Tokarev 

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2e9e5725a9..8c49229d6f 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -2274,7 +2274,7 @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
 
 case TCG_COND_TSTEQ:
 case TCG_COND_TSTNE:
-if (arg_is_const_val(op->args[2], 0)) {
+if (arg_is_const_val(op->args[3], 0)) {
 goto do_setcond_high;
 }
 if (arg_is_const_val(op->args[4], 0)) {
diff --git a/tests/tcg/x86_64/Makefile.target b/tests/tcg/x86_64/Makefile.target
index e64aab1b81..1d427cdc2c 100644
--- a/tests/tcg/x86_64/Makefile.target
+++ b/tests/tcg/x86_64/Makefile.target
@@ -8,6 +8,8 @@
 
 include $(SRC_PATH)/tests/tcg/i386/Makefile.target
 
+X86_64_TESTS += test-2413
+
 ifeq ($(filter %-linux-user, $(TARGET)),$(TARGET))
 X86_64_TESTS += vsyscall
 X86_64_TESTS += noexec
diff --git a/tests/tcg/x86_64/test-2413.c b/tests/tcg/x86_64/test-2413.c
new file mode 100644
index 00..456e5332fc
--- /dev/null
+++ b/tests/tcg/x86_64/test-2413.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright 2024 Linaro, Ltd. */
+/* See https://gitlab.com/qemu-project/qemu/-/issues/2413 */
+
+#include 
+
+void test(unsigned long *a, unsigned long *d, unsigned long c)
+{
+asm("xorl %%eax, %%eax\n\t"
+"xorl %%edx, %%edx\n\t"
+"testb $0x20, %%cl\n\t"
+"sete %%al\n\t"
+"setne %%dl\n\t"
+"shll %%cl, %%eax\n\t"
+"shll %%cl, %%edx\n\t"
+: "=a"(*a), "=d"(*d)
+: "c"(c));
+}
+
+int main(void)
+{
+unsigned long a, c, d;
+
+for (c = 0; c < 64; c++) {
+test(&a, &d, c);
+assert(a == (c & 0x20 ? 0 : 1u << (c & 0x1f)));
+assert(d == (c & 0x20 ? 1u << (c & 0x1f) : 0));
+}
+return 0;
+}
-- 
2.39.2

[Stable-9.0.2 13/22] i386/cpu: fixup number of addressable IDs for processor cores in the physical package

2024-07-04 Thread Michael Tokarev

From: Chuang Xu 

When QEMU is started with:
-cpu host,host-cache-info=on,l3-cache=off \
-smp 2,sockets=1,dies=1,cores=1,threads=2
Guest can't acquire maximum number of addressable IDs for processor cores in
the physical package from CPUID[04H].

When creating a CPU topology of 1 core per package, host-cache-info only
uses the Host's addressable core IDs field (CPUID.04H.EAX[bits 31-26]),
resulting in a conflict (on the multicore Host) between the Guest core
topology information in this field and the Guest's actual cores number.

Fix it by removing the unnecessary condition to cover 1 core per package
case. This is safe because cores_per_pkg will not be 0 and will be at
least 1.

Fixes: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical 
processors sharing cache")
Signed-off-by: Guixiong Wei 
Signed-off-by: Yipeng Yin 
Signed-off-by: Chuang Xu 
Reviewed-by: Zhao Liu 
Message-ID: <20240611032314.64076-1-xuchuangxc...@bytedance.com>
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 903916f0a017fe4b7789f1c6c6982333a5a71876)
Signed-off-by: Michael Tokarev 
(Mjt: fixup for 9.0 due to other changes in this area past 9.0)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e693f8ca9a..02a2da04a7 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6097,10 +6097,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 if (*eax & 31) {
 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
-if (cs->nr_cores > 1) {
-*eax &= ~0xFC00;
-*eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
-}
+*eax &= ~0xFC00;
+*eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
 if (host_vcpus_per_cache > vcpus_per_socket) {
 *eax &= ~0x3FFC000;
 *eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
-- 
2.39.2

[Stable-9.0.2 14/22] target/arm: Fix VCMLA Dd, Dn, Dm[idx]

2024-07-04 Thread Michael Tokarev

From: Richard Henderson 

The inner loop, bounded by eltspersegment, must not be
larger than the outer loop, bounded by elements.

Cc: qemu-sta...@nongnu.org
Fixes: 18fc2405781 ("target/arm: Implement SVE fp complex multiply add 
(indexed)")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2376
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240625183536.1672454-2-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
(cherry picked from commit 76bccf3cb9d9383da0128bbc6d1300cddbe3ae8f)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 1f93510b85..cc7cab338c 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -843,7 +843,7 @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, 
void *va,
 intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
 uint32_t neg_real = flip ^ neg_imag;
 intptr_t elements = opr_sz / sizeof(float16);
-intptr_t eltspersegment = 16 / sizeof(float16);
+intptr_t eltspersegment = MIN(16 / sizeof(float16), elements);
 intptr_t i, j;
 
 /* Shift boolean to the sign bit so we can xor to negate.  */
@@ -905,7 +905,7 @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, 
void *va,
 intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
 uint32_t neg_real = flip ^ neg_imag;
 intptr_t elements = opr_sz / sizeof(float32);
-intptr_t eltspersegment = 16 / sizeof(float32);
+intptr_t eltspersegment = MIN(16 / sizeof(float32), elements);
 intptr_t i, j;
 
 /* Shift boolean to the sign bit so we can xor to negate.  */
-- 
2.39.2

[Stable-9.0.2 06/22] stdvga: fix screen blanking

2024-07-04 Thread Michael Tokarev

From: Gerd Hoffmann 

In case the display surface uses a shared buffer (i.e. uses vga vram
directly instead of a shadow) go unshare the buffer before clearing it.

This avoids vga memory corruption, which in turn fixes unblanking not
working properly with X11.

Cc: qemu-sta...@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2067
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Marc-André Lureau 
Message-ID: <20240605131444.797896-2-kra...@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé 
(cherry picked from commit b1cf266c82cb1211ee2785f1813a6a3f3e693390)
Signed-off-by: Michael Tokarev 

diff --git a/hw/display/vga.c b/hw/display/vga.c
index 77f59e8c11..40adeb3e2f 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -1772,6 +1772,13 @@ static void vga_draw_blank(VGACommonState *s, int 
full_update)
 if (s->last_scr_width <= 0 || s->last_scr_height <= 0)
 return;
 
+if (is_buffer_shared(surface)) {
+/* unshare buffer, otherwise the blanking corrupts vga vram */
+surface = qemu_create_displaysurface(s->last_scr_width,
+ s->last_scr_height);
+dpy_gfx_replace_surface(s->con, surface);
+}
+
 w = s->last_scr_width * surface_bytes_per_pixel(surface);
 d = surface_data(surface);
 for(i = 0; i < s->last_scr_height; i++) {
-- 
2.39.2

[Stable-9.0.2 20/22] iotests/270: Don't store data-file with json: prefix in image

2024-07-04 Thread Michael Tokarev

From: Kevin Wolf 

We want to disable filename parsing for data files because it's too easy
to abuse in malicious image files. Make the test ready for the change by
passing the data file explicitly in command line options.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Hanna Czenczek 
(cherry picked from commit 7e1110664ecbc4826f3c978ccb06b6c1bce823e6)
Signed-off-by: Michael Tokarev 

diff --git a/tests/qemu-iotests/270 b/tests/qemu-iotests/270
index 74352342db..c37b674aa2 100755
--- a/tests/qemu-iotests/270
+++ b/tests/qemu-iotests/270
@@ -60,8 +60,16 @@ _make_test_img -o cluster_size=2M,data_file="$TEST_IMG.orig" 
\
 # "write" 2G of data without using any space.
 # (qemu-img create does not like it, though, because null-co does not
 # support image creation.)
-$QEMU_IMG amend -o data_file="json:{'driver':'null-co',,'size':'4294967296'}" \
-"$TEST_IMG"
+test_img_with_null_data="json:{
+'driver': '$IMGFMT',
+'file': {
+'filename': '$TEST_IMG'
+},
+'data-file': {
+'driver': 'null-co',
+'size':'4294967296'
+}
+}"
 
 # This gives us a range of:
 #   2^31 - 512 + 768 - 1 = 2^31 + 255 > 2^31
@@ -74,7 +82,7 @@ $QEMU_IMG amend -o 
data_file="json:{'driver':'null-co',,'size':'4294967296'}" \
 # on L2 boundaries, we need large L2 tables; hence the cluster size of
 # 2 MB.  (Anything from 256 kB should work, though, because then one L2
 # table covers 8 GB.)
-$QEMU_IO -c "write 768 $((2 ** 31 - 512))" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write 768 $((2 ** 31 - 512))" "$test_img_with_null_data" | 
_filter_qemu_io
 
 _check_test_img
 
-- 
2.39.2

[Stable-9.0.2 16/22] hw/core: allow parameter=1 for SMP topology on any machine

2024-07-04 Thread Michael Tokarev

From: Daniel P. Berrangé 

This effectively reverts

  commit 54c4ea8f3ae614054079395842128a856a73dbf9
  Author: Zhao Liu 
  Date:   Sat Mar 9 00:01:37 2024 +0800

hw/core/machine-smp: Deprecate unsupported "parameter=1" SMP configurations

but is not done as a 'git revert' since the part of the changes to the
file hw/core/machine-smp.c which add 'has_XXX' checks remain desirable.
Furthermore, we have to tweak the subsequently added unit test to
account for differing warning message.

The rationale for the original deprecation was:

  "Currently, it was allowed for users to specify the unsupported
   topology parameter as "1". For example, x86 PC machine doesn't
   support drawer/book/cluster topology levels, but user could specify
   "-smp drawers=1,books=1,clusters=1".

   This is meaningless and confusing, so that the support for this kind
   of configurations is marked deprecated since 9.0."

There are varying POVs on the topic of 'unsupported' topology levels.

It is common to say that on a system without hyperthreading, that there
is always 1 thread. Likewise when new CPUs introduced a concept of
multiple "dies', it was reasonable to say that all historical CPUs
before that implicitly had 1 'die'. Likewise for the more recently
introduced 'modules' and 'clusters' parameter'. From this POV, it is
valid to set 'parameter=1' on the -smp command line for any machine,
only a value > 1 is strictly an error condition.

It doesn't cause any functional difficulty for QEMU, because internally
the QEMU code is itself assuming that all "unsupported" parameters
implicitly have a value of '1'.

At the libvirt level, we've allowed applications to set 'parameter=1'
when configuring a guest, and pass that through to QEMU.

Deprecating this creates extra difficulty for because there's no info
exposed from QEMU about which machine types "support" which parameters.
Thus, libvirt can't know whether it is valid to pass 'parameter=1' for
a given machine type, or whether it will trigger deprecation messages.

Since there's no apparent functional benefit to deleting this deprecated
behaviour from QEMU, and it creates problems for consumers of QEMU,
remove this deprecation.

Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Zhao Liu 
Reviewed-by: Ján Tomko 
Message-ID: <20240513123358.612355-2-berra...@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé 
(cherry picked from commit 9d7950edb0cdf8f4e5746e220e6e8a9e713bad16)
Signed-off-by: Michael Tokarev 
(Mjt: remove hunk about modules in hw/core/machine-smp.c introduced in
 v9.0.0-155-g8ec0a4634798 "hw/core/machine: Support modules in -smp")

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index 27864c9507..b5e3849d3d 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -112,62 +112,38 @@ void machine_parse_smp_config(MachineState *ms,
 }

 /*
- * If not supported by the machine, a topology parameter must be
- * omitted.
+ * If not supported by the machine, a topology parameter must
+ * not be set to a value greater than 1.
  */
-if (!mc->smp_props.clusters_supported && config->has_clusters) {
-if (config->clusters > 1) {
-error_setg(errp, "clusters not supported by this "
-   "machine's CPU topology");
-return;
-} else {
-/* Here clusters only equals 1 since we've checked zero case. */
-warn_report("Deprecated CPU topology (considered invalid): "
-"Unsupported clusters parameter mustn't be "
-"specified as 1");
-}
+if (!mc->smp_props.clusters_supported &&
+config->has_clusters && config->clusters > 1) {
+error_setg(errp,
+   "clusters > 1 not supported by this machine's CPU 
topology");
+return;
 }
 clusters = clusters > 0 ? clusters : 1;

-if (!mc->smp_props.dies_supported && config->has_dies) {
-if (config->dies > 1) {
-error_setg(errp, "dies not supported by this "
-   "machine's CPU topology");
-return;
-} else {
-/* Here dies only equals 1 since we've checked zero case. */
-warn_report("Deprecated CPU topology (considered invalid): "
-"Unsupported dies parameter mustn't be "
-"specified as 1");
-}
+if (!mc->smp_props.dies_supported &&
+config->has_dies && config->dies > 1) {
+error_setg(errp,
+   "dies > 1 not supported by this machine's CPU topology");
+return;
 }
 dies = dies > 0 ? dies : 1;

-if (!mc->smp_props.books_supported && config->has_books) {
-if (config->books > 1) {
-error_setg(errp, "books not supported by this "
-   "machine's CPU topology");
-return;
-} else {
-/* Here books only equals 1 since we've checked zero case. */
-warn_r

[Stable-9.0.2 04/22] Revert "monitor: use aio_co_reschedule_self()"

2024-07-04 Thread Michael Tokarev

From: Stefan Hajnoczi 

Commit 1f25c172f837 ("monitor: use aio_co_reschedule_self()") was a code
cleanup that uses aio_co_reschedule_self() instead of open coding
coroutine rescheduling.

Bug RHEL-34618 was reported and Kevin Wolf  identified
the root cause. I missed that aio_co_reschedule_self() ->
qemu_get_current_aio_context() only knows about
qemu_aio_context/IOThread AioContexts and not about iohandler_ctx. It
does not function correctly when going back from the iohandler_ctx to
qemu_aio_context.

Go back to open coding the AioContext transitions to avoid this bug.

This reverts commit 1f25c172f83704e350c0829438d832384084a74d.

Cc: qemu-sta...@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-34618
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240506190622.56095-2-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit 719c6819ed9a9838520fa732f9861918dc693bda)
Signed-off-by: Michael Tokarev 

diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index f3488afeef..176b549473 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -212,7 +212,8 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
  * executing the command handler so that it can make progress if it
  * involves an AIO_WAIT_WHILE().
  */
-aio_co_reschedule_self(qemu_get_aio_context());
+aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
+qemu_coroutine_yield();
 }
 
 monitor_set_cur(qemu_coroutine_self(), cur_mon);
@@ -226,7 +227,9 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
  * Move back to iohandler_ctx so that nested event loops for
  * qemu_aio_context don't start new monitor commands.
  */
-aio_co_reschedule_self(iohandler_get_aio_context());
+aio_co_schedule(iohandler_get_aio_context(),
+qemu_coroutine_self());
+qemu_coroutine_yield();
 }
 } else {
/*
-- 
2.39.2

[Stable-9.0.2 11/22] migration: Fix file migration with fdset

2024-07-04 Thread Michael Tokarev

From: Fabiano Rosas 

When the "file:" migration support was added we missed the special
case in the qemu_open_old implementation that allows for a particular
file name format to be used to refer to a set of file descriptors that
have been previously provided to QEMU via the add-fd QMP command.

When using this fdset feature, we should not truncate the migration
file because being given an fd means that the management layer is in
control of the file and will likely already have some data written to
it. This is further indicated by the presence of the 'offset'
argument, which indicates the start of the region where QEMU is
allowed to write.

Fix the issue by replacing the O_TRUNC flag on open by an ftruncate
call, which will take the offset into consideration.

Fixes: 385f510df5 ("migration: file URI offset")
Suggested-by: Daniel P. Berrangé 
Reviewed-by: Prasad Pandit 
Reviewed-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Fabiano Rosas 
(cherry picked from commit 6d3279655ac49b806265f08415165f471d33e032)
Signed-off-by: Michael Tokarev 

diff --git a/migration/file.c b/migration/file.c
index ab18ba505a..ba5b5c44ff 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -84,12 +84,19 @@ void file_start_outgoing_migration(MigrationState *s,
 
 trace_migration_file_outgoing(filename);
 
-fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
- 0600, errp);
+fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY, 0600, errp);
 if (!fioc) {
 return;
 }
 
+if (ftruncate(fioc->fd, offset)) {
+error_setg_errno(errp, errno,
+ "failed to truncate migration file to offset %" 
PRIx64,
+ offset);
+object_unref(OBJECT(fioc));
+return;
+}
+
 outgoing_args.fname = g_strdup(filename);
 
 ioc = QIO_CHANNEL(fioc);
-- 
2.39.2

[Stable-9.0.2 10/22] tcg/loongarch64: Fix tcg_out_movi vs some pcrel pointers

2024-07-04 Thread Michael Tokarev

From: Richard Henderson 

Simplify the logic for two-part, 32-bit pc-relative addresses.
Rather than assume all such fit in int32_t, do some arithmetic
and assert a result, do some arithmetic first and then check
to see if the pieces are in range.

Cc: qemu-sta...@nongnu.org
Fixes: dacc51720db ("tcg/loongarch64: Implement tcg_out_mov and tcg_out_movi")
Reviewed-by: Song Gao 
Reported-by: Song Gao 
Signed-off-by: Richard Henderson 
(cherry picked from commit 521d7fb3ebdf88112ed13556a93e3037742b9eb8)
Signed-off-by: Michael Tokarev 

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 06ca1ab11c..8f68bd3e51 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -366,8 +366,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type, 
TCGReg rd,
  * back to the slow path.
  */
 
-intptr_t pc_offset;
-tcg_target_long val_lo, val_hi, pc_hi, offset_hi;
+intptr_t src_rx, pc_offset;
 tcg_target_long hi12, hi32, hi52;
 
 /* Value fits in signed i32.  */
@@ -377,24 +376,23 @@ static void tcg_out_movi(TCGContext *s, TCGType type, 
TCGReg rd,
 }
 
 /* PC-relative cases.  */
-pc_offset = tcg_pcrel_diff(s, (void *)val);
-if (pc_offset == sextreg(pc_offset, 0, 22) && (pc_offset & 3) == 0) {
-/* Single pcaddu2i.  */
-tcg_out_opc_pcaddu2i(s, rd, pc_offset >> 2);
-return;
+src_rx = (intptr_t)tcg_splitwx_to_rx(s->code_ptr);
+if ((val & 3) == 0) {
+pc_offset = val - src_rx;
+if (pc_offset == sextreg(pc_offset, 0, 22)) {
+/* Single pcaddu2i.  */
+tcg_out_opc_pcaddu2i(s, rd, pc_offset >> 2);
+return;
+}
 }
 
-if (pc_offset == (int32_t)pc_offset) {
-/* Offset within 32 bits; load with pcalau12i + ori.  */
-val_lo = sextreg(val, 0, 12);
-val_hi = val >> 12;
-pc_hi = (val - pc_offset) >> 12;
-offset_hi = val_hi - pc_hi;
-
-tcg_debug_assert(offset_hi == sextreg(offset_hi, 0, 20));
-tcg_out_opc_pcalau12i(s, rd, offset_hi);
+pc_offset = (val >> 12) - (src_rx >> 12);
+if (pc_offset == sextreg(pc_offset, 0, 20)) {
+/* Load with pcalau12i + ori.  */
+tcg_target_long val_lo = val & 0xfff;
+tcg_out_opc_pcalau12i(s, rd, pc_offset);
 if (val_lo != 0) {
-tcg_out_opc_ori(s, rd, rd, val_lo & 0xfff);
+tcg_out_opc_ori(s, rd, rd, val_lo);
 }
 return;
 }
-- 
2.39.2

[Stable-9.0.2 02/22] virtio-net: drop too short packets early

2024-07-04 Thread Michael Tokarev

From: Alexey Dobriyan 

Reproducer from https://gitlab.com/qemu-project/qemu/-/issues/1451
creates small packet (1 segment, len = 10 == n->guest_hdr_len),
then destroys queue.

"if (n->host_hdr_len != n->guest_hdr_len)" is triggered, if body creates
zero length/zero segment packet as there is nothing after guest header.

qemu_sendv_packet_async() tries to send it.

slirp discards it because it is smaller than Ethernet header,
but returns 0 because tx hooks are supposed to return total length of data.

0 is propagated upwards and is interpreted as "packet has been sent"
which is terrible because queue is being destroyed, nobody is waiting for TX
to complete and assert it triggered.

Fix is discard such empty packets instead of sending them.

Length 1 packets will go via different codepath:

virtqueue_push(q->tx_vq, elem, 0);
virtio_notify(vdev, q->tx_vq);
g_free(elem);

and aren't problematic.

Signed-off-by: Alexey Dobriyan 
Signed-off-by: Jason Wang 
(cherry picked from commit 2c3e4e2de699cd4d9f6c71f30a22d8f125cd6164)
Signed-off-by: Michael Tokarev 

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 24e5e7d347..3644bfd91b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2749,18 +2749,14 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 out_sg = elem->out_sg;
 if (out_num < 1) {
 virtio_error(vdev, "virtio-net header not in first element");
-virtqueue_detach_element(q->tx_vq, elem, 0);
-g_free(elem);
-return -EINVAL;
+goto detach;
 }
 
 if (n->has_vnet_hdr) {
 if (iov_to_buf(out_sg, out_num, 0, &vhdr, n->guest_hdr_len) <
 n->guest_hdr_len) {
 virtio_error(vdev, "virtio-net header incorrect");
-virtqueue_detach_element(q->tx_vq, elem, 0);
-g_free(elem);
-return -EINVAL;
+goto detach;
 }
 if (n->needs_vnet_hdr_swap) {
 virtio_net_hdr_swap(vdev, (void *) &vhdr);
@@ -2791,6 +2787,11 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
  n->guest_hdr_len, -1);
 out_num = sg_num;
 out_sg = sg;
+
+if (out_num < 1) {
+virtio_error(vdev, "virtio-net nothing to send");
+goto detach;
+}
 }
 
 ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index),
@@ -2811,6 +2812,11 @@ drop:
 }
 }
 return num_packets;
+
+detach:
+virtqueue_detach_element(q->tx_vq, elem, 0);
+g_free(elem);
+return -EINVAL;
 }
 
 static void virtio_net_tx_timer(void *opaque);
-- 
2.39.2

[Stable-9.0.2 09/22] target/sparc: use signed denominator in sdiv helper

2024-07-04 Thread Michael Tokarev

From: Clément Chigot 

The result has to be done with the signed denominator (b32) instead of
the unsigned value passed in argument (b).

Cc: qemu-sta...@nongnu.org
Fixes: 1326010322d6 ("target/sparc: Remove CC_OP_DIV")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2319
Signed-off-by: Clément Chigot 
Reviewed-by: Richard Henderson 
Message-Id: <20240606144331.698361-1-chi...@adacore.com>
Signed-off-by: Richard Henderson 
(cherry picked from commit 6b4965373e561b77f91cfbdf41353635c9661358)
Signed-off-by: Michael Tokarev 

diff --git a/target/sparc/helper.c b/target/sparc/helper.c
index 2247e243b5..7846ddd6f6 100644
--- a/target/sparc/helper.c
+++ b/target/sparc/helper.c
@@ -121,7 +121,7 @@ uint64_t helper_sdiv(CPUSPARCState *env, target_ulong a, 
target_ulong b)
 return (uint32_t)(b32 < 0 ? INT32_MAX : INT32_MIN) | (-1ull << 32);
 }
 
-a64 /= b;
+a64 /= b32;
 r = a64;
 if (unlikely(r != a64)) {
 return (uint32_t)(a64 < 0 ? INT32_MIN : INT32_MAX) | (-1ull << 32);
-- 
2.39.2

[Stable-9.0.2 00/22] Patch Round-up for stable 9.0.2, freeze on 2024-07-14

2024-07-04 Thread Michael Tokarev

The following patches are queued for QEMU stable v9.0.2:

  https://gitlab.com/qemu-project/qemu/-/commits/staging-9.0

Patch freeze is 2024-07-14, and the release is planned for 2024-07-16:

  https://wiki.qemu.org/Planning/9.0

Please respond here or CC qemu-sta...@nongnu.org on any additional patches
you think should (or shouldn't) be included in the release.

The changes which are staging for inclusion, with the original commit hash
from master branch, are given below the bottom line.

Thanks!

/mjt

--
01 3973615e7fba Mark Cave-Ayland:
   target/i386: fix size of EBP writeback in gen_enter()
02 2c3e4e2de699 Alexey Dobriyan:
   virtio-net: drop too short packets early
03 77bf310084da Dongwon Kim:
   ui/gtk: Draw guest frame at refresh cycle
04 719c6819ed9a Stefan Hajnoczi:
   Revert "monitor: use aio_co_reschedule_self()"
05 a276ec8e2632 Philippe Mathieu-Daudé:
   hw/audio/virtio-snd: Always use little endian audio format
06 b1cf266c82cb Gerd Hoffmann:
   stdvga: fix screen blanking
07 3b279f73fa37 Anton Johansson:
   accel/tcg: Fix typo causing tb->page_addr[1] to not be recorded
08 54b27921026d Ilya Leoshkevich:
   linux-user: Make TARGET_NR_setgroups affect only the current thread
09 6b4965373e56 Clément Chigot:
   target/sparc: use signed denominator in sdiv helper
10 521d7fb3ebdf Richard Henderson:
   tcg/loongarch64: Fix tcg_out_movi vs some pcrel pointers
11 6d3279655ac4 Fabiano Rosas:
   migration: Fix file migration with fdset
12 641b1efe01b2 Thomas Huth:
   tests: Update our CI to use CentOS Stream 9 instead of 8
13 903916f0a017 Chuang Xu:
   i386/cpu: fixup number of addressable IDs for processor cores in the 
   physical package
14 76bccf3cb9d9 Richard Henderson:
   target/arm: Fix VCMLA Dd, Dn, Dm[idx]
15 7619129f0d4a Richard Henderson:
   target/arm: Fix FJCVTZS vs flush-to-zero
16 9d7950edb0cd Daniel P. Berrangé:
   hw/core: allow parameter=1 for SMP topology on any machine
17 e68dcbb07923 Daniel P. Berrangé:
   tests: add testing of parameter=1 for SMP topology
18 bd385a5298d7 Kevin Wolf:
   qcow2: Don't open data_file with BDRV_O_NO_IO
19 2eb42a728d27 Kevin Wolf:
   iotests/244: Don't store data-file with protocol in image
20 7e1110664ecb Kevin Wolf:
   iotests/270: Don't store data-file with json: prefix in image
21 7ead94699861 Kevin Wolf:
   block: Parse filenames only when explicitly requested
22 a71d9dfbf63d Richard Henderson:
   tcg/optimize: Fix TCG_COND_TST* simplification of setcond2

[Stable-9.0.2 01/22] target/i386: fix size of EBP writeback in gen_enter()

2024-07-04 Thread Michael Tokarev

From: Mark Cave-Ayland 

The calculation of FrameTemp is done using the size indicated by mo_pushpop()
before being written back to EBP, but the final writeback to EBP is done using
the size indicated by mo_stacksize().

In the case where mo_pushpop() is MO_32 and mo_stacksize() is MO_16 then the
final writeback to EBP is done using MO_16 which can leave junk in the top
16-bits of EBP after executing ENTER.

Change the writeback of EBP to use the same size indicated by mo_pushpop() to
ensure that the full value is written back.

Signed-off-by: Mark Cave-Ayland 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2198
Message-ID: <20240606095319.229650-5-mark.cave-ayl...@ilande.co.uk>
Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 3973615e7fbaeef1deeaa067577e373781ced70a)
Signed-off-by: Michael Tokarev 

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index a55df176c6..26ed900f34 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2684,7 +2684,7 @@ static void gen_enter(DisasContext *s, int esp_addend, 
int level)
 }
 
 /* Copy the FrameTemp value to EBP.  */
-gen_op_mov_reg_v(s, a_ot, R_EBP, s->T1);
+gen_op_mov_reg_v(s, d_ot, R_EBP, s->T1);
 
 /* Compute the final value of ESP.  */
 tcg_gen_subi_tl(s->T1, s->T1, esp_addend + size * level);
-- 
2.39.2

[PATCH 2/3] hw/isa/vt82c686: Resolve intermediate IRQ forwarder

2024-07-04 Thread Bernhard Beschow

When @cpu_intr is populated before vt82xx's realize(), it can be directly passed
to i8259_init(), avoiding the need for the intermediate
via_isa_request_i8259_irq() handler. The result is less code and runtime
overhead, and a fixed memory leak caused by qemu_allocate_irqs().

Inspired-by: Philippe Mathieu-Daudé 
Signed-off-by: Bernhard Beschow 
---
 hw/isa/vt82c686.c   | 12 ++--
 hw/mips/fuloong2e.c |  2 +-
 hw/ppc/amigaone.c   |  8 
 hw/ppc/pegasos2.c   |  4 ++--
 4 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 505b44c4e6..ca02ad4c20 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -624,6 +624,7 @@ static void via_isa_init(Object *obj)
 object_initialize_child(obj, "uhci2", &s->uhci[1], 
TYPE_VT82C686B_USB_UHCI);
 object_initialize_child(obj, "ac97", &s->ac97, TYPE_VIA_AC97);
 object_initialize_child(obj, "mc97", &s->mc97, TYPE_VIA_MC97);
+qdev_init_gpio_out_named(DEVICE(obj), &s->cpu_intr, "intr", 1);
 }
 
 static const TypeInfo via_isa_info = {
@@ -704,24 +705,15 @@ static void via_isa_pirq(void *opaque, int pin, int level)
 via_isa_set_irq(opaque, pin, level);
 }
 
-static void via_isa_request_i8259_irq(void *opaque, int irq, int level)
-{
-ViaISAState *s = opaque;
-qemu_set_irq(s->cpu_intr, level);
-}
-
 static void via_isa_realize(PCIDevice *d, Error **errp)
 {
 ViaISAState *s = VIA_ISA(d);
 DeviceState *dev = DEVICE(d);
 PCIBus *pci_bus = pci_get_bus(d);
-qemu_irq *isa_irq;
 ISABus *isa_bus;
 int i;
 
-qdev_init_gpio_out_named(dev, &s->cpu_intr, "intr", 1);
 qdev_init_gpio_in_named(dev, via_isa_pirq, "pirq", PCI_NUM_PINS);
-isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
 isa_bus = isa_bus_new(dev, pci_address_space(d), pci_address_space_io(d),
   errp);
 
@@ -729,7 +721,7 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
 return;
 }
 
-s->isa_irqs_in = i8259_init(isa_bus, *isa_irq);
+s->isa_irqs_in = i8259_init(isa_bus, s->cpu_intr);
 isa_bus_register_input_irqs(isa_bus, s->isa_irqs_in);
 i8254_pit_init(isa_bus, 0x40, 0, NULL);
 i8257_dma_init(OBJECT(d), isa_bus, 0);
diff --git a/hw/mips/fuloong2e.c b/hw/mips/fuloong2e.c
index 6e4303ba47..e6487c34d8 100644
--- a/hw/mips/fuloong2e.c
+++ b/hw/mips/fuloong2e.c
@@ -286,6 +286,7 @@ static void mips_fuloong2e_init(MachineState *machine)
 /* South bridge -> IP5 */
 pci_dev = pci_new_multifunction(PCI_DEVFN(FULOONG2E_VIA_SLOT, 0),
 TYPE_VT82C686B_ISA);
+qdev_connect_gpio_out_named(DEVICE(pci_dev), "intr", 0, env->irq[5]);
 
 /* Set properties on individual devices before realizing the south bridge 
*/
 if (machine->audiodev) {
@@ -299,7 +300,6 @@ static void mips_fuloong2e_init(MachineState *machine)
   object_resolve_path_component(OBJECT(pci_dev),
 "rtc"),
   "date");
-qdev_connect_gpio_out_named(DEVICE(pci_dev), "intr", 0, env->irq[5]);
 
 dev = DEVICE(object_resolve_path_component(OBJECT(pci_dev), "ide"));
 pci_ide_create_devs(PCI_DEVICE(dev));
diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c
index 9dcc486c1a..2110875f56 100644
--- a/hw/ppc/amigaone.c
+++ b/hw/ppc/amigaone.c
@@ -148,13 +148,13 @@ static void amigaone_init(MachineState *machine)
 pci_bus = PCI_BUS(qdev_get_child_bus(dev, "pci.0"));
 
 /* VIA VT82c686B South Bridge (multifunction PCI device) */
-via = OBJECT(pci_create_simple_multifunction(pci_bus, PCI_DEVFN(7, 0),
- TYPE_VT82C686B_ISA));
+via = OBJECT(pci_new_multifunction(PCI_DEVFN(7, 0), TYPE_VT82C686B_ISA));
+qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
+qdev_get_gpio_in(DEVICE(cpu), 
PPC6xx_INPUT_INT));
+pci_realize_and_unref(PCI_DEVICE(via), pci_bus, &error_abort);
 object_property_add_alias(OBJECT(machine), "rtc-time",
   object_resolve_path_component(via, "rtc"),
   "date");
-qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
-qdev_get_gpio_in(DEVICE(cpu), 
PPC6xx_INPUT_INT));
 for (i = 0; i < PCI_NUM_PINS; i++) {
 qdev_connect_gpio_out(dev, i, qdev_get_gpio_in_named(DEVICE(via),
  "pirq", i));
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 9b0a6b70ab..54e60082ce 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -181,6 +181,8 @@ static void pegasos2_init(MachineState *machine)
 
 /* VIA VT8231 South Bridge (multifunction PCI device) */
 via = OBJECT(pci_new_multifunction(PCI_DEVFN(12, 0), TYPE_VT8231_ISA));
+qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
+qdev_get_gpio_in_named(

[PATCH 3/3] hw/isa/piix: Resolve intermediate IRQ forwarder

2024-07-04 Thread Bernhard Beschow

When @cpu_intr is populated before pixx4's realize(), it can be directly passed
to i8259_init(), avoiding the need for the intermediate piix_request_i8259_irq()
handler. The result is less code and runtime overhead, and a fixed memory leak
caused by qemu_allocate_irqs().

Inspired-by: Philippe Mathieu-Daudé 
Signed-off-by: Bernhard Beschow 
---
 hw/isa/piix.c   | 13 ++---
 hw/mips/malta.c |  4 +---
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/hw/isa/piix.c b/hw/isa/piix.c
index 2d30711b17..e070628f25 100644
--- a/hw/isa/piix.c
+++ b/hw/isa/piix.c
@@ -81,12 +81,6 @@ static void piix_set_pci_irq(void *opaque, int pirq, int 
level)
 piix_set_pci_irq_level(s, pirq, level);
 }
 
-static void piix_request_i8259_irq(void *opaque, int irq, int level)
-{
-PIIXState *s = opaque;
-qemu_set_irq(s->cpu_intr, level);
-}
-
 static PCIINTxRoute piix_route_intx_pin_to_irq(void *opaque, int pin)
 {
 PCIDevice *pci_dev = opaque;
@@ -315,9 +309,7 @@ static void pci_piix_realize(PCIDevice *dev, const char 
*uhci_type,
 
 /* PIC */
 if (d->has_pic) {
-qemu_irq *i8259_out_irq = qemu_allocate_irqs(piix_request_i8259_irq, d,
- 1);
-qemu_irq *i8259 = i8259_init(isa_bus, *i8259_out_irq);
+qemu_irq *i8259 = i8259_init(isa_bus, d->cpu_intr);
 size_t i;
 
 for (i = 0; i < ISA_NUM_IRQS; i++) {
@@ -325,8 +317,6 @@ static void pci_piix_realize(PCIDevice *dev, const char 
*uhci_type,
 }
 
 g_free(i8259);
-
-qdev_init_gpio_out_named(DEVICE(dev), &d->cpu_intr, "intr", 1);
 }
 
 isa_bus_register_input_irqs(isa_bus, d->isa_irqs_in);
@@ -402,6 +392,7 @@ static void pci_piix_init(Object *obj)
 {
 PIIXState *d = PIIX_PCI_DEVICE(obj);
 
+qdev_init_gpio_out_named(DEVICE(obj), &d->cpu_intr, "intr", 1);
 qdev_init_gpio_out_named(DEVICE(obj), d->isa_irqs_in, "isa-irqs",
  ISA_NUM_IRQS);
 
diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index 664a2ae0a9..50823bd5fb 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -1238,15 +1238,13 @@ void mips_malta_init(MachineState *machine)
 /* Southbridge */
 piix4 = pci_new_multifunction(PIIX4_PCI_DEVFN, TYPE_PIIX4_PCI_DEVICE);
 qdev_prop_set_uint32(DEVICE(piix4), "smb_io_base", 0x1100);
+qdev_connect_gpio_out_named(DEVICE(piix4), "intr", 0, i8259_irq);
 pci_realize_and_unref(piix4, pci_bus, &error_fatal);
 isa_bus = ISA_BUS(qdev_get_child_bus(DEVICE(piix4), "isa.0"));
 
 dev = DEVICE(object_resolve_path_component(OBJECT(piix4), "ide"));
 pci_ide_create_devs(PCI_DEVICE(dev));
 
-/* Interrupt controller */
-qdev_connect_gpio_out_named(DEVICE(piix4), "intr", 0, i8259_irq);
-
 /* generate SPD EEPROM data */
 dev = DEVICE(object_resolve_path_component(OBJECT(piix4), "pm"));
 smbus = I2C_BUS(qdev_get_child_bus(dev, "i2c"));
-- 
2.45.2

[PATCH 0/3] Resolve vt82c686 and piix4 qemu_irq memory leaks

2024-07-04 Thread Bernhard Beschow

This series first turns vt82c686's "INTR" pin into a named GPIO for better
comprehensibility. It then continues fixing qemu_irq memory leaks in vt82c686
and piix4 by connecting out IRQs of the south bridges before they get realized.
This approach is already used in the pc machines after it had been discussed at
KVM forum `23.

Observe that out IRQs are callbacks such as an INTR IRQ handler in a CPU which a
south bridge wants to trigger. If, as an implementation detail, the south bridge
wants to pass this callback to a child device, such as the PIC, then this
callback must be known to the south bridge before it gets realized. Otherwise
board code had to wire the PIC device itself, breaking encapsulation. This means
that qdev_connect_gpio_out*() has to be called before realize() which this
series implements. Another way to look at it is that callbacks apparently are
resouces such as memory regions which are also populated before realize().

Please check if above paragraph makes sense.

Best regards,
Bernhard

See also:
* https://lore.kernel.org/qemu-devel/0FFB5FD2-08CE-4CEC-9001-E7AC24407A44@gmail.
com/
* "Remove intermediate IRQ forwarder" patches in
https://lore.kernel.org/qemu-devel/20230210163744.32182-1-phi...@linaro.org/

Testing done:
* Boot amigaone machine into Linux
* Boot pegasos2 machine into MorphOS
* Start fuloong2e machine and check that it doesn't abort
* Boot malta machine with https://people.debian.org/~gio/dqib/

Bernhard Beschow (3):
  hw/isa/vt82c686: Turn "intr" irq into a named gpio
  hw/isa/vt82c686: Resolve intermediate IRQ forwarder
  hw/isa/piix: Resolve intermediate IRQ forwarder

 hw/isa/piix.c   | 13 ++---
 hw/isa/vt82c686.c   | 12 ++--
 hw/mips/fuloong2e.c |  2 +-
 hw/mips/malta.c |  4 +---
 hw/ppc/amigaone.c   |  8 
 hw/ppc/pegasos2.c   |  4 ++--
 6 files changed, 12 insertions(+), 31 deletions(-)

-- 
2.45.2

[PATCH 1/3] hw/isa/vt82c686: Turn "intr" irq into a named gpio

2024-07-04 Thread Bernhard Beschow

Makes the code more comprehensible, matches the datasheet and the piix4 device
model.

Signed-off-by: Bernhard Beschow 
---
 hw/isa/vt82c686.c   | 2 +-
 hw/mips/fuloong2e.c | 2 +-
 hw/ppc/amigaone.c   | 4 ++--
 hw/ppc/pegasos2.c   | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 8582ac0322..505b44c4e6 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -719,7 +719,7 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
 ISABus *isa_bus;
 int i;
 
-qdev_init_gpio_out(dev, &s->cpu_intr, 1);
+qdev_init_gpio_out_named(dev, &s->cpu_intr, "intr", 1);
 qdev_init_gpio_in_named(dev, via_isa_pirq, "pirq", PCI_NUM_PINS);
 isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
 isa_bus = isa_bus_new(dev, pci_address_space(d), pci_address_space_io(d),
diff --git a/hw/mips/fuloong2e.c b/hw/mips/fuloong2e.c
index a45aac368c..6e4303ba47 100644
--- a/hw/mips/fuloong2e.c
+++ b/hw/mips/fuloong2e.c
@@ -299,7 +299,7 @@ static void mips_fuloong2e_init(MachineState *machine)
   object_resolve_path_component(OBJECT(pci_dev),
 "rtc"),
   "date");
-qdev_connect_gpio_out(DEVICE(pci_dev), 0, env->irq[5]);
+qdev_connect_gpio_out_named(DEVICE(pci_dev), "intr", 0, env->irq[5]);
 
 dev = DEVICE(object_resolve_path_component(OBJECT(pci_dev), "ide"));
 pci_ide_create_devs(PCI_DEVICE(dev));
diff --git a/hw/ppc/amigaone.c b/hw/ppc/amigaone.c
index ddfa09457a..9dcc486c1a 100644
--- a/hw/ppc/amigaone.c
+++ b/hw/ppc/amigaone.c
@@ -153,8 +153,8 @@ static void amigaone_init(MachineState *machine)
 object_property_add_alias(OBJECT(machine), "rtc-time",
   object_resolve_path_component(via, "rtc"),
   "date");
-qdev_connect_gpio_out(DEVICE(via), 0,
-  qdev_get_gpio_in(DEVICE(cpu), PPC6xx_INPUT_INT));
+qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
+qdev_get_gpio_in(DEVICE(cpu), 
PPC6xx_INPUT_INT));
 for (i = 0; i < PCI_NUM_PINS; i++) {
 qdev_connect_gpio_out(dev, i, qdev_get_gpio_in_named(DEVICE(via),
  "pirq", i));
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index c1bd8dfa21..9b0a6b70ab 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -195,8 +195,8 @@ static void pegasos2_init(MachineState *machine)
 object_property_add_alias(OBJECT(machine), "rtc-time",
   object_resolve_path_component(via, "rtc"),
   "date");
-qdev_connect_gpio_out(DEVICE(via), 0,
-  qdev_get_gpio_in_named(pm->mv, "gpp", 31));
+qdev_connect_gpio_out_named(DEVICE(via), "intr", 0,
+qdev_get_gpio_in_named(pm->mv, "gpp", 31));
 
 dev = PCI_DEVICE(object_resolve_path_component(via, "ide"));
 pci_ide_create_devs(dev);
-- 
2.45.2

Re: [PATCH v4 00/14] test/tcg: Clang build fixes for arm/aarch64

2024-07-04 Thread Alex Bennée

Richard Henderson  writes:

> Supercedes: 20240629-tcg-v3-0-fa57918bd...@daynix.com
> ("[PATCH v3 0/7] tests/tcg/aarch64: Fix inline assemblies for clang")
>
> On top of Akihiko's patches for aarch64, additional changes are
> required for arm, both as a host and as a guest.

Queued to testing/next, thanks.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v6 09/10] hw/nvme: add reservation protocal command

2024-07-04 Thread Stefan Hajnoczi

I will skip this since Klaus Jensen's review is required for NVMe anyway.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v6 10/10] block/iscsi: add persistent reservation in/out driver

2024-07-04 Thread Stefan Hajnoczi

On Thu, Jun 13, 2024 at 03:13:27PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations for iscsi driver.
> The following methods are implemented: bdrv_co_pr_read_keys,
> bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
> bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/iscsi.c | 443 ++
>  1 file changed, 443 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 2ff14b7472..d94ebe35bd 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -96,6 +96,7 @@ typedef struct IscsiLun {
>  unsigned long *allocmap_valid;
>  long allocmap_size;
>  int cluster_size;
> +uint8_t pr_cap;
>  bool use_16_for_rw;
>  bool write_protected;
>  bool lbpme;
> @@ -280,6 +281,8 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
> status,
>  iTask->err_code = -error;
>  iTask->err_str = g_strdup(iscsi_get_error(iscsi));
>  }
> +} else if (status == SCSI_STATUS_RESERVATION_CONFLICT) {
> +iTask->err_code = -EBADE;

Should err_str be set too? For example, iscsi_co_writev() seems to
assume err_str is set if the iSCSI task fails.

>  }
>  }
>  }
> @@ -1792,6 +1795,52 @@ static void iscsi_save_designator(IscsiLun *lun,
>  }
>  }
>  
> +static void iscsi_get_pr_cap_sync(IscsiLun *iscsilun, Error **errp)
> +{
> +struct scsi_task *task = NULL;
> +struct scsi_persistent_reserve_in_report_capabilities *rc = NULL;
> +int retries = ISCSI_CMD_RETRIES;
> +int xferlen = sizeof(struct 
> scsi_persistent_reserve_in_report_capabilities);
> +
> +do {
> +if (task != NULL) {
> +scsi_free_scsi_task(task);
> +task = NULL;
> +}
> +
> +task = iscsi_persistent_reserve_in_sync(iscsilun->iscsi,
> +   iscsilun->lun, SCSI_PR_IN_REPORT_CAPABILITIES, xferlen);
> +if (task != NULL && task->status == SCSI_STATUS_GOOD) {
> +rc = scsi_datain_unmarshall(task);
> +if (rc == NULL) {
> +error_setg(errp,
> +"iSCSI: Failed to unmarshall report capabilities data.");
> +} else {
> +iscsilun->pr_cap =
> +
> scsi_pr_cap_to_block(rc->persistent_reservation_type_mask);
> +iscsilun->pr_cap |= (rc->ptpl_a) ? BLK_PR_CAP_PTPL : 0;
> +}
> +break;
> +}
> +
> +if (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> +&& task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
> +break;
> +}
> +
> +} while (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> + && task->sense.key == SCSI_SENSE_UNIT_ATTENTION
> + && retries-- > 0);
> +
> +if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +error_setg(errp, "iSCSI: failed to send report capabilities 
> command");
> +}
> +
> +if (task) {
> +scsi_free_scsi_task(task);
> +}
> +}
> +
>  static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>Error **errp)
>  {
> @@ -2024,6 +2073,11 @@ static int iscsi_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
>  }
>  
> +iscsi_get_pr_cap_sync(iscsilun, &local_err);
> +if (local_err != NULL) {
> +error_propagate(errp, local_err);
> +ret = -EINVAL;
> +}
>  out:
>  qemu_opts_del(opts);
>  g_free(initiator_name);
> @@ -2110,6 +2164,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  bs->bl.opt_transfer = pow2floor(iscsilun->bl.opt_xfer_len *
>  iscsilun->block_size);
>  }
> +
> +bs->bl.pr_cap = iscsilun->pr_cap;
>  }
>  
>  /* Note that this will not re-establish a connection with an iSCSI target - 
> it
> @@ -2408,6 +2464,385 @@ out_unlock:
>  return r;
>  }
>  
> +static int coroutine_fn
> +iscsi_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation,
> +  uint32_t num_keys, uint64_t *keys)
> +{
> +IscsiLun *iscsilun = bs->opaque;
> +QEMUIOVector qiov;
> +struct IscsiTask iTask;
> +int xferlen = sizeof(struct scsi_persistent_reserve_in_read_keys) +
> +  sizeof(uint64_t) * num_keys;
> +uint8_t *buf = g_malloc0(xferlen);
> +int32_t num_collect_keys = 0;
> +int r = 0;
> +
> +qemu_iovec_init_buf(&qiov, buf, xferlen);
> +iscsi_co_init_iscsitask(iscsilun, &iTask);
> +qemu_mutex_lock(&iscsilun->mutex);
> +retry:
> +iTask.task = iscsi_persistent_reserve_in_task(iscsilun->iscsi,
> + iscsilun->lun, SCSI_PR_IN_READ_KEYS, xferlen,
> + iscsi_co_generic_

Re: [PATCH v6 06/10] block/nvme: add reservation command protocol constants

2024-07-04 Thread Stefan Hajnoczi

On Thu, Jun 13, 2024 at 03:13:23PM +0800, Changqi Lu wrote:
> Add constants for the NVMe persistent command protocol.
> The constants include the reservation command opcode and
> reservation type values defined in section 7 of the NVMe
> 2.0 specification.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/block/nvme.h | 61 
>  1 file changed, 61 insertions(+)
> 
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index bb231d0b9a..da6ccb0f3b 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -633,6 +633,10 @@ enum NvmeIoCommands {
>  NVME_CMD_WRITE_ZEROES   = 0x08,
>  NVME_CMD_DSM= 0x09,
>  NVME_CMD_VERIFY = 0x0c,
> +NVME_CMD_RESV_REGISTER  = 0x0d,
> +NVME_CMD_RESV_REPORT= 0x0e,
> +NVME_CMD_RESV_ACQUIRE   = 0x11,
> +NVME_CMD_RESV_RELEASE   = 0x15,
>  NVME_CMD_IO_MGMT_RECV   = 0x12,

Keep NVME_CMD_IO_MGMT_RECV (0x12) before NVME_CMD_RESV_RELEASE (0x15) in
sorted order?

>  NVME_CMD_COPY   = 0x19,
>  NVME_CMD_IO_MGMT_SEND   = 0x1d,
> @@ -641,6 +645,63 @@ enum NvmeIoCommands {
>  NVME_CMD_ZONE_APPEND= 0x7d,
>  };
>  
> +typedef enum {
> +NVME_RESV_REGISTER_ACTION_REGISTER  = 0x00,
> +NVME_RESV_REGISTER_ACTION_UNREGISTER= 0x01,
> +NVME_RESV_REGISTER_ACTION_REPLACE   = 0x02,
> +} NvmeReservationRegisterAction;
> +
> +typedef enum {
> +NVME_RESV_RELEASE_ACTION_RELEASE= 0x00,
> +NVME_RESV_RELEASE_ACTION_CLEAR  = 0x01,
> +} NvmeReservationReleaseAction;
> +
> +typedef enum {
> +NVME_RESV_ACQUIRE_ACTION_ACQUIRE= 0x00,
> +NVME_RESV_ACQUIRE_ACTION_PREEMPT= 0x01,
> +NVME_RESV_ACQUIRE_ACTION_PREEMPT_AND_ABORT  = 0x02,
> +} NvmeReservationAcquireAction;
> +
> +typedef enum {
> +NVME_RESV_WRITE_EXCLUSIVE   = 0x01,
> +NVME_RESV_EXCLUSIVE_ACCESS  = 0x02,
> +NVME_RESV_WRITE_EXCLUSIVE_REGS_ONLY = 0x03,
> +NVME_RESV_EXCLUSIVE_ACCESS_REGS_ONLY= 0x04,
> +NVME_RESV_WRITE_EXCLUSIVE_ALL_REGS  = 0x05,
> +NVME_RESV_EXCLUSIVE_ACCESS_ALL_REGS = 0x06,
> +} NvmeResvType;
> +
> +typedef enum {
> +NVME_RESV_PTPL_NO_CHANGE = 0x00,
> +NVME_RESV_PTPL_DISABLE   = 0x02,
> +NVME_RESV_PTPL_ENABLE= 0x03,
> +} NvmeResvPTPL;
> +
> +typedef enum NVMEPrCap {
> +/* Persist Through Power Loss */
> +NVME_PR_CAP_PTPL = 1 << 0,
> +/* Write Exclusive reservation type */
> +NVME_PR_CAP_WR_EX = 1 << 1,
> +/* Exclusive Access reservation type */
> +NVME_PR_CAP_EX_AC = 1 << 2,
> +/* Write Exclusive Registrants Only reservation type */
> +NVME_PR_CAP_WR_EX_RO = 1 << 3,
> +/* Exclusive Access Registrants Only reservation type */
> +NVME_PR_CAP_EX_AC_RO = 1 << 4,
> +/* Write Exclusive All Registrants reservation type */
> +NVME_PR_CAP_WR_EX_AR = 1 << 5,
> +/* Exclusive Access All Registrants reservation type */
> +NVME_PR_CAP_EX_AC_AR = 1 << 6,
> +
> +NVME_PR_CAP_ALL = (NVME_PR_CAP_PTPL |
> +  NVME_PR_CAP_WR_EX |
> +  NVME_PR_CAP_EX_AC |
> +  NVME_PR_CAP_WR_EX_RO |
> +  NVME_PR_CAP_EX_AC_RO |
> +  NVME_PR_CAP_WR_EX_AR |
> +  NVME_PR_CAP_EX_AC_AR),
> +} NvmePrCap;
> +
>  typedef struct QEMU_PACKED NvmeDeleteQ {
>  uint8_t opcode;
>  uint8_t flags;
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature

Re: [PATCH v6 08/10] hw/nvme: enable ONCS and rescap function

2024-07-04 Thread Stefan Hajnoczi

On Thu, Jun 13, 2024 at 03:13:25PM +0800, Changqi Lu wrote:
> This commit enables ONCS to support the reservation
> function at the controller level. Also enables rescap
> function in the namespace by detecting the supported reservation
> function in the backend driver.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/nvme/ctrl.c | 3 ++-
>  hw/nvme/ns.c   | 5 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 127c3d2383..182307a48b 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -8248,7 +8248,8 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  id->nn = cpu_to_le32(NVME_MAX_NAMESPACES);
>  id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROES | NVME_ONCS_TIMESTAMP |
> NVME_ONCS_FEATURES | NVME_ONCS_DSM |
> -   NVME_ONCS_COMPARE | NVME_ONCS_COPY);
> +   NVME_ONCS_COMPARE | NVME_ONCS_COPY |
> +   NVME_ONCS_RESRVATIONS);

RESRVATIONS -> RESERVATIONS typo?

>  
>  /*
>   * NOTE: If this device ever supports a command set that does NOT use 0x0
> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> index ea8db175db..320c9bf658 100644
> --- a/hw/nvme/ns.c
> +++ b/hw/nvme/ns.c
> @@ -20,6 +20,7 @@
>  #include "qemu/bitops.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/block-backend.h"
> +#include "block/block_int.h"
>  
>  #include "nvme.h"
>  #include "trace.h"
> @@ -33,6 +34,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  BlockDriverInfo bdi;
>  int npdg, ret;
>  int64_t nlbas;
> +uint8_t blk_pr_cap;
>  
>  ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
>  ns->lbasz = 1 << ns->lbaf.ds;
> @@ -55,6 +57,9 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  }
>  
>  id_ns->npda = id_ns->npdg = npdg - 1;
> +
> +blk_pr_cap = blk_bs(ns->blkconf.blk)->file->bs->bl.pr_cap;

Kevin: This unprotected block graph access and the assumption that
->file->bs exists could be problematic. What is the best practice for
making this code safe and defensive?

> +id_ns->rescap = block_pr_cap_to_nvme(blk_pr_cap);
>  }
>  
>  static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature

Re: [PATCH v6 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-07-04 Thread Stefan Hajnoczi

On Thu, Jun 13, 2024 at 03:13:22PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/scsi/scsi-disk.c | 352 
>  1 file changed, 352 insertions(+)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..0e964dbd87 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -42,6 +43,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "qom/object.h"
> +#include "block/block_int.h"
>  
>  #ifdef __linux
>  #include 
> @@ -1474,6 +1476,346 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(&r->req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +void *req;
> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +void *req;
> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(&r->req);
> +num_keys = MIN(blk_keys->num_keys, ret);
> +blk_keys->generation = cpu_to_be32(blk_keys->generation);
> +memcpy(&buf[0], &blk_keys->generation, 4);
> +for (int i = 0; i < num_keys; i++) {
> +blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
> +memcpy(&buf[8 + i * 8], &blk_keys->keys[i], 8);
> +}
> +num_keys = cpu_to_be32(num_keys * 8);
> +memcpy(&buf[4], &num_keys, 4);
> +
> +scsi_req_data(&r->req, r->buflen);
> +done:
> +scsi_req_unref(&r->req);
> +g_free(blk_keys->keys);
> +g_free(blk_keys);
> +}
> +
> +static int scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
> +{
> +SCSIPrReadKeys *blk_keys;
> +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
> +int buflen = MIN(r->req.cmd.xfer, r->buflen);
> +int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);
> +
> +blk_keys = g_new0(SCSIPrReadKeys, 1);
> +blk_keys->generation = 0;
> +/* num_keys is the maximum number of keys that can be transmitted */
> +blk_keys->num_keys = num_keys;
> +blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
> +blk_keys->req = r;
> +
> +/* The request is used as the AIO opaque value, so add a ref.  */
> +scsi_req_ref(&r->req);
> +r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
> &blk_keys->generation,
> +blk_keys->num_keys, blk_keys->keys,
> +scsi_pr_read_keys_complete, 
> blk_keys);
> +return 0;
> +}
> +
> +static void scsi_pr_read_reservation_complete(void *opaque, int ret)
> +{
> +uint8_t *buf;
> +uint32_t additional_len = 0;
> +SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_rsv->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(&r->req);
> +blk_rsv->generation = cpu_to_be32(blk_rsv->generation);
> +memcpy(&buf[0], &blk_rsv->generation, 4);
> +if (ret) {
> +additional_len = cpu_to_be32(16);
> +blk_rsv->key = cpu_to_be64(blk_rsv->key);
> +memcpy(&buf[8], &blk_rsv->key, 8);
> +buf[21] = block_pr_type_to_scsi(blk_rsv->type) & 0xf;
> +} else {
> +additional_len =

Re: [PATCH v6 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-07-04 Thread Stefan Hajnoczi

On Thu, Jun 13, 2024 at 03:13:22PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 

As mentioned in my reply to a previous version, I don't understand the
buffer allocation/sizing in hw/scsi/ so I haven't been able to fully
review this code for buffer overflows and input validation. cmd.xfer
isn't consistently used for size checks in the new functions. Maybe some
checks are missing?

> ---
>  hw/scsi/scsi-disk.c | 352 
>  1 file changed, 352 insertions(+)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..0e964dbd87 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -42,6 +43,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "qom/object.h"
> +#include "block/block_int.h"
>  
>  #ifdef __linux
>  #include 
> @@ -1474,6 +1476,346 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(&r->req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +void *req;

Why is this field void * instead of SCSIDiskReq *?

> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +void *req;

Same here.

> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(&r->req);
> +num_keys = MIN(blk_keys->num_keys, ret);
> +blk_keys->generation = cpu_to_be32(blk_keys->generation);
> +memcpy(&buf[0], &blk_keys->generation, 4);
> +for (int i = 0; i < num_keys; i++) {
> +blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
> +memcpy(&buf[8 + i * 8], &blk_keys->keys[i], 8);
> +}
> +num_keys = cpu_to_be32(num_keys * 8);
> +memcpy(&buf[4], &num_keys, 4);
> +
> +scsi_req_data(&r->req, r->buflen);
> +done:
> +scsi_req_unref(&r->req);
> +g_free(blk_keys->keys);
> +g_free(blk_keys);
> +}
> +
> +static int scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
> +{
> +SCSIPrReadKeys *blk_keys;
> +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
> +int buflen = MIN(r->req.cmd.xfer, r->buflen);
> +int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);

If buflen is an untrusted input then num_keys < 0 and maybe num_keys ==
0 need to be rejected with an error.

> +
> +blk_keys = g_new0(SCSIPrReadKeys, 1);
> +blk_keys->generation = 0;
> +/* num_keys is the maximum number of keys that can be transmitted */
> +blk_keys->num_keys = num_keys;
> +blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
> +blk_keys->req = r;
> +
> +/* The request is used as the AIO opaque value, so add a ref.  */
> +scsi_req_ref(&r->req);
> +r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
> &blk_keys->generation,
> +blk_keys->num_keys, blk_keys->keys,
> +scsi_pr_read_keys_complete, 
> blk_keys);
> +return 0;
> +}
> +
> +static void scsi_pr_read_reservation_complete(void *opaque, int ret)
> +{
> +uint8_t *buf;
> +uint32_t additional_len = 0;
> +SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_rsv->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_

1 2 3 4 >

1 - 100 of 319 matches

Mail list logo