date:20230125

Re: [PATCH v4 3/5] hw/nvram/eeprom_at24c: Add init_rom field and at24c_eeprom_init_rom helper

2023-01-25 Thread Cédric Le Goater


Hello Ninad,

On 1/25/23 17:53, Ninad S Palsule wrote:

Signed-off-by: Peter Delevoryas pe...@pjd.dev 

Reviewed-by: Joel Stanley j...@jms.id.au 

Tested-by: Ninad Palsule ninadpals...@us.ibm.com 


Hi Peter,

I applied your patches and made sure that different EEPROM images can be loaded 
from

appropriate image files and it is working as expected.


May be you could contribute an eeprom qtest ? I would put the data under
tests/data/eeprom.

Thanks,

C.




# Used following command to invoke the qemu.

qemu-system-arm -M rainier-bmc -nographic \

   -kernel fitImage-linux.bin \

   -dtb aspeed-bmc-ibm-rainier.dtb \

   -initrd obmc-phosphor-initramfs.rootfs.cpio.xz \

   -drive file=obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2 \

   -append "rootwait console=ttyS4,115200n8 root=PARTLABEL=rofs-a" \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.0,address=0x51,drive=a,rom-size=32768 -drive 
file=tpm.eeprom.bin,format=raw,if=none,id=a \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.7,address=0x50,drive=b,rom-size=65536 -drive 
file=oppanel.eeprom.bin,format=raw,if=none,id=b \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.7,address=0x51,drive=c,rom-size=65536 -drive 
file=lcd.eeprom.bin,format=raw,if=none,id=c \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.8,address=0x50,drive=d,rom-size=65536 -drive 
file=baseboard.eeprom.bin,format=raw,if=none,id=d \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.8,address=0x51,drive=e,rom-size=65536 -drive 
file=bmc.eeprom.bin,format=raw,if=none,id=e \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.9,address=0x50,drive=f,rom-size=131072 -drive 
file=vrm.eeprom.bin,format=raw,if=none,id=f \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.10,address=0x50,drive=g,rom-size=131072 -drive 
file=vrm.eeprom.bin,format=raw,if=none,id=g \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.13,address=0x50,drive=h,rom-size=65536 -drive 
file=nvme.eeprom.bin,format=raw,if=none,id=h \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.14,address=0x50,drive=i,rom-size=65536 -drive 
file=nvme.eeprom.bin,format=raw,if=none,id=i \

   -device 
at24c-eeprom,bus=aspeed.i2c.bus.15,address=0x50,drive=j,rom-size=65536 -drive 
file=nvme.eeprom.bin,format=raw,if=none,id=j

Re: [PATCH v2 0/7] hw/cxl: RAS error emulation and injection

2023-01-25 Thread Ira Weiny

Jonathan Cameron wrote:
> v2: Thanks to Mike Maslenkin for review.
> - Fix wrong parameter type to ct3d_qmp_cor_err_to_cxl()
> - Rework use of CXLError local variable in ct3d_reg_write() to improve
>   code readability.
> 
> CXL error reporting is complex. This series only covers the protocol
> related errors reported via PCIE AER - Ira Weiny has posted support for
> Event log based injection and I will post an update of Poison list injection
> shortly. My proposal is to upstream this one first, followed by Ira's Event
> Log series, then finally the Poison List handling. That is based on likely
> order of Linux kernel support (the support for this type of error reporting
> went in during the recent merge window, the others are still under review).
> Note we may propose other non error related features in between!
> The current revisions of all the error injection can be found at:
> https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-11

Thanks!

I see all of the patches for the event log stuff has landed in this
tree.

I see the following:

1) I have cleanup patches for[*]
a) The timestamp change
b) the g_new0() allocation

2)  [PATCH v2 7/8] bswap: Add the ability to store to an unaligned 24 
bit field
Was left alone.  I'm good with that.  But did you said you
wanted to move it into the CXL specific code.  Did you
change your mind?

3) Thank you so much for fixing the optional variable stuff!  :-D

4) And thanks for the CXLRetCode fix.  Thanks!

5) In the latest code from 1/20 I see you fixed the static const
   UUID,  Thanks!

For the event stuff I have tested what is on this branch with the cleanup
patches.

I was not sure if you wanted me to re-roll them or just send fixes
patches.  But I'd like to move forward with the fixes submitted if that is
ok.  Those are all minor issues which don't affect the behavior much at
this point.

[*] 
https://lore.kernel.org/all/20230125-ira-cxl-events-fixups-2023-01-11-v1-0-193137851...@intel.com/

Thank you,
Ira

> 
> In order to test the kernel support for RAS error handling, I previously
> provided this series via gitlab, enabling David Jiang's kernel patches
> to be tested.
> 
> Now that Linux kernel support is upstream, this series is proposing the
> support for upstream inclusion in QEMU. Note that support for Multiple
> Header Recording has been added to QEMU the meantime and a kernel
> patch to use that feature sent out.
> 
> https://lore.kernel.org/linux-cxl/20230113154058.16227-1-jonathan.came...@huawei.com/T/#t
> 
> There are two generic PCI AER precursor feature additions.
> 1) The PCI_ERR_UCOR_MASK register has not been implemented until now
>and is necessary for correct emulation.
> 2) The routing for AER errors, via existing AER error injection, only
>covered one of two paths given in the PCIe base specification,
>unfortunately not the one used by the Linux kernel CXL support.
> 
> The use of MSI for the CXL root ports, both makes sense from the point
> of view of how it may well be implemented, and works around the documented
> lack of PCI interrupt routing in i386/q35. I have a hack that lets
> us correctly route those interrupts but don't currently plan to post it.
> 
> The actual CXL error injection uses a new QMP interface as documented
> in the final patch description. The existing AER error injection
> internals are reused though it's HMP interface is not.
> 
> Injection via QMP:
> { "execute": "qmp_capabilities" }
> ...
> { "execute": "cxl-inject-uncorrectable-errors",
>   "arguments": {
> "path": "/machine/peripheral/cxl-pmem0",
> "errors": [
> {
> "type": "cache-address-parity",
> "header": [ 3, 4]
> },
> {
> "type": "cache-data-parity",
> "header": 
> [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
> },
> {
> "type": "internal",
> "header": [ 1, 2, 4]
> }
> ]
>   }}
> ...
> { "execute": "cxl-inject-correctable-error",
> "arguments": {
> "path": "/machine/peripheral/cxl-pmem0",
> "type": "physical",
> "header": [ 3, 4]
> } }
> 
> Based on top of:
> https://lore.kernel.org/all/20230112102644.27830-1-jonathan.came...@huawei.com/
> [PATCH v2 0/8] hw/cxl: CXL emulation cleanups and minor fix

[PATCH 1/2] hw/cxl: Fix event log time stamp fields

2023-01-25 Thread Ira Weiny

CXL 3.0 8.2.9.4.2 Set Timestamp and 8.2.9.4.1 Get Timestamp define the
way for software to set and get the time stamp of a device.  Events
should use a time stamp consistent with the Get Timestamp mailbox
command.

In addition avoid setting the time stamp twice.

Fixes: fb64c5661d5f ("hw/cxl/events: Wire up get/clear event mailbox commands")
Reported-by: Jonathan Cameron 
Signed-off-by: Ira Weiny 
---
 hw/cxl/cxl-device-utils.c   | 15 +++
 hw/cxl/cxl-events.c |  4 +++-
 hw/cxl/cxl-mailbox-utils.c  | 11 +--
 hw/mem/cxl_type3.c  |  1 -
 include/hw/cxl/cxl_device.h |  2 ++
 5 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 7f29d40be04a..5876a3703e85 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -325,3 +325,18 @@ void cxl_device_register_init_swcci(CXLDeviceState 
*cxl_dstate)
 
 cxl_initialize_mailbox(cxl_dstate, true);
 }
+
+uint64_t cxl_device_get_timestamp(CXLDeviceState *cxl_dstate)
+{
+uint64_t time, delta;
+uint64_t final_time = 0;
+
+if (cxl_dstate->timestamp.set) {
+/* First find the delta from the last time the host set the time. */
+time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+delta = time - cxl_dstate->timestamp.last_set;
+final_time = cxl_dstate->timestamp.host_set + delta;
+}
+
+return final_time;
+}
diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
index 08fd52b66188..2536aafc55fb 100644
--- a/hw/cxl/cxl-events.c
+++ b/hw/cxl/cxl-events.c
@@ -100,7 +100,7 @@ bool cxl_event_insert(CXLDeviceState *cxlds,
   enum cxl_event_log_type log_type,
   struct cxl_event_record_raw *event)
 {
-uint64_t time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+uint64_t time;
 struct cxl_event_log *log;
 CXLEvent *entry;
 
@@ -108,6 +108,8 @@ bool cxl_event_insert(CXLDeviceState *cxlds,
 return false;
 }
 
+time = cxl_device_get_timestamp(cxlds);
+
 log = &cxlds->event_logs[log_type];
 
 QEMU_LOCK_GUARD(&log->lock);
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 75703023434b..0e64873c2395 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -394,17 +394,8 @@ static CXLRetCode cmd_timestamp_get(struct cxl_cmd *cmd,
 CXLDeviceState *cxl_dstate,
 uint16_t *len)
 {
-uint64_t time, delta;
-uint64_t final_time = 0;
-
-if (cxl_dstate->timestamp.set) {
-/* First find the delta from the last time the host set the time. */
-time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-delta = time - cxl_dstate->timestamp.last_set;
-final_time = cxl_dstate->timestamp.host_set + delta;
-}
+uint64_t final_time = cxl_device_get_timestamp(cxl_dstate);
 
-/* Then adjust the actual time */
 stq_le_p(cmd->payload, final_time);
 *len = 8;
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a7b587780af2..42e291dd9f76 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1291,7 +1291,6 @@ static void cxl_assign_event_header(struct 
cxl_event_record_hdr *hdr,
 hdr->flags[0] = flags;
 hdr->length = length;
 memcpy(&hdr->id, uuid, sizeof(hdr->id));
-hdr->timestamp = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
 
 static const QemuUUID gen_media_uuid = {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index cbb37c541c44..31579af342f1 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -426,4 +426,6 @@ CXLRetCode cxl_event_clear_records(CXLDeviceState *cxlds,
 
 void cxl_event_irq_assert(CXLType3Dev *ct3d);
 
+uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds);
+
 #endif

-- 
2.39.1

[PATCH 0/2] hw/cxl: CXL Event processing fixups

2023-01-25 Thread Ira Weiny

During review of the CXL Event processing series[1] these minor fixes were 
caught
but I did not have time to respin before Jonathan picked them up.

Make the fixes now.

These are based on Jonathan's latest branch:

https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-20

[1] 
https://lore.kernel.org/all/20221221-ira-cxl-events-2022-11-17-v2-0-2ce2ecc06...@intel.com/

To: Jonathan Cameron 
Cc: Michael Tsirkin 
Cc: Ben Widawsky 
Cc: Peter Maydell 
Cc: 
Cc: 
Signed-off-by: Ira Weiny 

---
Ira Weiny (2):
  hw/cxl: Fix event log time stamp fields
  hw/cxl: Remove check for g_new0() failure

 hw/cxl/cxl-device-utils.c   | 15 +++
 hw/cxl/cxl-events.c | 10 +++---
 hw/cxl/cxl-mailbox-utils.c  | 11 +--
 hw/mem/cxl_type3.c  |  1 -
 include/hw/cxl/cxl_device.h |  2 ++
 5 files changed, 21 insertions(+), 18 deletions(-)
---
base-commit: bb3f9b2853f9723c11a38c6b7bca7368677f2b43
change-id: 20230125-ira-cxl-events-fixups-2023-01-11-337953e87f5d

Best regards,
-- 
Ira Weiny

[PATCH 2/2] hw/cxl: Remove check for g_new0() failure

2023-01-25 Thread Ira Weiny

g_new0() will terminate the application if it fails.  Remove the check.

Fixes: fb64c5661d5f ("hw/cxl/events: Wire up get/clear event mailbox commands")
Reported-by: Jonathan Cameron 
Signed-off-by: Ira Weiny 
---
 hw/cxl/cxl-events.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
index 2536aafc55fb..4cbc838e8ff4 100644
--- a/hw/cxl/cxl-events.c
+++ b/hw/cxl/cxl-events.c
@@ -124,13 +124,7 @@ bool cxl_event_insert(CXLDeviceState *cxlds,
 }
 
 entry = g_new0(CXLEvent, 1);
-if (!entry) {
-error_report("Failed to allocate memory for event log entry");
-return false;
-}
-
 memcpy(&entry->data, event, sizeof(*event));
-
 entry->data.hdr.handle = cpu_to_le16(log->next_handle);
 log->next_handle++;
 /* 0 handle is never valid */

-- 
2.39.1

Re: [QEMU][PATCH v4 07/10] hw/xen/xen-hvm-common: Use g_new and error_setg_errno

2023-01-25 Thread Frediano Ziglio

Il giorno mer 25 gen 2023 alle ore 22:07 Stefano Stabellini
 ha scritto:
>
> On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> > Replace g_malloc with g_new and perror with error_setg_errno.
> >

error_setg_errno -> error_report ?

Also in the title

> > Signed-off-by: Vikram Garhwal 

Frediano

[PATCH v5 3/4] checkpatch: add qemu_bh_new/aio_bh_new checks

2023-01-25 Thread Alexander Bulekov

Advise authors to use the _guarded versions of the APIs, instead.

Signed-off-by: Alexander Bulekov 
---
 scripts/checkpatch.pl | 8 
 1 file changed, 8 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 6ecabfb2b5..61bb4b0a19 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2865,6 +2865,14 @@ sub process {
if ($line =~ /\bsignal\s*\(/ && !($line =~ /SIG_(?:IGN|DFL)/)) {
ERROR("use sigaction to establish signal handlers; 
signal is not portable\n" . $herecurr);
}
+# recommend qemu_bh_new_guarded instead of qemu_bh_new
+if ($line =~ /\bqemu_bh_new\s*\(/) {
+   ERROR("use qemu_bh_new_guarded() instead of 
qemu_bh_new() to avoid reentrancy problems\n" . $herecurr);
+   }
+# recommend aio_bh_new_guarded instead of aio_bh_new
+if ($line =~ /\baio_bh_new\s*\(/) {
+   ERROR("use aio_bh_new_guarded() instead of aio_bh_new() 
to avoid reentrancy problems\n" . $herecurr);
+   }
 # check for module_init(), use category-specific init macros explicitly please
if ($line =~ /^module_init\s*\(/) {
ERROR("please use block_init(), type_init() etc. 
instead of module_init()\n" . $herecurr);
-- 
2.39.0

[PATCH v5 4/4] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

2023-01-25 Thread Alexander Bulekov

This protects devices from bh->mmio reentrancy issues.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Alexander Bulekov 
---
 hw/9pfs/xen-9p-backend.c| 4 +++-
 hw/block/dataplane/virtio-blk.c | 3 ++-
 hw/block/dataplane/xen-block.c  | 5 +++--
 hw/block/virtio-blk.c   | 5 +++--
 hw/char/virtio-serial-bus.c | 3 ++-
 hw/display/qxl.c| 9 ++---
 hw/display/virtio-gpu.c | 6 --
 hw/ide/ahci.c   | 3 ++-
 hw/ide/core.c   | 3 ++-
 hw/misc/imx_rngc.c  | 6 --
 hw/misc/macio/mac_dbdma.c   | 2 +-
 hw/net/virtio-net.c | 3 ++-
 hw/nvme/ctrl.c  | 6 --
 hw/scsi/mptsas.c| 3 ++-
 hw/scsi/scsi-bus.c  | 3 ++-
 hw/scsi/vmw_pvscsi.c| 3 ++-
 hw/usb/dev-uas.c| 3 ++-
 hw/usb/hcd-dwc2.c   | 3 ++-
 hw/usb/hcd-ehci.c   | 3 ++-
 hw/usb/hcd-uhci.c   | 2 +-
 hw/usb/host-libusb.c| 6 --
 hw/usb/redirect.c   | 6 --
 hw/usb/xen-usb.c| 3 ++-
 hw/virtio/virtio-balloon.c  | 5 +++--
 hw/virtio/virtio-crypto.c   | 3 ++-
 25 files changed, 66 insertions(+), 35 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 65c4979c3c..f077c1b255 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -441,7 +441,9 @@ static int xen_9pfs_connect(struct XenLegacyDevice *xendev)
 xen_9pdev->rings[i].ring.out = xen_9pdev->rings[i].data +
XEN_FLEX_RING_SIZE(ring_order);
 
-xen_9pdev->rings[i].bh = qemu_bh_new(xen_9pfs_bh, 
&xen_9pdev->rings[i]);
+xen_9pdev->rings[i].bh = qemu_bh_new_guarded(xen_9pfs_bh,
+ &xen_9pdev->rings[i],
+ 
&DEVICE(xen_9pdev)->mem_reentrancy_guard);
 xen_9pdev->rings[i].out_cons = 0;
 xen_9pdev->rings[i].out_size = 0;
 xen_9pdev->rings[i].inprogress = false;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 26f965cabc..191a8c90aa 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -127,7 +127,8 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 } else {
 s->ctx = qemu_get_aio_context();
 }
-s->bh = aio_bh_new(s->ctx, notify_guest_bh, s);
+s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
+   &DEVICE(s)->mem_reentrancy_guard);
 s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 2785b9e849..e31806b317 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -632,8 +632,9 @@ XenBlockDataPlane *xen_block_dataplane_create(XenDevice 
*xendev,
 } else {
 dataplane->ctx = qemu_get_aio_context();
 }
-dataplane->bh = aio_bh_new(dataplane->ctx, xen_block_dataplane_bh,
-   dataplane);
+dataplane->bh = aio_bh_new_guarded(dataplane->ctx, xen_block_dataplane_bh,
+   dataplane,
+   &DEVICE(xendev)->mem_reentrancy_guard);
 
 return dataplane;
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f717550fdc..e9f516e633 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -866,8 +866,9 @@ static void virtio_blk_dma_restart_cb(void *opaque, bool 
running,
  * requests will be processed while starting the data plane.
  */
 if (!s->bh && !virtio_bus_ioeventfd_enabled(bus)) {
-s->bh = aio_bh_new(blk_get_aio_context(s->conf.conf.blk),
-   virtio_blk_dma_restart_bh, s);
+s->bh = aio_bh_new_guarded(blk_get_aio_context(s->conf.conf.blk),
+   virtio_blk_dma_restart_bh, s,
+   &DEVICE(s)->mem_reentrancy_guard);
 blk_inc_in_flight(s->conf.conf.blk);
 qemu_bh_schedule(s->bh);
 }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7d4601cb5d..dd619f0731 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -985,7 +985,8 @@ static void virtser_port_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-port->bh = qemu_bh_new(flush_queued_data_bh, port);
+port->bh = qemu_bh_new_guarded(flush_queued_data_bh, port,
+   &dev->mem_reentrancy_guard);
 port->elem = NULL;
 }
 
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 6772849dec..67efa3c3ef 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2223,11 +2223,14 @@ static void qxl_realize_common(PCIQXLDevice *qxl, Error 
**errp)
 
 qemu_add_vm_change_state_handler(qxl_vm_change_state_handler, qxl);
 
-qxl->update_

[PATCH v5 2/4] async: Add an optional reentrancy guard to the BH API

2023-01-25 Thread Alexander Bulekov

Devices can pass their MemoryReentrancyGuard (from their DeviceState),
when creating new BHes. Then, the async API will toggle the guard
before/after calling the BH call-back. This prevents bh->mmio reentrancy
issues.

Signed-off-by: Alexander Bulekov 
---
 docs/devel/multiple-iothreads.txt |  7 +++
 include/block/aio.h   | 18 --
 include/qemu/main-loop.h  |  7 +--
 tests/unit/ptimer-test-stubs.c|  3 ++-
 util/async.c  | 18 +-
 util/main-loop.c  |  5 +++--
 util/trace-events |  1 +
 7 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/docs/devel/multiple-iothreads.txt 
b/docs/devel/multiple-iothreads.txt
index 343120f2ef..a3e949f6b3 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -61,6 +61,7 @@ There are several old APIs that use the main loop AioContext:
  * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
  * LEGACY timer_new_ms() - create a timer
  * LEGACY qemu_bh_new() - create a BH
+ * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
  * LEGACY qemu_aio_wait() - run an event loop iteration
 
 Since they implicitly work on the main loop they cannot be used in code that
@@ -72,8 +73,14 @@ Instead, use the AioContext functions directly (see 
include/block/aio.h):
  * aio_set_event_notifier() - monitor an event notifier
  * aio_timer_new() - create a timer
  * aio_bh_new() - create a BH
+ * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
  * aio_poll() - run an event loop iteration
 
+The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
+argument, which is used to check for and prevent re-entrancy problems. For
+BHs associated with devices, the reentrancy-guard is contained in the
+corresponding DeviceState and named "mem_reentrancy_guard".
+
 The AioContext can be obtained from the IOThread using
 iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
 Code that takes an AioContext argument works both in IOThreads or the main
diff --git a/include/block/aio.h b/include/block/aio.h
index 0f65a3cc9e..94d661ff7e 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -23,6 +23,8 @@
 #include "qemu/thread.h"
 #include "qemu/timer.h"
 #include "block/graph-lock.h"
+#include "hw/qdev-core.h"
+
 
 typedef struct BlockAIOCB BlockAIOCB;
 typedef void BlockCompletionFunc(void *opaque, int ret);
@@ -332,9 +334,11 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, 
QEMUBHFunc *cb, void *opaque,
  * is opaque and must be allocated prior to its use.
  *
  * @name: A human-readable identifier for debugging purposes.
+ * @reentrancy_guard: A guard set when entering a cb to prevent
+ * device-reentrancy issues
  */
 QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
-const char *name);
+const char *name, MemReentrancyGuard 
*reentrancy_guard);
 
 /**
  * aio_bh_new: Allocate a new bottom half structure
@@ -343,7 +347,17 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, 
void *opaque,
  * string.
  */
 #define aio_bh_new(ctx, cb, opaque) \
-aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
+aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), NULL)
+
+/**
+ * aio_bh_new_guarded: Allocate a new bottom half structure with a
+ * reentrancy_guard
+ *
+ * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
+ * string.
+ */
+#define aio_bh_new_guarded(ctx, cb, opaque, guard) \
+aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), guard)
 
 /**
  * aio_notify: Force processing of pending events.
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index c25f390696..84d1ce57f0 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -389,9 +389,12 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
 
 void qemu_fd_register(int fd);
 
+#define qemu_bh_new_guarded(cb, opaque, guard) \
+qemu_bh_new_full((cb), (opaque), (stringify(cb)), guard)
 #define qemu_bh_new(cb, opaque) \
-qemu_bh_new_full((cb), (opaque), (stringify(cb)))
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
+qemu_bh_new_full((cb), (opaque), (stringify(cb)), NULL)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
+ MemReentrancyGuard *reentrancy_guard);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 
 enum {
diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
index f5e75a96b6..24d5413f9d 100644
--- a/tests/unit/ptimer-test-stubs.c
+++ b/tests/unit/ptimer-test-stubs.c
@@ -107,7 +107,8 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int 
attr_mask)
 return deadline;
 }
 
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *n

[PATCH v5 0/4] memory: prevent dma-reentracy issues

2023-01-25 Thread Alexander Bulekov

These patches aim to solve two types of DMA-reentrancy issues:

1.) mmio -> dma -> mmio case
To solve this, we track whether the device is engaged in io by
checking/setting a reentrancy-guard within APIs used for MMIO access.

2.) bh -> dma write -> mmio case
This case is trickier, since we dont have a generic way to associate a
bh with the underlying Device/DeviceState. Thus, this version allows a
device to associate a reentrancy-guard with a bh, when creating it.
(Instead of calling qemu_bh_new, you call qemu_bh_new_guarded)

I replaced most of the qemu_bh_new invocations with the guarded analog,
except for the ones where the DeviceState was not trivially accessible

v4-> v5: 
- Add corresponding checkpatch checks
- Save/restore reentrancy-flag when entering/exiting BHs
- Improve documentation
- Check object_dynamic_cast return value

v3 -> v4: Instead of changing all of the DMA APIs, instead add an
optional reentrancy guard to the BH API.

v2 -> v3: Bite the bullet and modify the DMA APIs, rather than
attempting to guess DeviceStates in BHs.

Alexander Bulekov (4):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  checkpatch: add qemu_bh_new/aio_bh_new checks
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

 docs/devel/multiple-iothreads.txt |  7 +++
 hw/9pfs/xen-9p-backend.c  |  4 +++-
 hw/block/dataplane/virtio-blk.c   |  3 ++-
 hw/block/dataplane/xen-block.c|  5 +++--
 hw/block/virtio-blk.c |  5 +++--
 hw/char/virtio-serial-bus.c   |  3 ++-
 hw/display/qxl.c  |  9 ++---
 hw/display/virtio-gpu.c   |  6 --
 hw/ide/ahci.c |  3 ++-
 hw/ide/core.c |  3 ++-
 hw/misc/imx_rngc.c|  6 --
 hw/misc/macio/mac_dbdma.c |  2 +-
 hw/net/virtio-net.c   |  3 ++-
 hw/nvme/ctrl.c|  6 --
 hw/scsi/mptsas.c  |  3 ++-
 hw/scsi/scsi-bus.c|  3 ++-
 hw/scsi/vmw_pvscsi.c  |  3 ++-
 hw/usb/dev-uas.c  |  3 ++-
 hw/usb/hcd-dwc2.c |  3 ++-
 hw/usb/hcd-ehci.c |  3 ++-
 hw/usb/hcd-uhci.c |  2 +-
 hw/usb/host-libusb.c  |  6 --
 hw/usb/redirect.c |  6 --
 hw/usb/xen-usb.c  |  3 ++-
 hw/virtio/virtio-balloon.c|  5 +++--
 hw/virtio/virtio-crypto.c |  3 ++-
 include/block/aio.h   | 18 --
 include/hw/qdev-core.h|  7 +++
 include/qemu/main-loop.h  |  7 +--
 scripts/checkpatch.pl |  8 
 softmmu/memory.c  | 17 +
 softmmu/trace-events  |  1 +
 tests/unit/ptimer-test-stubs.c|  3 ++-
 util/async.c  | 18 +-
 util/main-loop.c  |  5 +++--
 util/trace-events |  1 +
 36 files changed, 150 insertions(+), 43 deletions(-)

-- 
2.39.0

[PATCH v5 1/4] memory: prevent dma-reentracy issues

2023-01-25 Thread Alexander Bulekov

Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA.  The purpose of this
flag is to prevent two types of DMA-based reentrancy issues:

1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case

These issues have led to problems such as stack-exhaustion and
use-after-frees.

Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/cafeaca_23vc7he3iam-jva6w38lk4hjowae5kcknhprd5fp...@mail.gmail.com

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Alexander Bulekov 
---
 include/hw/qdev-core.h |  7 +++
 softmmu/memory.c   | 17 +
 softmmu/trace-events   |  1 +
 3 files changed, 25 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 35fddb19a6..8858195262 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -162,6 +162,10 @@ struct NamedClockList {
 QLIST_ENTRY(NamedClockList) node;
 };
 
+typedef struct {
+bool engaged_in_io;
+} MemReentrancyGuard;
+
 /**
  * DeviceState:
  * @realized: Indicates whether the device has been fully constructed.
@@ -194,6 +198,9 @@ struct DeviceState {
 int alias_required_for_version;
 ResettableState reset;
 GSList *unplug_blockers;
+
+/* Is the device currently in mmio/pio/dma? Used to prevent re-entrancy */
+MemReentrancyGuard mem_reentrancy_guard;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index e05332d07f..daffb48493 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -533,6 +533,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 uint64_t access_mask;
 unsigned access_size;
 unsigned i;
+DeviceState *dev = NULL;
 MemTxResult r = MEMTX_OK;
 
 if (!access_size_min) {
@@ -542,6 +543,19 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 access_size_max = 4;
 }
 
+/* Do not allow more than one simultanous access to a device's IO Regions 
*/
+if (mr->owner &&
+!mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+if (dev) {
+if (dev->mem_reentrancy_guard.engaged_in_io) {
+trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, 
size);
+return MEMTX_ERROR;
+}
+dev->mem_reentrancy_guard.engaged_in_io = true;
+}
+}
+
 /* FIXME: support unaligned access? */
 access_size = MAX(MIN(size, access_size_max), access_size_min);
 access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -556,6 +570,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 access_mask, attrs);
 }
 }
+if (dev) {
+dev->mem_reentrancy_guard.engaged_in_io = false;
+}
 return r;
 }
 
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 22606dc27b..62d04ea9a7 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t 
addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t 
value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 
0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t 
value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size 
%u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t 
value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size 
%u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned 
size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t 
value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, 
uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" 
size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr 
'%s' listener '%s' synced (global=%d)"
-- 
2.39.0

[PATCH v5 0/4] memory: prevent dma-reentracy issues

2023-01-25 Thread Alexander Bulekov

These patches aim to solve two types of DMA-reentrancy issues:

1.) mmio -> dma -> mmio case
To solve this, we track whether the device is engaged in io by
checking/setting a reentrancy-guard within APIs used for MMIO access.

2.) bh -> dma write -> mmio case
This case is trickier, since we dont have a generic way to associate a
bh with the underlying Device/DeviceState. Thus, this version allows a
device to associate a reentrancy-guard with a bh, when creating it.
(Instead of calling qemu_bh_new, you call qemu_bh_new_guarded)

I replaced most of the qemu_bh_new invocations with the guarded analog,
except for the ones where the DeviceState was not trivially accessible

v4-> v5: 
- Add corresponding checkpatch checks
- Save/restore reentrancy-flag when entering/exiting BHs
- Improve documentation
- Check object_dynamic_cast return value

v3 -> v4: Instead of changing all of the DMA APIs, instead add an
optional reentrancy guard to the BH API.

v2 -> v3: Bite the bullet and modify the DMA APIs, rather than
attempting to guess DeviceStates in BHs.

Alexander Bulekov (4):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  checkpatch: add qemu_bh_new/aio_bh_new checks
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

 docs/devel/multiple-iothreads.txt |  7 +++
 hw/9pfs/xen-9p-backend.c  |  4 +++-
 hw/block/dataplane/virtio-blk.c   |  3 ++-
 hw/block/dataplane/xen-block.c|  5 +++--
 hw/block/virtio-blk.c |  5 +++--
 hw/char/virtio-serial-bus.c   |  3 ++-
 hw/display/qxl.c  |  9 ++---
 hw/display/virtio-gpu.c   |  6 --
 hw/ide/ahci.c |  3 ++-
 hw/ide/core.c |  3 ++-
 hw/misc/imx_rngc.c|  6 --
 hw/misc/macio/mac_dbdma.c |  2 +-
 hw/net/virtio-net.c   |  3 ++-
 hw/nvme/ctrl.c|  6 --
 hw/scsi/mptsas.c  |  3 ++-
 hw/scsi/scsi-bus.c|  3 ++-
 hw/scsi/vmw_pvscsi.c  |  3 ++-
 hw/usb/dev-uas.c  |  3 ++-
 hw/usb/hcd-dwc2.c |  3 ++-
 hw/usb/hcd-ehci.c |  3 ++-
 hw/usb/hcd-uhci.c |  2 +-
 hw/usb/host-libusb.c  |  6 --
 hw/usb/redirect.c |  6 --
 hw/usb/xen-usb.c  |  3 ++-
 hw/virtio/virtio-balloon.c|  5 +++--
 hw/virtio/virtio-crypto.c |  3 ++-
 include/block/aio.h   | 18 --
 include/hw/qdev-core.h|  7 +++
 include/qemu/main-loop.h  |  7 +--
 scripts/checkpatch.pl |  8 
 softmmu/memory.c  | 17 +
 softmmu/trace-events  |  1 +
 tests/unit/ptimer-test-stubs.c|  3 ++-
 util/async.c  | 18 +-
 util/main-loop.c  |  5 +++--
 util/trace-events |  1 +
 36 files changed, 150 insertions(+), 43 deletions(-)

-- 
2.39.0

[PATCH v5 1/4] memory: prevent dma-reentracy issues

2023-01-25 Thread Alexander Bulekov

Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA.  The purpose of this
flag is to prevent two types of DMA-based reentrancy issues:

1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case

These issues have led to problems such as stack-exhaustion and
use-after-frees.

Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/cafeaca_23vc7he3iam-jva6w38lk4hjowae5kcknhprd5fp...@mail.gmail.com

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Alexander Bulekov 
---
 include/hw/qdev-core.h |  7 +++
 softmmu/memory.c   | 17 +
 softmmu/trace-events   |  1 +
 3 files changed, 25 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 35fddb19a6..8858195262 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -162,6 +162,10 @@ struct NamedClockList {
 QLIST_ENTRY(NamedClockList) node;
 };
 
+typedef struct {
+bool engaged_in_io;
+} MemReentrancyGuard;
+
 /**
  * DeviceState:
  * @realized: Indicates whether the device has been fully constructed.
@@ -194,6 +198,9 @@ struct DeviceState {
 int alias_required_for_version;
 ResettableState reset;
 GSList *unplug_blockers;
+
+/* Is the device currently in mmio/pio/dma? Used to prevent re-entrancy */
+MemReentrancyGuard mem_reentrancy_guard;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index e05332d07f..daffb48493 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -533,6 +533,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 uint64_t access_mask;
 unsigned access_size;
 unsigned i;
+DeviceState *dev = NULL;
 MemTxResult r = MEMTX_OK;
 
 if (!access_size_min) {
@@ -542,6 +543,19 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 access_size_max = 4;
 }
 
+/* Do not allow more than one simultanous access to a device's IO Regions 
*/
+if (mr->owner &&
+!mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+if (dev) {
+if (dev->mem_reentrancy_guard.engaged_in_io) {
+trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, 
size);
+return MEMTX_ERROR;
+}
+dev->mem_reentrancy_guard.engaged_in_io = true;
+}
+}
+
 /* FIXME: support unaligned access? */
 access_size = MAX(MIN(size, access_size_max), access_size_min);
 access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -556,6 +570,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 access_mask, attrs);
 }
 }
+if (dev) {
+dev->mem_reentrancy_guard.engaged_in_io = false;
+}
 return r;
 }
 
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 22606dc27b..62d04ea9a7 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t 
addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t 
value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 
0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t 
value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size 
%u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t 
value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size 
%u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned 
size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t 
value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, 
uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" 
size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr 
'%s' listener '%s' synced (global=%d)"
-- 
2.39.0

Re: [PATCH v4 1/3] memory: prevent dma-reentracy issues

2023-01-25 Thread Alexander Bulekov

On 230120 1447, Peter Maydell wrote:
> On Fri, 20 Jan 2023 at 14:42, Darren Kenny  wrote:
> > Generally, this looks good, but I do have a comment below...
> >
> > On Thursday, 2023-01-19 at 02:00:02 -05, Alexander Bulekov wrote:
> > > Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> > > This flag is set/checked prior to calling a device's MemoryRegion
> > > handlers, and set when device code initiates DMA.  The purpose of this
> > > flag is to prevent two types of DMA-based reentrancy issues:
> 
> > > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > > index e05332d07f..90ffaaa4f5 100644
> > > --- a/softmmu/memory.c
> > > +++ b/softmmu/memory.c
> > > @@ -533,6 +533,7 @@ static MemTxResult access_with_adjusted_size(hwaddr 
> > > addr,
> > >  uint64_t access_mask;
> > >  unsigned access_size;
> > >  unsigned i;
> > > +DeviceState *dev = NULL;
> > >  MemTxResult r = MEMTX_OK;
> > >
> > >  if (!access_size_min) {
> > > @@ -542,6 +543,17 @@ static MemTxResult access_with_adjusted_size(hwaddr 
> > > addr,
> > >  access_size_max = 4;
> > >  }
> > >
> > > +/* Do not allow more than one simultanous access to a device's IO 
> > > Regions */
> > > +if (mr->owner &&
> > > +!mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) 
> > > {
> > > +dev = (DeviceState *) object_dynamic_cast(mr->owner, 
> > > TYPE_DEVICE);
> >
> > I don't know how likely this is to happen, but according to:
> >
> > - https://qemu-project.gitlab.io/qemu/devel/qom.html#c.object_dynamic_cast
> >
> > it is possible for the object_dynamic_cast() function to return NULL,
> > so it might make sense to wrap the subsequent calls in a test of dev !=
> > NULL.
> 
> Yes. This came up in a previous version of this:
> https://lore.kernel.org/qemu-devel/CAFEAcA8E4nDoAWcj-v-dED-0hDtXGjJNSp3A=kdgf8uocw0...@mail.gmail.com/
> 
> It's generally a bug to call object_dynamic_cast() and then not check
> the return value.
> 

Sorry I missed that - Will be fixed in V5.
-Alex

> thanks
> -- PMM

[PATCH v5 06/36] tcg: Introduce tcg_target_call_oarg_reg

2023-01-25 Thread Richard Henderson

Replace the flat array tcg_target_call_oarg_regs[] with
a function call including the TCGCallReturnKind.

Extend the set of registers for ARM to r0-r3 to match the ABI:
https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#result-return

Reviewed-by: Alex Bennée 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  9 ++---
 tcg/aarch64/tcg-target.c.inc | 10 +++---
 tcg/arm/tcg-target.c.inc | 10 +++---
 tcg/i386/tcg-target.c.inc| 16 ++--
 tcg/loongarch64/tcg-target.c.inc | 10 ++
 tcg/mips/tcg-target.c.inc| 10 ++
 tcg/ppc/tcg-target.c.inc | 10 ++
 tcg/riscv/tcg-target.c.inc   | 10 ++
 tcg/s390x/tcg-target.c.inc   |  9 ++---
 tcg/sparc64/tcg-target.c.inc | 12 ++--
 tcg/tci/tcg-target.c.inc | 12 ++--
 11 files changed, 72 insertions(+), 46 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 644dc53196..72ac76926a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -151,6 +151,7 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg 
val,
 TCGReg base, intptr_t ofs);
 static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target,
  const TCGHelperInfo *info);
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot);
 static bool tcg_target_const_match(int64_t val, TCGType type, int ct);
 #ifdef TCG_TARGET_NEED_LDST_LABELS
 static int tcg_out_ldst_finalize(TCGContext *s);
@@ -740,14 +741,16 @@ static void init_call_layout(TCGHelperInfo *info)
 case dh_typecode_s64:
 info->nr_out = 64 / TCG_TARGET_REG_BITS;
 info->out_kind = TCG_CALL_RET_NORMAL;
-assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
+/* Query the last register now to trigger any assert early. */
+tcg_target_call_oarg_reg(info->out_kind, info->nr_out - 1);
 break;
 case dh_typecode_i128:
 info->nr_out = 128 / TCG_TARGET_REG_BITS;
 info->out_kind = TCG_CALL_RET_NORMAL; /* TODO */
 switch (/* TODO */ TCG_CALL_RET_NORMAL) {
 case TCG_CALL_RET_NORMAL:
-assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
+/* Query the last register now to trigger any assert early. */
+tcg_target_call_oarg_reg(info->out_kind, info->nr_out - 1);
 break;
 case TCG_CALL_RET_BY_REF:
 /*
@@ -4585,7 +4588,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 case TCG_CALL_RET_NORMAL:
 for (i = 0; i < nb_oargs; i++) {
 TCGTemp *ts = arg_temp(op->args[i]);
-TCGReg reg = tcg_target_call_oarg_regs[i];
+TCGReg reg = tcg_target_call_oarg_reg(TCG_CALL_RET_NORMAL, i);
 
 /* ENV should not be modified.  */
 tcg_debug_assert(!temp_readonly(ts));
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index bd6da72678..fde3b30ad1 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -63,9 +63,13 @@ static const int tcg_target_call_iarg_regs[8] = {
 TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
 TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7
 };
-static const int tcg_target_call_oarg_regs[1] = {
-TCG_REG_X0
-};
+
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
+{
+tcg_debug_assert(kind == TCG_CALL_RET_NORMAL);
+tcg_debug_assert(slot >= 0 && slot <= 1);
+return TCG_REG_X0 + slot;
+}
 
 #define TCG_REG_TMP TCG_REG_X30
 #define TCG_VEC_TMP TCG_REG_V31
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 6e9e9b9b3f..d06ac60c15 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -79,9 +79,13 @@ static const int tcg_target_reg_alloc_order[] = {
 static const int tcg_target_call_iarg_regs[4] = {
 TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R3
 };
-static const int tcg_target_call_oarg_regs[2] = {
-TCG_REG_R0, TCG_REG_R1
-};
+
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
+{
+tcg_debug_assert(kind == TCG_CALL_RET_NORMAL);
+tcg_debug_assert(slot >= 0 && slot <= 3);
+return TCG_REG_R0 + slot;
+}
 
 #define TCG_REG_TMP  TCG_REG_R12
 #define TCG_VEC_TMP  TCG_REG_Q15
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 7b573bd287..2f0a9521bf 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -109,12 +109,16 @@ static const int tcg_target_call_iarg_regs[] = {
 #endif
 };
 
-static const int tcg_target_call_oarg_regs[] = {
-TCG_REG_EAX,
-#if TCG_TARGET_REG_BITS == 32
-TCG_REG_EDX
-#endif
-};
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
+{
+switch (kind) {
+case TCG_CALL_RET_NORMAL:
+tcg_debug_assert(slot >= 0 && slot <= 1);
+return slot ? TCG_REG_EDX : TCG_REG_EAX;
+default:
+g_assert_

[PATCH v5 09/36] tcg/i386: Add TCG_TARGET_CALL_{RET,ARG}_I128

2023-01-25 Thread Richard Henderson

Fill in the parameters for the host ABI for Int128.
Adjust tcg_target_call_oarg_reg for _WIN64, and
tcg_out_call for i386 sysv.  Allow TCG_TYPE_V128
stores without AVX enabled.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h | 10 ++
 tcg/i386/tcg-target.c.inc | 30 +-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 5797a55ea0..d4f2a6f8c2 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -100,6 +100,16 @@ typedef enum {
 #endif
 #define TCG_TARGET_CALL_ARG_I32  TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I64  TCG_CALL_ARG_NORMAL
+#if defined(_WIN64)
+# define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_BY_REF
+# define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_BY_VEC
+#elif TCG_TARGET_REG_BITS == 64
+# define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
+# define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
+#else
+# define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
+# define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_BY_REF
+#endif
 
 extern bool have_bmi1;
 extern bool have_popcnt;
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 2f0a9521bf..883ced8168 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -115,6 +115,11 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 case TCG_CALL_RET_NORMAL:
 tcg_debug_assert(slot >= 0 && slot <= 1);
 return slot ? TCG_REG_EDX : TCG_REG_EAX;
+#ifdef _WIN64
+case TCG_CALL_RET_BY_VEC:
+tcg_debug_assert(slot == 0);
+return TCG_REG_XMM0;
+#endif
 default:
 g_assert_not_reached();
 }
@@ -1188,9 +1193,16 @@ static void tcg_out_st(TCGContext *s, TCGType type, 
TCGReg arg,
  * The gvec infrastructure is asserts that v128 vector loads
  * and stores use a 16-byte aligned offset.  Validate that the
  * final pointer is aligned by using an insn that will SIGSEGV.
+ *
+ * This specific instance is also used by TCG_CALL_RET_BY_VEC,
+ * for _WIN64, which must have SSE2 but may not have AVX.
  */
 tcg_debug_assert(arg >= 16);
-tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
+if (have_avx1) {
+tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
+} else {
+tcg_out_modrm_offset(s, OPC_MOVDQA_WxVx, arg, arg1, arg2);
+}
 break;
 case TCG_TYPE_V256:
 /*
@@ -1677,6 +1689,22 @@ static void tcg_out_call(TCGContext *s, const 
tcg_insn_unit *dest,
  const TCGHelperInfo *info)
 {
 tcg_out_branch(s, 1, dest);
+
+#ifndef _WIN32
+if (TCG_TARGET_REG_BITS == 32 && info->out_kind == TCG_CALL_RET_BY_REF) {
+/*
+ * The sysv i386 abi for struct return places a reference as the
+ * first argument of the stack, and pops that argument with the
+ * return statement.  Since we want to retain the aligned stack
+ * pointer for the callee, we do not want to actually push that
+ * argument before the call but rely on the normal store to the
+ * stack slot.  But we do need to compensate for the pop in order
+ * to reset our correct stack pointer value.
+ * Pushing a garbage value back onto the stack is quickest.
+ */
+tcg_out_push(s, TCG_REG_EAX);
+}
+#endif
 }
 
 static void tcg_out_jmp(TCGContext *s, const tcg_insn_unit *dest)
-- 
2.34.1

[PATCH v5 36/36] target/i386: Inline cmpxchg16b

2023-01-25 Thread Richard Henderson

Use tcg_gen_atomic_cmpxchg_i128 for the atomic case,
and tcg_gen_qemu_ld/st_i128 otherwise.

Signed-off-by: Richard Henderson 
---
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
---
 target/i386/helper.h |  4 ---
 target/i386/tcg/mem_helper.c | 69 
 target/i386/tcg/translate.c  | 44 ---
 3 files changed, 39 insertions(+), 78 deletions(-)

diff --git a/target/i386/helper.h b/target/i386/helper.h
index 2df8049f91..e627a93107 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -66,10 +66,6 @@ DEF_HELPER_1(rsm, void, env)
 #endif /* !CONFIG_USER_ONLY */
 
 DEF_HELPER_2(into, void, env, int)
-#ifdef TARGET_X86_64
-DEF_HELPER_2(cmpxchg16b_unlocked, void, env, tl)
-DEF_HELPER_2(cmpxchg16b, void, env, tl)
-#endif
 DEF_HELPER_FLAGS_1(single_step, TCG_CALL_NO_WG, noreturn, env)
 DEF_HELPER_1(rechecking_single_step, void, env)
 DEF_HELPER_1(cpuid, void, env)
diff --git a/target/i386/tcg/mem_helper.c b/target/i386/tcg/mem_helper.c
index 814786bb87..3ef84e90d9 100644
--- a/target/i386/tcg/mem_helper.c
+++ b/target/i386/tcg/mem_helper.c
@@ -27,75 +27,6 @@
 #include "tcg/tcg.h"
 #include "helper-tcg.h"
 
-#ifdef TARGET_X86_64
-void helper_cmpxchg16b_unlocked(CPUX86State *env, target_ulong a0)
-{
-uintptr_t ra = GETPC();
-Int128 oldv, cmpv, newv;
-uint64_t o0, o1;
-int eflags;
-bool success;
-
-if ((a0 & 0xf) != 0) {
-raise_exception_ra(env, EXCP0D_GPF, GETPC());
-}
-eflags = cpu_cc_compute_all(env, CC_OP);
-
-cmpv = int128_make128(env->regs[R_EAX], env->regs[R_EDX]);
-newv = int128_make128(env->regs[R_EBX], env->regs[R_ECX]);
-
-o0 = cpu_ldq_data_ra(env, a0 + 0, ra);
-o1 = cpu_ldq_data_ra(env, a0 + 8, ra);
-
-oldv = int128_make128(o0, o1);
-success = int128_eq(oldv, cmpv);
-if (!success) {
-newv = oldv;
-}
-
-cpu_stq_data_ra(env, a0 + 0, int128_getlo(newv), ra);
-cpu_stq_data_ra(env, a0 + 8, int128_gethi(newv), ra);
-
-if (success) {
-eflags |= CC_Z;
-} else {
-env->regs[R_EAX] = int128_getlo(oldv);
-env->regs[R_EDX] = int128_gethi(oldv);
-eflags &= ~CC_Z;
-}
-CC_SRC = eflags;
-}
-
-void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
-{
-uintptr_t ra = GETPC();
-
-if ((a0 & 0xf) != 0) {
-raise_exception_ra(env, EXCP0D_GPF, ra);
-} else if (HAVE_CMPXCHG128) {
-int eflags = cpu_cc_compute_all(env, CC_OP);
-
-Int128 cmpv = int128_make128(env->regs[R_EAX], env->regs[R_EDX]);
-Int128 newv = int128_make128(env->regs[R_EBX], env->regs[R_ECX]);
-
-int mem_idx = cpu_mmu_index(env, false);
-MemOpIdx oi = make_memop_idx(MO_TE | MO_128 | MO_ALIGN, mem_idx);
-Int128 oldv = cpu_atomic_cmpxchgo_le_mmu(env, a0, cmpv, newv, oi, ra);
-
-if (int128_eq(oldv, cmpv)) {
-eflags |= CC_Z;
-} else {
-env->regs[R_EAX] = int128_getlo(oldv);
-env->regs[R_EDX] = int128_gethi(oldv);
-eflags &= ~CC_Z;
-}
-CC_SRC = eflags;
-} else {
-cpu_loop_exit_atomic(env_cpu(env), ra);
-}
-}
-#endif
-
 void helper_boundw(CPUX86State *env, target_ulong a0, int v)
 {
 int low, high;
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index b542b084a6..9d9392b009 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -3053,15 +3053,49 @@ static void gen_cmpxchg8b(DisasContext *s, CPUX86State 
*env, int modrm)
 #ifdef TARGET_X86_64
 static void gen_cmpxchg16b(DisasContext *s, CPUX86State *env, int modrm)
 {
+MemOp mop = MO_TE | MO_128 | MO_ALIGN;
+TCGv_i64 t0, t1;
+TCGv_i128 cmp, val;
+
 gen_lea_modrm(env, s, modrm);
 
-if ((s->prefix & PREFIX_LOCK) &&
-(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cmpxchg16b(cpu_env, s->A0);
+cmp = tcg_temp_new_i128();
+val = tcg_temp_new_i128();
+tcg_gen_concat_i64_i128(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
+tcg_gen_concat_i64_i128(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
+
+/* Only require atomic with LOCK; non-parallel handled in generator. */
+if (s->prefix & PREFIX_LOCK) {
+tcg_gen_atomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, mop);
 } else {
-gen_helper_cmpxchg16b_unlocked(cpu_env, s->A0);
+tcg_gen_nonatomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, 
mop);
 }
-set_cc_op(s, CC_OP_EFLAGS);
+
+tcg_gen_extr_i128_i64(s->T0, s->T1, val);
+tcg_temp_free_i128(cmp);
+tcg_temp_free_i128(val);
+
+/* Determine success after the fact. */
+t0 = tcg_temp_new_i64();
+t1 = tcg_temp_new_i64();
+tcg_gen_xor_i64(t0, s->T0, cpu_regs[R_EAX]);
+tcg_gen_xor_i64(t1, s->T1, cpu_regs[R_EDX]);
+tcg_gen_or_i64(t0, t0, t1);
+tcg_temp_free_i64(t1);
+
+/* Update Z. */
+gen_compute_eflags(s);
+tcg_gen_setcondi_i64(TCG_COND_EQ, t0, t0, 0);
+tcg_gen_deposit_tl(cpu_cc_src, cpu_c

[PATCH v5 07/36] tcg: Add TCG_CALL_RET_BY_VEC

2023-01-25 Thread Richard Henderson

This will be used by _WIN64 to return i128.  Not yet used,
because allocation is not yet enabled.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-internal.h |  1 +
 tcg/tcg.c  | 19 +++
 2 files changed, 20 insertions(+)

diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index 2ec1ea01df..33f1d8b411 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -37,6 +37,7 @@
 typedef enum {
 TCG_CALL_RET_NORMAL, /* by registers */
 TCG_CALL_RET_BY_REF, /* for i128, by reference */
+TCG_CALL_RET_BY_VEC, /* for i128, by vector register */
 } TCGCallReturnKind;
 
 typedef enum {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 72ac76926a..084e3c3a54 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -752,6 +752,10 @@ static void init_call_layout(TCGHelperInfo *info)
 /* Query the last register now to trigger any assert early. */
 tcg_target_call_oarg_reg(info->out_kind, info->nr_out - 1);
 break;
+case TCG_CALL_RET_BY_VEC:
+/* Query the single register now to trigger any assert early. */
+tcg_target_call_oarg_reg(TCG_CALL_RET_BY_VEC, 0);
+break;
 case TCG_CALL_RET_BY_REF:
 /*
  * Allocate the first argument to the output.
@@ -4598,6 +4602,21 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 }
 break;
 
+case TCG_CALL_RET_BY_VEC:
+{
+TCGTemp *ts = arg_temp(op->args[0]);
+
+tcg_debug_assert(ts->base_type == TCG_TYPE_I128);
+tcg_debug_assert(ts->temp_subindex == 0);
+if (!ts->mem_allocated) {
+temp_allocate_frame(s, ts);
+}
+tcg_out_st(s, TCG_TYPE_V128,
+   tcg_target_call_oarg_reg(TCG_CALL_RET_BY_VEC, 0),
+   ts->mem_base->reg, ts->mem_offset);
+}
+/* fall through to mark all parts in memory */
+
 case TCG_CALL_RET_BY_REF:
 /* The callee has performed a write through the reference. */
 for (i = 0; i < nb_oargs; i++) {
-- 
2.34.1

[PATCH v5 08/36] include/qemu/int128: Use Int128 structure for TCI

2023-01-25 Thread Richard Henderson

We are about to allow passing Int128 to/from tcg helper functions,
but libffi doesn't support __int128_t, so use the structure.

In order for atomic128.h to continue working, we must provide
a mechanism to frob between real __int128_t and the structure.
Provide a new union, Int128Alias, for this.  We cannot modify
Int128 itself, as any changed alignment would also break libffi.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/qemu/atomic128.h | 29 +--
 include/qemu/int128.h| 25 +---
 util/int128.c| 42 
 3 files changed, 87 insertions(+), 9 deletions(-)

diff --git a/include/qemu/atomic128.h b/include/qemu/atomic128.h
index adb9a1a260..d0ba0b9c65 100644
--- a/include/qemu/atomic128.h
+++ b/include/qemu/atomic128.h
@@ -44,13 +44,23 @@
 #if defined(CONFIG_ATOMIC128)
 static inline Int128 atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
 {
-return qatomic_cmpxchg__nocheck(ptr, cmp, new);
+Int128Alias r, c, n;
+
+c.s = cmp;
+n.s = new;
+r.i = qatomic_cmpxchg__nocheck((__int128_t *)ptr, c.i, n.i);
+return r.s;
 }
 # define HAVE_CMPXCHG128 1
 #elif defined(CONFIG_CMPXCHG128)
 static inline Int128 atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
 {
-return __sync_val_compare_and_swap_16(ptr, cmp, new);
+Int128Alias r, c, n;
+
+c.s = cmp;
+n.s = new;
+r.i = __sync_val_compare_and_swap_16((__int128_t *)ptr, c.i, n.i);
+return r.s;
 }
 # define HAVE_CMPXCHG128 1
 #elif defined(__aarch64__)
@@ -89,12 +99,18 @@ Int128 QEMU_ERROR("unsupported atomic")
 #if defined(CONFIG_ATOMIC128)
 static inline Int128 atomic16_read(Int128 *ptr)
 {
-return qatomic_read__nocheck(ptr);
+Int128Alias r;
+
+r.i = qatomic_read__nocheck((__int128_t *)ptr);
+return r.s;
 }
 
 static inline void atomic16_set(Int128 *ptr, Int128 val)
 {
-qatomic_set__nocheck(ptr, val);
+Int128Alias v;
+
+v.s = val;
+qatomic_set__nocheck((__int128_t *)ptr, v.i);
 }
 
 # define HAVE_ATOMIC128 1
@@ -132,7 +148,8 @@ static inline void atomic16_set(Int128 *ptr, Int128 val)
 static inline Int128 atomic16_read(Int128 *ptr)
 {
 /* Maybe replace 0 with 0, returning the old value.  */
-return atomic16_cmpxchg(ptr, 0, 0);
+Int128 z = int128_make64(0);
+return atomic16_cmpxchg(ptr, z, z);
 }
 
 static inline void atomic16_set(Int128 *ptr, Int128 val)
@@ -141,7 +158,7 @@ static inline void atomic16_set(Int128 *ptr, Int128 val)
 do {
 cmp = old;
 old = atomic16_cmpxchg(ptr, cmp, val);
-} while (old != cmp);
+} while (int128_ne(old, cmp));
 }
 
 # define HAVE_ATOMIC128 1
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index d2b76ca6ac..f62a46b48c 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -3,7 +3,12 @@
 
 #include "qemu/bswap.h"
 
-#ifdef CONFIG_INT128
+/*
+ * With TCI, we need to use libffi for interfacing with TCG helpers.
+ * But libffi does not support __int128_t, and therefore cannot pass
+ * or return values of this type, force use of the Int128 struct.
+ */
+#if defined(CONFIG_INT128) && !defined(CONFIG_TCG_INTERPRETER)
 typedef __int128_t Int128;
 
 static inline Int128 int128_make64(uint64_t a)
@@ -460,8 +465,7 @@ Int128 int128_divu(Int128, Int128);
 Int128 int128_remu(Int128, Int128);
 Int128 int128_divs(Int128, Int128);
 Int128 int128_rems(Int128, Int128);
-
-#endif /* CONFIG_INT128 */
+#endif /* CONFIG_INT128 && !CONFIG_TCG_INTERPRETER */
 
 static inline void bswap128s(Int128 *s)
 {
@@ -472,4 +476,19 @@ static inline void bswap128s(Int128 *s)
 #define INT128_MAX int128_make128(UINT64_MAX, INT64_MAX)
 #define INT128_MIN int128_make128(0, INT64_MIN)
 
+/*
+ * When compiler supports a 128-bit type, define a combination of
+ * a possible structure and the native types.  Ease parameter passing
+ * via use of the transparent union extension.
+ */
+#ifdef CONFIG_INT128
+typedef union {
+Int128 s;
+__int128_t i;
+__uint128_t u;
+} Int128Alias __attribute__((transparent_union));
+#else
+typedef Int128 Int128Alias;
+#endif /* CONFIG_INT128 */
+
 #endif /* INT128_H */
diff --git a/util/int128.c b/util/int128.c
index ed8f25fef1..df6c6331bd 100644
--- a/util/int128.c
+++ b/util/int128.c
@@ -144,4 +144,46 @@ Int128 int128_rems(Int128 a, Int128 b)
 return r;
 }
 
+#elif defined(CONFIG_TCG_INTERPRETER)
+
+Int128 int128_divu(Int128 a_s, Int128 b_s)
+{
+Int128Alias r, a, b;
+
+a.s = a_s;
+b.s = b_s;
+r.u = a.u / b.u;
+return r.s;
+}
+
+Int128 int128_remu(Int128 a_s, Int128 b_s)
+{
+Int128Alias r, a, b;
+
+a.s = a_s;
+b.s = b_s;
+r.u = a.u % b.u;
+return r.s;
+}
+
+Int128 int128_divs(Int128 a_s, Int128 b_s)
+{
+Int128Alias r, a, b;
+
+a.s = a_s;
+b.s = b_s;
+r.i = a.i / b.i;
+return r.s;
+}
+
+Int128 int128_rems(Int128 a_s, Int128 b_s)
+{
+Int128Alias r, a, b;
+
+a.s = a_s;
+b.s = b_s;
+r.i = a

[PATCH v5 27/36] target/s390x: Use Int128 for return from CKSM

2023-01-25 Thread Richard Henderson

Acked-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/s390x/helper.h | 2 +-
 target/s390x/tcg/mem_helper.c | 7 +++
 target/s390x/tcg/translate.c  | 6 --
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 25c2dd0b3c..03b29efa3e 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -103,7 +103,7 @@ DEF_HELPER_4(tre, i64, env, i64, i64, i64)
 DEF_HELPER_4(trt, i32, env, i32, i64, i64)
 DEF_HELPER_4(trtr, i32, env, i32, i64, i64)
 DEF_HELPER_5(trXX, i32, env, i32, i32, i32, i32)
-DEF_HELPER_4(cksm, i64, env, i64, i64, i64)
+DEF_HELPER_4(cksm, i128, env, i64, i64, i64)
 DEF_HELPER_FLAGS_5(calc_cc, TCG_CALL_NO_RWG_SE, i32, env, i32, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sfpc, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(sfas, TCG_CALL_NO_WG, void, env, i64)
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 9be42851d8..b0b403e23a 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -1350,8 +1350,8 @@ uint32_t HELPER(clclu)(CPUS390XState *env, uint32_t r1, 
uint64_t a2,
 }
 
 /* checksum */
-uint64_t HELPER(cksm)(CPUS390XState *env, uint64_t r1,
-  uint64_t src, uint64_t src_len)
+Int128 HELPER(cksm)(CPUS390XState *env, uint64_t r1,
+uint64_t src, uint64_t src_len)
 {
 uintptr_t ra = GETPC();
 uint64_t max_len, len;
@@ -1392,8 +1392,7 @@ uint64_t HELPER(cksm)(CPUS390XState *env, uint64_t r1,
 env->cc_op = (len == src_len ? 0 : 3);
 
 /* Return both cksm and processed length.  */
-env->retxl = cksm;
-return len;
+return int128_make128(cksm, len);
 }
 
 void HELPER(pack)(CPUS390XState *env, uint32_t len, uint64_t dest, uint64_t 
src)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 8397fe2bd8..1a7aa9e4ae 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2041,11 +2041,13 @@ static DisasJumpType op_cxlgb(DisasContext *s, DisasOps 
*o)
 static DisasJumpType op_cksm(DisasContext *s, DisasOps *o)
 {
 int r2 = get_field(s, r2);
+TCGv_i128 pair = tcg_temp_new_i128();
 TCGv_i64 len = tcg_temp_new_i64();
 
-gen_helper_cksm(len, cpu_env, o->in1, o->in2, regs[r2 + 1]);
+gen_helper_cksm(pair, cpu_env, o->in1, o->in2, regs[r2 + 1]);
 set_cc_static(s);
-return_low128(o->out);
+tcg_gen_extr_i128_i64(o->out, len, pair);
+tcg_temp_free_i128(pair);
 
 tcg_gen_add_i64(regs[r2], regs[r2], len);
 tcg_gen_sub_i64(regs[r2 + 1], regs[r2 + 1], len);
-- 
2.34.1

[PATCH v5 32/36] target/s390x: Use tcg_gen_atomic_cmpxchg_i128 for CDSG

2023-01-25 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
Cc: David Hildenbrand 
Cc: Ilya Leoshkevich 
---
 target/s390x/helper.h|  2 --
 target/s390x/tcg/insn-data.h.inc |  2 +-
 target/s390x/tcg/mem_helper.c| 52 ---
 target/s390x/tcg/translate.c | 60 
 4 files changed, 38 insertions(+), 78 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index bccd3bfca6..341bc51ec2 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -35,8 +35,6 @@ DEF_HELPER_3(cxgb, i128, env, s64, i32)
 DEF_HELPER_3(celgb, i64, env, i64, i32)
 DEF_HELPER_3(cdlgb, i64, env, i64, i32)
 DEF_HELPER_3(cxlgb, i128, env, i64, i32)
-DEF_HELPER_4(cdsg, void, env, i64, i32, i32)
-DEF_HELPER_4(cdsg_parallel, void, env, i64, i32, i32)
 DEF_HELPER_4(csst, i32, env, i32, i64, i64)
 DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 893f4b48db..ea34b4a277 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -276,7 +276,7 @@
 /* COMPARE DOUBLE AND SWAP */
 D(0xbb00, CDS, RS_a,  Z,   r3_D32, r1_D32, new, r1_D32, cs, 0, MO_TEUQ)
 D(0xeb31, CDSY,RSY_a, LD,  r3_D32, r1_D32, new, r1_D32, cs, 0, MO_TEUQ)
-C(0xeb3e, CDSG,RSY_a, Z,   0, 0, 0, 0, cdsg, 0)
+C(0xeb3e, CDSG,RSY_a, Z,   la2, r3_D64, r1_D64, r1_D64, cdsg, 0)
 /* COMPARE AND SWAP AND STORE */
 C(0xc802, CSST,SSF,   CASS, la1, a2, 0, 0, csst, 0)
 
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 49969abda7..d6725fd18c 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -1771,58 +1771,6 @@ uint32_t HELPER(trXX)(CPUS390XState *env, uint32_t r1, 
uint32_t r2,
 return cc;
 }
 
-void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
-  uint32_t r1, uint32_t r3)
-{
-uintptr_t ra = GETPC();
-Int128 cmpv = int128_make128(env->regs[r1 + 1], env->regs[r1]);
-Int128 newv = int128_make128(env->regs[r3 + 1], env->regs[r3]);
-Int128 oldv;
-uint64_t oldh, oldl;
-bool fail;
-
-check_alignment(env, addr, 16, ra);
-
-oldh = cpu_ldq_data_ra(env, addr + 0, ra);
-oldl = cpu_ldq_data_ra(env, addr + 8, ra);
-
-oldv = int128_make128(oldl, oldh);
-fail = !int128_eq(oldv, cmpv);
-if (fail) {
-newv = oldv;
-}
-
-cpu_stq_data_ra(env, addr + 0, int128_gethi(newv), ra);
-cpu_stq_data_ra(env, addr + 8, int128_getlo(newv), ra);
-
-env->cc_op = fail;
-env->regs[r1] = int128_gethi(oldv);
-env->regs[r1 + 1] = int128_getlo(oldv);
-}
-
-void HELPER(cdsg_parallel)(CPUS390XState *env, uint64_t addr,
-   uint32_t r1, uint32_t r3)
-{
-uintptr_t ra = GETPC();
-Int128 cmpv = int128_make128(env->regs[r1 + 1], env->regs[r1]);
-Int128 newv = int128_make128(env->regs[r3 + 1], env->regs[r3]);
-int mem_idx;
-MemOpIdx oi;
-Int128 oldv;
-bool fail;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_TE | MO_128 | MO_ALIGN, mem_idx);
-oldv = cpu_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv, oi, ra);
-fail = !int128_eq(oldv, cmpv);
-
-env->cc_op = fail;
-env->regs[r1] = int128_gethi(oldv);
-env->regs[r1 + 1] = int128_getlo(oldv);
-}
-
 static uint32_t do_csst(CPUS390XState *env, uint32_t r3, uint64_t a1,
 uint64_t a2, bool parallel)
 {
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index d422a1e62b..0dafa27dab 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2224,31 +2224,22 @@ static DisasJumpType op_cs(DisasContext *s, DisasOps *o)
 static DisasJumpType op_cdsg(DisasContext *s, DisasOps *o)
 {
 int r1 = get_field(s, r1);
-int r3 = get_field(s, r3);
-int d2 = get_field(s, d2);
-int b2 = get_field(s, b2);
-DisasJumpType ret = DISAS_NEXT;
-TCGv_i64 addr;
-TCGv_i32 t_r1, t_r3;
 
-/* Note that R1:R1+1 = expected value and R3:R3+1 = new value.  */
-addr = get_address(s, 0, b2, d2);
-t_r1 = tcg_const_i32(r1);
-t_r3 = tcg_const_i32(r3);
-if (!(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cdsg(cpu_env, addr, t_r1, t_r3);
-} else if (HAVE_CMPXCHG128) {
-gen_helper_cdsg_parallel(cpu_env, addr, t_r1, t_r3);
-} else {
-gen_helper_exit_atomic(cpu_env);
-ret = DISAS_NORETURN;
-}
-tcg_temp_free_i64(addr);
-tcg_temp_free_i32(t_r1);
-tcg_temp_free_i32(t_r3);
+/* Note out (R1:R1+1) = expected value and in2 (R3:R3+1) = new value.  */
+tcg_gen_atomic_cmpxchg_i128(o->out_128, o->addr1, o->out_128, o->in2_128,
+get_mem_index(s), MO_BE | MO_128 | MO_ALIGN);
 
-set_cc_static(s);
-return ret;
+/*
+ * Extract result into cc_dst:cc_src, compa

[PATCH v5 26/36] target/s390x: Use Int128 for return from CLST

2023-01-25 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
 target/s390x/helper.h |  2 +-
 target/s390x/tcg/mem_helper.c | 11 ---
 target/s390x/tcg/translate.c  |  8 ++--
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 593f3c8bee..25c2dd0b3c 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -16,7 +16,7 @@ DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, i128, env, s64, 
s64)
 DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
 DEF_HELPER_3(srst, void, env, i32, i32)
 DEF_HELPER_3(srstu, void, env, i32, i32)
-DEF_HELPER_4(clst, i64, env, i64, i64, i64)
+DEF_HELPER_4(clst, i128, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mvn, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(mvo, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(mvpg, TCG_CALL_NO_WG, i32, env, i64, i32, i32)
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index cb82cd1c1d..9be42851d8 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -886,7 +886,7 @@ void HELPER(srstu)(CPUS390XState *env, uint32_t r1, 
uint32_t r2)
 }
 
 /* unsigned string compare (c is string terminator) */
-uint64_t HELPER(clst)(CPUS390XState *env, uint64_t c, uint64_t s1, uint64_t s2)
+Int128 HELPER(clst)(CPUS390XState *env, uint64_t c, uint64_t s1, uint64_t s2)
 {
 uintptr_t ra = GETPC();
 uint32_t len;
@@ -904,23 +904,20 @@ uint64_t HELPER(clst)(CPUS390XState *env, uint64_t c, 
uint64_t s1, uint64_t s2)
 if (v1 == c) {
 /* Equal.  CC=0, and don't advance the registers.  */
 env->cc_op = 0;
-env->retxl = s2;
-return s1;
+return int128_make128(s2, s1);
 }
 } else {
 /* Unequal.  CC={1,2}, and advance the registers.  Note that
the terminator need not be zero, but the string that contains
the terminator is by definition "low".  */
 env->cc_op = (v1 == c ? 1 : v2 == c ? 2 : v1 < v2 ? 1 : 2);
-env->retxl = s2 + len;
-return s1 + len;
+return int128_make128(s2 + len, s1 + len);
 }
 }
 
 /* CPU-determined bytes equal; advance the registers.  */
 env->cc_op = 3;
-env->retxl = s2 + len;
-return s1 + len;
+return int128_make128(s2 + len, s1 + len);
 }
 
 /* move page */
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 6953b81de7..8397fe2bd8 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2164,9 +2164,13 @@ static DisasJumpType op_clm(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_clst(DisasContext *s, DisasOps *o)
 {
-gen_helper_clst(o->in1, cpu_env, regs[0], o->in1, o->in2);
+TCGv_i128 pair = tcg_temp_new_i128();
+
+gen_helper_clst(pair, cpu_env, regs[0], o->in1, o->in2);
+tcg_gen_extr_i128_i64(o->in2, o->in1, pair);
+tcg_temp_free_i128(pair);
+
 set_cc_static(s);
-return_low128(o->in2);
 return DISAS_NEXT;
 }
 
-- 
2.34.1

[PATCH v5 00/36] tcg: Support for Int128 with helpers

2023-01-25 Thread Richard Henderson

Branch: https://gitlab.com/rth7680/qemu/-/tree/tcg-i128
Based-on: 20230124020507.3732200-1-richard.hender...@linaro.org
("[PULL v2 00/15] tcg patch queue")

Changes for v5:
  * Rebase, minor conflicts fixed.

Patches lacking review:
  common:
03-tcg-Allocate-objects-contiguously-in-temp_allocat.patch
05-tcg-Add-TCG_CALL_-RET-ARG-_BY_REF.patch
09-tcg-i386-Add-TCG_TARGET_CALL_-RET-ARG-_I128.patch
11-tcg-tci-Add-TCG_TARGET_CALL_-RET-ARG-_I128.patch
15-tcg-Add-guest-load-store-primitives-for-TCGv_i128.patch
16-tcg-Add-tcg_gen_-non-atomic_cmpxchg_i128.patch
17-tcg-Split-out-tcg_gen_nonatomic_cmpxchg_i-32-64.patch
  target/s390x/
24-target-s390x-Use-a-single-return-for-helper_divs3.patch
29-target-s390x-Copy-wout_x1-to-wout_x1_P.patch
31-target-s390x-Use-Int128-for-passing-float128.patch
32-target-s390x-Use-tcg_gen_atomic_cmpxchg_i128-for-.patch
33-target-s390x-Implement-CC_OP_NZ-in-gen_op_calc_cc.patch
  target/i386/
35-target-i386-Inline-cmpxchg8b.patch
36-target-i386-Inline-cmpxchg16b.patch


r~


Ilya Leoshkevich (2):
  tests/tcg/s390x: Add div.c
  tests/tcg/s390x: Add clst.c

Richard Henderson (34):
  tcg: Define TCG_TYPE_I128 and related helper macros
  tcg: Handle dh_typecode_i128 with TCG_CALL_{RET,ARG}_NORMAL
  tcg: Allocate objects contiguously in temp_allocate_frame
  tcg: Introduce tcg_out_addi_ptr
  tcg: Add TCG_CALL_{RET,ARG}_BY_REF
  tcg: Introduce tcg_target_call_oarg_reg
  tcg: Add TCG_CALL_RET_BY_VEC
  include/qemu/int128: Use Int128 structure for TCI
  tcg/i386: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg/tci: Fix big-endian return register ordering
  tcg/tci: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg: Add temp allocation for TCGv_i128
  tcg: Add basic data movement for TCGv_i128
  tcg: Add guest load/store primitives for TCGv_i128
  tcg: Add tcg_gen_{non}atomic_cmpxchg_i128
  tcg: Split out tcg_gen_nonatomic_cmpxchg_i{32,64}
  target/arm: Use tcg_gen_atomic_cmpxchg_i128 for STXP
  target/arm: Use tcg_gen_atomic_cmpxchg_i128 for CASP
  target/ppc: Use tcg_gen_atomic_cmpxchg_i128 for STQCX
  tests/tcg/s390x: Add long-double.c
  target/s390x: Use a single return for helper_divs32/u32
  target/s390x: Use a single return for helper_divs64/u64
  target/s390x: Use Int128 for return from CLST
  target/s390x: Use Int128 for return from CKSM
  target/s390x: Use Int128 for return from TRE
  target/s390x: Copy wout_x1 to wout_x1_P
  target/s390x: Use Int128 for returning float128
  target/s390x: Use Int128 for passing float128
  target/s390x: Use tcg_gen_atomic_cmpxchg_i128 for CDSG
  target/s390x: Implement CC_OP_NZ in gen_op_calc_cc
  target/i386: Split out gen_cmpxchg8b, gen_cmpxchg16b
  target/i386: Inline cmpxchg8b
  target/i386: Inline cmpxchg16b

 accel/tcg/tcg-runtime.h  |  11 +
 include/exec/cpu_ldst.h  |  10 +
 include/exec/helper-head.h   |   7 +
 include/qemu/atomic128.h |  29 ++-
 include/qemu/int128.h|  25 +-
 include/tcg/tcg-op.h |  15 ++
 include/tcg/tcg.h|  49 +++-
 target/arm/helper-a64.h  |   8 -
 target/i386/helper.h |   6 -
 target/ppc/helper.h  |   2 -
 target/s390x/helper.h|  54 ++---
 tcg/aarch64/tcg-target.h |   2 +
 tcg/arm/tcg-target.h |   2 +
 tcg/i386/tcg-target.h|  10 +
 tcg/loongarch64/tcg-target.h |   2 +
 tcg/mips/tcg-target.h|   2 +
 tcg/riscv/tcg-target.h   |   3 +
 tcg/s390x/tcg-target.h   |   2 +
 tcg/sparc64/tcg-target.h |   2 +
 tcg/tcg-internal.h   |  17 ++
 tcg/tci/tcg-target.h |   3 +
 target/s390x/tcg/insn-data.h.inc |  60 ++---
 accel/tcg/cputlb.c   | 112 +
 accel/tcg/user-exec.c|  66 ++
 target/arm/helper-a64.c  | 147 
 target/arm/translate-a64.c   | 121 +-
 target/i386/tcg/mem_helper.c | 126 --
 target/i386/tcg/translate.c  | 126 --
 target/ppc/mem_helper.c  |  44 
 target/ppc/translate.c   | 102 
 target/s390x/tcg/fpu_helper.c| 103 
 target/s390x/tcg/int_helper.c|  64 ++---
 target/s390x/tcg/mem_helper.c|  77 +-
 target/s390x/tcg/translate.c | 217 +++--
 tcg/tcg-op.c | 393 ++-
 tcg/tcg.c| 303 +---
 tcg/tci.c|  65 ++---
 tests/tcg/s390x/clst.c   |  82 +++
 tests/tcg/s390x/div.c|  75 ++
 tests/tcg/s390x/long-double.c|  24 ++
 util/int128.c|  42 
 accel/tcg/atomic_common.c.inc|  45 
 tcg/aarch64/tcg-target.c.inc |  17 +-
 tcg/arm/tcg-target.c.inc |  30 ++-
 tcg/i386/tcg-target.c.inc|  52 +++-
 tcg/loongarch64/tcg-target.c.inc |  17 +-
 tcg/mips/tcg-target.c.inc|  17 +-
 tcg/ppc/tcg-target.c.inc |

[PATCH v5 05/36] tcg: Add TCG_CALL_{RET,ARG}_BY_REF

2023-01-25 Thread Richard Henderson

These will be used by some hosts, both 32 and 64-bit, to pass and
return i128.  Not yet used, because allocation is not yet enabled.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-internal.h |   3 +
 tcg/tcg.c  | 135 -
 2 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index 6e50aeba3a..2ec1ea01df 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -36,6 +36,7 @@
  */
 typedef enum {
 TCG_CALL_RET_NORMAL, /* by registers */
+TCG_CALL_RET_BY_REF, /* for i128, by reference */
 } TCGCallReturnKind;
 
 typedef enum {
@@ -44,6 +45,8 @@ typedef enum {
 TCG_CALL_ARG_EXTEND, /* for i32, as a sign/zero-extended i64 */
 TCG_CALL_ARG_EXTEND_U,   /*  ... as a zero-extended i64 */
 TCG_CALL_ARG_EXTEND_S,   /*  ... as a sign-extended i64 */
+TCG_CALL_ARG_BY_REF, /* for i128, by reference, first */
+TCG_CALL_ARG_BY_REF_N,   /*   ... by reference, subsequent */
 } TCGCallArgumentKind;
 
 typedef struct TCGCallArgumentLoc {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index a561ef3ced..644dc53196 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -104,8 +104,7 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg1,
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg);
-static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long)
-__attribute__((unused));
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
@@ -683,6 +682,38 @@ static void layout_arg_normal_n(TCGCumulativeArgs *cum,
 cum->arg_slot += n;
 }
 
+static void layout_arg_by_ref(TCGCumulativeArgs *cum, TCGHelperInfo *info)
+{
+TCGCallArgumentLoc *loc = &info->in[cum->info_in_idx];
+int n = 128 / TCG_TARGET_REG_BITS;
+
+/* The first subindex carries the pointer. */
+layout_arg_1(cum, info, TCG_CALL_ARG_BY_REF);
+
+/*
+ * The callee is allowed to clobber memory associated with
+ * structure pass by-reference.  Therefore we must make copies.
+ * Allocate space from "ref_slot", which will be adjusted to
+ * follow the parameters on the stack.
+ */
+loc[0].ref_slot = cum->ref_slot;
+
+/*
+ * Subsequent words also go into the reference slot, but
+ * do not accumulate into the regular arguments.
+ */
+for (int i = 1; i < n; ++i) {
+loc[i] = (TCGCallArgumentLoc){
+.kind = TCG_CALL_ARG_BY_REF_N,
+.arg_idx = cum->arg_idx,
+.tmp_subindex = i,
+.ref_slot = cum->ref_slot + i,
+};
+}
+cum->info_in_idx += n;
+cum->ref_slot += n;
+}
+
 static void init_call_layout(TCGHelperInfo *info)
 {
 int max_reg_slots = ARRAY_SIZE(tcg_target_call_iarg_regs);
@@ -718,6 +749,14 @@ static void init_call_layout(TCGHelperInfo *info)
 case TCG_CALL_RET_NORMAL:
 assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
 break;
+case TCG_CALL_RET_BY_REF:
+/*
+ * Allocate the first argument to the output.
+ * We don't need to store this anywhere, just make it
+ * unavailable for use in the input loop below.
+ */
+cum.arg_slot = 1;
+break;
 default:
 qemu_build_not_reached();
 }
@@ -796,6 +835,9 @@ static void init_call_layout(TCGHelperInfo *info)
 case TCG_CALL_ARG_NORMAL:
 layout_arg_normal_n(&cum, info, 128 / TCG_TARGET_REG_BITS);
 break;
+case TCG_CALL_ARG_BY_REF:
+layout_arg_by_ref(&cum, info);
+break;
 default:
 qemu_build_not_reached();
 }
@@ -811,7 +853,39 @@ static void init_call_layout(TCGHelperInfo *info)
 assert(cum.info_in_idx <= ARRAY_SIZE(info->in));
 /* Validate the backend has enough argument space. */
 assert(cum.arg_slot <= max_reg_slots + max_stk_slots);
-assert(cum.ref_slot <= max_stk_slots);
+
+/*
+ * Relocate the "ref_slot" area to the end of the parameters.
+ * Minimizing this stack offset helps code size for x86,
+ * which has a signed 8-bit offset encoding.
+ */
+if (cum.ref_slot != 0) {
+int ref_base = 0;
+
+if (cum.arg_slot > max_reg_slots) {
+int align = __alignof(Int128) / sizeof(tcg_target_long);
+
+ref_base = cum.arg_slot - max_reg_slots;
+if (align > 1) {
+ref_base = ROUND_UP(ref_base, align);
+}
+}
+assert(ref_base + cum.ref_slot <= max_stk_slots);
+
+

[PATCH v5 28/36] target/s390x: Use Int128 for return from TRE

2023-01-25 Thread Richard Henderson

Acked-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/s390x/helper.h | 2 +-
 target/s390x/tcg/mem_helper.c | 7 +++
 target/s390x/tcg/translate.c  | 7 +--
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 03b29efa3e..b4170a4256 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -99,7 +99,7 @@ DEF_HELPER_FLAGS_4(unpka, TCG_CALL_NO_WG, i32, env, i64, i32, 
i64)
 DEF_HELPER_FLAGS_4(unpku, TCG_CALL_NO_WG, i32, env, i64, i32, i64)
 DEF_HELPER_FLAGS_3(tp, TCG_CALL_NO_WG, i32, env, i64, i32)
 DEF_HELPER_FLAGS_4(tr, TCG_CALL_NO_WG, void, env, i32, i64, i64)
-DEF_HELPER_4(tre, i64, env, i64, i64, i64)
+DEF_HELPER_4(tre, i128, env, i64, i64, i64)
 DEF_HELPER_4(trt, i32, env, i32, i64, i64)
 DEF_HELPER_4(trtr, i32, env, i32, i64, i64)
 DEF_HELPER_5(trXX, i32, env, i32, i32, i32, i32)
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index b0b403e23a..49969abda7 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -1632,8 +1632,8 @@ void HELPER(tr)(CPUS390XState *env, uint32_t len, 
uint64_t array,
 do_helper_tr(env, len, array, trans, GETPC());
 }
 
-uint64_t HELPER(tre)(CPUS390XState *env, uint64_t array,
- uint64_t len, uint64_t trans)
+Int128 HELPER(tre)(CPUS390XState *env, uint64_t array,
+   uint64_t len, uint64_t trans)
 {
 uintptr_t ra = GETPC();
 uint8_t end = env->regs[0] & 0xff;
@@ -1668,8 +1668,7 @@ uint64_t HELPER(tre)(CPUS390XState *env, uint64_t array,
 }
 
 env->cc_op = cc;
-env->retxl = len - i;
-return array + i;
+return int128_make128(len - i, array + i);
 }
 
 static inline uint32_t do_helper_trt(CPUS390XState *env, int len,
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 1a7aa9e4ae..f3e4b70ed9 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -4905,8 +4905,11 @@ static DisasJumpType op_tr(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_tre(DisasContext *s, DisasOps *o)
 {
-gen_helper_tre(o->out, cpu_env, o->out, o->out2, o->in2);
-return_low128(o->out2);
+TCGv_i128 pair = tcg_temp_new_i128();
+
+gen_helper_tre(pair, cpu_env, o->out, o->out2, o->in2);
+tcg_gen_extr_i128_i64(o->out2, o->out, pair);
+tcg_temp_free_i128(pair);
 set_cc_static(s);
 return DISAS_NEXT;
 }
-- 
2.34.1

[PATCH v5 02/36] tcg: Handle dh_typecode_i128 with TCG_CALL_{RET, ARG}_NORMAL

2023-01-25 Thread Richard Henderson

Many hosts pass and return 128-bit quantities like sequential
64-bit quantities.  Treat this just like we currently break
down 64-bit quantities for a 32-bit host.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 37 +
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index d502327be2..ffddda96ed 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -707,11 +707,22 @@ static void init_call_layout(TCGHelperInfo *info)
 case dh_typecode_s64:
 info->nr_out = 64 / TCG_TARGET_REG_BITS;
 info->out_kind = TCG_CALL_RET_NORMAL;
+assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
+break;
+case dh_typecode_i128:
+info->nr_out = 128 / TCG_TARGET_REG_BITS;
+info->out_kind = TCG_CALL_RET_NORMAL; /* TODO */
+switch (/* TODO */ TCG_CALL_RET_NORMAL) {
+case TCG_CALL_RET_NORMAL:
+assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
+break;
+default:
+qemu_build_not_reached();
+}
 break;
 default:
 g_assert_not_reached();
 }
-assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
 
 /*
  * Parse and place function arguments.
@@ -733,6 +744,9 @@ static void init_call_layout(TCGHelperInfo *info)
 case dh_typecode_ptr:
 type = TCG_TYPE_PTR;
 break;
+case dh_typecode_i128:
+type = TCG_TYPE_I128;
+break;
 default:
 g_assert_not_reached();
 }
@@ -772,6 +786,19 @@ static void init_call_layout(TCGHelperInfo *info)
 }
 break;
 
+case TCG_TYPE_I128:
+switch (/* TODO */ TCG_CALL_ARG_NORMAL) {
+case TCG_CALL_ARG_EVEN:
+layout_arg_even(&cum);
+/* fall through */
+case TCG_CALL_ARG_NORMAL:
+layout_arg_normal_n(&cum, info, 128 / TCG_TARGET_REG_BITS);
+break;
+default:
+qemu_build_not_reached();
+}
+break;
+
 default:
 g_assert_not_reached();
 }
@@ -1690,11 +1717,13 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, 
TCGTemp **args)
 op->args[pi++] = temp_arg(ret);
 break;
 case 2:
+case 4:
 tcg_debug_assert(ret != NULL);
-tcg_debug_assert(ret->base_type == ret->type + 1);
+tcg_debug_assert(ret->base_type == ret->type + ctz32(n));
 tcg_debug_assert(ret->temp_subindex == 0);
-op->args[pi++] = temp_arg(ret);
-op->args[pi++] = temp_arg(ret + 1);
+for (i = 0; i < n; ++i) {
+op->args[pi++] = temp_arg(ret + i);
+}
 break;
 default:
 g_assert_not_reached();
-- 
2.34.1

[PATCH v5 04/36] tcg: Introduce tcg_out_addi_ptr

2023-01-25 Thread Richard Henderson

Implement the function for arm, i386, and s390x, which will use it.
Add stubs for all other backends.

Reviewed-by: Alex Bennée 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  2 ++
 tcg/aarch64/tcg-target.c.inc |  7 +++
 tcg/arm/tcg-target.c.inc | 20 
 tcg/i386/tcg-target.c.inc|  8 
 tcg/loongarch64/tcg-target.c.inc |  7 +++
 tcg/mips/tcg-target.c.inc|  7 +++
 tcg/ppc/tcg-target.c.inc |  7 +++
 tcg/riscv/tcg-target.c.inc   |  7 +++
 tcg/s390x/tcg-target.c.inc   |  7 +++
 tcg/sparc64/tcg-target.c.inc |  7 +++
 tcg/tci/tcg-target.c.inc |  7 +++
 11 files changed, 86 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ff30f5e141..a561ef3ced 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -104,6 +104,8 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg1,
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg);
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long)
+__attribute__((unused));
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 330d26b395..bd6da72678 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1102,6 +1102,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type, 
TCGReg rd,
 tcg_out_insn(s, 3305, LDR, 0, rd);
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+/* This function is only used for passing structs by reference. */
+g_assert_not_reached();
+}
+
 /* Define something more legible for general use.  */
 #define tcg_out_ldst_r  tcg_out_insn_3310
 
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 0f5f9f4925..6e9e9b9b3f 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -2581,6 +2581,26 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 tcg_out_movi32(s, COND_AL, ret, arg);
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+int enc, opc = ARITH_ADD;
+
+/* All of the easiest immediates to encode are positive. */
+if (imm < 0) {
+imm = -imm;
+opc = ARITH_SUB;
+}
+enc = encode_imm(imm);
+if (enc >= 0) {
+tcg_out_dat_imm(s, COND_AL, opc, rd, rs, enc);
+} else {
+tcg_out_movi32(s, COND_AL, TCG_REG_TMP, imm);
+tcg_out_dat_reg(s, COND_AL, opc, rd, rs,
+TCG_REG_TMP, SHIFT_IMM_LSL(0));
+}
+}
+
 /* Type is always V128, with I64 elements.  */
 static void tcg_out_dup2_vec(TCGContext *s, TCGReg rd, TCGReg rl, TCGReg rh)
 {
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index c71c3e664d..7b573bd287 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1069,6 +1069,14 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 }
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+/* This function is only used for passing structs by reference. */
+tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+tcg_out_modrm_offset(s, OPC_LEA, rd, rs, imm);
+}
+
 static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
 {
 if (val == (int8_t)val) {
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index ce4a153887..b6e2ff6213 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -417,6 +417,13 @@ static void tcg_out_addi(TCGContext *s, TCGType type, 
TCGReg rd,
 }
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+/* This function is only used for passing structs by reference. */
+g_assert_not_reached();
+}
+
 static void tcg_out_ext8u(TCGContext *s, TCGReg ret, TCGReg arg)
 {
 tcg_out_opc_andi(s, ret, arg, 0xff);
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 6e000d8e69..d419c4c1fc 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -550,6 +550,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 }
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+/* This function is only used for passing structs by reference. */
+g_assert_not_reached();
+}
+
 static void tcg_out_bswap16(TCGContext *s, TCGReg ret, TCGReg arg, int flags)
 {
 /* ret and arg can't be register tmp0 */
diff --git a/t

[PATCH v5 34/36] target/i386: Split out gen_cmpxchg8b, gen_cmpxchg16b

2023-01-25 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
---
 target/i386/tcg/translate.c | 48 -
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 7e0b2a709a..a82131d635 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2993,6 +2993,34 @@ static void gen_sty_env_A0(DisasContext *s, int offset, 
bool align)
 #include "emit.c.inc"
 #include "decode-new.c.inc"
 
+static void gen_cmpxchg8b(DisasContext *s, CPUX86State *env, int modrm)
+{
+gen_lea_modrm(env, s, modrm);
+
+if ((s->prefix & PREFIX_LOCK) &&
+(tb_cflags(s->base.tb) & CF_PARALLEL)) {
+gen_helper_cmpxchg8b(cpu_env, s->A0);
+} else {
+gen_helper_cmpxchg8b_unlocked(cpu_env, s->A0);
+}
+set_cc_op(s, CC_OP_EFLAGS);
+}
+
+#ifdef TARGET_X86_64
+static void gen_cmpxchg16b(DisasContext *s, CPUX86State *env, int modrm)
+{
+gen_lea_modrm(env, s, modrm);
+
+if ((s->prefix & PREFIX_LOCK) &&
+(tb_cflags(s->base.tb) & CF_PARALLEL)) {
+gen_helper_cmpxchg16b(cpu_env, s->A0);
+} else {
+gen_helper_cmpxchg16b_unlocked(cpu_env, s->A0);
+}
+set_cc_op(s, CC_OP_EFLAGS);
+}
+#endif
+
 /* convert one instruction. s->base.is_jmp is set if the translation must
be stopped. Return the next pc value */
 static bool disas_insn(DisasContext *s, CPUState *cpu)
@@ -3844,28 +3872,14 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (!(s->cpuid_ext_features & CPUID_EXT_CX16)) {
 goto illegal_op;
 }
-gen_lea_modrm(env, s, modrm);
-if ((s->prefix & PREFIX_LOCK) &&
-(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cmpxchg16b(cpu_env, s->A0);
-} else {
-gen_helper_cmpxchg16b_unlocked(cpu_env, s->A0);
-}
-set_cc_op(s, CC_OP_EFLAGS);
+gen_cmpxchg16b(s, env, modrm);
 break;
 }
-#endif
+#endif
 if (!(s->cpuid_features & CPUID_CX8)) {
 goto illegal_op;
 }
-gen_lea_modrm(env, s, modrm);
-if ((s->prefix & PREFIX_LOCK) &&
-(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cmpxchg8b(cpu_env, s->A0);
-} else {
-gen_helper_cmpxchg8b_unlocked(cpu_env, s->A0);
-}
-set_cc_op(s, CC_OP_EFLAGS);
+gen_cmpxchg8b(s, env, modrm);
 break;
 
 case 7: /* RDSEED */
-- 
2.34.1

[PATCH v5 24/36] target/s390x: Use a single return for helper_divs32/u32

2023-01-25 Thread Richard Henderson

Pack the quotient and remainder into a single uint64_t.

Signed-off-by: Richard Henderson 
---
v2: Fix operand ordering; use tcg_extr32_i64.
Cc: David Hildenbrand 
Cc: Ilya Leoshkevich 
---
 target/s390x/helper.h |  2 +-
 target/s390x/tcg/int_helper.c | 26 +-
 target/s390x/tcg/translate.c  |  8 
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 93923ca153..bc828d976b 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -10,7 +10,7 @@ DEF_HELPER_FLAGS_4(clc, TCG_CALL_NO_WG, i32, env, i32, i64, 
i64)
 DEF_HELPER_3(mvcl, i32, env, i32, i32)
 DEF_HELPER_3(clcl, i32, env, i32, i32)
 DEF_HELPER_FLAGS_4(clm, TCG_CALL_NO_WG, i32, env, i32, i32, i64)
-DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, s64, env, s64, s64)
+DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, i64, env, s64, s64)
 DEF_HELPER_FLAGS_3(divu32, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, s64, env, s64, s64)
 DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
index 954542388a..7260583cf2 100644
--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -34,45 +34,45 @@
 #endif
 
 /* 64/32 -> 32 signed division */
-int64_t HELPER(divs32)(CPUS390XState *env, int64_t a, int64_t b64)
+uint64_t HELPER(divs32)(CPUS390XState *env, int64_t a, int64_t b64)
 {
-int32_t ret, b = b64;
-int64_t q;
+int32_t b = b64;
+int64_t q, r;
 
 if (b == 0) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
-ret = q = a / b;
-env->retxl = a % b;
+q = a / b;
+r = a % b;
 
 /* Catch non-representable quotient.  */
-if (ret != q) {
+if (q != (int32_t)q) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
-return ret;
+return deposit64(q, 32, 32, r);
 }
 
 /* 64/32 -> 32 unsigned division */
 uint64_t HELPER(divu32)(CPUS390XState *env, uint64_t a, uint64_t b64)
 {
-uint32_t ret, b = b64;
-uint64_t q;
+uint32_t b = b64;
+uint64_t q, r;
 
 if (b == 0) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
-ret = q = a / b;
-env->retxl = a % b;
+q = a / b;
+r = a % b;
 
 /* Catch non-representable quotient.  */
-if (ret != q) {
+if (q != (uint32_t)q) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
-return ret;
+return deposit64(q, 32, 32, r);
 }
 
 /* 64/64 -> 64 signed division */
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index a339b277e9..169f7ee1b2 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2395,15 +2395,15 @@ static DisasJumpType op_diag(DisasContext *s, DisasOps 
*o)
 
 static DisasJumpType op_divs32(DisasContext *s, DisasOps *o)
 {
-gen_helper_divs32(o->out2, cpu_env, o->in1, o->in2);
-return_low128(o->out);
+gen_helper_divs32(o->out, cpu_env, o->in1, o->in2);
+tcg_gen_extr32_i64(o->out2, o->out, o->out);
 return DISAS_NEXT;
 }
 
 static DisasJumpType op_divu32(DisasContext *s, DisasOps *o)
 {
-gen_helper_divu32(o->out2, cpu_env, o->in1, o->in2);
-return_low128(o->out);
+gen_helper_divu32(o->out, cpu_env, o->in1, o->in2);
+tcg_gen_extr32_i64(o->out2, o->out, o->out);
 return DISAS_NEXT;
 }
 
-- 
2.34.1

[PATCH v5 13/36] tcg: Add temp allocation for TCGv_i128

2023-01-25 Thread Richard Henderson

This enables allocation of i128.  The type is not yet
usable, as we have not yet added data movement ops.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h | 32 +
 tcg/tcg.c | 60 +--
 2 files changed, 74 insertions(+), 18 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 8b7e61e7a5..7a8e4bbdd7 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -685,6 +685,11 @@ static inline TCGTemp *tcgv_i64_temp(TCGv_i64 v)
 return tcgv_i32_temp((TCGv_i32)v);
 }
 
+static inline TCGTemp *tcgv_i128_temp(TCGv_i128 v)
+{
+return tcgv_i32_temp((TCGv_i32)v);
+}
+
 static inline TCGTemp *tcgv_ptr_temp(TCGv_ptr v)
 {
 return tcgv_i32_temp((TCGv_i32)v);
@@ -705,6 +710,11 @@ static inline TCGArg tcgv_i64_arg(TCGv_i64 v)
 return temp_arg(tcgv_i64_temp(v));
 }
 
+static inline TCGArg tcgv_i128_arg(TCGv_i128 v)
+{
+return temp_arg(tcgv_i128_temp(v));
+}
+
 static inline TCGArg tcgv_ptr_arg(TCGv_ptr v)
 {
 return temp_arg(tcgv_ptr_temp(v));
@@ -726,6 +736,11 @@ static inline TCGv_i64 temp_tcgv_i64(TCGTemp *t)
 return (TCGv_i64)temp_tcgv_i32(t);
 }
 
+static inline TCGv_i128 temp_tcgv_i128(TCGTemp *t)
+{
+return (TCGv_i128)temp_tcgv_i32(t);
+}
+
 static inline TCGv_ptr temp_tcgv_ptr(TCGTemp *t)
 {
 return (TCGv_ptr)temp_tcgv_i32(t);
@@ -851,6 +866,11 @@ static inline void tcg_temp_free_i64(TCGv_i64 arg)
 tcg_temp_free_internal(tcgv_i64_temp(arg));
 }
 
+static inline void tcg_temp_free_i128(TCGv_i128 arg)
+{
+tcg_temp_free_internal(tcgv_i128_temp(arg));
+}
+
 static inline void tcg_temp_free_ptr(TCGv_ptr arg)
 {
 tcg_temp_free_internal(tcgv_ptr_temp(arg));
@@ -899,6 +919,18 @@ static inline TCGv_i64 tcg_temp_local_new_i64(void)
 return temp_tcgv_i64(t);
 }
 
+static inline TCGv_i128 tcg_temp_new_i128(void)
+{
+TCGTemp *t = tcg_temp_new_internal(TCG_TYPE_I128, false);
+return temp_tcgv_i128(t);
+}
+
+static inline TCGv_i128 tcg_temp_local_new_i128(void)
+{
+TCGTemp *t = tcg_temp_new_internal(TCG_TYPE_I128, true);
+return temp_tcgv_i128(t);
+}
+
 static inline TCGv_ptr tcg_global_mem_new_ptr(TCGv_ptr reg, intptr_t offset,
   const char *name)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 63e0753ded..d449bb0864 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1273,26 +1273,45 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool 
temp_local)
 tcg_debug_assert(ts->base_type == type);
 tcg_debug_assert(ts->kind == kind);
 } else {
+int i, n;
+
+switch (type) {
+case TCG_TYPE_I32:
+case TCG_TYPE_V64:
+case TCG_TYPE_V128:
+case TCG_TYPE_V256:
+n = 1;
+break;
+case TCG_TYPE_I64:
+n = 64 / TCG_TARGET_REG_BITS;
+break;
+case TCG_TYPE_I128:
+n = 128 / TCG_TARGET_REG_BITS;
+break;
+default:
+g_assert_not_reached();
+}
+
 ts = tcg_temp_alloc(s);
-if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
-TCGTemp *ts2 = tcg_temp_alloc(s);
+ts->base_type = type;
+ts->temp_allocated = 1;
+ts->kind = kind;
 
-ts->base_type = type;
-ts->type = TCG_TYPE_I32;
-ts->temp_allocated = 1;
-ts->kind = kind;
-
-tcg_debug_assert(ts2 == ts + 1);
-ts2->base_type = TCG_TYPE_I64;
-ts2->type = TCG_TYPE_I32;
-ts2->temp_allocated = 1;
-ts2->temp_subindex = 1;
-ts2->kind = kind;
-} else {
-ts->base_type = type;
+if (n == 1) {
 ts->type = type;
-ts->temp_allocated = 1;
-ts->kind = kind;
+} else {
+ts->type = TCG_TYPE_REG;
+
+for (i = 1; i < n; ++i) {
+TCGTemp *ts2 = tcg_temp_alloc(s);
+
+tcg_debug_assert(ts2 == ts + i);
+ts2->base_type = type;
+ts2->type = TCG_TYPE_REG;
+ts2->temp_allocated = 1;
+ts2->temp_subindex = i;
+ts2->kind = kind;
+}
 }
 }
 
@@ -3381,9 +3400,14 @@ static void temp_allocate_frame(TCGContext *s, TCGTemp 
*ts)
 case TCG_TYPE_V64:
 align = 8;
 break;
+case TCG_TYPE_I128:
 case TCG_TYPE_V128:
 case TCG_TYPE_V256:
-/* Note that we do not require aligned storage for V256. */
+/*
+ * Note that we do not require aligned storage for V256,
+ * and that we provide alignment for I128 to match V128,
+ * even if that's above what the host ABI requires.
+ */
 align = 16;
 break;
 default:
-- 
2.34.1

[PATCH v5 21/36] tests/tcg/s390x: Add div.c

2023-01-25 Thread Richard Henderson

From: Ilya Leoshkevich 

Add a basic test to prevent regressions.

Signed-off-by: Ilya Leoshkevich 
Message-Id: <2022110300.2539919-1-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 tests/tcg/s390x/div.c   | 40 +
 tests/tcg/s390x/Makefile.target |  1 +
 2 files changed, 41 insertions(+)
 create mode 100644 tests/tcg/s390x/div.c

diff --git a/tests/tcg/s390x/div.c b/tests/tcg/s390x/div.c
new file mode 100644
index 00..5807295614
--- /dev/null
+++ b/tests/tcg/s390x/div.c
@@ -0,0 +1,40 @@
+#include 
+#include 
+
+static void test_dr(void)
+{
+register int32_t r0 asm("r0") = -1;
+register int32_t r1 asm("r1") = -4241;
+int32_t b = 101, q, r;
+
+asm("dr %[r0],%[b]"
+: [r0] "+r" (r0), [r1] "+r" (r1)
+: [b] "r" (b)
+: "cc");
+q = r1;
+r = r0;
+assert(q == -41);
+assert(r == -100);
+}
+
+static void test_dlr(void)
+{
+register uint32_t r0 asm("r0") = 0;
+register uint32_t r1 asm("r1") = 4243;
+uint32_t b = 101, q, r;
+
+asm("dlr %[r0],%[b]"
+: [r0] "+r" (r0), [r1] "+r" (r1)
+: [b] "r" (b)
+: "cc");
+q = r1;
+r = r0;
+assert(q == 42);
+assert(r == 1);
+}
+
+int main(void)
+{
+test_dr();
+test_dlr();
+}
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 07fcc6d0ce..ab7a3bcfb2 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -24,6 +24,7 @@ TESTS+=trap
 TESTS+=signals-s390x
 TESTS+=branch-relative-long
 TESTS+=noexec
+TESTS+=div
 
 Z13_TESTS=vistr
 $(Z13_TESTS): CFLAGS+=-march=z13 -O2
-- 
2.34.1

[PATCH v5 22/36] tests/tcg/s390x: Add clst.c

2023-01-25 Thread Richard Henderson

From: Ilya Leoshkevich 

Add a basic test to prevent regressions.

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20221025213008.2209006-2-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 tests/tcg/s390x/clst.c  | 82 +
 tests/tcg/s390x/Makefile.target |  1 +
 2 files changed, 83 insertions(+)
 create mode 100644 tests/tcg/s390x/clst.c

diff --git a/tests/tcg/s390x/clst.c b/tests/tcg/s390x/clst.c
new file mode 100644
index 00..ed2fe7326c
--- /dev/null
+++ b/tests/tcg/s390x/clst.c
@@ -0,0 +1,82 @@
+#define _GNU_SOURCE
+#include 
+#include 
+
+static int clst(char sep, const char **s1, const char **s2)
+{
+const char *r1 = *s1;
+const char *r2 = *s2;
+int cc;
+
+do {
+register int r0 asm("r0") = sep;
+
+asm("clst %[r1],%[r2]\n"
+"ipm %[cc]\n"
+"srl %[cc],28"
+: [r1] "+r" (r1), [r2] "+r" (r2), "+r" (r0), [cc] "=r" (cc)
+:
+: "cc");
+*s1 = r1;
+*s2 = r2;
+} while (cc == 3);
+
+return cc;
+}
+
+static const struct test {
+const char *name;
+char sep;
+const char *s1;
+const char *s2;
+int exp_cc;
+int exp_off;
+} tests[] = {
+{
+.name = "cc0",
+.sep = 0,
+.s1 = "aa",
+.s2 = "aa",
+.exp_cc = 0,
+.exp_off = 0,
+},
+{
+.name = "cc1",
+.sep = 1,
+.s1 = "a\x01",
+.s2 = "aa\x01",
+.exp_cc = 1,
+.exp_off = 1,
+},
+{
+.name = "cc2",
+.sep = 2,
+.s1 = "abc\x02",
+.s2 = "abb\x02",
+.exp_cc = 2,
+.exp_off = 2,
+},
+};
+
+int main(void)
+{
+const struct test *t;
+const char *s1, *s2;
+size_t i;
+int cc;
+
+for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+t = &tests[i];
+s1 = t->s1;
+s2 = t->s2;
+cc = clst(t->sep, &s1, &s2);
+if (cc != t->exp_cc ||
+s1 != t->s1 + t->exp_off ||
+s2 != t->s2 + t->exp_off) {
+fprintf(stderr, "%s\n", t->name);
+return EXIT_FAILURE;
+}
+}
+
+return EXIT_SUCCESS;
+}
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index ab7a3bcfb2..79250f31dd 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -25,6 +25,7 @@ TESTS+=signals-s390x
 TESTS+=branch-relative-long
 TESTS+=noexec
 TESTS+=div
+TESTS+=clst
 
 Z13_TESTS=vistr
 $(Z13_TESTS): CFLAGS+=-march=z13 -O2
-- 
2.34.1

[PATCH v5 20/36] target/ppc: Use tcg_gen_atomic_cmpxchg_i128 for STQCX

2023-01-25 Thread Richard Henderson

Note that the previous direct reference to reserve_val,

-   tcg_gen_ld_i64(t1, cpu_env, (ctx->le_mode
-? offsetof(CPUPPCState, reserve_val2)
-: offsetof(CPUPPCState, reserve_val)));

was incorrect because all references should have gone through
cpu_reserve_val.  Create a cpu_reserve_val2 tcg temp to fix this.

Signed-off-by: Richard Henderson 
Reviewed-by: Daniel Henrique Barboza 
Message-Id: <20221112061122.2720163-2-richard.hender...@linaro.org>
---
 target/ppc/helper.h |   2 -
 target/ppc/mem_helper.c |  44 -
 target/ppc/translate.c  | 102 ++--
 3 files changed, 47 insertions(+), 101 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 8dd22a35e4..0beaca5c7a 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -818,6 +818,4 @@ DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
void, env, tl, i64, i64, i32)
 DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
void, env, tl, i64, i64, i32)
-DEF_HELPER_5(stqcx_le_parallel, i32, env, tl, i64, i64, i32)
-DEF_HELPER_5(stqcx_be_parallel, i32, env, tl, i64, i64, i32)
 #endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index d1163f316c..1578887a8f 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -413,50 +413,6 @@ void helper_stq_be_parallel(CPUPPCState *env, target_ulong 
addr,
 val = int128_make128(lo, hi);
 cpu_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
 }
-
-uint32_t helper_stqcx_le_parallel(CPUPPCState *env, target_ulong addr,
-  uint64_t new_lo, uint64_t new_hi,
-  uint32_t opidx)
-{
-bool success = false;
-
-/* We will have raised EXCP_ATOMIC from the translator.  */
-assert(HAVE_CMPXCHG128);
-
-if (likely(addr == env->reserve_addr)) {
-Int128 oldv, cmpv, newv;
-
-cmpv = int128_make128(env->reserve_val2, env->reserve_val);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv,
-  opidx, GETPC());
-success = int128_eq(oldv, cmpv);
-}
-env->reserve_addr = -1;
-return env->so + success * CRF_EQ_BIT;
-}
-
-uint32_t helper_stqcx_be_parallel(CPUPPCState *env, target_ulong addr,
-  uint64_t new_lo, uint64_t new_hi,
-  uint32_t opidx)
-{
-bool success = false;
-
-/* We will have raised EXCP_ATOMIC from the translator.  */
-assert(HAVE_CMPXCHG128);
-
-if (likely(addr == env->reserve_addr)) {
-Int128 oldv, cmpv, newv;
-
-cmpv = int128_make128(env->reserve_val2, env->reserve_val);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv,
-  opidx, GETPC());
-success = int128_eq(oldv, cmpv);
-}
-env->reserve_addr = -1;
-return env->so + success * CRF_EQ_BIT;
-}
 #endif
 
 /*/
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index edb3daa9b5..1c17d5a558 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -72,6 +72,7 @@ static TCGv cpu_cfar;
 static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
 static TCGv cpu_reserve;
 static TCGv cpu_reserve_val;
+static TCGv cpu_reserve_val2;
 static TCGv cpu_fpscr;
 static TCGv_i32 cpu_access_type;
 
@@ -141,8 +142,11 @@ void ppc_translate_init(void)
  offsetof(CPUPPCState, reserve_addr),
  "reserve_addr");
 cpu_reserve_val = tcg_global_mem_new(cpu_env,
- offsetof(CPUPPCState, reserve_val),
- "reserve_val");
+ offsetof(CPUPPCState, reserve_val),
+ "reserve_val");
+cpu_reserve_val2 = tcg_global_mem_new(cpu_env,
+  offsetof(CPUPPCState, reserve_val2),
+  "reserve_val2");
 
 cpu_fpscr = tcg_global_mem_new(cpu_env,
offsetof(CPUPPCState, fpscr), "fpscr");
@@ -3998,78 +4002,66 @@ static void gen_lqarx(DisasContext *ctx)
 /* stqcx. */
 static void gen_stqcx_(DisasContext *ctx)
 {
+TCGLabel *lab_fail, *lab_over;
 int rs = rS(ctx->opcode);
-TCGv EA, hi, lo;
+TCGv EA, t0, t1;
+TCGv_i128 cmp, val;
 
 if (unlikely(rs & 1)) {
 gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
 return;
 }
 
+lab_fail = gen_new_label();
+lab_over = gen_new_label();
+
 gen_set_access_type(ctx, ACCESS_RES);
 EA = tcg_temp_new();
 gen_addr_reg_index(ctx, EA);
 
+tcg_gen_brcon

[PATCH v5 29/36] target/s390x: Copy wout_x1 to wout_x1_P

2023-01-25 Thread Richard Henderson

Make a copy of wout_x1 before modifying it, as wout_x1_P
emphasizing that it operates on the out/out2 pair.  The insns
that use x1_P are data movement that will not change to Int128.

Acked-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/insn-data.h.inc | 12 ++--
 target/s390x/tcg/translate.c |  8 
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 79c6ab509a..d0814cb218 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -422,7 +422,7 @@
 F(0x3800, LER, RR_a,  Z,   0, e2, 0, cond_e1e2, mov2, 0, IF_AFP1 | 
IF_AFP2)
 F(0x7800, LE,  RX_a,  Z,   0, m2_32u, 0, e1, mov2, 0, IF_AFP1)
 F(0xed64, LEY, RXY_a, LD,  0, m2_32u, 0, e1, mov2, 0, IF_AFP1)
-F(0xb365, LXR, RRE,   Z,   x2h, x2l, 0, x1, movx, 0, IF_AFP1)
+F(0xb365, LXR, RRE,   Z,   x2h, x2l, 0, x1_P, movx, 0, IF_AFP1)
 /* LOAD IMMEDIATE */
 C(0xc001, LGFI,RIL_a, EI,  0, i2, 0, r1, mov2, 0)
 /* LOAD RELATIVE LONG */
@@ -461,7 +461,7 @@
 C(0xe332, LTGF,RXY_a, GIE, 0, a2, r1, 0, ld32s, s64)
 F(0xb302, LTEBR,   RRE,   Z,   0, e2, 0, cond_e1e2, mov2, f32, IF_BFP)
 F(0xb312, LTDBR,   RRE,   Z,   0, f2, 0, f1, mov2, f64, IF_BFP)
-F(0xb342, LTXBR,   RRE,   Z,   x2h, x2l, 0, x1, movx, f128, IF_BFP)
+F(0xb342, LTXBR,   RRE,   Z,   x2h, x2l, 0, x1_P, movx, f128, IF_BFP)
 /* LOAD AND TRAP */
 C(0xe39f, LAT, RXY_a, LAT, 0, m2_32u, r1, 0, lat, 0)
 C(0xe385, LGAT,RXY_a, LAT, 0, a2, r1, 0, lgat, 0)
@@ -483,7 +483,7 @@
 C(0xb913, LCGFR,   RRE,   Z,   0, r2_32s, r1, 0, neg, neg64)
 F(0xb303, LCEBR,   RRE,   Z,   0, e2, new, e1, negf32, f32, IF_BFP)
 F(0xb313, LCDBR,   RRE,   Z,   0, f2, new, f1, negf64, f64, IF_BFP)
-F(0xb343, LCXBR,   RRE,   Z,   x2h, x2l, new_P, x1, negf128, f128, IF_BFP)
+F(0xb343, LCXBR,   RRE,   Z,   x2h, x2l, new_P, x1_P, negf128, f128, 
IF_BFP)
 F(0xb373, LCDFR,   RRE,   FPSSH, 0, f2, new, f1, negf64, 0, IF_AFP1 | 
IF_AFP2)
 /* LOAD COUNT TO BLOCK BOUNDARY */
 C(0xe727, LCBB,RXE,   V,   la2, 0, r1, 0, lcbb, 0)
@@ -552,7 +552,7 @@
 C(0xb911, LNGFR,   RRE,   Z,   0, r2_32s, r1, 0, nabs, nabs64)
 F(0xb301, LNEBR,   RRE,   Z,   0, e2, new, e1, nabsf32, f32, IF_BFP)
 F(0xb311, LNDBR,   RRE,   Z,   0, f2, new, f1, nabsf64, f64, IF_BFP)
-F(0xb341, LNXBR,   RRE,   Z,   x2h, x2l, new_P, x1, nabsf128, f128, IF_BFP)
+F(0xb341, LNXBR,   RRE,   Z,   x2h, x2l, new_P, x1_P, nabsf128, f128, 
IF_BFP)
 F(0xb371, LNDFR,   RRE,   FPSSH, 0, f2, new, f1, nabsf64, 0, IF_AFP1 | 
IF_AFP2)
 /* LOAD ON CONDITION */
 C(0xb9f2, LOCR,RRF_c, LOC, r1, r2, new, r1_32, loc, 0)
@@ -577,7 +577,7 @@
 C(0xb910, LPGFR,   RRE,   Z,   0, r2_32s, r1, 0, abs, abs64)
 F(0xb300, LPEBR,   RRE,   Z,   0, e2, new, e1, absf32, f32, IF_BFP)
 F(0xb310, LPDBR,   RRE,   Z,   0, f2, new, f1, absf64, f64, IF_BFP)
-F(0xb340, LPXBR,   RRE,   Z,   x2h, x2l, new_P, x1, absf128, f128, IF_BFP)
+F(0xb340, LPXBR,   RRE,   Z,   x2h, x2l, new_P, x1_P, absf128, f128, 
IF_BFP)
 F(0xb370, LPDFR,   RRE,   FPSSH, 0, f2, new, f1, absf64, 0, IF_AFP1 | 
IF_AFP2)
 /* LOAD REVERSED */
 C(0xb91f, LRVR,RRE,   Z,   0, r2_32u, new, r1_32, rev32, 0)
@@ -588,7 +588,7 @@
 /* LOAD ZERO */
 F(0xb374, LZER,RRE,   Z,   0, 0, 0, e1, zero, 0, IF_AFP1)
 F(0xb375, LZDR,RRE,   Z,   0, 0, 0, f1, zero, 0, IF_AFP1)
-F(0xb376, LZXR,RRE,   Z,   0, 0, 0, x1, zero2, 0, IF_AFP1)
+F(0xb376, LZXR,RRE,   Z,   0, 0, 0, x1_P, zero2, 0, IF_AFP1)
 
 /* LOAD FPC */
 F(0xb29d, LFPC,S, Z,   0, m2_32u, 0, 0, sfpc, 0, IF_BFP)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index f3e4b70ed9..d25b6f3c03 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -5518,6 +5518,14 @@ static void wout_x1(DisasContext *s, DisasOps *o)
 }
 #define SPEC_wout_x1 SPEC_r1_f128
 
+static void wout_x1_P(DisasContext *s, DisasOps *o)
+{
+int f1 = get_field(s, r1);
+store_freg(f1, o->out);
+store_freg(f1 + 2, o->out2);
+}
+#define SPEC_wout_x1_P SPEC_r1_f128
+
 static void wout_cond_r1r2_32(DisasContext *s, DisasOps *o)
 {
 if (get_field(s, r1) != get_field(s, r2)) {
-- 
2.34.1

[PATCH v5 25/36] target/s390x: Use a single return for helper_divs64/u64

2023-01-25 Thread Richard Henderson

Pack the quotient and remainder into a single Int128.
Use the divu128 primitive to remove the cpu_abort on
32-bit hosts.

Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
v2: Extended div test case to cover these insns.
---
 target/s390x/helper.h |  4 ++--
 target/s390x/tcg/int_helper.c | 38 +--
 target/s390x/tcg/translate.c  | 14 +
 tests/tcg/s390x/div.c | 35 
 4 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index bc828d976b..593f3c8bee 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -12,8 +12,8 @@ DEF_HELPER_3(clcl, i32, env, i32, i32)
 DEF_HELPER_FLAGS_4(clm, TCG_CALL_NO_WG, i32, env, i32, i32, i64)
 DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, i64, env, s64, s64)
 DEF_HELPER_FLAGS_3(divu32, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, s64, env, s64, s64)
-DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, i128, env, s64, s64)
+DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
 DEF_HELPER_3(srst, void, env, i32, i32)
 DEF_HELPER_3(srstu, void, env, i32, i32)
 DEF_HELPER_4(clst, i64, env, i64, i64, i64)
diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
index 7260583cf2..eb8e6dd1b5 100644
--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -76,46 +76,26 @@ uint64_t HELPER(divu32)(CPUS390XState *env, uint64_t a, 
uint64_t b64)
 }
 
 /* 64/64 -> 64 signed division */
-int64_t HELPER(divs64)(CPUS390XState *env, int64_t a, int64_t b)
+Int128 HELPER(divs64)(CPUS390XState *env, int64_t a, int64_t b)
 {
 /* Catch divide by zero, and non-representable quotient (MIN / -1).  */
 if (b == 0 || (b == -1 && a == (1ll << 63))) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
-env->retxl = a % b;
-return a / b;
+return int128_make128(a / b, a % b);
 }
 
 /* 128 -> 64/64 unsigned division */
-uint64_t HELPER(divu64)(CPUS390XState *env, uint64_t ah, uint64_t al,
-uint64_t b)
+Int128 HELPER(divu64)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t b)
 {
-uint64_t ret;
-/* Signal divide by zero.  */
-if (b == 0) {
-tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
-}
-if (ah == 0) {
-/* 64 -> 64/64 case */
-env->retxl = al % b;
-ret = al / b;
-} else {
-/* ??? Move i386 idivq helper to host-utils.  */
-#ifdef CONFIG_INT128
-__uint128_t a = ((__uint128_t)ah << 64) | al;
-__uint128_t q = a / b;
-env->retxl = a % b;
-ret = q;
-if (ret != q) {
-tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
+if (b != 0) {
+uint64_t r = divu128(&al, &ah, b);
+if (ah == 0) {
+return int128_make128(al, r);
 }
-#else
-/* 32-bit hosts would need special wrapper functionality - just abort 
if
-   we encounter such a case; it's very unlikely anyways. */
-cpu_abort(env_cpu(env), "128 -> 64/64 division not implemented\n");
-#endif
 }
-return ret;
+/* divide by zero or overflow */
+tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
 uint64_t HELPER(cvd)(int32_t reg)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 169f7ee1b2..6953b81de7 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2409,15 +2409,21 @@ static DisasJumpType op_divu32(DisasContext *s, 
DisasOps *o)
 
 static DisasJumpType op_divs64(DisasContext *s, DisasOps *o)
 {
-gen_helper_divs64(o->out2, cpu_env, o->in1, o->in2);
-return_low128(o->out);
+TCGv_i128 t = tcg_temp_new_i128();
+
+gen_helper_divs64(t, cpu_env, o->in1, o->in2);
+tcg_gen_extr_i128_i64(o->out2, o->out, t);
+tcg_temp_free_i128(t);
 return DISAS_NEXT;
 }
 
 static DisasJumpType op_divu64(DisasContext *s, DisasOps *o)
 {
-gen_helper_divu64(o->out2, cpu_env, o->out, o->out2, o->in2);
-return_low128(o->out);
+TCGv_i128 t = tcg_temp_new_i128();
+
+gen_helper_divu64(t, cpu_env, o->out, o->out2, o->in2);
+tcg_gen_extr_i128_i64(o->out2, o->out, t);
+tcg_temp_free_i128(t);
 return DISAS_NEXT;
 }
 
diff --git a/tests/tcg/s390x/div.c b/tests/tcg/s390x/div.c
index 5807295614..6ad9900e08 100644
--- a/tests/tcg/s390x/div.c
+++ b/tests/tcg/s390x/div.c
@@ -33,8 +33,43 @@ static void test_dlr(void)
 assert(r == 1);
 }
 
+static void test_dsgr(void)
+{
+register int64_t r0 asm("r0") = -1;
+register int64_t r1 asm("r1") = -4241;
+int64_t b = 101, q, r;
+
+asm("dsgr %[r0],%[b]"
+: [r0] "+r" (r0), [r1] "+r" (r1)
+: [b] "r" (b)
+: "cc");
+q = r1;
+r = r0;
+assert(q == -41);
+assert(r == -

[PATCH v5 17/36] tcg: Split out tcg_gen_nonatomic_cmpxchg_i{32,64}

2023-01-25 Thread Richard Henderson

Normally this is automatically handled by the CF_PARALLEL checks
with in tcg_gen_atomic_cmpxchg_i{32,64}, but x86 has a special
case of !PREFIX_LOCK where it always wants the non-atomic version.

Split these out so that x86 does not have to roll its own.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op.h |   4 ++
 tcg/tcg-op.c | 154 +++
 2 files changed, 101 insertions(+), 57 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 31bf3d287e..839d91c0c7 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -910,6 +910,10 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, 
TCGv_i64,
 void tcg_gen_atomic_cmpxchg_i128(TCGv_i128, TCGv, TCGv_i128, TCGv_i128,
  TCGArg, MemOp);
 
+void tcg_gen_nonatomic_cmpxchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGv_i32,
+   TCGArg, MemOp);
+void tcg_gen_nonatomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGv_i64,
+   TCGArg, MemOp);
 void tcg_gen_nonatomic_cmpxchg_i128(TCGv_i128, TCGv, TCGv_i128, TCGv_i128,
 TCGArg, MemOp);
 
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 5811ecd3e7..c581ae77c4 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -3325,82 +3325,122 @@ static void * const table_cmpxchg[(MO_SIZE | MO_BSWAP) 
+ 1] = {
 WITH_ATOMIC128([MO_128 | MO_BE] = gen_helper_atomic_cmpxchgo_be)
 };
 
+void tcg_gen_nonatomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
+   TCGv_i32 newv, TCGArg idx, MemOp memop)
+{
+TCGv_i32 t1 = tcg_temp_new_i32();
+TCGv_i32 t2 = tcg_temp_new_i32();
+
+tcg_gen_ext_i32(t2, cmpv, memop & MO_SIZE);
+
+tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
+tcg_gen_movcond_i32(TCG_COND_EQ, t2, t1, t2, newv, t1);
+tcg_gen_qemu_st_i32(t2, addr, idx, memop);
+tcg_temp_free_i32(t2);
+
+if (memop & MO_SIGN) {
+tcg_gen_ext_i32(retv, t1, memop);
+} else {
+tcg_gen_mov_i32(retv, t1);
+}
+tcg_temp_free_i32(t1);
+}
+
 void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
 TCGv_i32 newv, TCGArg idx, MemOp memop)
 {
-memop = tcg_canonicalize_memop(memop, 0, 0);
+gen_atomic_cx_i32 gen;
+MemOpIdx oi;
 
 if (!(tcg_ctx->gen_tb->cflags & CF_PARALLEL)) {
-TCGv_i32 t1 = tcg_temp_new_i32();
-TCGv_i32 t2 = tcg_temp_new_i32();
-
-tcg_gen_ext_i32(t2, cmpv, memop & MO_SIZE);
-
-tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
-tcg_gen_movcond_i32(TCG_COND_EQ, t2, t1, t2, newv, t1);
-tcg_gen_qemu_st_i32(t2, addr, idx, memop);
-tcg_temp_free_i32(t2);
-
-if (memop & MO_SIGN) {
-tcg_gen_ext_i32(retv, t1, memop);
-} else {
-tcg_gen_mov_i32(retv, t1);
-}
-tcg_temp_free_i32(t1);
-} else {
-gen_atomic_cx_i32 gen;
-MemOpIdx oi;
-
-gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
-tcg_debug_assert(gen != NULL);
-
-oi = make_memop_idx(memop & ~MO_SIGN, idx);
-gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
-
-if (memop & MO_SIGN) {
-tcg_gen_ext_i32(retv, retv, memop);
-}
+tcg_gen_nonatomic_cmpxchg_i32(retv, addr, cmpv, newv, idx, memop);
+return;
 }
+
+memop = tcg_canonicalize_memop(memop, 0, 0);
+gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
+tcg_debug_assert(gen != NULL);
+
+oi = make_memop_idx(memop & ~MO_SIGN, idx);
+gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
+
+if (memop & MO_SIGN) {
+tcg_gen_ext_i32(retv, retv, memop);
+}
+}
+
+void tcg_gen_nonatomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
+   TCGv_i64 newv, TCGArg idx, MemOp memop)
+{
+TCGv_i64 t1, t2;
+
+if (TCG_TARGET_REG_BITS == 32 && (memop & MO_SIZE) < MO_64) {
+tcg_gen_nonatomic_cmpxchg_i32(TCGV_LOW(retv), addr, TCGV_LOW(cmpv),
+  TCGV_LOW(newv), idx, memop);
+if (memop & MO_SIGN) {
+tcg_gen_sari_i32(TCGV_HIGH(retv), TCGV_LOW(retv), 31);
+} else {
+tcg_gen_movi_i32(TCGV_HIGH(retv), 0);
+}
+return;
+}
+
+t1 = tcg_temp_new_i64();
+t2 = tcg_temp_new_i64();
+
+tcg_gen_ext_i64(t2, cmpv, memop & MO_SIZE);
+
+tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
+tcg_gen_movcond_i64(TCG_COND_EQ, t2, t1, t2, newv, t1);
+tcg_gen_qemu_st_i64(t2, addr, idx, memop);
+tcg_temp_free_i64(t2);
+
+if (memop & MO_SIGN) {
+tcg_gen_ext_i64(retv, t1, memop);
+} else {
+tcg_gen_mov_i64(retv, t1);
+}
+tcg_temp_free_i64(t1);
 }
 
 void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
 TCGv_i64 newv, TCGArg idx, Mem

[PATCH v5 14/36] tcg: Add basic data movement for TCGv_i128

2023-01-25 Thread Richard Henderson

Add code generation functions for data movement between
TCGv_i128 (mov) and to/from TCGv_i64 (concat, extract).

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op.h |  4 
 tcg/tcg-internal.h   | 13 +
 tcg/tcg-op.c | 20 
 3 files changed, 37 insertions(+)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 79b1cf786f..c4276767d1 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -712,6 +712,10 @@ void tcg_gen_extrh_i64_i32(TCGv_i32 ret, TCGv_i64 arg);
 void tcg_gen_extr_i64_i32(TCGv_i32 lo, TCGv_i32 hi, TCGv_i64 arg);
 void tcg_gen_extr32_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i64 arg);
 
+void tcg_gen_mov_i128(TCGv_i128 dst, TCGv_i128 src);
+void tcg_gen_extr_i128_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i128 arg);
+void tcg_gen_concat_i64_i128(TCGv_i128 ret, TCGv_i64 lo, TCGv_i64 hi);
+
 static inline void tcg_gen_concat32_i64(TCGv_i64 ret, TCGv_i64 lo, TCGv_i64 hi)
 {
 tcg_gen_deposit_i64(ret, lo, hi, 32, 32);
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index 33f1d8b411..e542a4e9b7 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -117,4 +117,17 @@ extern TCGv_i32 TCGV_LOW(TCGv_i64) QEMU_ERROR("32-bit code 
path is reachable");
 extern TCGv_i32 TCGV_HIGH(TCGv_i64) QEMU_ERROR("32-bit code path is 
reachable");
 #endif
 
+static inline TCGv_i64 TCGV128_LOW(TCGv_i128 t)
+{
+/* For 32-bit, offset by 2, which may then have TCGV_{LOW,HIGH} applied. */
+int o = HOST_BIG_ENDIAN ? 64 / TCG_TARGET_REG_BITS : 0;
+return temp_tcgv_i64(tcgv_i128_temp(t) + o);
+}
+
+static inline TCGv_i64 TCGV128_HIGH(TCGv_i128 t)
+{
+int o = HOST_BIG_ENDIAN ? 0 : 64 / TCG_TARGET_REG_BITS;
+return temp_tcgv_i64(tcgv_i128_temp(t) + o);
+}
+
 #endif /* TCG_INTERNAL_H */
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 326a9180ef..cb83d2375d 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2747,6 +2747,26 @@ void tcg_gen_extr32_i64(TCGv_i64 lo, TCGv_i64 hi, 
TCGv_i64 arg)
 tcg_gen_shri_i64(hi, arg, 32);
 }
 
+void tcg_gen_extr_i128_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i128 arg)
+{
+tcg_gen_mov_i64(lo, TCGV128_LOW(arg));
+tcg_gen_mov_i64(hi, TCGV128_HIGH(arg));
+}
+
+void tcg_gen_concat_i64_i128(TCGv_i128 ret, TCGv_i64 lo, TCGv_i64 hi)
+{
+tcg_gen_mov_i64(TCGV128_LOW(ret), lo);
+tcg_gen_mov_i64(TCGV128_HIGH(ret), hi);
+}
+
+void tcg_gen_mov_i128(TCGv_i128 dst, TCGv_i128 src)
+{
+if (dst != src) {
+tcg_gen_mov_i64(TCGV128_LOW(dst), TCGV128_LOW(src));
+tcg_gen_mov_i64(TCGV128_HIGH(dst), TCGV128_HIGH(src));
+}
+}
+
 /* QEMU specific operations.  */
 
 void tcg_gen_exit_tb(const TranslationBlock *tb, unsigned idx)
-- 
2.34.1

[PATCH v5 35/36] target/i386: Inline cmpxchg8b

2023-01-25 Thread Richard Henderson

Use tcg_gen_atomic_cmpxchg_i64 for the atomic case,
and tcg_gen_nonatomic_cmpxchg_i64 otherwise.

Signed-off-by: Richard Henderson 
---
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
---
 target/i386/helper.h |  2 --
 target/i386/tcg/mem_helper.c | 57 
 target/i386/tcg/translate.c  | 54 ++
 3 files changed, 49 insertions(+), 64 deletions(-)

diff --git a/target/i386/helper.h b/target/i386/helper.h
index b7de5429ef..2df8049f91 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -66,8 +66,6 @@ DEF_HELPER_1(rsm, void, env)
 #endif /* !CONFIG_USER_ONLY */
 
 DEF_HELPER_2(into, void, env, int)
-DEF_HELPER_2(cmpxchg8b_unlocked, void, env, tl)
-DEF_HELPER_2(cmpxchg8b, void, env, tl)
 #ifdef TARGET_X86_64
 DEF_HELPER_2(cmpxchg16b_unlocked, void, env, tl)
 DEF_HELPER_2(cmpxchg16b, void, env, tl)
diff --git a/target/i386/tcg/mem_helper.c b/target/i386/tcg/mem_helper.c
index e3cdafd2d4..814786bb87 100644
--- a/target/i386/tcg/mem_helper.c
+++ b/target/i386/tcg/mem_helper.c
@@ -27,63 +27,6 @@
 #include "tcg/tcg.h"
 #include "helper-tcg.h"
 
-void helper_cmpxchg8b_unlocked(CPUX86State *env, target_ulong a0)
-{
-uintptr_t ra = GETPC();
-uint64_t oldv, cmpv, newv;
-int eflags;
-
-eflags = cpu_cc_compute_all(env, CC_OP);
-
-cmpv = deposit64(env->regs[R_EAX], 32, 32, env->regs[R_EDX]);
-newv = deposit64(env->regs[R_EBX], 32, 32, env->regs[R_ECX]);
-
-oldv = cpu_ldq_data_ra(env, a0, ra);
-newv = (cmpv == oldv ? newv : oldv);
-/* always do the store */
-cpu_stq_data_ra(env, a0, newv, ra);
-
-if (oldv == cmpv) {
-eflags |= CC_Z;
-} else {
-env->regs[R_EAX] = (uint32_t)oldv;
-env->regs[R_EDX] = (uint32_t)(oldv >> 32);
-eflags &= ~CC_Z;
-}
-CC_SRC = eflags;
-}
-
-void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
-{
-#ifdef CONFIG_ATOMIC64
-uint64_t oldv, cmpv, newv;
-int eflags;
-
-eflags = cpu_cc_compute_all(env, CC_OP);
-
-cmpv = deposit64(env->regs[R_EAX], 32, 32, env->regs[R_EDX]);
-newv = deposit64(env->regs[R_EBX], 32, 32, env->regs[R_ECX]);
-
-{
-uintptr_t ra = GETPC();
-int mem_idx = cpu_mmu_index(env, false);
-MemOpIdx oi = make_memop_idx(MO_TEUQ, mem_idx);
-oldv = cpu_atomic_cmpxchgq_le_mmu(env, a0, cmpv, newv, oi, ra);
-}
-
-if (oldv == cmpv) {
-eflags |= CC_Z;
-} else {
-env->regs[R_EAX] = (uint32_t)oldv;
-env->regs[R_EDX] = (uint32_t)(oldv >> 32);
-eflags &= ~CC_Z;
-}
-CC_SRC = eflags;
-#else
-cpu_loop_exit_atomic(env_cpu(env), GETPC());
-#endif /* CONFIG_ATOMIC64 */
-}
-
 #ifdef TARGET_X86_64
 void helper_cmpxchg16b_unlocked(CPUX86State *env, target_ulong a0)
 {
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index a82131d635..b542b084a6 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2995,15 +2995,59 @@ static void gen_sty_env_A0(DisasContext *s, int offset, 
bool align)
 
 static void gen_cmpxchg8b(DisasContext *s, CPUX86State *env, int modrm)
 {
+TCGv_i64 cmp, val, old;
+TCGv Z;
+
 gen_lea_modrm(env, s, modrm);
 
-if ((s->prefix & PREFIX_LOCK) &&
-(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cmpxchg8b(cpu_env, s->A0);
+cmp = tcg_temp_new_i64();
+val = tcg_temp_new_i64();
+old = tcg_temp_new_i64();
+
+/* Construct the comparison values from the register pair. */
+tcg_gen_concat_tl_i64(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
+tcg_gen_concat_tl_i64(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
+
+/* Only require atomic with LOCK; non-parallel handled in generator. */
+if (s->prefix & PREFIX_LOCK) {
+tcg_gen_atomic_cmpxchg_i64(old, s->A0, cmp, val, s->mem_index, 
MO_TEUQ);
 } else {
-gen_helper_cmpxchg8b_unlocked(cpu_env, s->A0);
+tcg_gen_nonatomic_cmpxchg_i64(old, s->A0, cmp, val,
+  s->mem_index, MO_TEUQ);
 }
-set_cc_op(s, CC_OP_EFLAGS);
+tcg_temp_free_i64(val);
+
+/* Set tmp0 to match the required value of Z. */
+tcg_gen_setcond_i64(TCG_COND_EQ, cmp, old, cmp);
+Z = tcg_temp_new();
+tcg_gen_trunc_i64_tl(Z, cmp);
+tcg_temp_free_i64(cmp);
+
+/*
+ * Extract the result values for the register pair.
+ * For 32-bit, we may do this unconditionally, because on success (Z=1),
+ * the old value matches the previous value in EDX:EAX.  For x86_64,
+ * the store must be conditional, because we must leave the source
+ * registers unchanged on success, and zero-extend the writeback
+ * on failure (Z=0).
+ */
+if (TARGET_LONG_BITS == 32) {
+tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], old);
+} else {
+TCGv zero = tcg_constant_tl(0);
+
+tcg_gen_extr_i64_tl(s->T0, s->T1, old);
+tcg_gen_movcond_tl(TCG_COND_EQ, cpu_regs[R_EAX], Z, zero,
+

[PATCH v5 16/36] tcg: Add tcg_gen_{non}atomic_cmpxchg_i128

2023-01-25 Thread Richard Henderson

This will allow targets to avoid rolling their own.

Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-runtime.h   | 11 +
 include/tcg/tcg-op.h  |  5 +++
 tcg/tcg-op.c  | 85 +++
 accel/tcg/atomic_common.c.inc | 45 +++
 4 files changed, 146 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 37cbd722bf..e141a6ab24 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -55,6 +55,17 @@ DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
 DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
i64, env, tl, i64, i64, i32)
 #endif
+#ifdef CONFIG_CMPXCHG128
+DEF_HELPER_FLAGS_5(atomic_cmpxchgo_be, TCG_CALL_NO_WG,
+   i128, env, tl, i128, i128, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgo_le, TCG_CALL_NO_WG,
+   i128, env, tl, i128, i128, i32)
+#endif
+
+DEF_HELPER_FLAGS_5(nonatomic_cmpxchgo_be, TCG_CALL_NO_WG,
+   i128, env, tl, i128, i128, i32)
+DEF_HELPER_FLAGS_5(nonatomic_cmpxchgo_le, TCG_CALL_NO_WG,
+   i128, env, tl, i128, i128, i32)
 
 #ifdef CONFIG_ATOMIC64
 #define GEN_ATOMIC_HELPERS(NAME)  \
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index e5f5b63c37..31bf3d287e 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -907,6 +907,11 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32, TCGv, TCGv_i32, 
TCGv_i32,
 TCGArg, MemOp);
 void tcg_gen_atomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGv_i64,
 TCGArg, MemOp);
+void tcg_gen_atomic_cmpxchg_i128(TCGv_i128, TCGv, TCGv_i128, TCGv_i128,
+ TCGArg, MemOp);
+
+void tcg_gen_nonatomic_cmpxchg_i128(TCGv_i128, TCGv, TCGv_i128, TCGv_i128,
+TCGArg, MemOp);
 
 void tcg_gen_atomic_xchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, MemOp);
 void tcg_gen_atomic_xchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, MemOp);
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 33ef325f6e..5811ecd3e7 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -3295,6 +3295,8 @@ typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv_env, 
TCGv,
   TCGv_i32, TCGv_i32, TCGv_i32);
 typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv_env, TCGv,
   TCGv_i64, TCGv_i64, TCGv_i32);
+typedef void (*gen_atomic_cx_i128)(TCGv_i128, TCGv_env, TCGv,
+   TCGv_i128, TCGv_i128, TCGv_i32);
 typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv,
   TCGv_i32, TCGv_i32);
 typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv,
@@ -3305,6 +3307,11 @@ typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, 
TCGv,
 #else
 # define WITH_ATOMIC64(X)
 #endif
+#ifdef CONFIG_CMPXCHG128
+# define WITH_ATOMIC128(X) X,
+#else
+# define WITH_ATOMIC128(X)
+#endif
 
 static void * const table_cmpxchg[(MO_SIZE | MO_BSWAP) + 1] = {
 [MO_8] = gen_helper_atomic_cmpxchgb,
@@ -3314,6 +3321,8 @@ static void * const table_cmpxchg[(MO_SIZE | MO_BSWAP) + 
1] = {
 [MO_32 | MO_BE] = gen_helper_atomic_cmpxchgl_be,
 WITH_ATOMIC64([MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le)
 WITH_ATOMIC64([MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be)
+WITH_ATOMIC128([MO_128 | MO_LE] = gen_helper_atomic_cmpxchgo_le)
+WITH_ATOMIC128([MO_128 | MO_BE] = gen_helper_atomic_cmpxchgo_be)
 };
 
 void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
@@ -3412,6 +3421,82 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv 
addr, TCGv_i64 cmpv,
 }
 }
 
+void tcg_gen_nonatomic_cmpxchg_i128(TCGv_i128 retv, TCGv addr, TCGv_i128 cmpv,
+TCGv_i128 newv, TCGArg idx, MemOp memop)
+{
+if (TCG_TARGET_REG_BITS == 32) {
+/* Inline expansion below is simply too large for 32-bit hosts. */
+gen_atomic_cx_i128 gen = ((memop & MO_BSWAP) == MO_LE
+  ? gen_helper_nonatomic_cmpxchgo_le 
+  : gen_helper_nonatomic_cmpxchgo_be);
+MemOpIdx oi = make_memop_idx(memop, idx);
+
+tcg_debug_assert((memop & MO_SIZE) == MO_128);
+tcg_debug_assert((memop & MO_SIGN) == 0);
+
+gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
+} else {
+TCGv_i128 oldv = tcg_temp_new_i128();
+TCGv_i128 tmpv = tcg_temp_new_i128();
+TCGv_i64 t0 = tcg_temp_new_i64();
+TCGv_i64 t1 = tcg_temp_new_i64();
+TCGv_i64 z = tcg_constant_i64(0);
+
+tcg_gen_qemu_ld_i128(oldv, addr, idx, memop);
+
+/* Compare i128 */
+tcg_gen_xor_i64(t0, TCGV128_LOW(oldv), TCGV128_LOW(cmpv));
+tcg_gen_xor_i64(t1, TCGV128_HIGH(oldv), TCGV128_HIGH(cmpv));
+tcg_gen_or_i64(t0, t0, t1);
+
+/* tmpv = equal ? newv : oldv */
+tcg_gen_movcond_i64(TCG_COND_EQ, TCG

[PATCH v5 01/36] tcg: Define TCG_TYPE_I128 and related helper macros

2023-01-25 Thread Richard Henderson

Begin staging in support for TCGv_i128 with Int128.
Define the type enumerator, the typedef, and the
helper-head.h macros.

This cannot yet be used, because you can't allocate
temporaries of this new type.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/exec/helper-head.h |  7 +++
 include/tcg/tcg.h  | 17 ++---
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/exec/helper-head.h b/include/exec/helper-head.h
index bc6698b19f..b8d1140dc7 100644
--- a/include/exec/helper-head.h
+++ b/include/exec/helper-head.h
@@ -26,6 +26,7 @@
 #define dh_alias_int i32
 #define dh_alias_i64 i64
 #define dh_alias_s64 i64
+#define dh_alias_i128 i128
 #define dh_alias_f16 i32
 #define dh_alias_f32 i32
 #define dh_alias_f64 i64
@@ -40,6 +41,7 @@
 #define dh_ctype_int int
 #define dh_ctype_i64 uint64_t
 #define dh_ctype_s64 int64_t
+#define dh_ctype_i128 Int128
 #define dh_ctype_f16 uint32_t
 #define dh_ctype_f32 float32
 #define dh_ctype_f64 float64
@@ -71,6 +73,7 @@
 #define dh_retvar_decl0_noreturn void
 #define dh_retvar_decl0_i32 TCGv_i32 retval
 #define dh_retvar_decl0_i64 TCGv_i64 retval
+#define dh_retval_decl0_i128 TCGv_i128 retval
 #define dh_retvar_decl0_ptr TCGv_ptr retval
 #define dh_retvar_decl0(t) glue(dh_retvar_decl0_, dh_alias(t))
 
@@ -78,6 +81,7 @@
 #define dh_retvar_decl_noreturn
 #define dh_retvar_decl_i32 TCGv_i32 retval,
 #define dh_retvar_decl_i64 TCGv_i64 retval,
+#define dh_retvar_decl_i128 TCGv_i128 retval,
 #define dh_retvar_decl_ptr TCGv_ptr retval,
 #define dh_retvar_decl(t) glue(dh_retvar_decl_, dh_alias(t))
 
@@ -85,6 +89,7 @@
 #define dh_retvar_noreturn NULL
 #define dh_retvar_i32 tcgv_i32_temp(retval)
 #define dh_retvar_i64 tcgv_i64_temp(retval)
+#define dh_retvar_i128 tcgv_i128_temp(retval)
 #define dh_retvar_ptr tcgv_ptr_temp(retval)
 #define dh_retvar(t) glue(dh_retvar_, dh_alias(t))
 
@@ -95,6 +100,7 @@
 #define dh_typecode_i64 4
 #define dh_typecode_s64 5
 #define dh_typecode_ptr 6
+#define dh_typecode_i128 7
 #define dh_typecode_int dh_typecode_s32
 #define dh_typecode_f16 dh_typecode_i32
 #define dh_typecode_f32 dh_typecode_i32
@@ -104,6 +110,7 @@
 
 #define dh_callflag_i32  0
 #define dh_callflag_i64  0
+#define dh_callflag_i128 0
 #define dh_callflag_ptr  0
 #define dh_callflag_void 0
 #define dh_callflag_noreturn TCG_CALL_NO_RETURN
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 9a0ae7d20b..8b7e61e7a5 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -270,6 +270,7 @@ typedef struct TCGPool {
 typedef enum TCGType {
 TCG_TYPE_I32,
 TCG_TYPE_I64,
+TCG_TYPE_I128,
 
 TCG_TYPE_V64,
 TCG_TYPE_V128,
@@ -351,13 +352,14 @@ typedef tcg_target_ulong TCGArg;
in tcg/README. Target CPU front-end code uses these types to deal
with TCG variables as it emits TCG code via the tcg_gen_* functions.
They come in several flavours:
-* TCGv_i32 : 32 bit integer type
-* TCGv_i64 : 64 bit integer type
-* TCGv_ptr : a host pointer type
-* TCGv_vec : a host vector type; the exact size is not exposed
- to the CPU front-end code.
-* TCGv : an integer type the same size as target_ulong
- (an alias for either TCGv_i32 or TCGv_i64)
+* TCGv_i32  : 32 bit integer type
+* TCGv_i64  : 64 bit integer type
+* TCGv_i128 : 128 bit integer type
+* TCGv_ptr  : a host pointer type
+* TCGv_vec  : a host vector type; the exact size is not exposed
+  to the CPU front-end code.
+* TCGv  : an integer type the same size as target_ulong
+  (an alias for either TCGv_i32 or TCGv_i64)
The compiler's type checking will complain if you mix them
up and pass the wrong sized TCGv to a function.
 
@@ -377,6 +379,7 @@ typedef tcg_target_ulong TCGArg;
 
 typedef struct TCGv_i32_d *TCGv_i32;
 typedef struct TCGv_i64_d *TCGv_i64;
+typedef struct TCGv_i128_d *TCGv_i128;
 typedef struct TCGv_ptr_d *TCGv_ptr;
 typedef struct TCGv_vec_d *TCGv_vec;
 typedef TCGv_ptr TCGv_env;
-- 
2.34.1

[PATCH v5 19/36] target/arm: Use tcg_gen_atomic_cmpxchg_i128 for CASP

2023-01-25 Thread Richard Henderson

Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Message-Id: <20221112042555.2622152-3-richard.hender...@linaro.org>
---
 target/arm/helper-a64.h|  2 --
 target/arm/helper-a64.c| 43 ---
 target/arm/translate-a64.c | 61 +++---
 3 files changed, 18 insertions(+), 88 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 94065d1917..ff56807247 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -50,8 +50,6 @@ DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, ptr)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
-DEF_HELPER_5(casp_le_parallel, void, env, i32, i64, i64, i64)
-DEF_HELPER_5(casp_be_parallel, void, env, i32, i64, i64, i64)
 DEF_HELPER_FLAGS_3(advsimd_maxh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
 DEF_HELPER_FLAGS_3(advsimd_minh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
 DEF_HELPER_FLAGS_3(advsimd_maxnumh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 7dbdb2c233..0972a4bdd0 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -505,49 +505,6 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, 
uint32_t bytes)
 return crc32c(acc, buf, bytes) ^ 0x;
 }
 
-/* Writes back the old data into Rs.  */
-void HELPER(casp_le_parallel)(CPUARMState *env, uint32_t rs, uint64_t addr,
-  uint64_t new_lo, uint64_t new_hi)
-{
-Int128 oldv, cmpv, newv;
-uintptr_t ra = GETPC();
-int mem_idx;
-MemOpIdx oi;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_LE | MO_128 | MO_ALIGN, mem_idx);
-
-cmpv = int128_make128(env->xregs[rs], env->xregs[rs + 1]);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv, oi, ra);
-
-env->xregs[rs] = int128_getlo(oldv);
-env->xregs[rs + 1] = int128_gethi(oldv);
-}
-
-void HELPER(casp_be_parallel)(CPUARMState *env, uint32_t rs, uint64_t addr,
-  uint64_t new_hi, uint64_t new_lo)
-{
-Int128 oldv, cmpv, newv;
-uintptr_t ra = GETPC();
-int mem_idx;
-MemOpIdx oi;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_LE | MO_128 | MO_ALIGN, mem_idx);
-
-cmpv = int128_make128(env->xregs[rs + 1], env->xregs[rs]);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv, oi, ra);
-
-env->xregs[rs + 1] = int128_getlo(oldv);
-env->xregs[rs] = int128_gethi(oldv);
-}
-
 /*
  * AdvSIMD half-precision
  */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index bd97666ddc..6678894ec7 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -2694,53 +2694,28 @@ static void gen_compare_and_swap_pair(DisasContext *s, 
int rs, int rt,
 tcg_gen_extr32_i64(s2, s1, cmp);
 }
 tcg_temp_free_i64(cmp);
-} else if (tb_cflags(s->base.tb) & CF_PARALLEL) {
-if (HAVE_CMPXCHG128) {
-TCGv_i32 tcg_rs = tcg_constant_i32(rs);
-if (s->be_data == MO_LE) {
-gen_helper_casp_le_parallel(cpu_env, tcg_rs,
-clean_addr, t1, t2);
-} else {
-gen_helper_casp_be_parallel(cpu_env, tcg_rs,
-clean_addr, t1, t2);
-}
-} else {
-gen_helper_exit_atomic(cpu_env);
-s->base.is_jmp = DISAS_NORETURN;
-}
 } else {
-TCGv_i64 d1 = tcg_temp_new_i64();
-TCGv_i64 d2 = tcg_temp_new_i64();
-TCGv_i64 a2 = tcg_temp_new_i64();
-TCGv_i64 c1 = tcg_temp_new_i64();
-TCGv_i64 c2 = tcg_temp_new_i64();
-TCGv_i64 zero = tcg_constant_i64(0);
+TCGv_i128 cmp = tcg_temp_new_i128();
+TCGv_i128 val = tcg_temp_new_i128();
 
-/* Load the two words, in memory order.  */
-tcg_gen_qemu_ld_i64(d1, clean_addr, memidx,
-MO_64 | MO_ALIGN_16 | s->be_data);
-tcg_gen_addi_i64(a2, clean_addr, 8);
-tcg_gen_qemu_ld_i64(d2, a2, memidx, MO_64 | s->be_data);
+if (s->be_data == MO_LE) {
+tcg_gen_concat_i64_i128(val, t1, t2);
+tcg_gen_concat_i64_i128(cmp, s1, s2);
+} else {
+tcg_gen_concat_i64_i128(val, t2, t1);
+tcg_gen_concat_i64_i128(cmp, s2, s1);
+}
 
-/* Compare the two words, also in memory order.  */
-tcg_gen_setcond_i64(TCG_COND_EQ, c1, d1, s1);
-tcg_gen_setcond_i64(TCG_COND_EQ, c2, d2, s2);
-tcg_gen_and_i64(c2, c2, c1);
+tcg_gen_atomic_cmpxchg_i128(cmp, clean_addr, cmp, val, memidx,
+

[PATCH v5 12/36] tcg: Add TCG_TARGET_CALL_{RET,ARG}_I128

2023-01-25 Thread Richard Henderson

Fill in the parameters for the host ABI for Int128 for
those backends which require no extra modification.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h | 2 ++
 tcg/arm/tcg-target.h | 2 ++
 tcg/loongarch64/tcg-target.h | 2 ++
 tcg/mips/tcg-target.h| 2 ++
 tcg/riscv/tcg-target.h   | 3 +++
 tcg/s390x/tcg-target.h   | 2 ++
 tcg/sparc64/tcg-target.h | 2 ++
 tcg/tcg.c| 6 +++---
 tcg/ppc/tcg-target.c.inc | 3 +++
 9 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 8d244292aa..c0b0f614ba 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -54,6 +54,8 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET0
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 /* optional instructions */
 #define TCG_TARGET_HAS_div_i32  1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 91b8954804..def2a189e6 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -91,6 +91,8 @@ extern bool use_neon_instructions;
 #define TCG_TARGET_CALL_STACK_OFFSET   0
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_BY_REF
 
 /* optional instructions */
 #define TCG_TARGET_HAS_ext8s_i321
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 8b151e7f6f..17b8193aa5 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -92,6 +92,8 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET0
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 /* optional instructions */
 #define TCG_TARGET_HAS_movcond_i32  1
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 7bc8e15293..68b11e4d48 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -89,6 +89,8 @@ typedef enum {
 # define TCG_TARGET_CALL_ARG_I64  TCG_CALL_ARG_NORMAL
 #endif
 #define TCG_TARGET_CALL_ARG_I32   TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128  TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_RET_I128  TCG_CALL_RET_NORMAL
 
 /* MOVN/MOVZ instructions detection */
 #if (defined(__mips_isa_rev) && (__mips_isa_rev >= 1)) || \
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 1337bc1f1e..0deb33701f 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -85,9 +85,12 @@ typedef enum {
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
 #if TCG_TARGET_REG_BITS == 32
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN
 #else
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
 #endif
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 /* optional instructions */
 #define TCG_TARGET_HAS_movcond_i32  0
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index e597e47e60..a05b473117 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -169,6 +169,8 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_CALL_STACK_OFFSET   160
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_EXTEND
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_BY_REF
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_BY_REF
 
 #define TCG_TARGET_HAS_MEMORY_BSWAP   1
 
diff --git a/tcg/sparc64/tcg-target.h b/tcg/sparc64/tcg-target.h
index 1d6a5c8b07..ffe22b1d21 100644
--- a/tcg/sparc64/tcg-target.h
+++ b/tcg/sparc64/tcg-target.h
@@ -73,6 +73,8 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET(128 + 6*8 + TCG_TARGET_STACK_BIAS)
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_EXTEND
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 #if defined(__VIS__) && __VIS__ >= 0x300
 #define use_vis3_instructions  1
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 4c43fd28ba..63e0753ded 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -765,8 +765,8 @@ static void init_call_layout(TCGHelperInfo *info)
 break;
 case dh_typecode_i128:
 info->nr_out = 128 / TCG_TARGET_REG_BITS;
-info->out_kind = TCG_CALL_RET_NORMAL; /* TODO */
-switch (/* TODO */ TCG_CALL_RET_NORMAL) {
+info->out_kind

[PATCH v5 10/36] tcg/tci: Fix big-endian return register ordering

2023-01-25 Thread Richard Henderson

We expect the backend to require register pairs in
host-endian ordering, thus for big-endian the first
register of a pair contains the high part.
We were forcing R0 to contain the low part for calls.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 05a24163d3..eeccdde8bc 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -520,27 +520,28 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 ffi_call(pptr[1], pptr[0], stack, call_slots);
 }
 
-/* Any result winds up "left-aligned" in the stack[0] slot. */
 switch (len) {
 case 0: /* void */
 break;
 case 1: /* uint32_t */
 /*
+ * The result winds up "left-aligned" in the stack[0] slot.
  * Note that libffi has an odd special case in that it will
  * always widen an integral result to ffi_arg.
  */
-if (sizeof(ffi_arg) == 4) {
-regs[TCG_REG_R0] = *(uint32_t *)stack;
-break;
-}
-/* fall through */
-case 2: /* uint64_t */
-if (TCG_TARGET_REG_BITS == 32) {
-tci_write_reg64(regs, TCG_REG_R1, TCG_REG_R0, stack[0]);
+if (sizeof(ffi_arg) == 8) {
+regs[TCG_REG_R0] = (uint32_t)stack[0];
 } else {
-regs[TCG_REG_R0] = stack[0];
+regs[TCG_REG_R0] = *(uint32_t *)stack;
 }
 break;
+case 2: /* uint64_t */
+/*
+ * For TCG_TARGET_REG_BITS == 32, the register pair
+ * must stay in host memory order.
+ */
+memcpy(®s[TCG_REG_R0], stack, 8);
+break;
 default:
 g_assert_not_reached();
 }
-- 
2.34.1

[PATCH v5 30/36] target/s390x: Use Int128 for returning float128

2023-01-25 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
v2: Remove extraneous return_low128.
Cc: David Hildenbrand 
Cc: Ilya Leoshkevich 
---
 target/s390x/helper.h| 22 +++---
 target/s390x/tcg/insn-data.h.inc | 20 ++---
 target/s390x/tcg/fpu_helper.c| 29 +-
 target/s390x/tcg/translate.c | 51 +---
 4 files changed, 63 insertions(+), 59 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index b4170a4256..d40aeb471f 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -31,32 +31,32 @@ DEF_HELPER_4(clcle, i32, env, i32, i64, i32)
 DEF_HELPER_4(clclu, i32, env, i32, i64, i32)
 DEF_HELPER_3(cegb, i64, env, s64, i32)
 DEF_HELPER_3(cdgb, i64, env, s64, i32)
-DEF_HELPER_3(cxgb, i64, env, s64, i32)
+DEF_HELPER_3(cxgb, i128, env, s64, i32)
 DEF_HELPER_3(celgb, i64, env, i64, i32)
 DEF_HELPER_3(cdlgb, i64, env, i64, i32)
-DEF_HELPER_3(cxlgb, i64, env, i64, i32)
+DEF_HELPER_3(cxlgb, i128, env, i64, i32)
 DEF_HELPER_4(cdsg, void, env, i64, i32, i32)
 DEF_HELPER_4(cdsg_parallel, void, env, i64, i32, i32)
 DEF_HELPER_4(csst, i32, env, i32, i64, i64)
 DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_3(seb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(sdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_3(deb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(ddb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_3(meeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
-DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
 DEF_HELPER_FLAGS_2(ldeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_4(ldxb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
-DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i64, env, i64)
+DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i128, env, i64)
+DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i128, env, i64)
 DEF_HELPER_FLAGS_3(ledb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_4(lexb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
 DEF_HELPER_FLAGS_3(ceb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
@@ -79,7 +79,7 @@ DEF_HELPER_3(clfdb, i64, env, i64, i32)
 DEF_HELPER_4(clfxb, i64, env, i64, i64, i32)
 DEF_HELPER_FLAGS_3(fieb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(fidb, TCG_CALL_NO_WG, i64, env, i64, i32)
-DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
+DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i128, env, i64, i64, i32)
 DEF_HELPER_FLAGS_4(maeb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(madb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mseb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
@@ -89,7 +89,7 @@ DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, 
i64)
 DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i128, env, i64, i64)
 DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
 DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(pka, TCG_CALL_NO_WG, void, env, i64, i64, i32)
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index d0814cb218..517a4500ae 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -306,10 +306,10 @@
 /* CONVERT FROM FIXED */
 F(0xb394, CEFBR,   RRF_e, Z,   0, r2_32s, new, e1, cegb, 0, IF_BFP)
 F(0xb395, CDFBR,   RRF_e, Z,   0, r2_32s, new, f1, cdgb, 0, IF_BFP)
-F(0xb396, CXFBR,   RRF_e, Z,   0, r2_32s, new_P, x1, cxgb, 0, IF_BFP)
+F(0xb396, CXFBR,   RRF_e, Z,   0, r2_32s, new_x, x1, cxgb, 0, IF_BFP)
 F(0xb3a4, CEGBR,   RRF_e, Z,   0, r2_o, new, e1, cegb, 0, IF_BFP)
 F(0xb3a5, CDGBR,   RRF_e, Z,   0, r2_o, new, f1, cdgb, 0, IF_BFP)
-F(0xb3a6, CXGBR,   RRF_e, Z,   0, r2_o, new_P, x1, cxgb, 0, IF_BFP)
+F(0xb3a6, CXGBR,   RRF_e, Z,   0, r2_o, new_x, x

[PATCH v5 03/36] tcg: Allocate objects contiguously in temp_allocate_frame

2023-01-25 Thread Richard Henderson

When allocating a temp to the stack frame, consider the
base type and allocate all parts at once.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ffddda96ed..ff30f5e141 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3264,11 +3264,12 @@ static bool liveness_pass_2(TCGContext *s)
 
 static void temp_allocate_frame(TCGContext *s, TCGTemp *ts)
 {
-int size = tcg_type_size(ts->type);
-int align;
 intptr_t off;
+int size, align;
 
-switch (ts->type) {
+/* When allocating an object, look at the full type. */
+size = tcg_type_size(ts->base_type);
+switch (ts->base_type) {
 case TCG_TYPE_I32:
 align = 4;
 break;
@@ -3299,13 +3300,26 @@ static void temp_allocate_frame(TCGContext *s, TCGTemp 
*ts)
 tcg_raise_tb_overflow(s);
 }
 s->current_frame_offset = off + size;
-
-ts->mem_offset = off;
 #if defined(__sparc__)
-ts->mem_offset += TCG_TARGET_STACK_BIAS;
+off += TCG_TARGET_STACK_BIAS;
 #endif
-ts->mem_base = s->frame_temp;
-ts->mem_allocated = 1;
+
+/* If the object was subdivided, assign memory to all the parts. */
+if (ts->base_type != ts->type) {
+int part_size = tcg_type_size(ts->type);
+int part_count = size / part_size;
+
+ts -= ts->temp_subindex;
+for (int i = 0; i < part_count; ++i) {
+ts[i].mem_offset = off + i * part_size;
+ts[i].mem_base = s->frame_temp;
+ts[i].mem_allocated = 1;
+}
+} else {
+ts->mem_offset = off;
+ts->mem_base = s->frame_temp;
+ts->mem_allocated = 1;
+}
 }
 
 /* Assign @reg to @ts, and update reg_to_temp[]. */
-- 
2.34.1

[PATCH v5 23/36] tests/tcg/s390x: Add long-double.c

2023-01-25 Thread Richard Henderson

Acked-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tests/tcg/s390x/long-double.c   | 24 
 tests/tcg/s390x/Makefile.target |  1 +
 2 files changed, 25 insertions(+)
 create mode 100644 tests/tcg/s390x/long-double.c

diff --git a/tests/tcg/s390x/long-double.c b/tests/tcg/s390x/long-double.c
new file mode 100644
index 00..757a6262fd
--- /dev/null
+++ b/tests/tcg/s390x/long-double.c
@@ -0,0 +1,24 @@
+/*
+ * Perform some basic arithmetic with long double, as a sanity check.
+ * With small integral numbers, we can cross-check with integers.
+ */
+
+#include 
+
+int main()
+{
+int i, j;
+
+for (i = 1; i < 5; i++) {
+for (j = 1; j < 5; j++) {
+long double la = (long double)i + j;
+long double lm = (long double)i * j;
+long double ls = (long double)i - j;
+
+assert(la == i + j);
+assert(lm == i * j);
+assert(ls == i - j);
+}
+}
+return 0;
+}
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 79250f31dd..1d454270c0 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -26,6 +26,7 @@ TESTS+=branch-relative-long
 TESTS+=noexec
 TESTS+=div
 TESTS+=clst
+TESTS+=long-double
 
 Z13_TESTS=vistr
 $(Z13_TESTS): CFLAGS+=-march=z13 -O2
-- 
2.34.1

[PATCH v5 33/36] target/s390x: Implement CC_OP_NZ in gen_op_calc_cc

2023-01-25 Thread Richard Henderson

This case is trivial to implement inline.

Signed-off-by: Richard Henderson 
---
Cc: David Hildenbrand 
Cc: Ilya Leoshkevich 
---
 target/s390x/tcg/translate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 0dafa27dab..b8cb21c395 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -625,6 +625,9 @@ static void gen_op_calc_cc(DisasContext *s)
 /* env->cc_op already is the cc value */
 break;
 case CC_OP_NZ:
+tcg_gen_setcondi_i64(TCG_COND_NE, cc_dst, cc_dst, 0);
+tcg_gen_extrl_i64_i32(cc_op, cc_dst);
+break;
 case CC_OP_ABS_64:
 case CC_OP_NABS_64:
 case CC_OP_ABS_32:
-- 
2.34.1

[PATCH v5 15/36] tcg: Add guest load/store primitives for TCGv_i128

2023-01-25 Thread Richard Henderson

These are not yet considering atomicity of the 16-byte value;
this is a direct replacement for the current target code which
uses a pair of 8-byte operations.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu_ldst.h |  10 +++
 include/tcg/tcg-op.h|   2 +
 accel/tcg/cputlb.c  | 112 +
 accel/tcg/user-exec.c   |  66 
 tcg/tcg-op.c| 134 
 5 files changed, 324 insertions(+)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index d0c7c0d5fe..09b55cc0ee 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -220,6 +220,11 @@ uint32_t cpu_ldl_le_mmu(CPUArchState *env, abi_ptr ptr,
 uint64_t cpu_ldq_le_mmu(CPUArchState *env, abi_ptr ptr,
 MemOpIdx oi, uintptr_t ra);
 
+Int128 cpu_ld16_be_mmu(CPUArchState *env, abi_ptr addr,
+   MemOpIdx oi, uintptr_t ra);
+Int128 cpu_ld16_le_mmu(CPUArchState *env, abi_ptr addr,
+   MemOpIdx oi, uintptr_t ra);
+
 void cpu_stb_mmu(CPUArchState *env, abi_ptr ptr, uint8_t val,
  MemOpIdx oi, uintptr_t ra);
 void cpu_stw_be_mmu(CPUArchState *env, abi_ptr ptr, uint16_t val,
@@ -235,6 +240,11 @@ void cpu_stl_le_mmu(CPUArchState *env, abi_ptr ptr, 
uint32_t val,
 void cpu_stq_le_mmu(CPUArchState *env, abi_ptr ptr, uint64_t val,
 MemOpIdx oi, uintptr_t ra);
 
+void cpu_st16_be_mmu(CPUArchState *env, abi_ptr addr, Int128 val,
+ MemOpIdx oi, uintptr_t ra);
+void cpu_st16_le_mmu(CPUArchState *env, abi_ptr addr, Int128 val,
+ MemOpIdx oi, uintptr_t ra);
+
 uint32_t cpu_atomic_cmpxchgb_mmu(CPUArchState *env, target_ulong addr,
  uint32_t cmpv, uint32_t newv,
  MemOpIdx oi, uintptr_t retaddr);
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index c4276767d1..e5f5b63c37 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -845,6 +845,8 @@ void tcg_gen_qemu_ld_i32(TCGv_i32, TCGv, TCGArg, MemOp);
 void tcg_gen_qemu_st_i32(TCGv_i32, TCGv, TCGArg, MemOp);
 void tcg_gen_qemu_ld_i64(TCGv_i64, TCGv, TCGArg, MemOp);
 void tcg_gen_qemu_st_i64(TCGv_i64, TCGv, TCGArg, MemOp);
+void tcg_gen_qemu_ld_i128(TCGv_i128, TCGv, TCGArg, MemOp);
+void tcg_gen_qemu_st_i128(TCGv_i128, TCGv, TCGArg, MemOp);
 
 static inline void tcg_gen_qemu_ld8u(TCGv ret, TCGv addr, int mem_index)
 {
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 4e040a1cb9..e3604ad313 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2187,6 +2187,64 @@ uint64_t cpu_ldq_le_mmu(CPUArchState *env, abi_ptr addr,
 return cpu_load_helper(env, addr, oi, ra, helper_le_ldq_mmu);
 }
 
+Int128 cpu_ld16_be_mmu(CPUArchState *env, abi_ptr addr,
+   MemOpIdx oi, uintptr_t ra)
+{
+MemOp mop = get_memop(oi);
+int mmu_idx = get_mmuidx(oi);
+MemOpIdx new_oi;
+unsigned a_bits;
+uint64_t h, l;
+
+tcg_debug_assert((mop & (MO_BSWAP|MO_SSIZE)) == (MO_BE|MO_128));
+a_bits = get_alignment_bits(mop);
+
+/* Handle CPU specific unaligned behaviour */
+if (addr & ((1 << a_bits) - 1)) {
+cpu_unaligned_access(env_cpu(env), addr, MMU_DATA_LOAD,
+ mmu_idx, ra);
+}
+
+/* Construct an unaligned 64-bit replacement MemOpIdx. */
+mop = (mop & ~(MO_SIZE | MO_AMASK)) | MO_64 | MO_UNALN;
+new_oi = make_memop_idx(mop, mmu_idx);
+
+h = helper_be_ldq_mmu(env, addr, new_oi, ra);
+l = helper_be_ldq_mmu(env, addr + 8, new_oi, ra);
+
+qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R);
+return int128_make128(l, h);
+}
+
+Int128 cpu_ld16_le_mmu(CPUArchState *env, abi_ptr addr,
+   MemOpIdx oi, uintptr_t ra)
+{
+MemOp mop = get_memop(oi);
+int mmu_idx = get_mmuidx(oi);
+MemOpIdx new_oi;
+unsigned a_bits;
+uint64_t h, l;
+
+tcg_debug_assert((mop & (MO_BSWAP|MO_SSIZE)) == (MO_LE|MO_128));
+a_bits = get_alignment_bits(mop);
+
+/* Handle CPU specific unaligned behaviour */
+if (addr & ((1 << a_bits) - 1)) {
+cpu_unaligned_access(env_cpu(env), addr, MMU_DATA_LOAD,
+ mmu_idx, ra);
+}
+
+/* Construct an unaligned 64-bit replacement MemOpIdx. */
+mop = (mop & ~(MO_SIZE | MO_AMASK)) | MO_64 | MO_UNALN;
+new_oi = make_memop_idx(mop, mmu_idx);
+
+l = helper_le_ldq_mmu(env, addr, new_oi, ra);
+h = helper_le_ldq_mmu(env, addr + 8, new_oi, ra);
+
+qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R);
+return int128_make128(l, h);
+}
+
 /*
  * Store Helpers
  */
@@ -2541,6 +2599,60 @@ void cpu_stq_le_mmu(CPUArchState *env, target_ulong 
addr, uint64_t val,
 cpu_store_helper(env, addr, val, oi, retaddr, helper_le_stq_mmu);
 }
 
+void cpu_st16_be_mmu(CPUArchState *env, abi_ptr addr, Int128 val,
+ MemOpIdx oi, uintptr_t ra)
+{
+M

[PATCH v5 11/36] tcg/tci: Add TCG_TARGET_CALL_{RET,ARG}_I128

2023-01-25 Thread Richard Henderson

Fill in the parameters for libffi for Int128.
Adjust the interpreter to allow for 16-byte return values.
Adjust tcg_out_call to record the return value length.

Call parameters are no longer all the same size, so we
cannot reuse the same call_slots array for every function.
Compute it each time now, but only fill in slots required
for the call we're about to make.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h |  3 +++
 tcg/tcg.c| 19 +
 tcg/tci.c| 44 
 tcg/tci/tcg-target.c.inc | 10 -
 4 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 1414ab4d5b..7140a76a73 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -160,10 +160,13 @@ typedef enum {
 #if TCG_TARGET_REG_BITS == 32
 # define TCG_TARGET_CALL_ARG_I32TCG_CALL_ARG_EVEN
 # define TCG_TARGET_CALL_ARG_I64TCG_CALL_ARG_EVEN
+# define TCG_TARGET_CALL_ARG_I128   TCG_CALL_ARG_EVEN
 #else
 # define TCG_TARGET_CALL_ARG_I32TCG_CALL_ARG_NORMAL
 # define TCG_TARGET_CALL_ARG_I64TCG_CALL_ARG_NORMAL
+# define TCG_TARGET_CALL_ARG_I128   TCG_CALL_ARG_NORMAL
 #endif
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 #define HAVE_TCG_QEMU_TB_EXEC
 #define TCG_TARGET_NEED_POOL_LABELS
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 084e3c3a54..4c43fd28ba 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -570,6 +570,22 @@ static GHashTable *helper_table;
 #ifdef CONFIG_TCG_INTERPRETER
 static ffi_type *typecode_to_ffi(int argmask)
 {
+/*
+ * libffi does not support __int128_t, so we have forced Int128
+ * to use the structure definition instead of the builtin type.
+ */
+static ffi_type *ffi_type_i128_elements[3] = {
+&ffi_type_uint64,
+&ffi_type_uint64,
+NULL
+};
+static ffi_type ffi_type_i128 = {
+.size = 16,
+.alignment = __alignof__(Int128),
+.type = FFI_TYPE_STRUCT,
+.elements = ffi_type_i128_elements,
+};
+
 switch (argmask) {
 case dh_typecode_void:
 return &ffi_type_void;
@@ -583,6 +599,8 @@ static ffi_type *typecode_to_ffi(int argmask)
 return &ffi_type_sint64;
 case dh_typecode_ptr:
 return &ffi_type_pointer;
+case dh_typecode_i128:
+return &ffi_type_i128;
 }
 g_assert_not_reached();
 }
@@ -613,6 +631,7 @@ static void init_ffi_layouts(void)
 /* Ignoring the return type, find the last non-zero field. */
 nargs = 32 - clz32(typemask >> 3);
 nargs = DIV_ROUND_UP(nargs, 3);
+assert(nargs <= MAX_CALL_IARGS);
 
 ca = g_malloc0(sizeof(*ca) + nargs * sizeof(ffi_type *));
 ca->cif.rtype = typecode_to_ffi(typemask & 7);
diff --git a/tcg/tci.c b/tcg/tci.c
index eeccdde8bc..022fe9d0f8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -470,12 +470,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tcg_target_ulong regs[TCG_TARGET_NB_REGS];
 uint64_t stack[(TCG_STATIC_CALL_ARGS_SIZE + TCG_STATIC_FRAME_SIZE)
/ sizeof(uint64_t)];
-void *call_slots[TCG_STATIC_CALL_ARGS_SIZE / sizeof(uint64_t)];
 
 regs[TCG_AREG0] = (tcg_target_ulong)env;
 regs[TCG_REG_CALL_STACK] = (uintptr_t)stack;
-/* Other call_slots entries initialized at first use (see below). */
-call_slots[0] = NULL;
 tci_assert(tb_ptr);
 
 for (;;) {
@@ -498,26 +495,26 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 switch (opc) {
 case INDEX_op_call:
-/*
- * Set up the ffi_avalue array once, delayed until now
- * because many TB's do not make any calls. In tcg_gen_callN,
- * we arranged for every real argument to be "left-aligned"
- * in each 64-bit slot.
- */
-if (unlikely(call_slots[0] == NULL)) {
-for (int i = 0; i < ARRAY_SIZE(call_slots); ++i) {
-call_slots[i] = &stack[i];
-}
-}
-
-tci_args_nl(insn, tb_ptr, &len, &ptr);
-
-/* Helper functions may need to access the "return address" */
-tci_tb_ptr = (uintptr_t)tb_ptr;
-
 {
-void **pptr = ptr;
-ffi_call(pptr[1], pptr[0], stack, call_slots);
+void *call_slots[MAX_CALL_IARGS];
+ffi_cif *cif;
+void *func;
+unsigned i, s, n;
+
+tci_args_nl(insn, tb_ptr, &len, &ptr);
+func = ((void **)ptr)[0];
+cif = ((void **)ptr)[1];
+
+n = cif->nargs;
+for (i = s = 0; i < n; ++i) {
+ffi_type *t = cif->arg_types[i];
+call_slots[i] = &stack[s];
+s += DIV_ROUND_UP(t->size, 8);
+}
+
+/* Helper functions may need to access t

[PATCH v5 18/36] target/arm: Use tcg_gen_atomic_cmpxchg_i128 for STXP

2023-01-25 Thread Richard Henderson

Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Message-Id: <20221112042555.2622152-2-richard.hender...@linaro.org>
---
 target/arm/helper-a64.h|   6 ---
 target/arm/helper-a64.c| 104 -
 target/arm/translate-a64.c |  60 -
 3 files changed, 35 insertions(+), 135 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 7b706571bb..94065d1917 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -50,12 +50,6 @@ DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, 
ptr)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
-DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, 
i64)
-DEF_HELPER_FLAGS_4(paired_cmpxchg64_le_parallel, TCG_CALL_NO_WG,
-   i64, env, i64, i64, i64)
-DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, 
i64)
-DEF_HELPER_FLAGS_4(paired_cmpxchg64_be_parallel, TCG_CALL_NO_WG,
-   i64, env, i64, i64, i64)
 DEF_HELPER_5(casp_le_parallel, void, env, i32, i64, i64, i64)
 DEF_HELPER_5(casp_be_parallel, void, env, i32, i64, i64, i64)
 DEF_HELPER_FLAGS_3(advsimd_maxh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 77a8502b6b..7dbdb2c233 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -505,110 +505,6 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, 
uint32_t bytes)
 return crc32c(acc, buf, bytes) ^ 0x;
 }
 
-uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
- uint64_t new_lo, uint64_t new_hi)
-{
-Int128 cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
-Int128 newv = int128_make128(new_lo, new_hi);
-Int128 oldv;
-uintptr_t ra = GETPC();
-uint64_t o0, o1;
-bool success;
-int mem_idx = cpu_mmu_index(env, false);
-MemOpIdx oi0 = make_memop_idx(MO_LEUQ | MO_ALIGN_16, mem_idx);
-MemOpIdx oi1 = make_memop_idx(MO_LEUQ, mem_idx);
-
-o0 = cpu_ldq_le_mmu(env, addr + 0, oi0, ra);
-o1 = cpu_ldq_le_mmu(env, addr + 8, oi1, ra);
-oldv = int128_make128(o0, o1);
-
-success = int128_eq(oldv, cmpv);
-if (success) {
-cpu_stq_le_mmu(env, addr + 0, int128_getlo(newv), oi1, ra);
-cpu_stq_le_mmu(env, addr + 8, int128_gethi(newv), oi1, ra);
-}
-
-return !success;
-}
-
-uint64_t HELPER(paired_cmpxchg64_le_parallel)(CPUARMState *env, uint64_t addr,
-  uint64_t new_lo, uint64_t new_hi)
-{
-Int128 oldv, cmpv, newv;
-uintptr_t ra = GETPC();
-bool success;
-int mem_idx;
-MemOpIdx oi;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_LE | MO_128 | MO_ALIGN, mem_idx);
-
-cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv, oi, ra);
-
-success = int128_eq(oldv, cmpv);
-return !success;
-}
-
-uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
- uint64_t new_lo, uint64_t new_hi)
-{
-/*
- * High and low need to be switched here because this is not actually a
- * 128bit store but two doublewords stored consecutively
- */
-Int128 cmpv = int128_make128(env->exclusive_high, env->exclusive_val);
-Int128 newv = int128_make128(new_hi, new_lo);
-Int128 oldv;
-uintptr_t ra = GETPC();
-uint64_t o0, o1;
-bool success;
-int mem_idx = cpu_mmu_index(env, false);
-MemOpIdx oi0 = make_memop_idx(MO_BEUQ | MO_ALIGN_16, mem_idx);
-MemOpIdx oi1 = make_memop_idx(MO_BEUQ, mem_idx);
-
-o1 = cpu_ldq_be_mmu(env, addr + 0, oi0, ra);
-o0 = cpu_ldq_be_mmu(env, addr + 8, oi1, ra);
-oldv = int128_make128(o0, o1);
-
-success = int128_eq(oldv, cmpv);
-if (success) {
-cpu_stq_be_mmu(env, addr + 0, int128_gethi(newv), oi1, ra);
-cpu_stq_be_mmu(env, addr + 8, int128_getlo(newv), oi1, ra);
-}
-
-return !success;
-}
-
-uint64_t HELPER(paired_cmpxchg64_be_parallel)(CPUARMState *env, uint64_t addr,
-  uint64_t new_lo, uint64_t new_hi)
-{
-Int128 oldv, cmpv, newv;
-uintptr_t ra = GETPC();
-bool success;
-int mem_idx;
-MemOpIdx oi;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_BE | MO_128 | MO_ALIGN, mem_idx);
-
-/*
- * High and low need to be switched here because this is not actually a
- * 128bit store but two doublewords stored consecutively
- */
-cmpv = int128_make128(env->exclusive_high, env->exclusive_val);
-newv = int128_make128(ne

[PATCH v5 31/36] target/s390x: Use Int128 for passing float128

2023-01-25 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
v2: Fix SPEC_in1_x1.
Cc: David Hildenbrand 
Cc: Ilya Leoshkevich 
---
 target/s390x/helper.h| 32 ++--
 target/s390x/tcg/insn-data.h.inc | 30 +--
 target/s390x/tcg/fpu_helper.c| 88 ++--
 target/s390x/tcg/translate.c | 76 ++-
 4 files changed, 121 insertions(+), 105 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index d40aeb471f..bccd3bfca6 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -41,55 +41,55 @@ DEF_HELPER_4(csst, i32, env, i32, i64, i64)
 DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(axb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(seb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(sdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(sxb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(deb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(ddb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(dxb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(meeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
-DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(mxb, TCG_CALL_NO_WG, i128, env, i128, i128)
+DEF_HELPER_FLAGS_3(mxdb, TCG_CALL_NO_WG, i128, env, i128, i64)
 DEF_HELPER_FLAGS_2(ldeb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_4(ldxb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
+DEF_HELPER_FLAGS_3(ldxb, TCG_CALL_NO_WG, i64, env, i128, i32)
 DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i128, env, i64)
 DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i128, env, i64)
 DEF_HELPER_FLAGS_3(ledb, TCG_CALL_NO_WG, i64, env, i64, i32)
-DEF_HELPER_FLAGS_4(lexb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
+DEF_HELPER_FLAGS_3(lexb, TCG_CALL_NO_WG, i64, env, i128, i32)
 DEF_HELPER_FLAGS_3(ceb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(cdb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
-DEF_HELPER_FLAGS_5(cxb, TCG_CALL_NO_WG_SE, i32, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(cxb, TCG_CALL_NO_WG_SE, i32, env, i128, i128)
 DEF_HELPER_FLAGS_3(keb, TCG_CALL_NO_WG, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(kdb, TCG_CALL_NO_WG, i32, env, i64, i64)
-DEF_HELPER_FLAGS_5(kxb, TCG_CALL_NO_WG, i32, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(kxb, TCG_CALL_NO_WG, i32, env, i128, i128)
 DEF_HELPER_3(cgeb, i64, env, i64, i32)
 DEF_HELPER_3(cgdb, i64, env, i64, i32)
-DEF_HELPER_4(cgxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(cgxb, i64, env, i128, i32)
 DEF_HELPER_3(cfeb, i64, env, i64, i32)
 DEF_HELPER_3(cfdb, i64, env, i64, i32)
-DEF_HELPER_4(cfxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(cfxb, i64, env, i128, i32)
 DEF_HELPER_3(clgeb, i64, env, i64, i32)
 DEF_HELPER_3(clgdb, i64, env, i64, i32)
-DEF_HELPER_4(clgxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(clgxb, i64, env, i128, i32)
 DEF_HELPER_3(clfeb, i64, env, i64, i32)
 DEF_HELPER_3(clfdb, i64, env, i64, i32)
-DEF_HELPER_4(clfxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(clfxb, i64, env, i128, i32)
 DEF_HELPER_FLAGS_3(fieb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(fidb, TCG_CALL_NO_WG, i64, env, i64, i32)
-DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i128, env, i64, i64, i32)
+DEF_HELPER_FLAGS_3(fixb, TCG_CALL_NO_WG, i128, env, i128, i32)
 DEF_HELPER_FLAGS_4(maeb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(madb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mseb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(msdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_3(tceb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
-DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i128, i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i128, env, i64, i64)
+DEF_HELPER_FLAGS_2(sqxb, TCG_CALL_NO_WG, i128, env, i128)
 DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
 DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(pka, TCG_CALL_NO_WG, void, env, i64, i64, i32)
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 517a4500ae..893f4b48db 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-dat

Re: [PATCH v4 2/3] async: Add an optional reentrancy guard to the BH API

2023-01-25 Thread Alexander Bulekov

On 230125 1624, Stefan Hajnoczi wrote:
> On Thu, Jan 19, 2023 at 02:03:07AM -0500, Alexander Bulekov wrote:
> > Devices can pass their MemoryReentrancyGuard (from their DeviceState),
> > when creating new BHes. Then, the async API will toggle the guard
> > before/after calling the BH call-back. This prevents bh->mmio reentrancy
> > issues.
> > 
> > Signed-off-by: Alexander Bulekov 
> > ---
> >  docs/devel/multiple-iothreads.txt |  2 ++
> >  include/block/aio.h   | 18 --
> >  include/qemu/main-loop.h  |  7 +--
> >  tests/unit/ptimer-test-stubs.c|  3 ++-
> >  util/async.c  | 12 +++-
> >  util/main-loop.c  |  5 +++--
> >  6 files changed, 39 insertions(+), 8 deletions(-)
> > 
> > diff --git a/docs/devel/multiple-iothreads.txt 
> > b/docs/devel/multiple-iothreads.txt
> > index 343120f2ef..e4fafed9d9 100644
> > --- a/docs/devel/multiple-iothreads.txt
> > +++ b/docs/devel/multiple-iothreads.txt
> > @@ -61,6 +61,7 @@ There are several old APIs that use the main loop 
> > AioContext:
> >   * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
> >   * LEGACY timer_new_ms() - create a timer
> >   * LEGACY qemu_bh_new() - create a BH
> > + * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy 
> > guard
> >   * LEGACY qemu_aio_wait() - run an event loop iteration
> >  
> >  Since they implicitly work on the main loop they cannot be used in code 
> > that
> > @@ -72,6 +73,7 @@ Instead, use the AioContext functions directly (see 
> > include/block/aio.h):
> >   * aio_set_event_notifier() - monitor an event notifier
> >   * aio_timer_new() - create a timer
> >   * aio_bh_new() - create a BH
> > + * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
> >   * aio_poll() - run an event loop iteration
> >  
> >  The AioContext can be obtained from the IOThread using
> > diff --git a/include/block/aio.h b/include/block/aio.h
> > index 0f65a3cc9e..94d661ff7e 100644
> > --- a/include/block/aio.h
> > +++ b/include/block/aio.h
> > @@ -23,6 +23,8 @@
> >  #include "qemu/thread.h"
> >  #include "qemu/timer.h"
> >  #include "block/graph-lock.h"
> > +#include "hw/qdev-core.h"
> > +
> >  
> >  typedef struct BlockAIOCB BlockAIOCB;
> >  typedef void BlockCompletionFunc(void *opaque, int ret);
> > @@ -332,9 +334,11 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, 
> > QEMUBHFunc *cb, void *opaque,
> >   * is opaque and must be allocated prior to its use.
> >   *
> >   * @name: A human-readable identifier for debugging purposes.
> > + * @reentrancy_guard: A guard set when entering a cb to prevent
> > + * device-reentrancy issues
> >   */
> >  QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> > -const char *name);
> > +const char *name, MemReentrancyGuard 
> > *reentrancy_guard);
> >  
> >  /**
> >   * aio_bh_new: Allocate a new bottom half structure
> > @@ -343,7 +347,17 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc 
> > *cb, void *opaque,
> >   * string.
> >   */
> >  #define aio_bh_new(ctx, cb, opaque) \
> > -aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
> > +aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), NULL)
> > +
> > +/**
> > + * aio_bh_new_guarded: Allocate a new bottom half structure with a
> > + * reentrancy_guard
> > + *
> > + * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
> > + * string.
> > + */
> > +#define aio_bh_new_guarded(ctx, cb, opaque, guard) \
> > +aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), guard)
> >  
> >  /**
> >   * aio_notify: Force processing of pending events.
> > diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> > index c25f390696..84d1ce57f0 100644
> > --- a/include/qemu/main-loop.h
> > +++ b/include/qemu/main-loop.h
> > @@ -389,9 +389,12 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int 
> > ms);
> >  
> >  void qemu_fd_register(int fd);
> >  
> > +#define qemu_bh_new_guarded(cb, opaque, guard) \
> > +qemu_bh_new_full((cb), (opaque), (stringify(cb)), guard)
> >  #define qemu_bh_new(cb, opaque) \
> > -qemu_bh_new_full((cb), (opaque), (stringify(cb)))
> > -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
> > +qemu_bh_new_full((cb), (opaque), (stringify(cb)), NULL)
> > +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
> > + MemReentrancyGuard *reentrancy_guard);
> >  void qemu_bh_schedule_idle(QEMUBH *bh);
> >  
> >  enum {
> > diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
> > index f5e75a96b6..24d5413f9d 100644
> > --- a/tests/unit/ptimer-test-stubs.c
> > +++ b/tests/unit/ptimer-test-stubs.c
> > @@ -107,7 +107,8 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, 
> > int attr_mask)
> >  return deadline;
> >  }
> >  
> > -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, c

Re: [PATCH v4 06/36] tcg: Introduce tcg_target_call_oarg_reg

2023-01-25 Thread Richard Henderson


On 1/25/23 11:09, Alex Bennée wrote:

-static const int tcg_target_call_oarg_regs[2] = {
-TCG_REG_R0, TCG_REG_R1
-};
+
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
+{
+tcg_debug_assert(kind == TCG_CALL_RET_NORMAL);
+tcg_debug_assert(slot >= 0 && slot <= 3);
+return TCG_REG_R0 + slot;
+}


So this is now returning allocations of TCG_REG_R0 to TCG_REG_R3?


Yes, should have mentioned in the patch description.  Done.



Do we
have to take care to get things right if slot is ever bigger w.r.t.
tcg_target_reg_alloc_order?


No, reg_alloc_order is optimization for call-saved vs call-clobbered vs call arguments. 
It should not affect correctness at all.  Nor will it ever affect call return -- those 
registers die immediately before the call, and become live with these values immediately 
after the call.



r~

Re: [PATCH qemu 1/1] Remove a stray "@end table" marker

2023-01-25 Thread Richard Henderson


On 1/25/23 16:12, ~gurjeet wrote:

From: Gurjeet Singh 

Signed-off-by: Gurjeet Singh 
---
  docs/system/cpu-models-x86.rst.inc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/system/cpu-models-x86.rst.inc 
b/docs/system/cpu-models-x86.rst.inc
index 7f6368f999..261da6e21d 100644
--- a/docs/system/cpu-models-x86.rst.inc
+++ b/docs/system/cpu-models-x86.rst.inc
@@ -25,7 +25,7 @@ Two ways to configure CPU models with QEMU / KVM
  typically refer to specific generations of hardware released by
  Intel and AMD.  These allow the guest VMs to have a degree of
  isolation from the host CPU, allowing greater flexibility in live
-migrating between hosts with differing hardware.  @end table
+migrating between hosts with differing hardware.
  
  In both cases, it is possible to optionally add or remove individual CPU

  features, to alter what is presented to the guest by default.


Cc: qemu-trivial.

Reviewed-by: Richard Henderson 

r~

Re: ARM: ptw.c:S1_ptw_translate

2023-01-25 Thread Richard Henderson


On 1/25/23 13:27, Sid Manning wrote:

On 7.2 VA to PA mappings are not consistent:

  Thread 10 "vp" hit Breakpoint 1, tlb_add_large_page (env=0xeb7ac0, 
mmu_idx=0x2, vaddr=0xff809977f000, size=0x1000) at 
../../../../../../src/qemu/accel/tcg/cputlb.c:1090
tlb_set_page_full: vaddr=ff809977f000 paddr=0x000f35f32000 prot=3 idx=2
Thread 14 "vp" hit Breakpoint 1, tlb_add_large_page (env=0xf185e0, mmu_idx=0x2, 
vaddr=0xff809977f000, size=0x1000) at 
../../../../../../src/qemu/accel/tcg/cputlb.c:1090
tlb_set_page_full: vaddr=ff809977f000 paddr=0x000f42a16000 prot=3 idx=2

Using the monitor to view the memory I see that on 7.2 the first entry appears 
to be accurate.
xp /2x 0x000f35f32018
000f35f32018: 0x9977eff0 0xff80

And the second is not:
xp /2x 0x000f42a16018
000f42a16018: 0x 0x

7.2 is calling arm_cpu_tlb_fill more often now and I don't know if that is 
related to the problem I'm seeing or a natural result of the changes made to 
S1_ptw_translate between the releases.


Well, there are more calls to tlb_fill, since we're now also using tlb_fill for the stage2 
translation, and for the translation tables themselves.  It's possible that there's a bug 
in the stage2 tlb flushing that wouldn't have been visible before (and also not visible 
from the monitor, since that avoids tlb_fill entirely).


While it would still be handier to have a test case, the next best thing may be for me to 
add some tracepoints within ptw.c.  I'll work on that later today or tomorrow.



r~

Re: [QEMU][PATCH v4 09/10] hw/arm: introduce xenpvh machine

2023-01-25 Thread Vikram Garhwal


Hi Stefano,

On 1/25/23 2:20 PM, Stefano Stabellini wrote:

On Wed, 25 Jan 2023, Vikram Garhwal wrote:

Add a new machine xenpvh which creates a IOREQ server to register/connect with
Xen Hypervisor.

Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
TPM emulator and connects to swtpm running on host machine via chardev socket
and support TPM functionalities for a guest domain.

Extra command line for aarch64 xenpvh QEMU to connect to swtpm:
 -chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
 -tpmdev emulator,id=tpm0,chardev=chrtpm \
 -machine tpm-base-addr=0x0c00 \

swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
provides access to TPM functionality over socket, chardev and CUSE interface.
Github repo: https://github.com/stefanberger/swtpm
Example for starting swtpm on host machine:
 mkdir /tmp/vtpm2
 swtpm socket --tpmstate dir=/tmp/vtpm2 \
 --ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
  docs/system/arm/xenpvh.rst|  34 +++
  docs/system/target-arm.rst|   1 +
  hw/arm/meson.build|   2 +
  hw/arm/xen_arm.c  | 184 ++
  include/hw/arm/xen_arch_hvm.h |   9 ++
  include/hw/xen/arch_hvm.h |   2 +
  6 files changed, 232 insertions(+)
  create mode 100644 docs/system/arm/xenpvh.rst
  create mode 100644 hw/arm/xen_arm.c
  create mode 100644 include/hw/arm/xen_arch_hvm.h

diff --git a/docs/system/arm/xenpvh.rst b/docs/system/arm/xenpvh.rst
new file mode 100644
index 00..e1655c7ab8
--- /dev/null
+++ b/docs/system/arm/xenpvh.rst
@@ -0,0 +1,34 @@
+XENPVH (``xenpvh``)
+=
+This machine creates a IOREQ server to register/connect with Xen Hypervisor.
+
+When TPM is enabled, this machine also creates a tpm-tis-device at a user input
+tpm base address, adds a TPM emulator and connects to a swtpm application
+running on host machine via chardev socket. This enables xenpvh to support TPM
+functionalities for a guest domain.
+
+More information about TPM use and installing swtpm linux application can be
+found at: docs/specs/tpm.rst.
+
+Example for starting swtpm on host machine:
+.. code-block:: console
+
+mkdir /tmp/vtpm2
+swtpm socket --tpmstate dir=/tmp/vtpm2 \
+--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &
+
+Sample QEMU xenpvh commands for running and connecting with Xen:
+.. code-block:: console
+
+qemu-system-aarch64 -xen-domid 1 \
+-chardev socket,id=libxl-cmd,path=qmp-libxl-1,server=on,wait=off \
+-mon chardev=libxl-cmd,mode=control \
+-chardev socket,id=libxenstat-cmd,path=qmp-libxenstat-1,server=on,wait=off 
\
+-mon chardev=libxenstat-cmd,mode=control \
+-xen-attach -name guest0 -vnc none -display none -nographic \
+-machine xenpvh -m 1301 \
+-chardev socket,id=chrtpm,path=tmp/vtpm2/swtpm-sock \
+-tpmdev emulator,id=tpm0,chardev=chrtpm -machine tpm-base-addr=0x0C00
+
+In above QEMU command, last two lines are for connecting xenpvh QEMU to swtpm
+via chardev socket.
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index 91ebc26c6d..af8d7c77d6 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -106,6 +106,7 @@ undocumented; you can get a complete list by running
 arm/stm32
 arm/virt
 arm/xlnx-versal-virt
+   arm/xenpvh
  
  Emulated CPU architecture support

  =
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index b036045603..06bddbfbb8 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -61,6 +61,8 @@ arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: 
files('fsl-imx7.c', 'mcimx7d-sabre.
  arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
  arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 
'mcimx6ul-evk.c'))
  arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
+arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
+arm_ss.add_all(xen_ss)
  
  softmmu_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))

  softmmu_ss.add(when: 'CONFIG_EXYNOS4', if_true: files('exynos4_boards.c'))
diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
new file mode 100644
index 00..12b19e3609
--- /dev/null
+++ b/hw/arm/xen_arm.c
@@ -0,0 +1,184 @@
+/*
+ * QEMU ARM Xen PV Machine

^ PVH



+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be

Re: [QEMU][PATCH v4 01/10] hw/i386/xen/: move xen-mapcache.c to hw/xen/

2023-01-25 Thread Vikram Garhwal


Hi Philippe,

On 1/25/23 2:59 PM, Philippe Mathieu-Daudé wrote:

On 25/1/23 09:53, Vikram Garhwal wrote:
xen-mapcache.c contains common functions which can be used for 
enabling Xen on
aarch64 with IOREQ handling. Moving it out from hw/i386/xen to hw/xen 
to make it

accessible for both aarch64 and x86.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
  hw/i386/meson.build  | 1 +
  hw/i386/xen/meson.build  | 1 -
  hw/i386/xen/trace-events | 5 -
  hw/xen/meson.build   | 4 
  hw/xen/trace-events  | 5 +
  hw/{i386 => }/xen/xen-mapcache.c | 0
  6 files changed, 10 insertions(+), 6 deletions(-)
  rename hw/{i386 => }/xen/xen-mapcache.c (100%)

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3..cfdbfdcbcb 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -33,5 +33,6 @@ subdir('kvm')
  subdir('xen')
    i386_ss.add_all(xenpv_ss)
+i386_ss.add_all(xen_ss)
    hw_arch += {'i386': i386_ss}
diff --git a/hw/i386/xen/meson.build b/hw/i386/xen/meson.build
index be84130300..2fcc46e6ca 100644
--- a/hw/i386/xen/meson.build
+++ b/hw/i386/xen/meson.build
@@ -1,6 +1,5 @@
  i386_ss.add(when: 'CONFIG_XEN', if_true: files(
    'xen-hvm.c',
-  'xen-mapcache.c',
    'xen_apic.c',
    'xen_platform.c',
    'xen_pvdevice.c',
diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index 5d6be61090..a0c89d91c4 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -21,8 +21,3 @@ xen_map_resource_ioreq(uint32_t id, void *addr) 
"id: %u addr: %p"
  cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, 
uint32_t size, uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u 
data=0x%x"
  cpu_ioreq_config_write(void *req, uint32_t sbdf, uint32_t reg, 
uint32_t size, uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u 
data=0x%x"

  -# xen-mapcache.c
-xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
-xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
-xen_map_cache_return(void* ptr) "%p"
-
diff --git a/hw/xen/meson.build b/hw/xen/meson.build
index ae0ace3046..19d0637c46 100644
--- a/hw/xen/meson.build
+++ b/hw/xen/meson.build
@@ -22,3 +22,7 @@ else
  endif
    specific_ss.add_all(when: ['CONFIG_XEN', xen], if_true: 
xen_specific_ss)

+
+xen_ss = ss.source_set()
+
+xen_ss.add(when: 'CONFIG_XEN', if_true: files('xen-mapcache.c'))


Can't we add it to softmmu_ss directly?

I tried adding this in softmmu_ss as per your comment in v2. But it 
fails with following error:
//mnt/qemu_ioreq_upstream/include/sysemu/xen-mapcache.h:16:8: error: 
attempt to use poisoned "CONFIG_XEN"//

// #ifdef CONFIG_XEN//
//    ^//
//../hw/xen/xen-mapcache.c:106:6: error: redefinition of 
'xen_map_cache_init'//

/

/ void xen_map_cache_init(phys_offset_to_gaddr_t f, void *opaque)/

I couldn't fix it in easy way.


diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index 3da3fd8348..2c8f238f42 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -41,3 +41,8 @@ xs_node_vprintf(char *path, char *value) "%s %s"
  xs_node_vscanf(char *path, char *value) "%s %s"
  xs_node_watch(char *path) "%s"
  xs_node_unwatch(char *path) "%s"
+
+# xen-mapcache.c
+xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
+xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
+xen_map_cache_return(void* ptr) "%p"
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
similarity index 100%
rename from hw/i386/xen/xen-mapcache.c
rename to hw/xen/xen-mapcache.c

Re: [PATCH v8 09/13] vfio/migration: Implement VFIO migration protocol v2

2023-01-25 Thread Alex Williamson

On Sun, 22 Jan 2023 12:31:33 +0200
Avihai Horon  wrote:

> On 21/01/2023 1:07, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, 16 Jan 2023 16:11:31 +0200
> > Avihai Horon  wrote:
> >  
> >> Implement the basic mandatory part of VFIO migration protocol v2.
> >> This includes all functionality that is necessary to support
> >> VFIO_MIGRATION_STOP_COPY part of the v2 protocol.
> >>
> >> The two protocols, v1 and v2, will co-exist and in the following patches
> >> v1 protocol code will be removed.
> >>
> >> There are several main differences between v1 and v2 protocols:
> >> - VFIO device state is now represented as a finite state machine instead
> >>of a bitmap.
> >>
> >> - Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
> >>ioctl and normal read() and write() instead of the migration region.
> >>
> >> - Pre-copy is made optional in v2 protocol. Support for pre-copy will be
> >>added later on.
> >>
> >> Detailed information about VFIO migration protocol v2 and its difference
> >> compared to v1 protocol can be found here [1].
> >>
> >> [1]
> >> https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/
> >>
> >> Signed-off-by: Avihai Horon 
> >> Reviewed-by: Cédric Le Goater 
> >> ---
> >>   include/hw/vfio/vfio-common.h |   5 +
> >>   hw/vfio/common.c  |  19 +-
> >>   hw/vfio/migration.c   | 455 +++---
> >>   hw/vfio/trace-events  |   7 +
> >>   4 files changed, 447 insertions(+), 39 deletions(-)
> >>
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index bbaf72ba00..6d7d850bfe 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -66,6 +66,11 @@ typedef struct VFIOMigration {
> >>   int vm_running;
> >>   Notifier migration_state;
> >>   uint64_t pending_bytes;
> >> +uint32_t device_state;
> >> +int data_fd;
> >> +void *data_buffer;
> >> +size_t data_buffer_size;
> >> +bool v2;
> >>   } VFIOMigration;
> >>
> >>   typedef struct VFIOAddressSpace {
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 550b2d7ded..dcaa77d2a8 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -355,10 +355,18 @@ static bool 
> >> vfio_devices_all_dirty_tracking(VFIOContainer *container)
> >>   return false;
> >>   }
> >>
> >> -if ((vbasedev->pre_copy_dirty_page_tracking == 
> >> ON_OFF_AUTO_OFF) &&
> >> +if (!migration->v2 &&
> >> +(vbasedev->pre_copy_dirty_page_tracking == 
> >> ON_OFF_AUTO_OFF) &&
> >>   (migration->device_state_v1 & 
> >> VFIO_DEVICE_STATE_V1_RUNNING)) {
> >>   return false;
> >>   }
> >> +
> >> +if (migration->v2 &&
> >> +(vbasedev->pre_copy_dirty_page_tracking == 
> >> ON_OFF_AUTO_OFF) &&
> >> +(migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> >> + migration->device_state == 
> >> VFIO_DEVICE_STATE_RUNNING_P2P)) {
> >> +return false;
> >> +}
> >>   }
> >>   }
> >>   return true;
> >> @@ -385,7 +393,14 @@ static bool 
> >> vfio_devices_all_running_and_mig_active(VFIOContainer *container)
> >>   return false;
> >>   }
> >>
> >> -if (migration->device_state_v1 & 
> >> VFIO_DEVICE_STATE_V1_RUNNING) {
> >> +if (!migration->v2 &&
> >> +migration->device_state_v1 & 
> >> VFIO_DEVICE_STATE_V1_RUNNING) {
> >> +continue;
> >> +}
> >> +
> >> +if (migration->v2 &&
> >> +(migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> >> + migration->device_state == 
> >> VFIO_DEVICE_STATE_RUNNING_P2P)) {
> >>   continue;
> >>   } else {
> >>   return false;
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index 9df859f4d3..f19ada0f4f 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -10,6 +10,7 @@
> >>   #include "qemu/osdep.h"
> >>   #include "qemu/main-loop.h"
> >>   #include "qemu/cutils.h"
> >> +#include "qemu/units.h"
> >>   #include 
> >>   #include 
> >>
> >> @@ -44,8 +45,103 @@
> >>   #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
> >>   #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
> >>
> >> +/*
> >> + * This is an arbitrary size based on migration of mlx5 devices, where 
> >> typically
> >> + * total device migration size is on the order of 100s of MB. Testing with
> >> + * larger values, e.g. 128MB and 1GB, did not show a performance 
> >> improvement.
> >> + */
> >> +#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
> >> +
> >>   static int64_t bytes_transferred;
> >>
> >> +static const char *mig_state_to_str(enum vfio_device_mig_state s

Re: [PATCH v2 34/35] cpu-exec: assert that plugin_mem_cbs is NULL after execution

2023-01-25 Thread Philippe Mathieu-Daudé


On 24/1/23 19:01, Alex Bennée wrote:

From: Emilio Cota 

Fixes: #1381

Signed-off-by: Emilio Cota 
Message-Id: <20230108165107.62488-1-c...@braap.org>
[AJB: manually applied follow-up fix]
Signed-off-by: Alex Bennée 
---
  include/qemu/plugin.h | 4 
  accel/tcg/cpu-exec.c  | 2 ++
  2 files changed, 6 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

RE: ARM: ptw.c:S1_ptw_translate

2023-01-25 Thread Sid Manning



> -Original Message-
> From: qemu-devel-bounces+sidneym=quicinc@nongnu.org  devel-bounces+sidneym=quicinc@nongnu.org> On Behalf Of Sid
> Manning
> Sent: Thursday, January 5, 2023 7:08 PM
> To: 'Richard Henderson' ; qemu-
> de...@nongnu.org
> Cc: phi...@linaro.org; Mark Burton 
> Subject: RE: ARM: ptw.c:S1_ptw_translate
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
> 
> > -Original Message-
> > From: Richard Henderson 
> > Sent: Wednesday, January 4, 2023 11:42 PM
> > To: Sid Manning ; qemu-devel@nongnu.org
> > Cc: phi...@linaro.org; Mark Burton 
> > Subject: Re: ARM: ptw.c:S1_ptw_translate
> >
> > WARNING: This email originated from outside of Qualcomm. Please be
> > wary of any links or attachments, and do not enable macros.
> >
> > On 1/4/23 08:55, Sid Manning wrote:
> > > ptw.c:S1_ptw_translate
> > >
> > > After migrating to v7.2.0, an issue was found where we were not
> > > getting the correct virtual address from a load insn.  Reading the
> > > address used in the load insn from the debugger resulted in the
> > > execution of the insn getting the correct value but simply stepping
> > > over the
> > insn did not.
> > >
> > > This is the instruction:
> > >
> > > ldr   x0, [x1, #24]
> > >
> > > The debug path varies based on the regime and if regime is NOT stage
> > >two out_phys is set to addr if the regime is stage 2 then out_phys is
> > >set to s2.f.phys_addr.  In the non-debug path out_phys is always set
> > >to full- phys_addr.
> > >
> > > I got around this by only using full->phys_addr if regime_is_stage2
> > > was
> > true:
> > >
> > > diff --git a/target/arm/ptw.c b/target/arm/ptw.c
> > >
> > > index 3745ac9723..87bc6754a6 100644
> > >
> > > --- a/target/arm/ptw.c
> > >
> > > +++ b/target/arm/ptw.c
> > >
> > > @@ -266,7 +266,12 @@ static bool S1_ptw_translate(CPUARMState
> *env,
> > > S1Translate *ptw,
> > >
> > >   if (unlikely(flags & TLB_INVALID_MASK)) {
> > >
> > >   goto fail;
> > >
> > >   }
> > >
> > > -ptw->out_phys = full->phys_addr;
> > >
> > > +
> > >
> > > +if (regime_is_stage2(s2_mmu_idx)) {
> > >
> > > +ptw->out_phys = full->phys_addr;
> > >
> > > +} else {
> > >
> > > +ptw->out_phys = addr;
> > >
> > > +}
> > >
> > >   ptw->out_rw = full->prot & PAGE_WRITE;
> > >
> > >   pte_attrs = full->pte_attrs;
> > >
> > >   pte_secure = full->attrs.secure;
> > >
> > > This change got me the answer I wanted but I’m not familiar enough
> > > with the code to know if this is correct or not.
> >
> > This is incorrect.  If you are getting the wrong value here, then
> > something has gone wrong elsewhere, as the s2_mmu_idx result was
> logged.
> >
> > Do you have a test case you can share?
> 
> This happens while booting QNX so I can't share it.  I don't have the source
> code either just the object code.  A number of cores are being started and
> the address happens to be what will eventually become the stack.
> 
> I'll see what I can come up with to better characterize is problem.

I have not been able to isolate the cause of this issue.  I have pulled in 
recent updates to ptw.c/cputlb.c/tlb_helper.c to see if one of the recent 
changes would help but they have not.

I'm running the same QNX images between a version of QEMU based on 7.1 and 
another based on 7.2.  QEMU has been patched to allow it to integrate into a 
System-C Virtual Platform but this problem seems to be contained in QEMU.

I defined DEBUG_TLB cputlb.c and set a breakpoint in tlb_add_large_page.  

On 7.1 I see consistent PA to VA mappings:

Thread 10 "vp" hit Breakpoint 1, tlb_add_large_page (env=0xeb3360, mmu_idx=0xa, 
vaddr=0xff809975f000, size=0x1000) at 
../../../../../../src/qemu/accel/tcg/cputlb.c:1079
tlb_set_page_with_attrs: vaddr=ff809975f000 paddr=0x000f35f3a000 prot=3 
idx=10
Thread 13 "vp" hit Breakpoint 1, tlb_add_large_page (env=0xee26e0, mmu_idx=0xa, 
vaddr=0xff809975f000, size=0x1000) at 
../../../../../../src/qemu/accel/tcg/cputlb.c:1079
tlb_set_page_with_attrs: vaddr=ff809975f000 paddr=0x000f35f3a000 prot=3 
idx=10


On 7.2 VA to PA mappings are not consistent:

 Thread 10 "vp" hit Breakpoint 1, tlb_add_large_page (env=0xeb7ac0, 
mmu_idx=0x2, vaddr=0xff809977f000, size=0x1000) at 
../../../../../../src/qemu/accel/tcg/cputlb.c:1090
tlb_set_page_full: vaddr=ff809977f000 paddr=0x000f35f32000 prot=3 idx=2
Thread 14 "vp" hit Breakpoint 1, tlb_add_large_page (env=0xf185e0, mmu_idx=0x2, 
vaddr=0xff809977f000, size=0x1000) at 
../../../../../../src/qemu/accel/tcg/cputlb.c:1090
tlb_set_page_full: vaddr=ff809977f000 paddr=0x000f42a16000 prot=3 idx=2

Using the monitor to view the memory I see that on 7.2 the first entry appears 
to be accurate.
xp /2x 0x000f35f32018
000f35f32018: 0x9977eff0 0xff80

And the second is not:
xp /2x 0x

Re: [PATCH v2 35/35] plugins: Iterate on cb_lists in qemu_plugin_user_exit

2023-01-25 Thread Philippe Mathieu-Daudé


On 24/1/23 19:01, Alex Bennée wrote:

From: Richard Henderson 

Rather than iterate over all plugins for all events,
iterate over plugins that have registered a given event.

Signed-off-by: Richard Henderson 
Message-Id: <20230117035701.168514-4-richard.hender...@linaro.org>


(Missing Alex's S-o-b)


---
  plugins/core.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 23/35] semihosting: Write back semihosting data before completion callback

2023-01-25 Thread Philippe Mathieu-Daudé


On 24/1/23 19:01, Alex Bennée wrote:

From: Keith Packard 

'lock_user' allocates a host buffer to shadow a target buffer,
'unlock_user' copies that host buffer back to the target and frees the
host memory. If the completion function uses the target buffer, it
must be called after unlock_user to ensure the data are present.

This caused the arm-compatible TARGET_SYS_READC to fail as the
completion function, common_semi_readc_cb, pulled data from the target
buffer which would not have been gotten the console data.

I decided to fix all instances of this pattern instead of just the
console_read function to make things consistent and potentially fix
bugs in other cases.

Signed-off-by: Keith Packard 
Reviewed-by: Richard Henderson 
Message-Id: <20221012014822.1242170-1-kei...@keithp.com>
Signed-off-by: Alex Bennée 
---
  semihosting/syscalls.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [XEN PATCH v2 0/3] Configure qemu upstream correctly by default for igd-passthru

2023-01-25 Thread Chuck Zmudzinski

On 1/25/2023 6:37 AM, Anthony PERARD wrote:
> On Tue, Jan 10, 2023 at 02:32:01AM -0500, Chuck Zmudzinski wrote:
> > I call attention to the commit message of the first patch which points
> > out that using the "pc" machine and adding the xen platform device on
> > the qemu upstream command line is not functionally equivalent to using
> > the "xenfv" machine which automatically adds the xen platform device
> > earlier in the guest creation process. As a result, there is a noticeable
> > reduction in the performance of the guest during startup with the "pc"
> > machne type even if the xen platform device is added via the qemu
> > command line options, although eventually both Linux and Windows guests
> > perform equally well once the guest operating system is fully loaded.
>
> There shouldn't be a difference between "xenfv" machine or using the
> "pc" machine while adding the "xen-platform" device, at least with
> regards to access to disk or network.
>
> The first patch of the series is using the "pc" machine without any
> "xen-platform" device, so we can't compare startup performance based on
> that.
>
> > Specifically, startup time is longer and neither the grub vga drivers
> > nor the windows vga drivers in early startup perform as well when the
> > xen platform device is added via the qemu command line instead of being
> > added immediately after the other emulated i440fx pci devices when the
> > "xenfv" machine type is used.
>
> The "xen-platform" device is mostly an hint to a guest that they can use
> pv-disk and pv-network devices. I don't think it would change anything
> with regards to graphics.
>
> > For example, when using the "pc" machine, which adds the xen platform
> > device using a command line option, the Linux guest could not display
> > the grub boot menu at the native resolution of the monitor, but with the
> > "xenfv" machine, the grub menu is displayed at the full 1920x1080
> > native resolution of the monitor for testing. So improved startup
> > performance is an advantage for the patch for qemu.
>
> I've just found out that when doing IGD passthrough, both machine
> "xenfv" and "pc" are much more different than I though ... :-(
> pc_xen_hvm_init_pci() in QEMU changes the pci-host device, which in
> turns copy some informations from the real host bridge.
> I guess this new host bridge help when the firmware setup the graphic
> for grub.

I am surprised it works at all with the "pc" machine, that is, without the
TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE that is used in the "xenfv"
machine. This only seems to affect the legacy grub vga driver and the legacy
Windows vga driver during early boot. Still, I much prefer keeping the "xenfv"
machine for Intel IGD than this workaround of patching libxl to use the "pc"
machine.

>
> > I also call attention to the last point of the commit message of the
> > second patch and the comments for reviewers section of the second patch.
> > This approach, as opposed to fixing this in qemu upstream, makes
> > maintaining the code in libxl__build_device_model_args_new more
> > difficult and therefore increases the chances of problems caused by
> > coding errors and typos for users of libxl. So that is another advantage
> > of the patch for qemu.
>
> We would just needs to use a different approach in libxl when generating
> the command line. We could probably avoid duplications. I was hopping to
> have patch series for libxl that would change the machine used to start
> using "pc" instead of "xenfv" for all configurations, but based on the
> point above (IGD specific change to "xenfv"), then I guess we can't
> really do anything from libxl to fix IGD passthrough.

We could switch to the "pc" machine, but we would need to patch
qemu also so the "pc" machine uses the special device the "xenfv"
machine uses (TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE).
So it is simpler to just use the other patch to qemu and not patch
libxl at all to fix this.

>
> > OTOH, fixing this in qemu causes newer qemu versions to behave
> > differently than previous versions of qemu, which the qemu community
> > does not like, although they seem OK with the other patch since it only
> > affects qemu "xenfv" machine types, but they do not want the patch to
> > affect toolstacks like libvirt that do not use qemu upstream's
> > autoconfiguration options as much as libxl does, and, of course, libvirt
> > can manage qemu "xenfv" machines so exising "xenfv" guests configured
> > manually by libvirt could be adversely affected by the patch to qemu,
> > but only if those same guests are also configured for igd-passthrough,
> > which is likely a very small number of possibly affected libvirt users
> > of qemu.
> > 
> > A year or two ago I tried to configure guests for pci passthrough on xen
> > using libvirt's tool to convert a libxl xl.cfg file to libvirt xml. It
> > could not convert an xl.cfg file with a configuration item
> > pci = [ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...] for pci passthrough.

Re: [PATCH 4/4] hw/ppc/e500.c: Attach eSDHC unimplemented region to ccsr_addr_space

2023-01-25 Thread Philippe Mathieu-Daudé


On 25/1/23 14:00, Bernhard Beschow wrote:

Makes the unimplemented region move together with the CCSR address space
if moved by a bootloader. Moving the CCSR address space isn't
implemented yet but this patch is a preparation for it.

Signed-off-by: Bernhard Beschow 
---
  hw/ppc/e500.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 3/3] util/userfaultfd: Support /dev/userfaultfd

2023-01-25 Thread Philippe Mathieu-Daudé


On 25/1/23 23:40, Peter Xu wrote:

Teach QEMU to use /dev/userfaultfd when it existed and fallback to the
system call if either it's not there or doesn't have enough permission.

Firstly, as long as the app has permission to access /dev/userfaultfd, it
always have the ability to trap kernel faults which QEMU mostly wants.
Meanwhile, in some context (e.g. containers) the userfaultfd syscall can be
forbidden, so it can be the major way to use postcopy in a restricted
environment with strict seccomp setup.

Signed-off-by: Peter Xu 
---
  util/trace-events  |  1 +
  util/userfaultfd.c | 36 
  2 files changed, 37 insertions(+)

diff --git a/util/trace-events b/util/trace-events
index c8f53d7d9f..16f78d8fe5 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -93,6 +93,7 @@ qemu_vfio_region_info(const char *desc, uint64_t region_ofs, 
uint64_t region_siz
  qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size, int ofs, void *host) "map 
region bar#%d addr 0x%"PRIx64" size 0x%"PRIx64" ofs 0x%x host %p"
  
  #userfaultfd.c

+uffd_detect_open_mode(int mode) "%d"
  uffd_query_features_nosys(int err) "errno: %i"
  uffd_query_features_api_failed(int err) "errno: %i"
  uffd_create_fd_nosys(int err) "errno: %i"
diff --git a/util/userfaultfd.c b/util/userfaultfd.c
index 9845a2ec81..360ecf8084 100644
--- a/util/userfaultfd.c
+++ b/util/userfaultfd.c
@@ -18,10 +18,46 @@
  #include 
  #include 
  #include 
+#include 
+
+typedef enum {
+UFFD_UNINITIALIZED = 0,
+UFFD_USE_DEV_PATH,
+UFFD_USE_SYSCALL,
+} uffd_open_mode;
+
+static uffd_open_mode open_mode;


'open_mode' could be reduced to uffd_detect_open_mode()'s
scope.


+static int uffd_dev;
+
+static uffd_open_mode uffd_detect_open_mode(void)
+{
+if (open_mode == UFFD_UNINITIALIZED) {
+/*
+ * Make /dev/userfaultfd the default approach because it has better
+ * permission controls, meanwhile allows kernel faults without any
+ * privilege requirement (e.g. SYS_CAP_PTRACE).
+ */
+uffd_dev = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
+if (uffd_dev >= 0) {
+open_mode = UFFD_USE_DEV_PATH;
+} else {
+/* Fallback to the system call */
+open_mode = UFFD_USE_SYSCALL;
+}
+trace_uffd_detect_open_mode(open_mode);
+}
+
+return open_mode;


If 'open_mode' isn't relevant, this function could return uffd_dev/-1 
instead. Not really an improvement :)


Reviewed-by: Philippe Mathieu-Daudé 


+}
  
  int uffd_open(int flags)

  {
  #if defined(__linux__) && defined(__NR_userfaultfd)
+if (uffd_detect_open_mode() == UFFD_USE_DEV_PATH) {
+assert(uffd_dev >= 0);
+return ioctl(uffd_dev, USERFAULTFD_IOC_NEW, flags);
+}
+
  return syscall(__NR_userfaultfd, flags);
  #else
  return -EINVAL;

Re: [PATCH 2/3] util/userfaultfd: Add uffd_open()

2023-01-25 Thread Philippe Mathieu-Daudé


On 25/1/23 23:40, Peter Xu wrote:

Add a helper to create the uffd handle.

Signed-off-by: Peter Xu 
---
  include/qemu/userfaultfd.h   |  1 +
  migration/postcopy-ram.c | 11 +--
  tests/qtest/migration-test.c |  3 ++-
  util/userfaultfd.c   | 13 +++--
  4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/include/qemu/userfaultfd.h b/include/qemu/userfaultfd.h
index 6b74f92792..a19a05d5f7 100644
--- a/include/qemu/userfaultfd.h
+++ b/include/qemu/userfaultfd.h
@@ -17,6 +17,7 @@
  #include "exec/hwaddr.h"
  #include 
  
+int uffd_open(int flags);


Preferably documenting what this function returns:
Reviewed-by: Philippe Mathieu-Daudé

Re: [QEMU][PATCH v4 01/10] hw/i386/xen/: move xen-mapcache.c to hw/xen/

2023-01-25 Thread Philippe Mathieu-Daudé


On 25/1/23 09:53, Vikram Garhwal wrote:

xen-mapcache.c contains common functions which can be used for enabling Xen on
aarch64 with IOREQ handling. Moving it out from hw/i386/xen to hw/xen to make it
accessible for both aarch64 and x86.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
  hw/i386/meson.build  | 1 +
  hw/i386/xen/meson.build  | 1 -
  hw/i386/xen/trace-events | 5 -
  hw/xen/meson.build   | 4 
  hw/xen/trace-events  | 5 +
  hw/{i386 => }/xen/xen-mapcache.c | 0
  6 files changed, 10 insertions(+), 6 deletions(-)
  rename hw/{i386 => }/xen/xen-mapcache.c (100%)

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3..cfdbfdcbcb 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -33,5 +33,6 @@ subdir('kvm')
  subdir('xen')
  
  i386_ss.add_all(xenpv_ss)

+i386_ss.add_all(xen_ss)
  
  hw_arch += {'i386': i386_ss}

diff --git a/hw/i386/xen/meson.build b/hw/i386/xen/meson.build
index be84130300..2fcc46e6ca 100644
--- a/hw/i386/xen/meson.build
+++ b/hw/i386/xen/meson.build
@@ -1,6 +1,5 @@
  i386_ss.add(when: 'CONFIG_XEN', if_true: files(
'xen-hvm.c',
-  'xen-mapcache.c',
'xen_apic.c',
'xen_platform.c',
'xen_pvdevice.c',
diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index 5d6be61090..a0c89d91c4 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -21,8 +21,3 @@ xen_map_resource_ioreq(uint32_t id, void *addr) "id: %u addr: 
%p"
  cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, uint32_t 
data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
  cpu_ioreq_config_write(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, uint32_t 
data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
  
-# xen-mapcache.c

-xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
-xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
-xen_map_cache_return(void* ptr) "%p"
-
diff --git a/hw/xen/meson.build b/hw/xen/meson.build
index ae0ace3046..19d0637c46 100644
--- a/hw/xen/meson.build
+++ b/hw/xen/meson.build
@@ -22,3 +22,7 @@ else
  endif
  
  specific_ss.add_all(when: ['CONFIG_XEN', xen], if_true: xen_specific_ss)

+
+xen_ss = ss.source_set()
+
+xen_ss.add(when: 'CONFIG_XEN', if_true: files('xen-mapcache.c'))


Can't we add it to softmmu_ss directly?


diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index 3da3fd8348..2c8f238f42 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -41,3 +41,8 @@ xs_node_vprintf(char *path, char *value) "%s %s"
  xs_node_vscanf(char *path, char *value) "%s %s"
  xs_node_watch(char *path) "%s"
  xs_node_unwatch(char *path) "%s"
+
+# xen-mapcache.c
+xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
+xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
+xen_map_cache_return(void* ptr) "%p"
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
similarity index 100%
rename from hw/i386/xen/xen-mapcache.c
rename to hw/xen/xen-mapcache.c

Re: [PATCH v4 34/36] target/i386: Split out gen_cmpxchg8b, gen_cmpxchg16b

2023-01-25 Thread Philippe Mathieu-Daudé


On 8/1/23 03:37, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  target/i386/tcg/translate.c | 48 -
  1 file changed, 31 insertions(+), 17 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd

2023-01-25 Thread Peter Xu

The new /dev/userfaultfd handle is superior to the system call with a
better permission control and also works for a restricted seccomp
environment.

The new device was only introduced in v6.1 so we need a header update.

Please have a look, thanks.

Peter Xu (3):
  linux-headers: Update to v6.1
  util/userfaultfd: Add uffd_open()
  util/userfaultfd: Support /dev/userfaultfd

 include/qemu/userfaultfd.h|   1 +
 include/standard-headers/drm/drm_fourcc.h |  34 -
 include/standard-headers/linux/ethtool.h  |  63 +++-
 include/standard-headers/linux/fuse.h |   6 +-
 .../linux/input-event-codes.h |   1 +
 include/standard-headers/linux/virtio_blk.h   |  19 +++
 linux-headers/asm-generic/hugetlb_encode.h|  26 ++--
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-riscv/kvm.h |   4 +
 linux-headers/linux/kvm.h |   1 +
 linux-headers/linux/psci.h|  14 ++
 linux-headers/linux/userfaultfd.h |   4 +
 linux-headers/linux/vfio.h| 142 ++
 migration/postcopy-ram.c  |  11 +-
 tests/qtest/migration-test.c  |   3 +-
 util/trace-events |   1 +
 util/userfaultfd.c|  49 +-
 18 files changed, 354 insertions(+), 29 deletions(-)

-- 
2.37.3

[PATCH 2/3] util/userfaultfd: Add uffd_open()

2023-01-25 Thread Peter Xu

Add a helper to create the uffd handle.

Signed-off-by: Peter Xu 
---
 include/qemu/userfaultfd.h   |  1 +
 migration/postcopy-ram.c | 11 +--
 tests/qtest/migration-test.c |  3 ++-
 util/userfaultfd.c   | 13 +++--
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/include/qemu/userfaultfd.h b/include/qemu/userfaultfd.h
index 6b74f92792..a19a05d5f7 100644
--- a/include/qemu/userfaultfd.h
+++ b/include/qemu/userfaultfd.h
@@ -17,6 +17,7 @@
 #include "exec/hwaddr.h"
 #include 
 
+int uffd_open(int flags);
 int uffd_query_features(uint64_t *features);
 int uffd_create_fd(uint64_t features, bool non_blocking);
 void uffd_close_fd(int uffd_fd);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b9a37ef255..0c55df0e52 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -37,6 +37,7 @@
 #include "qemu-file.h"
 #include "yank_functions.h"
 #include "tls.h"
+#include "qemu/userfaultfd.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -226,11 +227,9 @@ static bool receive_ufd_features(uint64_t *features)
 int ufd;
 bool ret = true;
 
-/* if we are here __NR_userfaultfd should exists */
-ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+ufd = uffd_open(O_CLOEXEC);
 if (ufd == -1) {
-error_report("%s: syscall __NR_userfaultfd failed: %s", __func__,
- strerror(errno));
+error_report("%s: uffd_open() failed: %s", __func__, strerror(errno));
 return false;
 }
 
@@ -375,7 +374,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState 
*mis)
 goto out;
 }
 
-ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+ufd = uffd_open(O_CLOEXEC);
 if (ufd == -1) {
 error_report("%s: userfaultfd not available: %s", __func__,
  strerror(errno));
@@ -1160,7 +1159,7 @@ static int 
postcopy_temp_pages_setup(MigrationIncomingState *mis)
 int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
 {
 /* Open the fd for the kernel to give us userfaults */
-mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+mis->userfault_fd = uffd_open(O_CLOEXEC | O_NONBLOCK);
 if (mis->userfault_fd == -1) {
 error_report("%s: Failed to open userfault fd: %s", __func__,
  strerror(errno));
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 1dd32c9506..7a5d1922dd 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -62,13 +62,14 @@ static bool uffd_feature_thread_id;
 #include 
 #include 
 #include 
+#include "qemu/userfaultfd.h"
 
 static bool ufd_version_check(void)
 {
 struct uffdio_api api_struct;
 uint64_t ioctl_mask;
 
-int ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+int ufd = uffd_open(O_CLOEXEC);
 
 if (ufd == -1) {
 g_test_message("Skipping test: userfaultfd not available");
diff --git a/util/userfaultfd.c b/util/userfaultfd.c
index f1cd6af2b1..9845a2ec81 100644
--- a/util/userfaultfd.c
+++ b/util/userfaultfd.c
@@ -19,6 +19,15 @@
 #include 
 #include 
 
+int uffd_open(int flags)
+{
+#if defined(__linux__) && defined(__NR_userfaultfd)
+return syscall(__NR_userfaultfd, flags);
+#else
+return -EINVAL;
+#endif
+}
+
 /**
  * uffd_query_features: query UFFD features
  *
@@ -32,7 +41,7 @@ int uffd_query_features(uint64_t *features)
 struct uffdio_api api_struct = { 0 };
 int ret = -1;
 
-uffd_fd = syscall(__NR_userfaultfd, O_CLOEXEC);
+uffd_fd = uffd_open(O_CLOEXEC);
 if (uffd_fd < 0) {
 trace_uffd_query_features_nosys(errno);
 return -1;
@@ -69,7 +78,7 @@ int uffd_create_fd(uint64_t features, bool non_blocking)
 uint64_t ioctl_mask = BIT(_UFFDIO_REGISTER) | BIT(_UFFDIO_UNREGISTER);
 
 flags = O_CLOEXEC | (non_blocking ? O_NONBLOCK : 0);
-uffd_fd = syscall(__NR_userfaultfd, flags);
+uffd_fd = uffd_open(flags);
 if (uffd_fd < 0) {
 trace_uffd_create_fd_nosys(errno);
 return -1;
-- 
2.37.3

[PATCH 1/3] linux-headers: Update to v6.1

2023-01-25 Thread Peter Xu

Signed-off-by: Peter Xu 
---
 include/standard-headers/drm/drm_fourcc.h |  34 -
 include/standard-headers/linux/ethtool.h  |  63 +++-
 include/standard-headers/linux/fuse.h |   6 +-
 .../linux/input-event-codes.h |   1 +
 include/standard-headers/linux/virtio_blk.h   |  19 +++
 linux-headers/asm-generic/hugetlb_encode.h|  26 ++--
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-riscv/kvm.h |   4 +
 linux-headers/linux/kvm.h |   1 +
 linux-headers/linux/psci.h|  14 ++
 linux-headers/linux/userfaultfd.h |   4 +
 linux-headers/linux/vfio.h| 142 ++
 13 files changed, 298 insertions(+), 20 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 48b620cbef..b868488f93 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -98,18 +98,42 @@ extern "C" {
 #define DRM_FORMAT_INVALID 0
 
 /* color index */
+#define DRM_FORMAT_C1  fourcc_code('C', '1', ' ', ' ') /* [7:0] 
C0:C1:C2:C3:C4:C5:C6:C7 1:1:1:1:1:1:1:1 eight pixels/byte */
+#define DRM_FORMAT_C2  fourcc_code('C', '2', ' ', ' ') /* [7:0] 
C0:C1:C2:C3 2:2:2:2 four pixels/byte */
+#define DRM_FORMAT_C4  fourcc_code('C', '4', ' ', ' ') /* [7:0] C0:C1 
4:4 two pixels/byte */
 #define DRM_FORMAT_C8  fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
-/* 8 bpp Red */
+/* 1 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D1  fourcc_code('D', '1', ' ', ' ') /* [7:0] 
D0:D1:D2:D3:D4:D5:D6:D7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D2  fourcc_code('D', '2', ' ', ' ') /* [7:0] 
D0:D1:D2:D3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D4  fourcc_code('D', '4', ' ', ' ') /* [7:0] D0:D1 
4:4 two pixels/byte */
+
+/* 8 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D8  fourcc_code('D', '8', ' ', ' ') /* [7:0] D */
+
+/* 1 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R1  fourcc_code('R', '1', ' ', ' ') /* [7:0] 
R0:R1:R2:R3:R4:R5:R6:R7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R2  fourcc_code('R', '2', ' ', ' ') /* [7:0] 
R0:R1:R2:R3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R4  fourcc_code('R', '4', ' ', ' ') /* [7:0] R0:R1 
4:4 two pixels/byte */
+
+/* 8 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R8  fourcc_code('R', '8', ' ', ' ') /* [7:0] R */
 
-/* 10 bpp Red */
+/* 10 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R10 fourcc_code('R', '1', '0', ' ') /* [15:0] x:R 
6:10 little endian */
 
-/* 12 bpp Red */
+/* 12 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R12 fourcc_code('R', '1', '2', ' ') /* [15:0] x:R 
4:12 little endian */
 
-/* 16 bpp Red */
+/* 16 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R16 fourcc_code('R', '1', '6', ' ') /* [15:0] R 
little endian */
 
 /* 16 bpp RG */
@@ -204,7 +228,9 @@ extern "C" {
 #define DRM_FORMAT_VYUYfourcc_code('V', 'Y', 'U', 'Y') /* 
[31:0] Y1:Cb0:Y0:Cr0 8:8:8:8 little endian */
 
 #define DRM_FORMAT_AYUVfourcc_code('A', 'Y', 'U', 'V') /* 
[31:0] A:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_AVUYfourcc_code('A', 'V', 'U', 'Y') /* [31:0] 
A:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_XYUVfourcc_code('X', 'Y', 'U', 'V') /* [31:0] 
X:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_XVUYfourcc_code('X', 'V', 'U', 'Y') /* [31:0] 
X:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_VUY888  fourcc_code('V', 'U', '2', '4') /* [23:0] 
Cr:Cb:Y 8:8:8 little endian */
 #define DRM_FORMAT_VUY101010   fourcc_code('V', 'U', '3', '0') /* Y followed 
by U then V, 10:10:10. Non-linear modifier only */
 
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 4537da20cc..1dc56cdc0a 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -736,6 +736,51 @@ enum ethtool_module_power_mode {
ETHTOOL_MODULE_POWER_MODE_HIGH,
 };
 
+/**
+ * enum ethtool_podl_pse_admin_state - operational state of the PoDL PSE
+ * functions. IEEE 802.3-

[PATCH 3/3] util/userfaultfd: Support /dev/userfaultfd

2023-01-25 Thread Peter Xu

Teach QEMU to use /dev/userfaultfd when it existed and fallback to the
system call if either it's not there or doesn't have enough permission.

Firstly, as long as the app has permission to access /dev/userfaultfd, it
always have the ability to trap kernel faults which QEMU mostly wants.
Meanwhile, in some context (e.g. containers) the userfaultfd syscall can be
forbidden, so it can be the major way to use postcopy in a restricted
environment with strict seccomp setup.

Signed-off-by: Peter Xu 
---
 util/trace-events  |  1 +
 util/userfaultfd.c | 36 
 2 files changed, 37 insertions(+)

diff --git a/util/trace-events b/util/trace-events
index c8f53d7d9f..16f78d8fe5 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -93,6 +93,7 @@ qemu_vfio_region_info(const char *desc, uint64_t region_ofs, 
uint64_t region_siz
 qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size, 
int ofs, void *host) "map region bar#%d addr 0x%"PRIx64" size 0x%"PRIx64" ofs 
0x%x host %p"
 
 #userfaultfd.c
+uffd_detect_open_mode(int mode) "%d"
 uffd_query_features_nosys(int err) "errno: %i"
 uffd_query_features_api_failed(int err) "errno: %i"
 uffd_create_fd_nosys(int err) "errno: %i"
diff --git a/util/userfaultfd.c b/util/userfaultfd.c
index 9845a2ec81..360ecf8084 100644
--- a/util/userfaultfd.c
+++ b/util/userfaultfd.c
@@ -18,10 +18,46 @@
 #include 
 #include 
 #include 
+#include 
+
+typedef enum {
+UFFD_UNINITIALIZED = 0,
+UFFD_USE_DEV_PATH,
+UFFD_USE_SYSCALL,
+} uffd_open_mode;
+
+static uffd_open_mode open_mode;
+static int uffd_dev;
+
+static uffd_open_mode uffd_detect_open_mode(void)
+{
+if (open_mode == UFFD_UNINITIALIZED) {
+/*
+ * Make /dev/userfaultfd the default approach because it has better
+ * permission controls, meanwhile allows kernel faults without any
+ * privilege requirement (e.g. SYS_CAP_PTRACE).
+ */
+uffd_dev = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
+if (uffd_dev >= 0) {
+open_mode = UFFD_USE_DEV_PATH;
+} else {
+/* Fallback to the system call */
+open_mode = UFFD_USE_SYSCALL;
+}
+trace_uffd_detect_open_mode(open_mode);
+}
+
+return open_mode;
+}
 
 int uffd_open(int flags)
 {
 #if defined(__linux__) && defined(__NR_userfaultfd)
+if (uffd_detect_open_mode() == UFFD_USE_DEV_PATH) {
+assert(uffd_dev >= 0);
+return ioctl(uffd_dev, USERFAULTFD_IOC_NEW, flags);
+}
+
 return syscall(__NR_userfaultfd, flags);
 #else
 return -EINVAL;
-- 
2.37.3

Re: [QEMU][PATCH v4 04/10] xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common

2023-01-25 Thread Vikram Garhwal


Hi Stefano,

On 1/25/23 1:55 PM, Stefano Stabellini wrote:

On Wed, 25 Jan 2023, Vikram Garhwal wrote:

From: Stefano Stabellini 

This patch does following:
1. creates arch_handle_ioreq() and arch_xen_set_memory(). This is done in
 preparation for moving most of xen-hvm code to an arch-neutral location,
 move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
 Also, move handle_vmport_ioreq to arch_handle_ioreq.

2. Pure code movement: move common functions to hw/xen/xen-hvm-common.c
 Extract common functionalities from hw/i386/xen/xen-hvm.c and move them to
 hw/xen/xen-hvm-common.c. These common functions are useful for creating
 an IOREQ server.

 xen_hvm_init_pc() contains the architecture independent code for creating
 and mapping a IOREQ server, connecting memory and IO listeners, 
initializing
 a xen bus and registering backends. Moved this common xen code to a new
 function xen_register_ioreq() which can be used by both x86 and ARM 
machines.

 Following functions are moved to hw/xen/xen-hvm-common.c:
 xen_vcpu_eport(), xen_vcpu_ioreq(), xen_ram_alloc(), xen_set_memory(),
 xen_region_add(), xen_region_del(), xen_io_add(), xen_io_del(),
 xen_device_realize(), xen_device_unrealize(),
 cpu_get_ioreq_from_shared_memory(), cpu_get_ioreq(), do_inp(),
 do_outp(), rw_phys_req_item(), read_phys_req_item(),
 write_phys_req_item(), cpu_ioreq_pio(), cpu_ioreq_move(),
 cpu_ioreq_config(), handle_ioreq(), handle_buffered_iopage(),
 handle_buffered_io(), cpu_handle_ioreq(), xen_main_loop_prepare(),
 xen_hvm_change_state_handler(), xen_exit_notifier(),
 xen_map_ioreq_server(), destroy_hvm_domain() and
 xen_shutdown_fatal_error()

3. Removed static type from below functions:
 1. xen_region_add()
 2. xen_region_del()
 3. xen_io_add()
 4. xen_io_del()
 5. xen_device_realize()
 6. xen_device_unrealize()
 7. xen_hvm_change_state_handler()
 8. cpu_ioreq_pio()
 9. xen_exit_notifier()

4. Replace TARGET_PAGE_SIZE with XC_PAGE_SIZE to match the page side with Xen.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 

One comment below

[...]


+void xen_exit_notifier(Notifier *n, void *data)
+{
+XenIOState *state = container_of(n, XenIOState, exit);
+
+xen_destroy_ioreq_server(xen_domid, state->ioservid);

In the original code we had:

-if (state->fres != NULL) {
-xenforeignmemory_unmap_resource(xen_fmem, state->fres);
-}

Should we add it here?


I went through the manual process of comparing all the code additions
and deletions (not fun!) and everything checks out except for this.
thanks for catching this. There were two recent commits in upstream and 
i missed those. I rechecked and there are actually three other lines 
which needs update. I will address it in v5.



+xenevtchn_close(state->xce_handle);
+xs_daemon_close(state->xenstore);
+}

Re: [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

2023-01-25 Thread Stefan Hajnoczi

On Thu, Jan 19, 2023 at 02:03:08AM -0500, Alexander Bulekov wrote:
> This protects devices from bh->mmio reentrancy issues.
> 
> Signed-off-by: Alexander Bulekov 
> ---
>  hw/9pfs/xen-9p-backend.c| 4 +++-
>  hw/block/dataplane/virtio-blk.c | 3 ++-
>  hw/block/dataplane/xen-block.c  | 5 +++--
>  hw/block/virtio-blk.c   | 5 +++--
>  hw/char/virtio-serial-bus.c | 3 ++-
>  hw/display/qxl.c| 9 ++---
>  hw/display/virtio-gpu.c | 6 --
>  hw/ide/ahci.c   | 3 ++-
>  hw/ide/core.c   | 3 ++-
>  hw/misc/imx_rngc.c  | 6 --
>  hw/misc/macio/mac_dbdma.c   | 2 +-
>  hw/net/virtio-net.c | 3 ++-
>  hw/nvme/ctrl.c  | 6 --
>  hw/scsi/mptsas.c| 3 ++-
>  hw/scsi/scsi-bus.c  | 3 ++-
>  hw/scsi/vmw_pvscsi.c| 3 ++-
>  hw/usb/dev-uas.c| 3 ++-
>  hw/usb/hcd-dwc2.c   | 3 ++-
>  hw/usb/hcd-ehci.c   | 3 ++-
>  hw/usb/hcd-uhci.c   | 2 +-
>  hw/usb/host-libusb.c| 6 --
>  hw/usb/redirect.c   | 6 --
>  hw/usb/xen-usb.c| 3 ++-
>  hw/virtio/virtio-balloon.c  | 5 +++--
>  hw/virtio/virtio-crypto.c   | 3 ++-
>  25 files changed, 66 insertions(+), 35 deletions(-)

Should scripts/checkpatch.pl complain when qemu_bh_new() or aio_bh_new()
are called from hw/? Adding a check is important so new instances cannot
be added accidentally in the future.

Stefan


signature.asc
Description: PGP signature

Re: [QEMU][PATCH v4 09/10] hw/arm: introduce xenpvh machine

2023-01-25 Thread Stefano Stabellini

On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> Add a new machine xenpvh which creates a IOREQ server to register/connect with
> Xen Hypervisor.
> 
> Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
> TPM emulator and connects to swtpm running on host machine via chardev socket
> and support TPM functionalities for a guest domain.
> 
> Extra command line for aarch64 xenpvh QEMU to connect to swtpm:
> -chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
> -tpmdev emulator,id=tpm0,chardev=chrtpm \
> -machine tpm-base-addr=0x0c00 \
> 
> swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
> provides access to TPM functionality over socket, chardev and CUSE interface.
> Github repo: https://github.com/stefanberger/swtpm
> Example for starting swtpm on host machine:
> mkdir /tmp/vtpm2
> swtpm socket --tpmstate dir=/tmp/vtpm2 \
> --ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &
> 
> Signed-off-by: Vikram Garhwal 
> Signed-off-by: Stefano Stabellini 
> ---
>  docs/system/arm/xenpvh.rst|  34 +++
>  docs/system/target-arm.rst|   1 +
>  hw/arm/meson.build|   2 +
>  hw/arm/xen_arm.c  | 184 ++
>  include/hw/arm/xen_arch_hvm.h |   9 ++
>  include/hw/xen/arch_hvm.h |   2 +
>  6 files changed, 232 insertions(+)
>  create mode 100644 docs/system/arm/xenpvh.rst
>  create mode 100644 hw/arm/xen_arm.c
>  create mode 100644 include/hw/arm/xen_arch_hvm.h
> 
> diff --git a/docs/system/arm/xenpvh.rst b/docs/system/arm/xenpvh.rst
> new file mode 100644
> index 00..e1655c7ab8
> --- /dev/null
> +++ b/docs/system/arm/xenpvh.rst
> @@ -0,0 +1,34 @@
> +XENPVH (``xenpvh``)
> +=
> +This machine creates a IOREQ server to register/connect with Xen Hypervisor.
> +
> +When TPM is enabled, this machine also creates a tpm-tis-device at a user 
> input
> +tpm base address, adds a TPM emulator and connects to a swtpm application
> +running on host machine via chardev socket. This enables xenpvh to support 
> TPM
> +functionalities for a guest domain.
> +
> +More information about TPM use and installing swtpm linux application can be
> +found at: docs/specs/tpm.rst.
> +
> +Example for starting swtpm on host machine:
> +.. code-block:: console
> +
> +mkdir /tmp/vtpm2
> +swtpm socket --tpmstate dir=/tmp/vtpm2 \
> +--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &
> +
> +Sample QEMU xenpvh commands for running and connecting with Xen:
> +.. code-block:: console
> +
> +qemu-system-aarch64 -xen-domid 1 \
> +-chardev socket,id=libxl-cmd,path=qmp-libxl-1,server=on,wait=off \
> +-mon chardev=libxl-cmd,mode=control \
> +-chardev 
> socket,id=libxenstat-cmd,path=qmp-libxenstat-1,server=on,wait=off \
> +-mon chardev=libxenstat-cmd,mode=control \
> +-xen-attach -name guest0 -vnc none -display none -nographic \
> +-machine xenpvh -m 1301 \
> +-chardev socket,id=chrtpm,path=tmp/vtpm2/swtpm-sock \
> +-tpmdev emulator,id=tpm0,chardev=chrtpm -machine tpm-base-addr=0x0C00
> +
> +In above QEMU command, last two lines are for connecting xenpvh QEMU to swtpm
> +via chardev socket.
> diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
> index 91ebc26c6d..af8d7c77d6 100644
> --- a/docs/system/target-arm.rst
> +++ b/docs/system/target-arm.rst
> @@ -106,6 +106,7 @@ undocumented; you can get a complete list by running
> arm/stm32
> arm/virt
> arm/xlnx-versal-virt
> +   arm/xenpvh
>  
>  Emulated CPU architecture support
>  =
> diff --git a/hw/arm/meson.build b/hw/arm/meson.build
> index b036045603..06bddbfbb8 100644
> --- a/hw/arm/meson.build
> +++ b/hw/arm/meson.build
> @@ -61,6 +61,8 @@ arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: 
> files('fsl-imx7.c', 'mcimx7d-sabre.
>  arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
>  arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 
> 'mcimx6ul-evk.c'))
>  arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
> +arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
> +arm_ss.add_all(xen_ss)
>  
>  softmmu_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))
>  softmmu_ss.add(when: 'CONFIG_EXYNOS4', if_true: files('exynos4_boards.c'))
> diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
> new file mode 100644
> index 00..12b19e3609
> --- /dev/null
> +++ b/hw/arm/xen_arm.c
> @@ -0,0 +1,184 @@
> +/*
> + * QEMU ARM Xen PV Machine
   ^ PVH


> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to w

Re: [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

2023-01-25 Thread Stefan Hajnoczi

On Thu, Jan 19, 2023 at 02:03:08AM -0500, Alexander Bulekov wrote:
> This protects devices from bh->mmio reentrancy issues.
> 
> Signed-off-by: Alexander Bulekov 
> ---
>  hw/9pfs/xen-9p-backend.c| 4 +++-
>  hw/block/dataplane/virtio-blk.c | 3 ++-
>  hw/block/dataplane/xen-block.c  | 5 +++--
>  hw/block/virtio-blk.c   | 5 +++--
>  hw/char/virtio-serial-bus.c | 3 ++-
>  hw/display/qxl.c| 9 ++---
>  hw/display/virtio-gpu.c | 6 --
>  hw/ide/ahci.c   | 3 ++-
>  hw/ide/core.c   | 3 ++-
>  hw/misc/imx_rngc.c  | 6 --
>  hw/misc/macio/mac_dbdma.c   | 2 +-
>  hw/net/virtio-net.c | 3 ++-
>  hw/nvme/ctrl.c  | 6 --
>  hw/scsi/mptsas.c| 3 ++-
>  hw/scsi/scsi-bus.c  | 3 ++-
>  hw/scsi/vmw_pvscsi.c| 3 ++-
>  hw/usb/dev-uas.c| 3 ++-
>  hw/usb/hcd-dwc2.c   | 3 ++-
>  hw/usb/hcd-ehci.c   | 3 ++-
>  hw/usb/hcd-uhci.c   | 2 +-
>  hw/usb/host-libusb.c| 6 --
>  hw/usb/redirect.c   | 6 --
>  hw/usb/xen-usb.c| 3 ++-
>  hw/virtio/virtio-balloon.c  | 5 +++--
>  hw/virtio/virtio-crypto.c   | 3 ++-
>  25 files changed, 66 insertions(+), 35 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 5/5] hw/nvram/eeprom_at24c: Make reset behavior more like hardware

2023-01-25 Thread Peter Delevoryas

On Wed, Jan 25, 2023 at 03:41:30PM -0600, Corey Minyard wrote:
> On Tue, Jan 17, 2023 at 06:42:14PM -0800, Peter Delevoryas wrote:
> > EEPROM's are a form of non-volatile memory. After power-cycling an EEPROM,
> > I would expect the I2C state machine to be reset to default values, but I
> > wouldn't really expect the memory to change at all.
> 
> Yes, I agree, I was actually wondering about this reviewing earlier
> changes.  Thanks for fixing this.

Oh great! I'm glad everyone has agreed with this so far.

- Peter

> 
> Reviewed-by: Corey Minyard 
> 
> > 
> > The current implementation of the at24c EEPROM resets its internal memory on
> > reset. This matches the specification in docs/devel/reset.rst:
> > 
> >   Cold reset is supported by every resettable object. In QEMU, it means we 
> > reset
> >   to the initial state corresponding to the start of QEMU; this might differ
> >   from what is a real hardware cold reset. It differs from other resets 
> > (like
> >   warm or bus resets) which may keep certain parts untouched.
> > 
> > But differs from my intuition. For example, if someone writes some 
> > information
> > to an EEPROM, then AC power cycles their board, they would expect the 
> > EEPROM to
> > retain that information. It's very useful to be able to test things like 
> > this
> > in QEMU as well, to verify software instrumentation like determining the 
> > cause
> > of a reboot.
> > 
> > Fixes: 5d8424dbd3e8 ("nvram: add AT24Cx i2c eeprom")
> > Signed-off-by: Peter Delevoryas 
> > Reviewed-by: Joel Stanley 
> > ---
> >  hw/nvram/eeprom_at24c.c | 22 ++
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> > 
> > diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
> > index f8d751fa278d..5074776bff04 100644
> > --- a/hw/nvram/eeprom_at24c.c
> > +++ b/hw/nvram/eeprom_at24c.c
> > @@ -185,18 +185,6 @@ static void at24c_eeprom_realize(DeviceState *dev, 
> > Error **errp)
> >  }
> >  
> >  ee->mem = g_malloc0(ee->rsize);
> > -
> > -}
> > -
> > -static
> > -void at24c_eeprom_reset(DeviceState *state)
> > -{
> > -EEPROMState *ee = AT24C_EE(state);
> > -
> > -ee->changed = false;
> > -ee->cur = 0;
> > -ee->haveaddr = 0;
> > -
> >  memset(ee->mem, 0, ee->rsize);
> >  
> >  if (ee->init_rom) {
> > @@ -214,6 +202,16 @@ void at24c_eeprom_reset(DeviceState *state)
> >  }
> >  }
> >  
> > +static
> > +void at24c_eeprom_reset(DeviceState *state)
> > +{
> > +EEPROMState *ee = AT24C_EE(state);
> > +
> > +ee->changed = false;
> > +ee->cur = 0;
> > +ee->haveaddr = 0;
> > +}
> > +
> >  static Property at24c_eeprom_props[] = {
> >  DEFINE_PROP_UINT32("rom-size", EEPROMState, rsize, 0),
> >  DEFINE_PROP_BOOL("writable", EEPROMState, writable, true),
> > -- 
> > 2.39.0
> > 
> >

Re: [QEMU][PATCH v4 07/10] hw/xen/xen-hvm-common: Use g_new and error_setg_errno

2023-01-25 Thread Stefano Stabellini

On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> Replace g_malloc with g_new and perror with error_setg_errno.
> 
> Signed-off-by: Vikram Garhwal 
> ---
>  hw/xen/xen-hvm-common.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> index 94dbbe97ed..01c8ec1956 100644
> --- a/hw/xen/xen-hvm-common.c
> +++ b/hw/xen/xen-hvm-common.c
> @@ -34,7 +34,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
> MemoryRegion *mr,
>  trace_xen_ram_alloc(ram_addr, size);
>  
>  nr_pfn = size >> TARGET_PAGE_BITS;
> -pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
> +pfn_list = g_new(xen_pfn_t, nr_pfn);
>  
>  for (i = 0; i < nr_pfn; i++) {
>  pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
> @@ -726,7 +726,7 @@ void destroy_hvm_domain(bool reboot)
>  return;
>  }
>  if (errno != ENOTTY /* old Xen */) {
> -perror("xendevicemodel_shutdown failed");
> +error_report("xendevicemodel_shutdown failed with error %d", 
> errno);

You can use strerror(errno), here and below.

Either way:

Reviewed-by: Stefano Stabellini 



>  }
>  /* well, try the old thing then */
>  }
> @@ -797,7 +797,7 @@ static void xen_do_ioreq_register(XenIOState *state,
>  }
>  
>  /* Note: cpus is empty at this point in init */
> -state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
> +state->cpu_by_vcpu_id = g_new0(CPUState *, max_cpus);
>  
>  rc = xen_set_ioreq_server_state(xen_domid, state->ioservid, true);
>  if (rc < 0) {
> @@ -806,7 +806,7 @@ static void xen_do_ioreq_register(XenIOState *state,
>  goto err;
>  }
>  
> -state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
> +state->ioreq_local_port = g_new0(evtchn_port_t, max_cpus);
>  
>  /* FIXME: how about if we overflow the page here? */
>  for (i = 0; i < max_cpus; i++) {
> @@ -860,13 +860,13 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
> max_cpus,
>  
>  state->xce_handle = xenevtchn_open(NULL, 0);
>  if (state->xce_handle == NULL) {
> -perror("xen: event channel open");
> +error_report("xen: event channel open failed with error %d", errno);
>  goto err;
>  }
>  
>  state->xenstore = xs_daemon_open();
>  if (state->xenstore == NULL) {
> -perror("xen: xenstore open");
> +error_report("xen: xenstore open failed with error %d", errno);
>  goto err;
>  }
>  
> -- 
> 2.17.0
> 
>

Re: [PATCH v4 3/5] hw/nvram/eeprom_at24c: Add init_rom field and at24c_eeprom_init_rom helper

2023-01-25 Thread Peter Delevoryas

On Wed, Jan 25, 2023 at 03:36:23PM -0600, Corey Minyard wrote:
> On Tue, Jan 17, 2023 at 06:42:12PM -0800, Peter Delevoryas wrote:
> > Allows users to specify binary data to initialize an EEPROM, allowing users 
> > to
> > emulate data programmed at manufacturing time.
> > 
> > - Added init_rom and init_rom_size attributes to TYPE_AT24C_EE
> > - Added at24c_eeprom_init_rom helper function to initialize attributes
> > - If -drive property is provided, it overrides init_rom data
> > 
> > Signed-off-by: Peter Delevoryas 
> > Reviewed-by: Joel Stanley 
> > ---
> >  hw/nvram/eeprom_at24c.c | 37 -
> >  include/hw/nvram/eeprom_at24c.h | 16 ++
> >  2 files changed, 48 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
> > index 98857e3626b9..f8d751fa278d 100644
> > --- a/hw/nvram/eeprom_at24c.c
> > +++ b/hw/nvram/eeprom_at24c.c
> > @@ -50,6 +50,9 @@ struct EEPROMState {
> >  uint8_t *mem;
> >  
> >  BlockBackend *blk;
> > +
> > +const uint8_t *init_rom;
> > +uint32_t init_rom_size;
> >  };
> >  
> >  static
> > @@ -131,19 +134,38 @@ int at24c_eeprom_send(I2CSlave *s, uint8_t data)
> >  
> >  I2CSlave *at24c_eeprom_init(I2CBus *bus, uint8_t address, uint32_t 
> > rom_size)
> >  {
> > -I2CSlave *i2c_dev = i2c_slave_new(TYPE_AT24C_EE, address);
> > -DeviceState *dev = DEVICE(i2c_dev);
> > +return at24c_eeprom_init_rom(bus, address, rom_size, NULL, 0);
> > +}
> > +
> > +I2CSlave *at24c_eeprom_init_rom(I2CBus *bus, uint8_t address, uint32_t 
> > rom_size,
> > +const uint8_t *init_rom, uint32_t 
> > init_rom_size)
> > +{
> > +EEPROMState *s;
> > +
> > +s = AT24C_EE(qdev_new(TYPE_AT24C_EE));
> > +
> > +qdev_prop_set_uint8(DEVICE(s), "address", address);
> 
> Why did you switch from using i2c_slave_new()?  Using it is more
> documentation and future-proofing than convenience.

Oh, yeah that's my bad. I was probably doing it so that all the qdev_prop_set
calls to the object are in the same place, but I probably should have just kept
i2c_slave_new() and initialized only the at24c-eeprom properties here, instead
of initializing the I2CSlave ones too.

- Peter

> 
> Other than that, looks good to me.
> 
> Reviewed-by: Corey Minyard 
> 
> > +qdev_prop_set_uint32(DEVICE(s), "rom-size", rom_size);
> >  
> > -qdev_prop_set_uint32(dev, "rom-size", rom_size);
> > -i2c_slave_realize_and_unref(i2c_dev, bus, &error_abort);
> > +/* TODO: Model init_rom with QOM properties. */
> > +s->init_rom = init_rom;
> > +s->init_rom_size = init_rom_size;
> >  
> > -return i2c_dev;
> > +i2c_slave_realize_and_unref(I2C_SLAVE(s), bus, &error_abort);
> > +
> > +return I2C_SLAVE(s);
> >  }
> >  
> >  static void at24c_eeprom_realize(DeviceState *dev, Error **errp)
> >  {
> >  EEPROMState *ee = AT24C_EE(dev);
> >  
> > +if (ee->init_rom_size > ee->rsize) {
> > +error_setg(errp, "%s: init rom is larger than rom: %u > %u",
> > +   TYPE_AT24C_EE, ee->init_rom_size, ee->rsize);
> > +return;
> > +}
> > +
> >  if (ee->blk) {
> >  int64_t len = blk_getlength(ee->blk);
> >  
> > @@ -163,6 +185,7 @@ static void at24c_eeprom_realize(DeviceState *dev, 
> > Error **errp)
> >  }
> >  
> >  ee->mem = g_malloc0(ee->rsize);
> > +
> >  }
> >  
> >  static
> > @@ -176,6 +199,10 @@ void at24c_eeprom_reset(DeviceState *state)
> >  
> >  memset(ee->mem, 0, ee->rsize);
> >  
> > +if (ee->init_rom) {
> > +memcpy(ee->mem, ee->init_rom, MIN(ee->init_rom_size, ee->rsize));
> > +}
> > +
> >  if (ee->blk) {
> >  int ret = blk_pread(ee->blk, 0, ee->rsize, ee->mem, 0);
> >  
> > diff --git a/include/hw/nvram/eeprom_at24c.h 
> > b/include/hw/nvram/eeprom_at24c.h
> > index 196db309d451..acb9857b2add 100644
> > --- a/include/hw/nvram/eeprom_at24c.h
> > +++ b/include/hw/nvram/eeprom_at24c.h
> > @@ -20,4 +20,20 @@
> >   */
> >  I2CSlave *at24c_eeprom_init(I2CBus *bus, uint8_t address, uint32_t 
> > rom_size);
> >  
> > +
> > +/*
> > + * Create and realize an AT24C EEPROM device on the heap with initial data.
> > + * @bus: I2C bus to put it on
> > + * @address: I2C address of the EEPROM slave when put on a bus
> > + * @rom_size: size of the EEPROM
> > + * @init_rom: Array of bytes to initialize EEPROM memory with
> > + * @init_rom_size: Size of @init_rom, must be less than or equal to 
> > @rom_size
> > + *
> > + * Create the device state structure, initialize it, put it on the 
> > specified
> > + * @bus, and drop the reference to it (the device is realized). Copies the 
> > data
> > + * from @init_rom to the beginning of the EEPROM memory buffer.
> > + */
> > +I2CSlave *at24c_eeprom_init_rom(I2CBus *bus, uint8_t address, uint32_t 
> > rom_size,
> > +const uint8_t *init_rom, uint32_t 
> > init_rom_size);
> > +
> >  #endif
> > -- 
> > 2.39.0
> > 
> >

Re: [QEMU][PATCH v4 06/10] hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration failure

2023-01-25 Thread Stefano Stabellini

On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> From: Stefano Stabellini 
> 
> On ARM it is possible to have a functioning xenpv machine with only the
> PV backends and no IOREQ server. If the IOREQ server creation fails continue
> to the PV backends initialization.
> 
> Also, moved the IOREQ registration and mapping subroutine to new function
> xen_do_ioreq_register().
> 
> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Vikram Garhwal 

as per my previous reply, even though I am listed as co-author, for
tracking that I did review this version of the patch:

Reviewed-by: Stefano Stabellini 


> ---
>  hw/xen/xen-hvm-common.c | 53 -
>  1 file changed, 36 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> index e748d8d423..94dbbe97ed 100644
> --- a/hw/xen/xen-hvm-common.c
> +++ b/hw/xen/xen-hvm-common.c
> @@ -777,25 +777,12 @@ err:
>  exit(1);
>  }
>  
> -void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
> -MemoryListener xen_memory_listener)
> +static void xen_do_ioreq_register(XenIOState *state,
> +   unsigned int max_cpus,
> +   MemoryListener 
> xen_memory_listener)
>  {
>  int i, rc;
>  
> -state->xce_handle = xenevtchn_open(NULL, 0);
> -if (state->xce_handle == NULL) {
> -perror("xen: event channel open");
> -goto err;
> -}
> -
> -state->xenstore = xs_daemon_open();
> -if (state->xenstore == NULL) {
> -perror("xen: xenstore open");
> -goto err;
> -}
> -
> -xen_create_ioreq_server(xen_domid, &state->ioservid);
> -
>  state->exit.notify = xen_exit_notifier;
>  qemu_add_exit_notifier(&state->exit);
>  
> @@ -859,12 +846,44 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
> max_cpus,
>  QLIST_INIT(&state->dev_list);
>  device_listener_register(&state->device_listener);
>  
> +return;
> +
> +err:
> +error_report("xen hardware virtual machine initialisation failed");
> +exit(1);
> +}
> +
> +void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
> +MemoryListener xen_memory_listener)
> +{
> +int rc;
> +
> +state->xce_handle = xenevtchn_open(NULL, 0);
> +if (state->xce_handle == NULL) {
> +perror("xen: event channel open");
> +goto err;
> +}
> +
> +state->xenstore = xs_daemon_open();
> +if (state->xenstore == NULL) {
> +perror("xen: xenstore open");
> +goto err;
> +}
> +
> +rc = xen_create_ioreq_server(xen_domid, &state->ioservid);
> +if (!rc) {
> +xen_do_ioreq_register(state, max_cpus, xen_memory_listener);
> +} else {
> +warn_report("xen: failed to create ioreq server");
> +}
> +
>  xen_bus_init();
>  
>  xen_register_backend(state);
>  
>  return;
> +
>  err:
> -error_report("xen hardware virtual machine initialisation failed");
> +error_report("xen hardware virtual machine backend registration failed");
>  exit(1);
>  }
> -- 
> 2.17.0
> 
>

Re: [QEMU][PATCH v4 05/10] include/hw/xen/xen_common: return error from xen_create_ioreq_server

2023-01-25 Thread Stefano Stabellini

On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> From: Stefano Stabellini 
> 
> This is done to prepare for enabling xenpv support for ARM architecture.
> On ARM it is possible to have a functioning xenpv machine with only the
> PV backends and no IOREQ server. If the IOREQ server creation fails,
> continue to the PV backends initialization.
> 
> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Vikram Garhwal 

I know I am co-author of the patch but just for record-keeping to
remember that I also reviewed this patch:

Reviewed-by: Stefano Stabellini 


> ---
>  include/hw/xen/xen_common.h | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
> index 9a13a756ae..9ec69582b3 100644
> --- a/include/hw/xen/xen_common.h
> +++ b/include/hw/xen/xen_common.h
> @@ -467,9 +467,10 @@ static inline void xen_unmap_pcidev(domid_t dom,
>  {
>  }
>  
> -static inline void xen_create_ioreq_server(domid_t dom,
> -   ioservid_t *ioservid)
> +static inline int xen_create_ioreq_server(domid_t dom,
> +  ioservid_t *ioservid)
>  {
> +return 0;
>  }
>  
>  static inline void xen_destroy_ioreq_server(domid_t dom,
> @@ -600,8 +601,8 @@ static inline void xen_unmap_pcidev(domid_t dom,
>PCI_FUNC(pci_dev->devfn));
>  }
>  
> -static inline void xen_create_ioreq_server(domid_t dom,
> -   ioservid_t *ioservid)
> +static inline int xen_create_ioreq_server(domid_t dom,
> +  ioservid_t *ioservid)
>  {
>  int rc = xendevicemodel_create_ioreq_server(xen_dmod, dom,
>  HVM_IOREQSRV_BUFIOREQ_ATOMIC,
> @@ -609,12 +610,14 @@ static inline void xen_create_ioreq_server(domid_t dom,
>  
>  if (rc == 0) {
>  trace_xen_ioreq_server_create(*ioservid);
> -return;
> +return rc;
>  }
>  
>  *ioservid = 0;
>  use_default_ioreq_server = true;
>  trace_xen_default_ioreq_server();
> +
> +return rc;
>  }
>  
>  static inline void xen_destroy_ioreq_server(domid_t dom,
> -- 
> 2.17.0
> 
>

Re: [QEMU][PATCH v4 04/10] xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common

2023-01-25 Thread Stefano Stabellini

On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> From: Stefano Stabellini 
> 
> This patch does following:
> 1. creates arch_handle_ioreq() and arch_xen_set_memory(). This is done in
> preparation for moving most of xen-hvm code to an arch-neutral location,
> move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
> Also, move handle_vmport_ioreq to arch_handle_ioreq.
> 
> 2. Pure code movement: move common functions to hw/xen/xen-hvm-common.c
> Extract common functionalities from hw/i386/xen/xen-hvm.c and move them to
> hw/xen/xen-hvm-common.c. These common functions are useful for creating
> an IOREQ server.
> 
> xen_hvm_init_pc() contains the architecture independent code for creating
> and mapping a IOREQ server, connecting memory and IO listeners, 
> initializing
> a xen bus and registering backends. Moved this common xen code to a new
> function xen_register_ioreq() which can be used by both x86 and ARM 
> machines.
> 
> Following functions are moved to hw/xen/xen-hvm-common.c:
> xen_vcpu_eport(), xen_vcpu_ioreq(), xen_ram_alloc(), xen_set_memory(),
> xen_region_add(), xen_region_del(), xen_io_add(), xen_io_del(),
> xen_device_realize(), xen_device_unrealize(),
> cpu_get_ioreq_from_shared_memory(), cpu_get_ioreq(), do_inp(),
> do_outp(), rw_phys_req_item(), read_phys_req_item(),
> write_phys_req_item(), cpu_ioreq_pio(), cpu_ioreq_move(),
> cpu_ioreq_config(), handle_ioreq(), handle_buffered_iopage(),
> handle_buffered_io(), cpu_handle_ioreq(), xen_main_loop_prepare(),
> xen_hvm_change_state_handler(), xen_exit_notifier(),
> xen_map_ioreq_server(), destroy_hvm_domain() and
> xen_shutdown_fatal_error()
> 
> 3. Removed static type from below functions:
> 1. xen_region_add()
> 2. xen_region_del()
> 3. xen_io_add()
> 4. xen_io_del()
> 5. xen_device_realize()
> 6. xen_device_unrealize()
> 7. xen_hvm_change_state_handler()
> 8. cpu_ioreq_pio()
> 9. xen_exit_notifier()
> 
> 4. Replace TARGET_PAGE_SIZE with XC_PAGE_SIZE to match the page side with Xen.
> 
> Signed-off-by: Vikram Garhwal 
> Signed-off-by: Stefano Stabellini 

One comment below

[...]

> +void xen_exit_notifier(Notifier *n, void *data)
> +{
> +XenIOState *state = container_of(n, XenIOState, exit);
> +
> +xen_destroy_ioreq_server(xen_domid, state->ioservid);

In the original code we had:

-if (state->fres != NULL) {
-xenforeignmemory_unmap_resource(xen_fmem, state->fres);
-}

Should we add it here?


I went through the manual process of comparing all the code additions
and deletions (not fun!) and everything checks out except for this.


> +xenevtchn_close(state->xce_handle);
> +xs_daemon_close(state->xenstore);
> +}

Re: [PATCH v4 00/36] tcg: Support for Int128 with helpers

2023-01-25 Thread Alex Bennée

Richard Henderson  writes:

> Changes for v4:
>   * About half of the v3 series has been merged,
>   * AArch64 host requires even argument register.
>   * target/{arm,ppc,s390x,i386} uses included here.

Have you got a branch or a new re-base? I tried applying but got messy
conflicts I couldn't cleanly resolve.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v4 5/5] hw/nvram/eeprom_at24c: Make reset behavior more like hardware

2023-01-25 Thread Corey Minyard

On Tue, Jan 17, 2023 at 06:42:14PM -0800, Peter Delevoryas wrote:
> EEPROM's are a form of non-volatile memory. After power-cycling an EEPROM,
> I would expect the I2C state machine to be reset to default values, but I
> wouldn't really expect the memory to change at all.

Yes, I agree, I was actually wondering about this reviewing earlier
changes.  Thanks for fixing this.

Reviewed-by: Corey Minyard 

> 
> The current implementation of the at24c EEPROM resets its internal memory on
> reset. This matches the specification in docs/devel/reset.rst:
> 
>   Cold reset is supported by every resettable object. In QEMU, it means we 
> reset
>   to the initial state corresponding to the start of QEMU; this might differ
>   from what is a real hardware cold reset. It differs from other resets (like
>   warm or bus resets) which may keep certain parts untouched.
> 
> But differs from my intuition. For example, if someone writes some information
> to an EEPROM, then AC power cycles their board, they would expect the EEPROM 
> to
> retain that information. It's very useful to be able to test things like this
> in QEMU as well, to verify software instrumentation like determining the cause
> of a reboot.
> 
> Fixes: 5d8424dbd3e8 ("nvram: add AT24Cx i2c eeprom")
> Signed-off-by: Peter Delevoryas 
> Reviewed-by: Joel Stanley 
> ---
>  hw/nvram/eeprom_at24c.c | 22 ++
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
> index f8d751fa278d..5074776bff04 100644
> --- a/hw/nvram/eeprom_at24c.c
> +++ b/hw/nvram/eeprom_at24c.c
> @@ -185,18 +185,6 @@ static void at24c_eeprom_realize(DeviceState *dev, Error 
> **errp)
>  }
>  
>  ee->mem = g_malloc0(ee->rsize);
> -
> -}
> -
> -static
> -void at24c_eeprom_reset(DeviceState *state)
> -{
> -EEPROMState *ee = AT24C_EE(state);
> -
> -ee->changed = false;
> -ee->cur = 0;
> -ee->haveaddr = 0;
> -
>  memset(ee->mem, 0, ee->rsize);
>  
>  if (ee->init_rom) {
> @@ -214,6 +202,16 @@ void at24c_eeprom_reset(DeviceState *state)
>  }
>  }
>  
> +static
> +void at24c_eeprom_reset(DeviceState *state)
> +{
> +EEPROMState *ee = AT24C_EE(state);
> +
> +ee->changed = false;
> +ee->cur = 0;
> +ee->haveaddr = 0;
> +}
> +
>  static Property at24c_eeprom_props[] = {
>  DEFINE_PROP_UINT32("rom-size", EEPROMState, rsize, 0),
>  DEFINE_PROP_BOOL("writable", EEPROMState, writable, true),
> -- 
> 2.39.0
> 
>

Re: [PATCH v4 4/5] hw/arm/aspeed: Add aspeed_eeprom.c

2023-01-25 Thread Corey Minyard

On Tue, Jan 17, 2023 at 06:42:13PM -0800, Peter Delevoryas wrote:
> - Create aspeed_eeprom.c and aspeed_eeprom.h
> - Include aspeed_eeprom.c in CONFIG_ASPEED meson source files
> - Include aspeed_eeprom.h in aspeed.c
> - Add fby35_bmc_fruid data
> - Use new at24c_eeprom_init_rom helper to initialize BMC FRUID EEPROM with 
> data
>   from aspeed_eeprom.c

Reviewed-by: Corey Minyard 

> 
> wget 
> https://github.com/facebook/openbmc/releases/download/openbmc-e2294ff5d31d/fby35.mtd
> qemu-system-aarch64 -machine fby35-bmc -nographic -mtdblock fby35.mtd
> ...
> user: root
> pass: 0penBmc
> ...
> root@bmc-oob:~# fruid-util bb
> 
> FRU Information   : Baseboard
> ---   : --
> Chassis Type  : Rack Mount Chassis
> Chassis Part Number   : N/A
> Chassis Serial Number : N/A
> Board Mfg Date: Fri Jan  7 10:30:00 2022
> Board Mfg : XX
> Board Product : Management Board wBMC
> Board Serial  : X
> Board Part Number : XX
> Board FRU ID  : 1.0
> Board Custom Data 1   : X
> Board Custom Data 2   : XX
> Product Manufacturer  : XX
> Product Name  : Yosemite V3.5 EVT2
> Product Part Number   : XX
> Product Version   : EVT2
> Product Serial: X
> Product Asset Tag : XXX
> Product FRU ID: 1.0
> Product Custom Data 1 : X
> Product Custom Data 2 : N/A
> root@bmc-oob:~# fruid-util bmc
> 
> FRU Information   : BMC
> ---   : --
> Board Mfg Date: Mon Jan 10 21:42:00 2022
> Board Mfg : XX
> Board Product : BMC Storage Module
> Board Serial  : X
> Board Part Number : XX
> Board FRU ID  : 1.0
> Board Custom Data 1   : X
> Board Custom Data 2   : XX
> Product Manufacturer  : XX
> Product Name  : Yosemite V3.5 EVT2
> Product Part Number   : XX
> Product Version   : EVT2
> Product Serial: X
> Product Asset Tag : XXX
> Product FRU ID: 1.0
> Product Custom Data 1 : X
> Product Custom Data 2 : Config A
> root@bmc-oob:~# fruid-util nic
> 
> FRU Information   : NIC
> ---   : --
> Board Mfg Date: Tue Nov  2 08:51:00 2021
> Board Mfg : 
> Board Product : Mellanox ConnectX-6 DX OCP3.0
> Board Serial  : 
> Board Part Number : X
> Board FRU ID  : FRU Ver 0.02
> Product Manufacturer  : 
> Product Name  : Mellanox ConnectX-6 DX OCP3.0
> Product Part Number   : X
> Product Version   : A9
> Product Serial: 
> Product Custom Data 3 : ConnectX-6 DX
> 
> Signed-off-by: Peter Delevoryas 
> Reviewed-by: Cédric Le Goater 
> Reviewed-by: Joel Stanley 
> ---
>  hw/arm/aspeed.c| 10 --
>  hw/arm/aspeed_eeprom.c | 78 ++
>  hw/arm/aspeed_eeprom.h | 16 +
>  hw/arm/meson.build |  1 +
>  4 files changed, 102 insertions(+), 3 deletions(-)
>  create mode 100644 hw/arm/aspeed_eeprom.c
>  create mode 100644 hw/arm/aspeed_eeprom.h
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index c929c61d582a..382965f82c38 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -14,6 +14,7 @@
>  #include "hw/arm/boot.h"
>  #include "hw/arm/aspeed.h"
>  #include "hw/arm/aspeed_soc.h"
> +#include "hw/arm/aspeed_eeprom.h"
>  #include "hw/i2c/i2c_mux_pca954x.h"
>  #include "hw/i2c/smbus_eeprom.h"
>  #include "hw/misc/pca9552.h"
> @@ -940,9 +941,12 @@ static void fby35_i2c_init(AspeedMachineState *bmc)
>  
>  at24c_eeprom_init(i2c[4], 0x51, 128 * KiB);
>  at24c_eeprom_init(i2c[6], 0x51, 128 * KiB);
> -at24c_eeprom_init(i2c[8], 0x50, 32 * KiB);
> -at24c_eeprom_init(i2c[11], 0x51, 128 * KiB);
> -at24c_eeprom_init(i2c[11], 0x54, 128 * KiB);
> +at24c_eeprom_init_rom(i2c[8], 0x50, 32 * KiB, fby35_nic_fruid,
> +  sizeof(fby35_nic_fruid));
> +at24c_eeprom_init_rom(i2c[11], 0x51, 128 * KiB, fby35_bb_fruid,
> +  sizeof(fby35_bb_fruid));
> +at24c_eeprom_init_rom(i2c[11], 0x54, 128 * KiB, fby35_bmc_fruid,
> +  sizeof(fby35_bmc_fruid));
>  
>  /*
>   * TODO: There is a multi-master i2c connection to an AST1030 MiniBMC on
> diff --git a/hw/arm/aspeed_eeprom.c b/hw/arm/aspeed_eeprom.c
> new file mode 100644
> index ..9d0700d4b709
> --- /dev/null
> +++ b/hw/arm/aspeed_eeprom.c
> @@ -0,0 +1,78 @@
> +/*
> + * Copyright (c) Meta Platforms, Inc. and affiliates.
> + *
> + * SPDX-Lic

Re: [PATCH v4 1/5] hw/arm: Extract at24c_eeprom_init helper from Aspeed and Nuvoton boards

2023-01-25 Thread Corey Minyard

On Tue, Jan 17, 2023 at 06:42:10PM -0800, Peter Delevoryas wrote:
> This helper is useful in board initialization because lets users initialize 
> and
> realize an EEPROM on an I2C bus with a single function call.

This is a good improvement.

Reviewed-by: Corey Minyard 

> 
> Signed-off-by: Peter Delevoryas 
> Reviewed-by: Cédric Le Goater 
> Reviewed-by: Joel Stanley 
> ---
>  hw/arm/aspeed.c | 10 +-
>  hw/arm/npcm7xx_boards.c | 20 +---
>  hw/nvram/eeprom_at24c.c | 12 
>  include/hw/nvram/eeprom_at24c.h | 23 +++
>  4 files changed, 41 insertions(+), 24 deletions(-)
>  create mode 100644 include/hw/nvram/eeprom_at24c.h
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 55f114ef729f..1f9799d4321e 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -17,6 +17,7 @@
>  #include "hw/i2c/i2c_mux_pca954x.h"
>  #include "hw/i2c/smbus_eeprom.h"
>  #include "hw/misc/pca9552.h"
> +#include "hw/nvram/eeprom_at24c.h"
>  #include "hw/sensor/tmp105.h"
>  #include "hw/misc/led.h"
>  #include "hw/qdev-properties.h"
> @@ -429,15 +430,6 @@ static void aspeed_machine_init(MachineState *machine)
>  arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
>  }
>  
> -static void at24c_eeprom_init(I2CBus *bus, uint8_t addr, uint32_t rsize)
> -{
> -I2CSlave *i2c_dev = i2c_slave_new("at24c-eeprom", addr);
> -DeviceState *dev = DEVICE(i2c_dev);
> -
> -qdev_prop_set_uint32(dev, "rom-size", rsize);
> -i2c_slave_realize_and_unref(i2c_dev, bus, &error_abort);
> -}
> -
>  static void palmetto_bmc_i2c_init(AspeedMachineState *bmc)
>  {
>  AspeedSoCState *soc = &bmc->soc;
> diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
> index 6bc6f5d2fe29..9b31207a06e9 100644
> --- a/hw/arm/npcm7xx_boards.c
> +++ b/hw/arm/npcm7xx_boards.c
> @@ -21,6 +21,7 @@
>  #include "hw/i2c/i2c_mux_pca954x.h"
>  #include "hw/i2c/smbus_eeprom.h"
>  #include "hw/loader.h"
> +#include "hw/nvram/eeprom_at24c.h"
>  #include "hw/qdev-core.h"
>  #include "hw/qdev-properties.h"
>  #include "qapi/error.h"
> @@ -140,17 +141,6 @@ static I2CBus *npcm7xx_i2c_get_bus(NPCM7xxState *soc, 
> uint32_t num)
>  return I2C_BUS(qdev_get_child_bus(DEVICE(&soc->smbus[num]), "i2c-bus"));
>  }
>  
> -static void at24c_eeprom_init(NPCM7xxState *soc, int bus, uint8_t addr,
> -  uint32_t rsize)
> -{
> -I2CBus *i2c_bus = npcm7xx_i2c_get_bus(soc, bus);
> -I2CSlave *i2c_dev = i2c_slave_new("at24c-eeprom", addr);
> -DeviceState *dev = DEVICE(i2c_dev);
> -
> -qdev_prop_set_uint32(dev, "rom-size", rsize);
> -i2c_slave_realize_and_unref(i2c_dev, i2c_bus, &error_abort);
> -}
> -
>  static void npcm7xx_init_pwm_splitter(NPCM7xxMachine *machine,
>NPCM7xxState *soc, const int 
> *fan_counts)
>  {
> @@ -253,8 +243,8 @@ static void quanta_gsj_i2c_init(NPCM7xxState *soc)
>  i2c_slave_create_simple(npcm7xx_i2c_get_bus(soc, 3), "tmp105", 0x5c);
>  i2c_slave_create_simple(npcm7xx_i2c_get_bus(soc, 4), "tmp105", 0x5c);
>  
> -at24c_eeprom_init(soc, 9, 0x55, 8192);
> -at24c_eeprom_init(soc, 10, 0x55, 8192);
> +at24c_eeprom_init(npcm7xx_i2c_get_bus(soc, 9), 0x55, 8192);
> +at24c_eeprom_init(npcm7xx_i2c_get_bus(soc, 10), 0x55, 8192);
>  
>  /*
>   * i2c-11:
> @@ -360,7 +350,7 @@ static void kudo_bmc_i2c_init(NPCM7xxState *soc)
>  
>  i2c_slave_create_simple(npcm7xx_i2c_get_bus(soc, 4), TYPE_PCA9548, 0x77);
>  
> -at24c_eeprom_init(soc, 4, 0x50, 8192); /* mbfru */
> +at24c_eeprom_init(npcm7xx_i2c_get_bus(soc, 4), 0x50, 8192); /* mbfru */
>  
>  i2c_mux = i2c_slave_create_simple(npcm7xx_i2c_get_bus(soc, 13),
>TYPE_PCA9548, 0x77);
> @@ -371,7 +361,7 @@ static void kudo_bmc_i2c_init(NPCM7xxState *soc)
>  i2c_slave_create_simple(pca954x_i2c_get_bus(i2c_mux, 4), "tmp105", 0x48);
>  i2c_slave_create_simple(pca954x_i2c_get_bus(i2c_mux, 5), "tmp105", 0x49);
>  
> -at24c_eeprom_init(soc, 14, 0x55, 8192); /* bmcfru */
> +at24c_eeprom_init(npcm7xx_i2c_get_bus(soc, 14), 0x55, 8192); /* bmcfru */
>  
>  /* TODO: Add remaining i2c devices. */
>  }
> diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
> index 2d4d8b952f38..98857e3626b9 100644
> --- a/hw/nvram/eeprom_at24c.c
> +++ b/hw/nvram/eeprom_at24c.c
> @@ -12,6 +12,7 @@
>  #include "qapi/error.h"
>  #include "qemu/module.h"
>  #include "hw/i2c/i2c.h"
> +#include "hw/nvram/eeprom_at24c.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/qdev-properties-system.h"
>  #include "sysemu/block-backend.h"
> @@ -128,6 +129,17 @@ int at24c_eeprom_send(I2CSlave *s, uint8_t data)
>  return 0;
>  }
>  
> +I2CSlave *at24c_eeprom_init(I2CBus *bus, uint8_t address, uint32_t rom_size)
> +{
> +I2CSlave *i2c_dev = i2c_slave_new(TYPE_AT24C_EE, address);
> +DeviceState *dev = DEVICE(i2c_dev);
> +
> +qdev_prop_set_uint32(

Re: [PATCH v4 2/5] hw/arm/aspeed: Replace aspeed_eeprom_init with at24c_eeprom_init

2023-01-25 Thread Corey Minyard

On Tue, Jan 17, 2023 at 06:42:11PM -0800, Peter Delevoryas wrote:
> aspeed_eeprom_init is an exact copy of at24c_eeprom_init, not needed.

Reviewed-by: Corey Minyard 
> 
> Signed-off-by: Peter Delevoryas 
> Reviewed-by: Cédric Le Goater 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Joel Stanley 
> ---
>  hw/arm/aspeed.c | 95 ++---
>  1 file changed, 43 insertions(+), 52 deletions(-)
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 1f9799d4321e..c929c61d582a 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -660,15 +660,6 @@ static void g220a_bmc_i2c_init(AspeedMachineState *bmc)
>eeprom_buf);
>  }
>  
> -static void aspeed_eeprom_init(I2CBus *bus, uint8_t addr, uint32_t rsize)
> -{
> -I2CSlave *i2c_dev = i2c_slave_new("at24c-eeprom", addr);
> -DeviceState *dev = DEVICE(i2c_dev);
> -
> -qdev_prop_set_uint32(dev, "rom-size", rsize);
> -i2c_slave_realize_and_unref(i2c_dev, bus, &error_abort);
> -}
> -
>  static void fp5280g2_bmc_i2c_init(AspeedMachineState *bmc)
>  {
>  AspeedSoCState *soc = &bmc->soc;
> @@ -701,7 +692,7 @@ static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
>  AspeedSoCState *soc = &bmc->soc;
>  I2CSlave *i2c_mux;
>  
> -aspeed_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 0), 0x51, 32 * KiB);
> +at24c_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 0), 0x51, 32 * KiB);
>  
>  create_pca9552(soc, 3, 0x61);
>  
> @@ -714,9 +705,9 @@ static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
>   0x4a);
>  i2c_mux = i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 4),
>"pca9546", 0x70);
> -aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 0), 0x50, 64 * KiB);
> -aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 1), 0x51, 64 * KiB);
> -aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 2), 0x52, 64 * KiB);
> +at24c_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 0), 0x50, 64 * KiB);
> +at24c_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 1), 0x51, 64 * KiB);
> +at24c_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 2), 0x52, 64 * KiB);
>  create_pca9552(soc, 4, 0x60);
>  
>  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 5), TYPE_TMP105,
> @@ -727,8 +718,8 @@ static void rainier_bmc_i2c_init(AspeedMachineState *bmc)
>  create_pca9552(soc, 5, 0x61);
>  i2c_mux = i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 5),
>"pca9546", 0x70);
> -aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 0), 0x50, 64 * KiB);
> -aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 1), 0x51, 64 * KiB);
> +at24c_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 0), 0x50, 64 * KiB);
> +at24c_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 1), 0x51, 64 * KiB);
>  
>  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 6), TYPE_TMP105,
>   0x48);
> @@ -738,10 +729,10 @@ static void rainier_bmc_i2c_init(AspeedMachineState 
> *bmc)
>   0x4b);
>  i2c_mux = i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 6),
>"pca9546", 0x70);
> -aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 0), 0x50, 64 * KiB);
> -aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 1), 0x51, 64 * KiB);
> -aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 2), 0x50, 64 * KiB);
> -aspeed_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 3), 0x51, 64 * KiB);
> +at24c_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 0), 0x50, 64 * KiB);
> +at24c_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 1), 0x51, 64 * KiB);
> +at24c_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 2), 0x50, 64 * KiB);
> +at24c_eeprom_init(pca954x_i2c_get_bus(i2c_mux, 3), 0x51, 64 * KiB);
>  
>  create_pca9552(soc, 7, 0x30);
>  create_pca9552(soc, 7, 0x31);
> @@ -754,15 +745,15 @@ static void rainier_bmc_i2c_init(AspeedMachineState 
> *bmc)
>  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 7), TYPE_TMP105,
>   0x48);
>  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 7), "max31785", 
> 0x52);
> -aspeed_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 7), 0x50, 64 * KiB);
> -aspeed_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 7), 0x51, 64 * KiB);
> +at24c_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 7), 0x50, 64 * KiB);
> +at24c_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 7), 0x51, 64 * KiB);
>  
>  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 8), TYPE_TMP105,
>   0x48);
>  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 8), TYPE_TMP105,
>   0x4a);
> -aspeed_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 8), 0x50, 64 * KiB);
> -aspeed_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 8), 0x51, 64 * KiB);
> +at24c_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 8), 0x50, 64 * KiB);
> +at24c_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 8), 0

Re: [PATCH v4 3/5] hw/nvram/eeprom_at24c: Add init_rom field and at24c_eeprom_init_rom helper

2023-01-25 Thread Corey Minyard

On Tue, Jan 17, 2023 at 06:42:12PM -0800, Peter Delevoryas wrote:
> Allows users to specify binary data to initialize an EEPROM, allowing users to
> emulate data programmed at manufacturing time.
> 
> - Added init_rom and init_rom_size attributes to TYPE_AT24C_EE
> - Added at24c_eeprom_init_rom helper function to initialize attributes
> - If -drive property is provided, it overrides init_rom data
> 
> Signed-off-by: Peter Delevoryas 
> Reviewed-by: Joel Stanley 
> ---
>  hw/nvram/eeprom_at24c.c | 37 -
>  include/hw/nvram/eeprom_at24c.h | 16 ++
>  2 files changed, 48 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
> index 98857e3626b9..f8d751fa278d 100644
> --- a/hw/nvram/eeprom_at24c.c
> +++ b/hw/nvram/eeprom_at24c.c
> @@ -50,6 +50,9 @@ struct EEPROMState {
>  uint8_t *mem;
>  
>  BlockBackend *blk;
> +
> +const uint8_t *init_rom;
> +uint32_t init_rom_size;
>  };
>  
>  static
> @@ -131,19 +134,38 @@ int at24c_eeprom_send(I2CSlave *s, uint8_t data)
>  
>  I2CSlave *at24c_eeprom_init(I2CBus *bus, uint8_t address, uint32_t rom_size)
>  {
> -I2CSlave *i2c_dev = i2c_slave_new(TYPE_AT24C_EE, address);
> -DeviceState *dev = DEVICE(i2c_dev);
> +return at24c_eeprom_init_rom(bus, address, rom_size, NULL, 0);
> +}
> +
> +I2CSlave *at24c_eeprom_init_rom(I2CBus *bus, uint8_t address, uint32_t 
> rom_size,
> +const uint8_t *init_rom, uint32_t 
> init_rom_size)
> +{
> +EEPROMState *s;
> +
> +s = AT24C_EE(qdev_new(TYPE_AT24C_EE));
> +
> +qdev_prop_set_uint8(DEVICE(s), "address", address);

Why did you switch from using i2c_slave_new()?  Using it is more
documentation and future-proofing than convenience.

Other than that, looks good to me.

Reviewed-by: Corey Minyard 

> +qdev_prop_set_uint32(DEVICE(s), "rom-size", rom_size);
>  
> -qdev_prop_set_uint32(dev, "rom-size", rom_size);
> -i2c_slave_realize_and_unref(i2c_dev, bus, &error_abort);
> +/* TODO: Model init_rom with QOM properties. */
> +s->init_rom = init_rom;
> +s->init_rom_size = init_rom_size;
>  
> -return i2c_dev;
> +i2c_slave_realize_and_unref(I2C_SLAVE(s), bus, &error_abort);
> +
> +return I2C_SLAVE(s);
>  }
>  
>  static void at24c_eeprom_realize(DeviceState *dev, Error **errp)
>  {
>  EEPROMState *ee = AT24C_EE(dev);
>  
> +if (ee->init_rom_size > ee->rsize) {
> +error_setg(errp, "%s: init rom is larger than rom: %u > %u",
> +   TYPE_AT24C_EE, ee->init_rom_size, ee->rsize);
> +return;
> +}
> +
>  if (ee->blk) {
>  int64_t len = blk_getlength(ee->blk);
>  
> @@ -163,6 +185,7 @@ static void at24c_eeprom_realize(DeviceState *dev, Error 
> **errp)
>  }
>  
>  ee->mem = g_malloc0(ee->rsize);
> +
>  }
>  
>  static
> @@ -176,6 +199,10 @@ void at24c_eeprom_reset(DeviceState *state)
>  
>  memset(ee->mem, 0, ee->rsize);
>  
> +if (ee->init_rom) {
> +memcpy(ee->mem, ee->init_rom, MIN(ee->init_rom_size, ee->rsize));
> +}
> +
>  if (ee->blk) {
>  int ret = blk_pread(ee->blk, 0, ee->rsize, ee->mem, 0);
>  
> diff --git a/include/hw/nvram/eeprom_at24c.h b/include/hw/nvram/eeprom_at24c.h
> index 196db309d451..acb9857b2add 100644
> --- a/include/hw/nvram/eeprom_at24c.h
> +++ b/include/hw/nvram/eeprom_at24c.h
> @@ -20,4 +20,20 @@
>   */
>  I2CSlave *at24c_eeprom_init(I2CBus *bus, uint8_t address, uint32_t rom_size);
>  
> +
> +/*
> + * Create and realize an AT24C EEPROM device on the heap with initial data.
> + * @bus: I2C bus to put it on
> + * @address: I2C address of the EEPROM slave when put on a bus
> + * @rom_size: size of the EEPROM
> + * @init_rom: Array of bytes to initialize EEPROM memory with
> + * @init_rom_size: Size of @init_rom, must be less than or equal to @rom_size
> + *
> + * Create the device state structure, initialize it, put it on the specified
> + * @bus, and drop the reference to it (the device is realized). Copies the 
> data
> + * from @init_rom to the beginning of the EEPROM memory buffer.
> + */
> +I2CSlave *at24c_eeprom_init_rom(I2CBus *bus, uint8_t address, uint32_t 
> rom_size,
> +const uint8_t *init_rom, uint32_t 
> init_rom_size);
> +
>  #endif
> -- 
> 2.39.0
> 
>

Re: [PATCH v4 2/3] async: Add an optional reentrancy guard to the BH API

2023-01-25 Thread Stefan Hajnoczi

On Thu, Jan 19, 2023 at 02:03:07AM -0500, Alexander Bulekov wrote:
> Devices can pass their MemoryReentrancyGuard (from their DeviceState),
> when creating new BHes. Then, the async API will toggle the guard
> before/after calling the BH call-back. This prevents bh->mmio reentrancy
> issues.
> 
> Signed-off-by: Alexander Bulekov 
> ---
>  docs/devel/multiple-iothreads.txt |  2 ++
>  include/block/aio.h   | 18 --
>  include/qemu/main-loop.h  |  7 +--
>  tests/unit/ptimer-test-stubs.c|  3 ++-
>  util/async.c  | 12 +++-
>  util/main-loop.c  |  5 +++--
>  6 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/docs/devel/multiple-iothreads.txt 
> b/docs/devel/multiple-iothreads.txt
> index 343120f2ef..e4fafed9d9 100644
> --- a/docs/devel/multiple-iothreads.txt
> +++ b/docs/devel/multiple-iothreads.txt
> @@ -61,6 +61,7 @@ There are several old APIs that use the main loop 
> AioContext:
>   * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
>   * LEGACY timer_new_ms() - create a timer
>   * LEGACY qemu_bh_new() - create a BH
> + * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
>   * LEGACY qemu_aio_wait() - run an event loop iteration
>  
>  Since they implicitly work on the main loop they cannot be used in code that
> @@ -72,6 +73,7 @@ Instead, use the AioContext functions directly (see 
> include/block/aio.h):
>   * aio_set_event_notifier() - monitor an event notifier
>   * aio_timer_new() - create a timer
>   * aio_bh_new() - create a BH
> + * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
>   * aio_poll() - run an event loop iteration
>  
>  The AioContext can be obtained from the IOThread using
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 0f65a3cc9e..94d661ff7e 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -23,6 +23,8 @@
>  #include "qemu/thread.h"
>  #include "qemu/timer.h"
>  #include "block/graph-lock.h"
> +#include "hw/qdev-core.h"
> +
>  
>  typedef struct BlockAIOCB BlockAIOCB;
>  typedef void BlockCompletionFunc(void *opaque, int ret);
> @@ -332,9 +334,11 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, 
> QEMUBHFunc *cb, void *opaque,
>   * is opaque and must be allocated prior to its use.
>   *
>   * @name: A human-readable identifier for debugging purposes.
> + * @reentrancy_guard: A guard set when entering a cb to prevent
> + * device-reentrancy issues
>   */
>  QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> -const char *name);
> +const char *name, MemReentrancyGuard 
> *reentrancy_guard);
>  
>  /**
>   * aio_bh_new: Allocate a new bottom half structure
> @@ -343,7 +347,17 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, 
> void *opaque,
>   * string.
>   */
>  #define aio_bh_new(ctx, cb, opaque) \
> -aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
> +aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), NULL)
> +
> +/**
> + * aio_bh_new_guarded: Allocate a new bottom half structure with a
> + * reentrancy_guard
> + *
> + * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
> + * string.
> + */
> +#define aio_bh_new_guarded(ctx, cb, opaque, guard) \
> +aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), guard)
>  
>  /**
>   * aio_notify: Force processing of pending events.
> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> index c25f390696..84d1ce57f0 100644
> --- a/include/qemu/main-loop.h
> +++ b/include/qemu/main-loop.h
> @@ -389,9 +389,12 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int 
> ms);
>  
>  void qemu_fd_register(int fd);
>  
> +#define qemu_bh_new_guarded(cb, opaque, guard) \
> +qemu_bh_new_full((cb), (opaque), (stringify(cb)), guard)
>  #define qemu_bh_new(cb, opaque) \
> -qemu_bh_new_full((cb), (opaque), (stringify(cb)))
> -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
> +qemu_bh_new_full((cb), (opaque), (stringify(cb)), NULL)
> +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
> + MemReentrancyGuard *reentrancy_guard);
>  void qemu_bh_schedule_idle(QEMUBH *bh);
>  
>  enum {
> diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
> index f5e75a96b6..24d5413f9d 100644
> --- a/tests/unit/ptimer-test-stubs.c
> +++ b/tests/unit/ptimer-test-stubs.c
> @@ -107,7 +107,8 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, 
> int attr_mask)
>  return deadline;
>  }
>  
> -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
> +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
> + MemReentrancyGuard *reentrancy_guard)
>  {
>  QEMUBH *bh = g_new(QEMUBH, 1);
>  
> diff --git a/util/async.c b/util/async.c
> index 14d63b3091..08924c3212 100

Re: [PATCH v4 07/36] tcg: Add TCG_CALL_RET_BY_VEC

2023-01-25 Thread Alex Bennée



Richard Henderson  writes:

> This will be used by _WIN64 to return i128.  Not yet used,
> because allocation is not yet enabled.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v4 1/3] memory: prevent dma-reentracy issues

2023-01-25 Thread Stefan Hajnoczi

On Thu, Jan 19, 2023 at 02:03:06AM -0500, Alexander Bulekov wrote:
> Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> This flag is set/checked prior to calling a device's MemoryRegion
> handlers, and set when device code initiates DMA.  The purpose of this
> flag is to prevent two types of DMA-based reentrancy issues:
> 
> 1.) mmio -> dma -> mmio case
> 2.) bh -> dma write -> mmio case
> 
> These issues have led to problems such as stack-exhaustion and
> use-after-frees.
> 
> Summary of the problem from Peter Maydell:
> https://lore.kernel.org/qemu-devel/cafeaca_23vc7he3iam-jva6w38lk4hjowae5kcknhprd5fp...@mail.gmail.com
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> Signed-off-by: Alexander Bulekov 
> ---
>  include/hw/qdev-core.h |  7 +++
>  softmmu/memory.c   | 15 +++
>  softmmu/trace-events   |  1 +
>  3 files changed, 23 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 06/36] tcg: Introduce tcg_target_call_oarg_reg

2023-01-25 Thread Alex Bennée



Richard Henderson  writes:

> Replace the flat array tcg_target_call_oarg_regs[] with
> a function call including the TCGCallReturnKind.
>
> Reviewed-by: Daniel Henrique Barboza 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/tcg.c|  9 ++---
>  tcg/aarch64/tcg-target.c.inc | 10 +++---
>  tcg/arm/tcg-target.c.inc | 10 +++---
>  tcg/i386/tcg-target.c.inc| 16 ++--
>  tcg/loongarch64/tcg-target.c.inc | 10 ++
>  tcg/mips/tcg-target.c.inc| 10 ++
>  tcg/ppc/tcg-target.c.inc | 10 ++
>  tcg/riscv/tcg-target.c.inc   | 10 ++
>  tcg/s390x/tcg-target.c.inc   |  9 ++---
>  tcg/sparc64/tcg-target.c.inc | 12 ++--
>  tcg/tci/tcg-target.c.inc | 12 ++--
>  11 files changed, 72 insertions(+), 46 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 93d1331f93..092cdaf422 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -148,6 +148,7 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, 
> TCGArg val,
>  TCGReg base, intptr_t ofs);
>  static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target,
>   const TCGHelperInfo *info);
> +static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot);
>  static bool tcg_target_const_match(int64_t val, TCGType type, int ct);
>  #ifdef TCG_TARGET_NEED_LDST_LABELS
>  static int tcg_out_ldst_finalize(TCGContext *s);
> @@ -719,14 +720,16 @@ static void init_call_layout(TCGHelperInfo *info)
>  case dh_typecode_s64:
>  info->nr_out = 64 / TCG_TARGET_REG_BITS;
>  info->out_kind = TCG_CALL_RET_NORMAL;
> -assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
> +/* Query the last register now to trigger any assert early. */
> +tcg_target_call_oarg_reg(info->out_kind, info->nr_out - 1);
>  break;
>  case dh_typecode_i128:
>  info->nr_out = 128 / TCG_TARGET_REG_BITS;
>  info->out_kind = TCG_CALL_RET_NORMAL; /* TODO */
>  switch (/* TODO */ TCG_CALL_RET_NORMAL) {
>  case TCG_CALL_RET_NORMAL:
> -assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
> +/* Query the last register now to trigger any assert early. */
> +tcg_target_call_oarg_reg(info->out_kind, info->nr_out - 1);
>  break;
>  case TCG_CALL_RET_BY_REF:
>  /*
> @@ -4563,7 +4566,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>  case TCG_CALL_RET_NORMAL:
>  for (i = 0; i < nb_oargs; i++) {
>  TCGTemp *ts = arg_temp(op->args[i]);
> -TCGReg reg = tcg_target_call_oarg_regs[i];
> +TCGReg reg = tcg_target_call_oarg_reg(TCG_CALL_RET_NORMAL, i);
>  
>  /* ENV should not be modified.  */
>  tcg_debug_assert(!temp_readonly(ts));
> diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
> index 2279a14c11..dfe569dd8c 100644
> --- a/tcg/aarch64/tcg-target.c.inc
> +++ b/tcg/aarch64/tcg-target.c.inc
> @@ -63,9 +63,13 @@ static const int tcg_target_call_iarg_regs[8] = {
>  TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
>  TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7
>  };
> -static const int tcg_target_call_oarg_regs[1] = {
> -TCG_REG_X0
> -};
> +
> +static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
> +{
> +tcg_debug_assert(kind == TCG_CALL_RET_NORMAL);
> +tcg_debug_assert(slot >= 0 && slot <= 1);
> +return TCG_REG_X0 + slot;
> +}
>  
>  #define TCG_REG_TMP TCG_REG_X30
>  #define TCG_VEC_TMP TCG_REG_V31
> diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
> index 8b24481d8c..4e1d06dcd8 100644
> --- a/tcg/arm/tcg-target.c.inc
> +++ b/tcg/arm/tcg-target.c.inc
> @@ -79,9 +79,13 @@ static const int tcg_target_reg_alloc_order[] = {
>  static const int tcg_target_call_iarg_regs[4] = {
>  TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R3
>  };
> -static const int tcg_target_call_oarg_regs[2] = {
> -TCG_REG_R0, TCG_REG_R1
> -};
> +
> +static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
> +{
> +tcg_debug_assert(kind == TCG_CALL_RET_NORMAL);
> +tcg_debug_assert(slot >= 0 && slot <= 3);
> +return TCG_REG_R0 + slot;
> +}

So this is now returning allocations of TCG_REG_R0 to TCG_REG_R3? Do we
have to take care to get things right if slot is ever bigger w.r.t.
tcg_target_reg_alloc_order?

>  
>  #define TCG_REG_TMP  TCG_REG_R12
>  #define TCG_VEC_TMP  TCG_REG_Q15
> diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
> index 6a021dda8b..ab6881a4f3 100644
> --- a/tcg/i386/tcg-target.c.inc
> +++ b/tcg/i386/tcg-target.c.inc
> @@ -109,12 +109,16 @@ static const int tcg_target_call_iarg_regs[] = {
>  #endif
>  };
>  
> -static const int tcg_target_call_oarg_regs[] = {
> -TCG_REG_EAX,
> -#if TCG_TARGET_REG_BITS == 32
> -TCG_REG_EDX
> -#endif
> -};
> +st

Re: [PATCH v4 3/5] hw/nvram/eeprom_at24c: Add init_rom field and at24c_eeprom_init_rom helper

2023-01-25 Thread Ninad S Palsule

Signed-off-by: Peter Delevoryas pe...@pjd.dev
Reviewed-by: Joel Stanley j...@jms.id.au

Tested-by: Ninad Palsule ninadpals...@us.ibm.com

Hi Peter,
I applied your patches and made sure that different EEPROM images can be loaded 
from
appropriate image files and it is working as expected.

# Used following command to invoke the qemu.
qemu-system-arm -M rainier-bmc -nographic \
  -kernel fitImage-linux.bin \
  -dtb aspeed-bmc-ibm-rainier.dtb \
  -initrd obmc-phosphor-initramfs.rootfs.cpio.xz \
  -drive file=obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2 \
  -append "rootwait console=ttyS4,115200n8 root=PARTLABEL=rofs-a" \
  -device at24c-eeprom,bus=aspeed.i2c.bus.0,address=0x51,drive=a,rom-size=32768 
-drive file=tpm.eeprom.bin,format=raw,if=none,id=a \
  -device at24c-eeprom,bus=aspeed.i2c.bus.7,address=0x50,drive=b,rom-size=65536 
-drive file=oppanel.eeprom.bin,format=raw,if=none,id=b \
  -device at24c-eeprom,bus=aspeed.i2c.bus.7,address=0x51,drive=c,rom-size=65536 
-drive file=lcd.eeprom.bin,format=raw,if=none,id=c \
  -device at24c-eeprom,bus=aspeed.i2c.bus.8,address=0x50,drive=d,rom-size=65536 
-drive file=baseboard.eeprom.bin,format=raw,if=none,id=d \
  -device at24c-eeprom,bus=aspeed.i2c.bus.8,address=0x51,drive=e,rom-size=65536 
-drive file=bmc.eeprom.bin,format=raw,if=none,id=e \
  -device 
at24c-eeprom,bus=aspeed.i2c.bus.9,address=0x50,drive=f,rom-size=131072 -drive 
file=vrm.eeprom.bin,format=raw,if=none,id=f \
  -device 
at24c-eeprom,bus=aspeed.i2c.bus.10,address=0x50,drive=g,rom-size=131072 -drive 
file=vrm.eeprom.bin,format=raw,if=none,id=g \
  -device 
at24c-eeprom,bus=aspeed.i2c.bus.13,address=0x50,drive=h,rom-size=65536 -drive 
file=nvme.eeprom.bin,format=raw,if=none,id=h \
  -device 
at24c-eeprom,bus=aspeed.i2c.bus.14,address=0x50,drive=i,rom-size=65536 -drive 
file=nvme.eeprom.bin,format=raw,if=none,id=i \
  -device 
at24c-eeprom,bus=aspeed.i2c.bus.15,address=0x50,drive=j,rom-size=65536 -drive 
file=nvme.eeprom.bin,format=raw,if=none,id=j

Re: [XEN PATCH v2 0/3] Configure qemu upstream correctly by default for igd-passthru

2023-01-25 Thread Chuck Zmudzinski

On 1/25/2023 6:37 AM, Anthony PERARD wrote:
> On Tue, Jan 10, 2023 at 02:32:01AM -0500, Chuck Zmudzinski wrote:
> > I call attention to the commit message of the first patch which points
> > out that using the "pc" machine and adding the xen platform device on
> > the qemu upstream command line is not functionally equivalent to using
> > the "xenfv" machine which automatically adds the xen platform device
> > earlier in the guest creation process. As a result, there is a noticeable
> > reduction in the performance of the guest during startup with the "pc"
> > machne type even if the xen platform device is added via the qemu
> > command line options, although eventually both Linux and Windows guests
> > perform equally well once the guest operating system is fully loaded.
>
> There shouldn't be a difference between "xenfv" machine or using the
> "pc" machine while adding the "xen-platform" device, at least with
> regards to access to disk or network.
>
> The first patch of the series is using the "pc" machine without any
> "xen-platform" device, so we can't compare startup performance based on
> that.
>
> > Specifically, startup time is longer and neither the grub vga drivers
> > nor the windows vga drivers in early startup perform as well when the
> > xen platform device is added via the qemu command line instead of being
> > added immediately after the other emulated i440fx pci devices when the
> > "xenfv" machine type is used.
>
> The "xen-platform" device is mostly an hint to a guest that they can use
> pv-disk and pv-network devices. I don't think it would change anything
> with regards to graphics.
>
> > For example, when using the "pc" machine, which adds the xen platform
> > device using a command line option, the Linux guest could not display
> > the grub boot menu at the native resolution of the monitor, but with the
> > "xenfv" machine, the grub menu is displayed at the full 1920x1080
> > native resolution of the monitor for testing. So improved startup
> > performance is an advantage for the patch for qemu.
>
> I've just found out that when doing IGD passthrough, both machine
> "xenfv" and "pc" are much more different than I though ... :-(
> pc_xen_hvm_init_pci() in QEMU changes the pci-host device, which in
> turns copy some informations from the real host bridge.
> I guess this new host bridge help when the firmware setup the graphic
> for grub.
>
> > I also call attention to the last point of the commit message of the
> > second patch and the comments for reviewers section of the second patch.
> > This approach, as opposed to fixing this in qemu upstream, makes
> > maintaining the code in libxl__build_device_model_args_new more
> > difficult and therefore increases the chances of problems caused by
> > coding errors and typos for users of libxl. So that is another advantage
> > of the patch for qemu.
>
> We would just needs to use a different approach in libxl when generating
> the command line. We could probably avoid duplications. I was hopping to
> have patch series for libxl that would change the machine used to start
> using "pc" instead of "xenfv" for all configurations, but based on the
> point above (IGD specific change to "xenfv"), then I guess we can't
> really do anything from libxl to fix IGD passthrough.
>
> > OTOH, fixing this in qemu causes newer qemu versions to behave
> > differently than previous versions of qemu, which the qemu community
> > does not like, although they seem OK with the other patch since it only
> > affects qemu "xenfv" machine types, but they do not want the patch to
> > affect toolstacks like libvirt that do not use qemu upstream's
> > autoconfiguration options as much as libxl does, and, of course, libvirt
> > can manage qemu "xenfv" machines so exising "xenfv" guests configured
> > manually by libvirt could be adversely affected by the patch to qemu,
> > but only if those same guests are also configured for igd-passthrough,
> > which is likely a very small number of possibly affected libvirt users
> > of qemu.
> > 
> > A year or two ago I tried to configure guests for pci passthrough on xen
> > using libvirt's tool to convert a libxl xl.cfg file to libvirt xml. It
> > could not convert an xl.cfg file with a configuration item
> > pci = [ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...] for pci passthrough.
> > So it is unlikely there are any users out there using libvirt to
> > configure xen hvm guests for igd passthrough on xen, and those are the
> > only users that could be adversely affected by the simpler patch to qemu
> > to fix this.
>
> FYI, libvirt should be using libxl to create guest, I don't think there
> is another way for libvirt to create xen guests.

I have success using libvirt as a frontend to libxl for most of my xen guests,
except for HVM guests that have pci devices passed through because the
tool to convert an xl.cfg file to libvirt xml was not able to convert the
pci = ... line in xl.cfg. Perhaps newer versions of libvirt can do it (I ha

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-25 Thread Stefan Hajnoczi

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
> Now any vhost-user-fs device makes VM unmigratable, that also prevents
> qemu update without stopping the VM. In most cases that makes sense
> because qemu has no way to transfer FUSE session state.
> 
> But we can give an option to orchestrator to override this if it can
> guarantee that state will be preserved (e.g. it uses migration to
> update qemu and dst will run on the same host as src and use the same
> socket endpoints).
> 
> This patch keeps default behavior that prevents migration with such devices
> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> 
> Signed-off-by: Anton Kuchin 
> ---
>  hw/virtio/vhost-user-fs.c | 25 -
>  qapi/migration.json   |  7 ++-
>  2 files changed, 30 insertions(+), 2 deletions(-)

Hi Anton,
Sorry for holding up your work with the discussions that we had. I still
feel it's important to agree on command-line and/or vhost-user protocol
changes that will be able to support non-migratable, stateless
migration/reconnect, and stateful migration vhost-user-fs back-ends. All
three will exist.

As a next step, could you share your code that implements the QEMU side
of stateless migration?

I think that will make it clearer whether a command-line option
(migration capability or per-device) is sufficient or whether the
vhost-user protocol needs to be extended.

If the vhost-user protocol is extended then maybe no user-visible
changes are necessary. QEMU will know if the vhost-user-fs backend
supports migration and which type of migration. It can block migration
in cases where it's not possible.

Thanks,
Stefan

signature.asc
Description: PGP signature

Re: [PATCH v4 3/5] hw/nvram/eeprom_at24c: Add init_rom field and at24c_eeprom_init_rom helper

2023-01-25 Thread Peter Delevoryas

On Wed, Jan 25, 2023 at 04:53:20PM +, Ninad S Palsule wrote:
> Signed-off-by: Peter Delevoryas pe...@pjd.dev
> Reviewed-by: Joel Stanley j...@jms.id.au
> 
> Tested-by: Ninad Palsule 
> ninadpals...@us.ibm.com
> 
> Hi Peter,
> I applied your patches and made sure that different EEPROM images can be 
> loaded from
> appropriate image files and it is working as expected.

Thanks Ninad, this is a good regression test to make sure I didn't break the
existing drive proprerty.

- Peter

> 
> # Used following command to invoke the qemu.
> qemu-system-arm -M rainier-bmc -nographic \
>   -kernel fitImage-linux.bin \
>   -dtb aspeed-bmc-ibm-rainier.dtb \
>   -initrd obmc-phosphor-initramfs.rootfs.cpio.xz \
>   -drive file=obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2 \
>   -append "rootwait console=ttyS4,115200n8 root=PARTLABEL=rofs-a" \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.0,address=0x51,drive=a,rom-size=32768 -drive 
> file=tpm.eeprom.bin,format=raw,if=none,id=a \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.7,address=0x50,drive=b,rom-size=65536 -drive 
> file=oppanel.eeprom.bin,format=raw,if=none,id=b \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.7,address=0x51,drive=c,rom-size=65536 -drive 
> file=lcd.eeprom.bin,format=raw,if=none,id=c \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.8,address=0x50,drive=d,rom-size=65536 -drive 
> file=baseboard.eeprom.bin,format=raw,if=none,id=d \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.8,address=0x51,drive=e,rom-size=65536 -drive 
> file=bmc.eeprom.bin,format=raw,if=none,id=e \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.9,address=0x50,drive=f,rom-size=131072 -drive 
> file=vrm.eeprom.bin,format=raw,if=none,id=f \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.10,address=0x50,drive=g,rom-size=131072 
> -drive file=vrm.eeprom.bin,format=raw,if=none,id=g \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.13,address=0x50,drive=h,rom-size=65536 -drive 
> file=nvme.eeprom.bin,format=raw,if=none,id=h \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.14,address=0x50,drive=i,rom-size=65536 -drive 
> file=nvme.eeprom.bin,format=raw,if=none,id=i \
>   -device 
> at24c-eeprom,bus=aspeed.i2c.bus.15,address=0x50,drive=j,rom-size=65536 -drive 
> file=nvme.eeprom.bin,format=raw,if=none,id=j

Re: [PATCH 00/40] x86: fixing and cleaning up ACPI PCI code part 3

2023-01-25 Thread Michael S. Tsirkin

On Wed, Jan 25, 2023 at 06:02:09PM +0100, Igor Mammedov wrote:
> On Thu, 12 Jan 2023 15:02:32 +0100
> Igor Mammedov  wrote:
> 
> ping,
> 
> Michael can you take a look at this series and queue it
> if it looks acceptable to you.

Yes it's tagged already. RSN.

> 
> PS:
> I'm waiting on this being merged, to send acpi-index
> support on non-hotpluggable ports (due to heavy dependency
> on this refactoring). After which I plan to post
> series that allows resources reallocation on bridges.
> (both should be doable in 8.0 timeframe)
> 
> (the rest: acpi-index for pxb and other targets/machines,
> pci-hostbridge cleanups will be after that but probably
> won't make into 8.0)
> 
> > Series continues refactoring on top of [1].
> > 
> > It focuses on isolating creation of non hotplug and
> > hotplug slot descriptions. In state it's posted, it's
> > not complete, but pretty close to it. The series contains
> > stable patches from refactoring and is already too large
> > to keep it to myself, hence I'm publishing it fro review.
> > 
> > It will be followed by separate series on top of this one,
> > that will finish concrete feature[s] in following order:
> >1 introduce acpi-index support for non-hotpluggable PCI
> >  devices (i.e. NICs directly attached to Q35 host-bridge)
> >  making acpi-index support complete within pc/q35 machines.
> >2 let guest OS to re-arrange bridge resources when ACPI PCI
> >  hotplug is enabled. (should fix insuficient resources issue
> >  during PCI hotplug)
> >3 finish isolating hotplug code from non-hotplug one,
> >  which should allow to re-use non-hotplug parts in other
> >  machines (arm/virt and microvm) and bring acpi-index
> >  support there.
> > 
> > PS:
> > Refactoring also adds testing for various corner cases
> > and fixes (present/latent/imagined) bugs where they were
> > spotted.
> > 
> > 1) "[PATCH 00/11] x86: clean up ACPI PCI code part 2"
> >https://www.mail-archive.com/qemu-devel@nongnu.org/msg915493.html
> > 
> > CC: "Michael S. Tsirkin" 
> > CC: Ani Sinha 
> > 
> > Igor Mammedov (40):
> >   tests: qtest: print device_add error before failing test
> >   tests: acpi: cleanup arguments to make them more readable
> >   tests: acpi: whitelist DSDT blobs for tests that use pci-bridges
> >   tests: acpi: extend pcihp with nested bridges
> >   tests: acpi: update expected blobs
> >   tests: acpi: cleanup use_uefi argument usage
> >   pci_bridge: remove whitespace
> >   x86: acpi: pcihp: clean up duplicate bridge_in_acpi assignment
> >   pci: acpi hotplug: rename x-native-hotplug to
> > x-do-not-expose-native-hotplug-cap
> >   pcihp: piix4: do not call acpi_pcihp_reset() when ACPI PCI hotplug is
> > disabled
> >   pci: acpihp: assign BSEL only to coldplugged bridges
> >   x86: pcihp: fix invalid AML PCNT calls to hotplugged bridges
> >   tests: boot_sector_test: avoid crashing if status is not available yet
> >   tests: acpi: extend bridge tests with hotplugged bridges
> >   tests: boot_sector_test(): make it multi-shot
> >   tests: acpi: add reboot cycle to bridge test
> >   tests: acpi: whitelist DSDT before refactoring acpi based PCI hotplug
> > machinery
> >   pcihp: drop pcihp_bridge_en dependency when composing PCNT method
> >   tests: acpi: update expected blobs
> >   tests: acpi: whitelist DSDT before refactoring acpi based PCI hotplug
> > machinery
> >   pcihp: compose PCNT callchain right before its user _GPE._E01
> >   pcihp: do not put empty PCNT in DSDT
> >   tests: acpi: update expected blobs
> >   whitelist DSDT before adding endpoint devices to bridge testcases
> >   tests: acpi: add endpoint devices to bridges
> >   tests: acpi: update expected blobs
> >   x86: pcihp: acpi: prepare slot ignore rule to work with self
> > describing bridges
> >   pci: acpi: wire up AcpiDevAmlIf interface to generic bridge
> >   pcihp: make bridge describe itself using
> > AcpiDevAmlIfClass:build_dev_aml
> >   pci: make sure pci_bus_is_express() won't error out with  "discards
> > ‘const’ qualifier"
> >   pcihp: isolate rule whether slot should be described in DSDT
> >   tests: acpi: whitelist DSDT before decoupling PCI hotplug code from
> > basic slots description
> >   pcihp: acpi: decouple hotplug and generic slots description
> >   tests: acpi: update expected blobs
> >   tests: acpi: whitelist DSDT blobs before removing dynamic _DSM on
> > coldplugged bridges
> >   pcihp: acpi: ignore coldplugged bridges when composing hotpluggable
> > slots
> >   tests: acpi: update expected blobs
> >   tests: acpi: whitelist DSDT before moving non-hotpluggble slots
> > description from hotplug path
> >   pcihp: generate populated non-hotpluggble slot descriptions on
> > non-hotplug path
> >   tests: acpi: update expected blobs
> > 
> >  include/hw/acpi/pci.h |   4 +
> >  include/hw/pci/pci.h  |   2 +-
> >  include/hw/pci/pcie_port.h

[PATCH] docs/s390x/pcidevices: document pci devices on s390x

2023-01-25 Thread Sebastian Mitterle

Add some documentation about the zpci device and how
to use it with pci devices on s390x.

Used source: Cornelia Huck's blog post
https://people.redhat.com/~cohuck/2018/02/19/notes-on-pci-on-s390x.html

Signed-off-by: Sebastian Mitterle 
---
 docs/system/s390x/pcidevices.rst | 34 
 docs/system/target-s390x.rst |  1 +
 2 files changed, 35 insertions(+)
 create mode 100644 docs/system/s390x/pcidevices.rst

diff --git a/docs/system/s390x/pcidevices.rst b/docs/system/s390x/pcidevices.rst
new file mode 100644
index 00..2086c1ca29
--- /dev/null
+++ b/docs/system/s390x/pcidevices.rst
@@ -0,0 +1,34 @@
+PCI devices on s390x
+
+
+PCI devices on s390x work differently than on other architectures.
+
+To start with, using a PCI device requires the additional ``zpci`` device. For 
example,
+in order to pass a PCI device ``:00:00.0`` through you'd specify::
+
+ qemu-system-s390x ... \
+   -device zpci,uid=1,fid=0,target=hostdev0,id=zpci1 \ 
+   -device vfio-pci,host=:00:00.0,id=hostdev0
+
+Here, the zpci device is joined with the PCI device via the ``target`` 
property.
+
+Note that we don't set bus, slot or function here for the guest as is common 
in other
+PCI implementations. Topology information is not available on s390x. Instead, 
``uid``
+and ``fid`` determine how the device is presented to the guest operating 
system.
+
+In case of Linux, ``uid`` will be used in the ``domain`` part of the PCI 
identifier, and
+``fid`` identifies the physical slot, i.e.::
+
+ qemu-system-s390x ... \
+   -device zpci,uid=7,fid=8,target=hostdev0,id=zpci1 \
+   ...
+
+will be presented in the guest as::
+
+ # lspci -v
+ 0007:00:00.0 ...
+ Physical Slot: 0008
+ ...
+
+Finally, note that you might have to enable the ``zpci`` feature in the cpu 
model in oder to use
+it.
diff --git a/docs/system/target-s390x.rst b/docs/system/target-s390x.rst
index c636f64113..fe8251bdef 100644
--- a/docs/system/target-s390x.rst
+++ b/docs/system/target-s390x.rst
@@ -33,3 +33,4 @@ Architectural features
 .. toctree::
s390x/bootdevices
s390x/protvirt
+   s390x/pcidevices
-- 
2.37.3

Re: [PATCH 00/40] x86: fixing and cleaning up ACPI PCI code part 3

2023-01-25 Thread Igor Mammedov

On Thu, 12 Jan 2023 15:02:32 +0100
Igor Mammedov  wrote:

ping,

Michael can you take a look at this series and queue it
if it looks acceptable to you.


PS:
I'm waiting on this being merged, to send acpi-index
support on non-hotpluggable ports (due to heavy dependency
on this refactoring). After which I plan to post
series that allows resources reallocation on bridges.
(both should be doable in 8.0 timeframe)

(the rest: acpi-index for pxb and other targets/machines,
pci-hostbridge cleanups will be after that but probably
won't make into 8.0)

> Series continues refactoring on top of [1].
> 
> It focuses on isolating creation of non hotplug and
> hotplug slot descriptions. In state it's posted, it's
> not complete, but pretty close to it. The series contains
> stable patches from refactoring and is already too large
> to keep it to myself, hence I'm publishing it fro review.
> 
> It will be followed by separate series on top of this one,
> that will finish concrete feature[s] in following order:
>1 introduce acpi-index support for non-hotpluggable PCI
>  devices (i.e. NICs directly attached to Q35 host-bridge)
>  making acpi-index support complete within pc/q35 machines.
>2 let guest OS to re-arrange bridge resources when ACPI PCI
>  hotplug is enabled. (should fix insuficient resources issue
>  during PCI hotplug)
>3 finish isolating hotplug code from non-hotplug one,
>  which should allow to re-use non-hotplug parts in other
>  machines (arm/virt and microvm) and bring acpi-index
>  support there.
> 
> PS:
> Refactoring also adds testing for various corner cases
> and fixes (present/latent/imagined) bugs where they were
> spotted.
> 
> 1) "[PATCH 00/11] x86: clean up ACPI PCI code part 2"
>https://www.mail-archive.com/qemu-devel@nongnu.org/msg915493.html
> 
> CC: "Michael S. Tsirkin" 
> CC: Ani Sinha 
> 
> Igor Mammedov (40):
>   tests: qtest: print device_add error before failing test
>   tests: acpi: cleanup arguments to make them more readable
>   tests: acpi: whitelist DSDT blobs for tests that use pci-bridges
>   tests: acpi: extend pcihp with nested bridges
>   tests: acpi: update expected blobs
>   tests: acpi: cleanup use_uefi argument usage
>   pci_bridge: remove whitespace
>   x86: acpi: pcihp: clean up duplicate bridge_in_acpi assignment
>   pci: acpi hotplug: rename x-native-hotplug to
> x-do-not-expose-native-hotplug-cap
>   pcihp: piix4: do not call acpi_pcihp_reset() when ACPI PCI hotplug is
> disabled
>   pci: acpihp: assign BSEL only to coldplugged bridges
>   x86: pcihp: fix invalid AML PCNT calls to hotplugged bridges
>   tests: boot_sector_test: avoid crashing if status is not available yet
>   tests: acpi: extend bridge tests with hotplugged bridges
>   tests: boot_sector_test(): make it multi-shot
>   tests: acpi: add reboot cycle to bridge test
>   tests: acpi: whitelist DSDT before refactoring acpi based PCI hotplug
> machinery
>   pcihp: drop pcihp_bridge_en dependency when composing PCNT method
>   tests: acpi: update expected blobs
>   tests: acpi: whitelist DSDT before refactoring acpi based PCI hotplug
> machinery
>   pcihp: compose PCNT callchain right before its user _GPE._E01
>   pcihp: do not put empty PCNT in DSDT
>   tests: acpi: update expected blobs
>   whitelist DSDT before adding endpoint devices to bridge testcases
>   tests: acpi: add endpoint devices to bridges
>   tests: acpi: update expected blobs
>   x86: pcihp: acpi: prepare slot ignore rule to work with self
> describing bridges
>   pci: acpi: wire up AcpiDevAmlIf interface to generic bridge
>   pcihp: make bridge describe itself using
> AcpiDevAmlIfClass:build_dev_aml
>   pci: make sure pci_bus_is_express() won't error out with  "discards
> ‘const’ qualifier"
>   pcihp: isolate rule whether slot should be described in DSDT
>   tests: acpi: whitelist DSDT before decoupling PCI hotplug code from
> basic slots description
>   pcihp: acpi: decouple hotplug and generic slots description
>   tests: acpi: update expected blobs
>   tests: acpi: whitelist DSDT blobs before removing dynamic _DSM on
> coldplugged bridges
>   pcihp: acpi: ignore coldplugged bridges when composing hotpluggable
> slots
>   tests: acpi: update expected blobs
>   tests: acpi: whitelist DSDT before moving non-hotpluggble slots
> description from hotplug path
>   pcihp: generate populated non-hotpluggble slot descriptions on
> non-hotplug path
>   tests: acpi: update expected blobs
> 
>  include/hw/acpi/pci.h |   4 +
>  include/hw/pci/pci.h  |   2 +-
>  include/hw/pci/pcie_port.h|   3 +-
>  hw/acpi/Kconfig   |   4 +
>  hw/acpi/meson.build   |   4 +-
>  hw/acpi/pci-bridge-stub.c |  20 ++
>  hw/acpi/pci-bridge.c  |  27 ++
>  hw/acpi/pcihp.c   |  35 ++-
>  hw/acpi/

Re: [PATCH 0/7] ACPI controller cleanup

2023-01-25 Thread Igor Mammedov

On Sun, 22 Jan 2023 18:07:17 +0100
Bernhard Beschow  wrote:

> This series brings the PIIX4 PM device closer to reality and resolves some
> redundant code along the way.

I'm done with this series review

> 
> Testing done:
> - `make check`
> - Starting a live CD under pc and q35 machines and check that the GPE accesses
>   are traced
> 
> Bernhard Beschow (7):
>   hw/acpi/{ich9,piix4}: Reuse existing attributes for QOM properties
>   hw/acpi/ich9: Remove unneeded assignments
>   hw/acpi/{ich9,piix4}: Resolve redundant io_base address attributes
>   hw/acpi/ich9: Use ICH9_PMIO_GPE0_STS just once
>   hw/acpi/piix4: Fix offset of GPE0 registers
>   hw/acpi: Trace GPE access in all device models, not just PIIX4
>   hw/acpi/core: Trace enable and status registers of GPE separately
> 
>  include/hw/acpi/ich9.h  |  1 -
>  include/hw/acpi/piix4.h |  3 +--
>  hw/acpi/core.c  |  9 +
>  hw/acpi/ich9.c  | 26 --
>  hw/acpi/piix4.c | 31 ---
>  hw/acpi/trace-events| 10 ++
>  6 files changed, 44 insertions(+), 36 deletions(-)
>

1 2 3 >

1 - 100 of 202 matches

Mail list logo