> On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <[email protected]> wrote:
>
> Hi Ani and Paolo,
>
> We have tested the code by applying both the original commit
> (98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit
> 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.
> However, the issue persists. We've conducted GDB debugging that shows the
> hang is occurring in a different location than what the fix addresses.
>
> Since the original patch is breaking KVM guest bringup completely on ppc64le,
> and the fix patch does not resolve the issue, given the severity of this
> regression (complete KVM breakage on ppc64le), we should either find a quick
> fix or consider reverting the patch until a proper solution can be identified.
Based on what you just described, it does not seem like the issue is related to
98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you revert this patch in
your local tree, can you confirm that your issue gets fixed?
>
> Analysis:
> 1. This is not a confidential guest. This is a regular KVM guest running on
> ppc64le.
> 2. The execution flow shows that qemu_system_reset() completes successfully
> and never enters the code path at line 529-543
This is what I expected and therefore, no code related to coco guest rebuilding
is getting executed. Your issue seems to be somewhere else.
> 3. The hang occurs later in qemu_default_main() at system/main.c:49, after
> calling bql_lock()
> 4. The ppc KVM guest boots fine with the previous commit -
> df8df3cb6b743372ebb335bd8404bc3d748da350
> 5. This suggests the issue is not with error handling of -EOPNOTSUPP during
> reset, but bql_lock() getting stuck in qemu_default_main()
>
> GDB Trace Analysis:
> We set breakpoints at qemu_system_reset() and qemu_default_main() to trace
> the execution flow. The system successfully completes qemu_system_reset()
> without entering the problematic code path where the fix provided by you
> applies (system/runstate.c:529-543).
>
> # gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
> pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1
> -nographic -serial pty -device virtio-balloon -device
> virtio-scsi-pci,id=scsi0 -drive
> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>
> (gdb) handle SIGUSR1 pass nostop noprint
> Signal Stop Print Pass to program Description
> SIGUSR1 No No Yes User defined signal 1
> (gdb) b qemu_system_reset
> Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
> (gdb) b qemu_default_main
> Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
> (gdb) r
>
> Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
> pseries,accel=kvm -enable-kvm -m 32768 -smp 32,sockets=1,cores=32,threads=1
> -nographic -serial pty -device virtio-balloon -device
> virtio-scsi-pci,id=scsi0 -drive
> file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
> bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
>
> Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset
> (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
> 513 AccelClass *ac = ACCEL_GET_CLASS(current_accel());
> (gdb) n
> 517 mc = current_machine ? MACHINE_GET_CLASS(current_machine) : NULL;
> (gdb) n
> 519 cpu_synchronize_all_states();
> (gdb) n
> 521 switch (reason) {
> (gdb) n
> 529 if (!cpus_are_resettable() &&
> (gdb) n
> 553 if (mc && mc->reset) {
> (gdb) n
> 554 mc->reset(current_machine, type);
> (gdb) n
> 558 switch (reason) {
> (gdb) n
> 574 if (cpus_are_resettable()) {
> (gdb) n
> 583 cpu_synchronize_all_post_reset();
> (gdb) n
> 587 vm_set_suspended(false);
> (gdb) n
> qdev_machine_creation_done () at ../hw/core/machine.c:1814
> 1814 register_global_state();
> (gdb) n
> qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at
> ../system/vl.c:2785
> 2785 if (machine->cgs && !machine->cgs->ready) {
> (gdb) n
> 2791 foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
> (gdb) n
> 2793 if (!vga_interface_created && !default_vga &&
> (gdb) n
> qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at
> ../system/vl.c:2815
> 2815 if (loadvm) {
> (gdb) n
> 2820 if (replay_mode != REPLAY_MODE_NONE) {
> (gdb) n
> 2824 if (incoming) {
> (gdb) n
> 2837 } else if (autostart) {
> (gdb) n
> 2838 qmp_cont(NULL);
> (gdb) n
> qemu_init (argc=<optimized out>, argv=<optimized out>) at ../system/vl.c:3849
> 3849 qemu_init_displays();
> (gdb) n
> 3850 accel_setup_post(current_machine);
> (gdb) n
> 3851 if (migrate_mode() != MIG_MODE_CPR_EXEC) {
> (gdb) n
> 3852 os_setup_post();
> (gdb) n
> 3854 resume_mux_open();
> (gdb) n
> main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:84
> 84 bql_unlock();
> (gdb) n
> 85 replay_mutex_unlock();
> (gdb) n
> 87 if (qemu_main) {
> (gdb) n
> 93 qemu_default_main(NULL);
> (gdb) n
>
> Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main
> (opaque=opaque@entry=0x0) at ../system/main.c:48
> 48 replay_mutex_lock();
> (gdb) n
> 49 bql_lock();
> (gdb) n
>
> <hangs>
> <system becomes unresponsive at this point>
>
>
> Thanks,
> Misbah Anjum N <[email protected]>
>
>
>
> On 2026-03-09 18:53, Ani Sinha wrote:
>> Yes seems this is an issue and I will fix it. Not sure if the fix will
>> address your issue though ...
>> Can you try the following patch?
>> From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:00 2001
>> From: Ani Sinha <[email protected]>
>> Date: Mon, 9 Mar 2026 18:44:40 +0530
>> Subject: [PATCH] Fix reset for non-x86 archs that do not support reset yet
>> Signed-off-by: Ani Sinha <[email protected]>
>> ---
>> system/runstate.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>> diff --git a/system/runstate.c b/system/runstate.c
>> index eca722b43c..c1f41284c9 100644
>> --- a/system/runstate.c
>> +++ b/system/runstate.c
>> @@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
>> (current_machine->new_accel_vmfd_on_reset ||
>> !cpus_are_resettable())) {
>> if (ac->rebuild_guest) {
>> ret = ac->rebuild_guest(current_machine);
>> - if (ret < 0) {
>> + if (ret < 0 && ret != -EOPNOTSUPP) {
>> error_report("unable to rebuild guest: %s(%d)",
>> strerror(-ret), ret);
>> vm_stop(RUN_STATE_INTERNAL_ERROR);
>> + } else if (ret == -EOPNOTSUPP) {
>> + error_report("accelerator does not support reset!");
>> } else {
>> info_report("virtual machine state has been rebuilt with new
>> "
>> "guest file handle.");
>> --
>> 2.42.0
>>> Is this a confidential guest that cannot be normally reset?
>