Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c

Misbah Anjum N Tue, 10 Mar 2026 02:10:06 -0700

On 2026-03-10 14:24, Ani Sinha wrote:

On 10 Mar 2026, at 2:09 PM, Misbah Anjum N <[email protected]>wrote:
Hi Ani and Paolo,
We have tested the code by applying both the original commit(98884e0cc10997a17ce9abfd6ff10be19224ca6a) and your fix patch (commit9e5a6945181d4c1fce7f8438e1b6213f1eb79c14) on ppc64le.However, the issue persists. We've conducted GDB debugging that showsthe hang is occurring in a different location than what the fixaddresses.
Since the original patch is breaking KVM guest bringup completely onppc64le, and the fix patch does not resolve the issue, given theseverity of this regression (complete KVM breakage on ppc64le), weshould either find a quick fix or consider reverting the patch until aproper solution can be identified.
Based on what you just described, it does not seem like the issue is
related to 98884e0cc10997a17ce9abfd6ff10be19224ca6a at all. If you
revert this patch in your local tree, can you confirm that your issue
gets fixed?


Yes, the issue is not seen with the immediate previous commit:

commit df8df3cb6b743372ebb335bd8404bc3d748da350 (ani-df8df3cb)
Author: Ani Sinha <[email protected]>
Date:   Wed Feb 25 09:19:09 2026 +0530

system/physmem: add helper to reattach existing memory after KVM VMfd change

After the guest KVM file descriptor has changed as a part of theprocess ofconfidential guest reset mechanism, existing memory needs to bereattached tothe new file descriptor. This change adds a helper functionram_block_rebind()

    for this purpose. The next patch will make use of this function.

    Signed-off-by: Ani Sinha <[email protected]>

Link:https://lore.kernel.org/r/[email protected]

    Signed-off-by: Paolo Bonzini <[email protected]>

Looks like the next patch is enabling the functionality of the previouspatches in such a way which causes bql_lock() to get stuck onarchitectures (ppc64le in this case) which does not support this featureyet.

Did you validate your patches on other architectures which does notsupport this feature yet?

Analysis:
1. This is not a confidential guest. This is a regular KVM guestrunning on ppc64le.2. The execution flow shows that qemu_system_reset() completessuccessfully and never enters the code path at line 529-543
This is what I expected and therefore, no code related to coco guest
rebuilding is getting executed. Your issue seems to be somewhere else.

The issue occurs only with the introduction of this patch and not withthe previous upstream commit as explained above.

3. The hang occurs later in qemu_default_main() at system/main.c:49,after calling bql_lock()4. The ppc KVM guest boots fine with the previous commit -df8df3cb6b743372ebb335bd8404bc3d748da3505. This suggests the issue is not with error handling of -EOPNOTSUPPduring reset, but bql_lock() getting stuck in qemu_default_main()
GDB Trace Analysis:
We set breakpoints at qemu_system_reset() and qemu_default_main() totrace the execution flow. The system successfully completesqemu_system_reset() without entering the problematic code path wherethe fix provided by you applies (system/runstate.c:529-543).
# gdb --args /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machinepseries,accel=kvm -enable-kvm -m 32768 -smp32,sockets=1,cores=32,threads=1 -nographic -serial pty -devicevirtio-balloon -device virtio-scsi-pci,id=scsi0 -drivefile=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdevbridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
(gdb) handle SIGUSR1 pass nostop noprint
Signal        Stop Print Pass to program Description
SIGUSR1       No No Yes User defined signal 1
(gdb) b qemu_system_reset
Breakpoint 1 at 0x69a688: file ../system/runstate.c, line 510.
(gdb) b qemu_default_main
Breakpoint 2 at 0xa9aeb8: file ../system/main.c, line 45.
(gdb) r
Starting program: /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1-machine pseries,accel=kvm -enable-kvm -m 32768 -smp32,sockets=1,cores=32,threads=1 -nographic -serial pty -devicevirtio-balloon -device virtio-scsi-pci,id=scsi0 -drivefile=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdevbridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
Thread 1 "qemu-system-ppc" hit Breakpoint 1, qemu_system_reset(reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:513
513     AccelClass *ac = ACCEL_GET_CLASS(current_accel());
(gdb) n
517 mc = current_machine ? MACHINE_GET_CLASS(current_machine) :NULL;
(gdb) n
519     cpu_synchronize_all_states();
(gdb) n
521     switch (reason) {
(gdb) n
529     if (!cpus_are_resettable() &&
(gdb) n
553     if (mc && mc->reset) {
(gdb) n
554         mc->reset(current_machine, type);
(gdb) n
558     switch (reason) {
(gdb) n
574     if (cpus_are_resettable()) {
(gdb) n
583             cpu_synchronize_all_post_reset();
(gdb) n
587     vm_set_suspended(false);
(gdb) n
qdev_machine_creation_done () at ../hw/core/machine.c:1814
1814    register_global_state();
(gdb) n
qemu_machine_creation_done (errp=0x10123e028 <error_fatal>) at../system/vl.c:2785
2785    if (machine->cgs && !machine->cgs->ready) {
(gdb) n
2791    foreach_device_config_or_exit(DEV_GDB, gdbserver_start);
(gdb) n
2793    if (!vga_interface_created && !default_vga &&
(gdb) n
qmp_x_exit_preconfig (errp=errp@entry=0x10123e028 <error_fatal>) at../system/vl.c:2815
2815    if (loadvm) {
(gdb) n
2820    if (replay_mode != REPLAY_MODE_NONE) {
(gdb) n
2824    if (incoming) {
(gdb) n
2837    } else if (autostart) {
(gdb) n
2838        qmp_cont(NULL);
(gdb) n
qemu_init (argc=<optimized out>, argv=<optimized out>) at../system/vl.c:3849
3849    qemu_init_displays();
(gdb) n
3850    accel_setup_post(current_machine);
(gdb) n
3851    if (migrate_mode() != MIG_MODE_CPR_EXEC) {
(gdb) n
3852        os_setup_post();
(gdb) n
3854    resume_mux_open();
(gdb) n
main (argc=<optimized out>, argv=<optimized out>) at../system/main.c:84
84      bql_unlock();
(gdb) n
85      replay_mutex_unlock();
(gdb) n
87      if (qemu_main) {
(gdb) n
93          qemu_default_main(NULL);
(gdb) n
Thread 1 "qemu-system-ppc" hit Breakpoint 2, qemu_default_main(opaque=opaque@entry=0x0) at ../system/main.c:48
48      replay_mutex_lock();
(gdb) n
49      bql_lock();
(gdb) n

<hangs>
<system becomes unresponsive at this point>


Thanks,
Misbah Anjum N <[email protected]>



On 2026-03-09 18:53, Ani Sinha wrote:
Yes seems this is an issue and I will fix it. Not sure if the fixwill
address your issue though ...
Can you try the following patch?
From 9e5a6945181d4c1fce7f8438e1b6213f1eb79c14 Mon Sep 17 00:00:002001
From: Ani Sinha <[email protected]>
Date: Mon, 9 Mar 2026 18:44:40 +0530
Subject: [PATCH] Fix reset for non-x86 archs that do not supportreset yet
Signed-off-by: Ani Sinha <[email protected]>
---
system/runstate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/system/runstate.c b/system/runstate.c
index eca722b43c..c1f41284c9 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -531,10 +531,12 @@ void qemu_system_reset(ShutdownCause reason)
(current_machine->new_accel_vmfd_on_reset ||!cpus_are_resettable())) {
        if (ac->rebuild_guest) {
            ret = ac->rebuild_guest(current_machine);
-            if (ret < 0) {
+            if (ret < 0 && ret != -EOPNOTSUPP) {
                error_report("unable to rebuild guest: %s(%d)",
                             strerror(-ret), ret);
                vm_stop(RUN_STATE_INTERNAL_ERROR);
+            } else if (ret == -EOPNOTSUPP) {
+                error_report("accelerator does not support reset!");
            } else {
info_report("virtual machine state has been rebuiltwith new "
                            "guest file handle.");
--
2.42.0
Is this a confidential guest that cannot be normally reset?

Re: [BUG] [powerpc] KVM guest boot failure - hangs on startup after commit 98884e0c

Reply via email to