Hi Ani, Paolo,

I think the problem lies here:

For archs which doesnt support vm fd change, we are baling out as below in kvm_reset_vmfd.


    /*
     * bail if the current architecture does not support VM file
     * descriptor change.
     */
    if (!kvm_arch_supports_vmfd_change()) {
        error_report("This target architecture does not support KVM VM "
                     "file descriptor change.");
        return -EOPNOTSUPP;
    }

However, when rebuild_guest (kvm_reset_vmfd) is called in
qemu_system_reset here:

    if ((reason == SHUTDOWN_CAUSE_GUEST_RESET ||
         reason == SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET) &&
(current_machine->new_accel_vmfd_on_reset || !cpus_are_resettable())) {
        if (ac->rebuild_guest) {
            ret = ac->rebuild_guest(current_machine);
            if (ret < 0) {
                error_report("unable to rebuild guest: %s(%d)",
                             strerror(-ret), ret);
                vm_stop(RUN_STATE_INTERNAL_ERROR);
            } else {
info_report("virtual machine state has been rebuilt with new "
                            "guest file handle.");
                guest_state_rebuilt = true;
            }
        } else if (!cpus_are_resettable())  {
            error_report("accelerator does not support reset!");
        } else {
error_report("accelerator does not support rebuilding guest state,"
                         " proceeding with normal reset!");
        }
    }


it just does a vm_stop if rebuild_guest returns < 0.

IMHO, This should handle -EOPNOTSUPP gracefully.
Please advise if this needs to be taken care differently?

regards,
Harsh

On 09/03/26 1:58 pm, Misbah Anjum N wrote:
Hi Ani and Paolo,
Following up on my previous report, I've attempted additional debugging to isolate the issue on ppc64le.

I implemented the architecture-specific hooks for ppc64le. After adding the following changes and recompiling QEMU and testing with the direct qemu-system-ppc64 command, the hang persists with the same issue - no output and complete unresponsiveness.

Could you suggest what additional changes are needed to ensure the VM FD change doesn't affect architectures that don't support this feature?

Tested with the following changes:
File: stubs/kvm.c
Changed the abort() call to return 0:
int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
{
return 0;  / Changed from abort() */
}

File: target/ppc/kvm.c
Added the following stubs:
int kvm_arch_on_vmfd_change(MachineState *ms, KVMState s)
{
/ ppc64le doesn't support VM FD changes for confidential guests */
return 0;
}

bool kvm_arch_supports_vmfd_change(void)
{
return false;
}

GDB Backtrace:
I ran QEMU under GDB to capture the hang state. The backtrace shows the vCPU thread is waiting on a condition variable:

Thread 4 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
__syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/ syscall_cancel.S:77 #0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/powerpc/ syscall_cancel.S:77 #1  0x00007ffff58a9678 in __internal_syscall_cancel (nr=221) at cancellation.c:49 #2  0x00007ffff58aa220 in __futex_abstimed_wait_common64 (futex_word=0x10131ba10, expected=0, op=393, abstime=0x0, cancel=true) at futex-internal.c:57 #3  __futex_abstimed_wait_common (futex_word=0x10131ba10, expected=0, clockid=0, abstime=<optimized out>, private=0, cancel=true) at futex- internal.c:87 #4  __GI___futex_abstimed_wait_cancelable64 (futex_word=0x10131ba10, expected=0, clockid=0, abstime=0x0, private=0) at futex-internal.c:139 #5  0x00007ffff58ae0bc in __pthread_cond_wait_common (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, clockid=0, abstime=0x0) at pthread_cond_wait.c:426 #6  ___pthread_cond_wait (cond=0x10131b9f0, mutex=0x101222ce0 <bql>) at pthread_cond_wait.c:458 #7  0x0000000100b9bea8 in qemu_cond_wait_impl (cond=0x10131b9f0, mutex=0x101222ce0 <bql>, file=0x100c59900 "../system/cpus.c", line=472) at ../util/qemu-thread-posix.c:240 #8  0x00000001006a0408 in qemu_process_cpu_events (cpu=0x1019dd260) at ../system/cpus.c:472 #9  0x0000000100913354 in kvm_vcpu_thread_fn (arg=0x1019dd260) at ../ accel/kvm/kvm-accel-ops.c:50 #10 0x0000000100b9b30c in qemu_thread_start (args=0x1019f1fe0) at ../ util/qemu-thread-posix.c:414 #11 0x00007ffff58aed94 in start_thread (arg=0x7ffff0bce320) at pthread_create.c:448 #12 0x00007ffff59555f8 in __GI___clone3 () at ../sysdeps/unix/sysv/ linux/powerpc/powerpc64/clone3.S:114

Thanks,
Misbah Anjum N <[email protected]>


On 2026-03-06 16:22, Misbah Anjum N wrote:
Hi,
I'm reporting a critical regression on ppc64le that causes all KVM
guests to hang immediately during startup. Git bisect identified
commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a as the first bad
commit. The commit completely breaks KVM functionality on ppc64le.

Regression Details:
Working Version: QEMU 10.2.50 (v10.2.0-1669-gffcf1a7981)
Broken Version: QEMU 10.2.50 (v10.2.0-1816-g3fb456e9a0)
Bad Commit: 98884e0cc10997a17ce9abfd6ff10be19224ca6a "accel/kvm: add
changes required to support KVM VM file descriptor change"
Commit Link:
https://gitlab.com/qemu-project/qemu/-/ commit/98884e0cc10997a17ce9abfd6ff10be19224ca6a

Environment:
Host: Fedora 42, Kernel 7.0.0-rc2, Power11 (ppc64le)
Libvirt: 12.1.0
Guest: Fedora 42, Kernel 7.0.0-rc2
Machine Type: pseries with KVM acceleration

Build Configuration:
git clone https://gitlab.com/qemu-project/qemu.git
cd qemu
git submodule init
git submodule update --recursive
./configure --target-list=ppc64-softmmu --disable-tcg --prefix=/usr
make && make install

Reproduction:
Using virt-install:
/usr/bin/virt-install --connect=qemu:///system --hvm --accelerate
--name 'avocado-vt-vm1' --machine pseries --memory=32768
--vcpu=32,sockets=1,cores=32,threads=1 --import --nographics
--os-variant rhel8.0 --serial pty --memballoon model=virtio
--controller type=scsi,model=virtio-scsi --disk
path=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,bus=scsi,size=10,format=qcow2
--network=bridge=virbr0,model=virtio --boot
emulator=/usr/bin/qemu-system-ppc64
Result: Starting install...
        <hangs indefinitely with no output>

Using direct QEMU command:
/usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
pseries,accel=kvm -enable-kvm -m 32768 -smp
32,sockets=1,cores=32,threads=1 -nographic -serial pty -device
virtio-balloon -device virtio-scsi-pci,id=scsi0 -drive
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel- ppc64le.qcow2,if=none,id=drive-scsi0-0-0,format=qcow2
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0 -netdev
bridge,id=net0,br=virbr0 -device virtio-net-pci,netdev=net0
Result: <hangs indefinitely with no output>

Analysis:
The commit introduces VM file descriptor change support with
architecture-specific hooks.
I attempted the following fixes without success:
1. Changed abort() to return 0; in stubs/kvm.c
2. Added early return in kvm_reset_vmfd() when
kvm_arch_supports_vmfd_change() returns false

Git Bisect Log:
# git bisect bad
98884e0cc10997a17ce9abfd6ff10be19224ca6a is the first bad commit
commit 98884e0cc10997a17ce9abfd6ff10be19224ca6a (HEAD)
Author: Ani Sinha <[email protected]>
Date:   Wed Feb 25 09:19:10 2026 +0530

    accel/kvm: add changes required to support KVM VM file descriptor change

    This change adds common kvm specific support to handle KVM VM file
descriptor
    change. KVM VM file descriptor can change as a part of
confidential guest reset
    mechanism. A new function api kvm_arch_on_vmfd_change() per
    architecture platform is added in order to implement architecture specific     changes required to support it. A subsequent patch will add x86 specific     implementation for kvm_arch_on_vmfd_change() as currently only x86 supports
    confidential guest reset.

    Signed-off-by: Ani Sinha <[email protected]>
    Link: https://lore.kernel.org/r/20260225035000.385950-6- [email protected]
    Signed-off-by: Paolo Bonzini <[email protected]>

 MAINTAINERS            |  6 ++++++
 accel/kvm/kvm-all.c    | 88
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++---
 accel/kvm/trace-events |  1 +
 include/system/kvm.h   |  3 +++
 stubs/kvm.c            | 22 ++++++++++++++++++++++
 stubs/meson.build      |  1 +
 target/i386/kvm/kvm.c  | 10 ++++++++++
 7 files changed, 128 insertions(+), 3 deletions(-)
 create mode 100644 stubs/kvm.c

# git bisect log
git bisect start
git bisect good ffcf1a7981793973ffbd8100a7c3c6042d02ae23
git bisect bad 3fb456e9a0e9eef6a71d9b49bfff596a0f0046e9
git bisect bad e76c30bb13ecb9dc716fa629954bfb6253056ce2
git bisect good 9bdc612a18588975f5776ee4e562df607fea1b2c
git bisect bad 40c015e96942fd2a3e4d5ace6063b3333a3dd372
git bisect good df8df3cb6b743372ebb335bd8404bc3d748da350
git bisect bad 0f53f021ad1ede28dc8944686544e496cab02e5e
git bisect bad 9f0c2b3032639315faf141010a2603b0dbf56230
git bisect bad 98884e0cc10997a17ce9abfd6ff10be19224ca6a
first bad commit: [98884e0cc10997a17ce9abfd6ff10be19224ca6a]

Thanks,
Misbah Anjum N <[email protected]>


Reply via email to