Re: Guest reboot issues since QEMU 6.0 and Linux 5.11
Am 28.07.22 um 12:13 schrieb Yan Vugenfirer: > Hi Fabian, > > Can you save the dump file with QEMU monitor using dump-guest-memory or with > virsh dump? > Then you can use elf2dmp (compiled with QEMU and is found in “contrib” > folder) to covert the dump file to WinDbg format and examine the stack. > Hi Yan, thank you for the suggestion! So for the two VMs in the KVM_EXIT_SHUTDOWN-qemu_system_reset-loop, I get > 2 CPU states has been found > CPU #0 CR3 is 0x0080 > DirectoryTableBase = 0x000fffd08000 has been found from CPU #0 as > interrupt handling CR3 > [1]4169758 segmentation fault (core dumped) elf2dmp memdump.elf > memdump.dmp I tried twice more, hoping for better timing, but the results were the same (haven't looked into why it segfaults yet). For the second one there is no segfault, but still an error upon conversion: > 2 CPU states has been found > CPU #0 CR3 is 0x0080 > DirectoryTableBase = 0x has been found from CPU #0 as > interrupt handling CR3 > Failed to find paging base For the VM with the spinning circles, the dump was converted successfully at least, but I don't have any experience with WinDbg and nothing interesting pops out to me: > Microsoft (R) Windows Debugger Version 10.0.22621.1 AMD64 > Copyright (c) Microsoft Corporation. All rights reserved. > > > Loading Dump File [F:\win-reboot-dump\memdump-circles.dmp] > Kernel Complete Dump File: Full address space is available > > Comment: 'Hello from elf2dmp!' > Symbol search path is: srv* > Executable search path is: > Windows 8.1 Kernel Version 9600 MP (2 procs) Free x64 > Product: Server, suite: TerminalServer SingleUserTS > Edition build lab: 9600.19913.amd64fre.winblue_ltsb_escrow.201207-1920 > Machine Name: > Kernel base = 0xf802`05073000 PsLoadedModuleList = 0xf802`053385d0 > System Uptime: 0 days 0:00:52.919 > Loading Kernel Symbols > ... > . > Loading User Symbols > > Loading unloaded module list > .. > Unknown exception - code (first/second chance not available) > For analysis of this file, run !analyze -v > 0: kd> ~0 > 0: kd> kb > # RetAddr : Args to Child > : Call Site > 00 f802`0526a0ad : e002`00519650 e002`00519410 > ` f802`05354180 : hal!HalProcessorIdle+0xf > 01 f802`05168a50 : f802`05354180 f802`066a2300 > ` e002`00519410 : nt!PpmIdleDefaultExecute+0x1d > 02 f802`050dd186 : f802`05354180 f802`066a23cc > f802`066a23d0 f802`066a23d8 : nt!PpmIdleExecuteTransition+0x400 > 03 f802`051b71ac : f802`05354180 f802`05354180 > f802`053bba00 ` : nt!PoIdle+0x2f6 > 04 ` : f802`066a3000 f802`0669c000 > ` ` : nt!KiIdleLoop+0x2c > 0: kd> r > rax= rbx= rcx=0086 > rdx= rsi= rdi=e00200519410 > rip=f8020501e81f rsp=f802066a2258 rbp=f802066a2339 > r8= r9= r10=0002 > r11=0001 r12= r13=0001 > r14=1f8de167 r15= > iopl=0 nv up ei ng nz na pe nc > cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=0282 > hal!HalProcessorIdle+0xf: > f802`0501e81f c3 ret > 0: kd> ~1 > 1: kd> kb > # RetAddr : Args to Child > : Call Site > 00 f802`0526a0ad : e002`00521b20 e002`005218e0 > ` d000`203da180 : hal!HalProcessorIdle+0xf > 01 f802`05168a50 : d000`203da180 d000`203f8300 > ` 018e`b7f04213 : nt!PpmIdleDefaultExecute+0x1d > 02 f802`050dd186 : d000`203da180 d000`203f83cc > d000`203f83d0 d000`203f83d8 : nt!PpmIdleExecuteTransition+0x400 > 03 f802`051b71ac : d000`203da180 d000`203da180 > d000`203ea300 ` : nt!PoIdle+0x2f6 > 04 ` : d000`203f9000 d000`203f2000 > ` ` : nt!KiIdleLoop+0x2c > 1: kd> r > rax=0020 rbx= rcx=0086 > rdx= rsi= rdi=e002005218e0 > rip=f8020501e81f rsp=d000203f8258 rbp=d000203f8339 > r8= r9= r10=0002 > r11=0001 r12= r13=0001 > r14=000128d624996edd r15= > iopl=0 nv up ei ng nz na pe nc > cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=0282 > hal!HalProcessorIdle+0xf: > f802`0501e81f c3 ret Is there anything I should be looking at in particular? I took a second
Re: Guest reboot issues since QEMU 6.0 and Linux 5.11
Hi Fabian, Can you save the dump file with QEMU monitor using dump-guest-memory or with virsh dump? Then you can use elf2dmp (compiled with QEMU and is found in “contrib” folder) to covert the dump file to WinDbg format and examine the stack. Best regards, Yan. > On 21 Jul 2022, at 3:49 PM, Fabian Ebner wrote: > > Hi, > since about half a year ago, we're getting user reports about guest > reboot issues with KVM/QEMU[0]. > > The most common scenario is a Windows Server VM (2012R2/2016/2019, > UEFI/OVMF and SeaBIOS) getting stuck during the screen with the Windows > logo and the spinning circles after a reboot was triggered from within > the guest. Quitting the kvm process and booting with a fresh instance > works. The issue seems to become more likely, the longer the kvm > instance runs. > > We did not get such reports while we were providing Linux 5.4 and QEMU > 5.2.0, but we do with Linux 5.11/5.13/5.15 and QEMU 6.x. > > I'm just wondering if anybody has seen this issue before or might have a > hunch what it's about? Any tips on what to look out for when debugging > are also greatly appreciated! > > We do have debug access to a user's test VM and the VM state was saved > before a problematic reboot, but I can't modify the host system there. > AFAICT QEMU just executes guest code as usual, but I'm really not sure > what to look out for. > > That VM has CPU type host, and a colleague did have a similar enough CPU > to load the VM state, but for him, the reboot went through normally. On > the user's system, it triggers consistently after loading the VM state > and rebooting. > > So unfortunately, we didn't manage to reproduce the issue locally yet. > With two other images provided by users, we ran into a boot loop, where > QEMU resets the CPUs and does a few KVM_RUNs before the exit reason is > KVM_EXIT_SHUTDOWN (which to my understanding indicates a triple fault) > and then it repeats. It's not clear if the issues are related. > > There are also a few reports about non-Windows VMs, mostly Ubuntu 20.04 > with UEFI/OVMF, but again, it's not clear if the issues are related. > > [0]: https://forum.proxmox.com/threads/100744/ > (the forum thread is a bit chaotic unfortunately). > > Best Regards, > Fabi > > >
Re: Guest reboot issues since QEMU 6.0 and Linux 5.11
Am 21.07.22 um 17:51 schrieb Maxim Levitsky: > On Thu, 2022-07-21 at 14:49 +0200, Fabian Ebner wrote: >> Hi, >> since about half a year ago, we're getting user reports about guest >> reboot issues with KVM/QEMU[0]. >> >> The most common scenario is a Windows Server VM (2012R2/2016/2019, >> UEFI/OVMF and SeaBIOS) getting stuck during the screen with the Windows >> logo and the spinning circles after a reboot was triggered from within >> the guest. Quitting the kvm process and booting with a fresh instance >> works. The issue seems to become more likely, the longer the kvm >> instance runs. >> >> We did not get such reports while we were providing Linux 5.4 and QEMU >> 5.2.0, but we do with Linux 5.11/5.13/5.15 and QEMU 6.x. >> >> I'm just wondering if anybody has seen this issue before or might have a >> hunch what it's about? Any tips on what to look out for when debugging >> are also greatly appreciated! >> >> We do have debug access to a user's test VM and the VM state was saved >> before a problematic reboot, but I can't modify the host system there. >> AFAICT QEMU just executes guest code as usual, but I'm really not sure >> what to look out for. >> >> That VM has CPU type host, and a colleague did have a similar enough CPU >> to load the VM state, but for him, the reboot went through normally. On >> the user's system, it triggers consistently after loading the VM state >> and rebooting. >> >> So unfortunately, we didn't manage to reproduce the issue locally yet. >> With two other images provided by users, we ran into a boot loop, where >> QEMU resets the CPUs and does a few KVM_RUNs before the exit reason is >> KVM_EXIT_SHUTDOWN (which to my understanding indicates a triple fa >> ult) >> and then it repeats. It's not clear if the issues are related. > > > Does the guest have HyperV enabled in it (that is nested virtualization?) > For all three machines described above Get-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V indicates that HyperV is disabled. > Intel or AMD? > We do have reports for both Intel and AMD. > Does the VM uses secure boot / SMM? > The customer VM which can reliably trigger the issue after loading the state and rebooting uses SeaBIOS. For the other two VMs, Confirm-SecureBootUEFI returns "False". SMM might be a lead! We did disable SMM in the past, because apparently there were problems with it (didn't dig out which, was before I worked here), and the timing of enabling it and the reports coming in would match. I guess (some) guest OSes don't expect it to be suddenly turned on? However, there is a report of a user with two clusters with QEMU 5.2, one with kernel 5.4 without the issue and one with kernel 5.11 with the issue (Windows VM with spinning circles). So that's confusing :/ We do use some additional options if the OS type is "Windows" in our high-level configuration, including hyperV enlightenments: > -cpu > 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt' > -no-hpet > -rtc 'driftfix=slew,base=localtime' > -global 'kvm-pit.lost_tick_policy=discard' But one user reported running into the issue even with OS type "other", i.e. when the above options are not present and CPU flags should be just '+kvm_pv_eoi,+kvm_pv_unhalt'. There are also reports with CPU type different from 'host', also with 'kvm64' (where we automatically set the flags +lahf_lm,+sep). Thank you and Best Regards, Fiona P.S. Please don't mind the (from your perspective sudden) name change. I'm still the same person and don't intend to change it again :) > Best regards, > Maxim Levitsky > >> >> There are also a few reports about non-Windows VMs, mostly Ubuntu 20.04 >> with UEFI/OVMF, but again, it's not clear if the issues are related. >> >> [0]: https://forum.proxmox.com/threads/100744/ >> (the forum thread is a bit chaotic unfortunately). >> >> Best Regards, >> Fabi >> >> > > >
Re: Guest reboot issues since QEMU 6.0 and Linux 5.11
On Thu, 2022-07-21 at 14:49 +0200, Fabian Ebner wrote: > Hi, > since about half a year ago, we're getting user reports about guest > reboot issues with KVM/QEMU[0]. > > The most common scenario is a Windows Server VM (2012R2/2016/2019, > UEFI/OVMF and SeaBIOS) getting stuck during the screen with the Windows > logo and the spinning circles after a reboot was triggered from within > the guest. Quitting the kvm process and booting with a fresh instance > works. The issue seems to become more likely, the longer the kvm > instance runs. > > We did not get such reports while we were providing Linux 5.4 and QEMU > 5.2.0, but we do with Linux 5.11/5.13/5.15 and QEMU 6.x. > > I'm just wondering if anybody has seen this issue before or might have a > hunch what it's about? Any tips on what to look out for when debugging > are also greatly appreciated! > > We do have debug access to a user's test VM and the VM state was saved > before a problematic reboot, but I can't modify the host system there. > AFAICT QEMU just executes guest code as usual, but I'm really not sure > what to look out for. > > That VM has CPU type host, and a colleague did have a similar enough CPU > to load the VM state, but for him, the reboot went through normally. On > the user's system, it triggers consistently after loading the VM state > and rebooting. > > So unfortunately, we didn't manage to reproduce the issue locally yet. > With two other images provided by users, we ran into a boot loop, where > QEMU resets the CPUs and does a few KVM_RUNs before the exit reason is > KVM_EXIT_SHUTDOWN (which to my understanding indicates a triple fa > ult) > and then it repeats. It's not clear if the issues are related. Does the guest have HyperV enabled in it (that is nested virtualization?) Intel or AMD? Does the VM uses secure boot / SMM? Best regards, Maxim Levitsky > > There are also a few reports about non-Windows VMs, mostly Ubuntu 20.04 > with UEFI/OVMF, but again, it's not clear if the issues are related. > > [0]: https://forum.proxmox.com/threads/100744/ > (the forum thread is a bit chaotic unfortunately). > > Best Regards, > Fabi > >