Hi Vlad, the ipxe-qemu package in Ubuntu (1.0.0+git-20190109.133f4c4-0ubuntu3) is built with DOWNLOAD_PROTO_HTTPS enabled (in "src/config/general.h"). According to the Ubuntu changelog, this is a new feature added in "1.0.0+git-20190109.133f4c4-0ubuntu1".
With DOWNLOAD_PROTO_HTTPS enabled, I can reproduce the issue locally, with iPXE built from source at git commit 133f4c4 (which you report the issue for), and also at current iPXE master (9ee70fb95bc2). The issue does not reproduce (with DOWNLOAD_PROTO_HTTPS enabled) at commit fbe8c52d. This suggests the problem should be bisectable. If I disable DOWNLOAD_PROTO_HTTPS, then the problem goes away even at 133f4c4 (i.e., the issue is masked). I've used current edk2 master to test with (14c7ed8b51f6). Viewed at 133f4c4: The DOWNLOAD_PROTO_HTTPS feature test macro seems to result in iPXE attempting to gather entropy. (Likely for setting up TLS connections.) For entropy gathering, iPXE seems to use an EFI timer, and to measure jitter across one timer tick. In this, iPXE plays some tricks with the UEFI TPL (Task Priority Level). In general, iPXE seems to want to run at TPL_CALLBACK most of the time, to mask the timer interrupt in most code locations, and drops down to TPL_APPLICATION only when it actively wants a timer callback (for the jitter collection, see above). When the iPXE driver is launched, the StartImage() UEFI boot service takes a note of the current TPL. It is TPL_APPLICATION (value 4). Then iPXE seems to perform the above trickery with TPL_CALLBACK & entropy collection. Finally, after installing EfiDriverBindingProtocol and EfiComponentName2Protocol, the iPXE driver exits (as expected from a UEFI driver model driver -- the entry point function is only supposed to perform some setup steps & install some protocol interfaces). At this point, StartImage() verifies whether the TPL has been restored to the same as it was before launching the driver. Unfortunately, something about the TPL manipulations in iPXE is unbalanced, because I see the following TPL changes: - raise: APPLICATION (4) -> CALLBACK (8) - raise: CALLBACK (8) -> NOTIFY (16) - raise: NOTIFY (16) -> NOTIFY (16) - restore: NOTIFY (16) -> NOTIFY (16) - restore: NOTIFY (16) -> CALLBACK (8) Note that the final "restore: CALLBACK (8) -> APPLICATION (4)" transition is missing, before iPXE exits. This is what StartImage() catches and reports with the failed ASSERT(). So, as I mentioned, the problem is bisectable. Here's the bisection log: > git bisect start > # bad: [9ee70fb95bc266885ff88be228b044a2bb226eeb] [efi] Attempt to > # connect our driver directly if ConnectController fails > git bisect bad 9ee70fb95bc266885ff88be228b044a2bb226eeb > # bad: [133f4c47baef6002b2ccb4904a035cda2303c6e5] [build] Handle > # R_X86_64_PLT32 from binutils 2.31 > git bisect bad 133f4c47baef6002b2ccb4904a035cda2303c6e5 > # good: [fbe8c52d0d9cdb3d6f5fe8be8edab54618becc1f] [ena] Fix spurious > # uninitialised variable warning on older versions of gcc > git bisect good fbe8c52d0d9cdb3d6f5fe8be8edab54618becc1f > # bad: [bc85368cdd311fe68ffcf251e7e8e90c14f8a9dc] [librm] Ensure that > # inline code symbols are unique > git bisect bad bc85368cdd311fe68ffcf251e7e8e90c14f8a9dc > # bad: [0778418e29ea16fc897fc5b6e497054f5ba86ebd] [golan] Do not > # assume all devices are identical > git bisect bad 0778418e29ea16fc897fc5b6e497054f5ba86ebd > # good: [f672a27b34220865b403df519593f382859559e0] [efi] Raise TPL > # within EFI_USB_IO_PROTOCOL entry points > git bisect good f672a27b34220865b403df519593f382859559e0 > # bad: [d8c500b7945e57023dde5bd0be2b0e40963315d9] [efi] Drop to > # TPL_APPLICATION when gathering entropy > git bisect bad d8c500b7945e57023dde5bd0be2b0e40963315d9 > # good: [c84f9d67272beaed98f98bf308471df16340a3be] [iscsi] Parse IPv6 > # address in root path > git bisect good c84f9d67272beaed98f98bf308471df16340a3be > # first bad commit: [d8c500b7945e57023dde5bd0be2b0e40963315d9] [efi] > # Drop to TPL_APPLICATION when gathering entropy The bisection fingers d8c500b7945e ("[efi] Drop to TPL_APPLICATION when gathering entropy", 2018-03-12) as first bad commit. Feel free to report this problem on the upstream iPXE mailing list. Regarding Ubuntu downstream, you should be able to work around this issue by #undef-ing DOWNLOAD_PROTO_HTTPS again, in "src/config/general.h" -- *minimally* in the CONFIG=qemu build(s). That is, in the ipxe-qemu subpackage. That's because in a CONFIG=qemu build, you totally don't need (or even *use*) the iPXE HTTPS infrastructure (the entropy gathering that trips the ASSERT seems spurious to me, with CONFIG=qemu). With CONFIG=qemu, iPXE provides the UEFI SNP (Simple Network Protocol) interface on top of the e1000 NIC, and the crypto stuff (if any) is done by the platform firmware (edk2 / OVMF). ** Project changed: qemu => ipxe ** Package changed: qemu (Ubuntu) => ipxe (Ubuntu) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1882671 Title: qemu-system-x86_64 (ver 4.2) stuck at boot with OVMF bios Status in iPXE: New Status in ipxe package in Ubuntu: Confirmed Bug description: The version of QEMU (4.2.0) packaged for Ubuntu 20.04 hangs indefinitely at boot if an OVMF bios is used. This happens ONLY with qemu-system-x86_64. qemu-system-i386 works fine with the latest ia32 OVMF bios. NOTE[1]: the same identical OVMF bios works fine on QEMU 2.x packaged with Ubuntu 18.04. NOTE[2]: reproducing the fatal bug requires *no* operating system: qemu-system-x86_64 -bios OVMF-pure-efi.fd On its window QEMU gets stuck at the very first stage: "Guest has not initialized the display (yet)." NOTE[3]: QEMU gets stuck no matter if KVM is used or not. NOTE[4]: By adding the `-d int` option it is possible to observe that QEMU is, apparently, stuck in an endless loop of interrupts. For the first few seconds, registers' values vary quickly, but at some point they reach a final value, while the interrupt counter increments: 2568: v=68 e=0000 i=0 cpl=0 IP=0038:0000000007f1d225 pc=0000000007f1d225 SP=0030:0000000007f0c8d0 env->regs[R_EAX]=0000000000000000 RAX=0000000000000000 RBX=0000000007f0c920 RCX=0000000000000000 RDX=0000000000000001 RSI=0000000006d18798 RDI=0000000000008664 RBP=0000000000000000 RSP=0000000007f0c8d0 R8 =0000000000000001 R9 =0000000000000089 R10=0000000000000000 R11=0000000007f2c987 R12=0000000000000000 R13=0000000000000000 R14=0000000007087901 R15=0000000000000000 RIP=0000000007f1d225 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA] CS =0038 0000000000000000 ffffffff 00af9a00 DPL=0 CS64 [-R-] SS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA] DS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA] FS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA] GS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA] LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy GDT= 00000000079eea98 00000047 IDT= 000000000758f018 00000fff CR0=80010033 CR2=0000000000000000 CR3=0000000007c01000 CR4=00000668 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS EFER=0000000000000d00 NOTE[5]: Just to better help the investigation of the bug, I'd like to remark that the issue is NOT caused by an endless loop of triple-faults. I tried with -d cpu_reset and there is NO such loop. No triple fault whatsoever. NOTE[6]: The OVMF version used for the test has been downloaded from: https://www.kraxel.org/repos/jenkins/edk2/edk2.git-ovmf-x64-0-20200515.1398.g6ff7c838d0.noarch.rpm but the issue is the same with older OVMF versions as well. Please take a look at it, as the bug is NOT a corner case. QEMU 4.2.0 cannot boot with an UEFI firmware (OVMF) while virtualizing a x86_64 machine AT ALL. Thank you very much, Vladislav K. Valtchev To manage notifications about this bug go to: https://bugs.launchpad.net/ipxe/+bug/1882671/+subscriptions