On Tue, 29 Aug 2023 at 16:37, Laszlo Ersek <ler...@redhat.com> wrote: > > On 8/29/23 15:29, Ard Biesheuvel wrote: > > Laszlo reports that the efi_gdb.py script fails to produce a full > > backtrace when attaching it to an ARM firmware build that has halted on > > an unhandled exception. > > > > The reason is that the asm code that processes the exception was not > > implemented with this in mind, and therefore lacks any handling of it. > > > > So let's add this: create a dummy frame record suitable for chasing the > > frame pointer, and add the CFI metadata to describe where the return > > value can be found on the stack. > > > > When using a GCC5 build, this produces a stack trace such as > > > > (gdb) bt > > #0 0x000000007fd4537c in CpuDeadLoop () at > > /home/ardb/build/edk2/MdePkg/Library/BaseLib/CpuDeadLoop.c:30 > > #1 0x000000007fd454f8 in DebugAssert ( > > FileName=FileName@entry=0x7fd4a8a8 <MmioWrite64Internal+3604> > > "/home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c", > > LineNumber=LineNumber@entry=343, > > Description=Description@entry=0x7fd4a896 <MmioWrite64Internal+3586> > > "((BOOLEAN)(0==1))") > > at > > /home/ardb/build/edk2/MdePkg/Library/BaseDebugLibSerialPort/DebugLib.c:235 > > #2 0x000000007fd479ec in DefaultExceptionHandler > > (ExceptionType=<optimized out>, SystemContext=...) > > at > > /home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c:343 > > #3 0x000000007fd48eb8 in ExceptionHandlersEnd () > > #4 0x000000007fcde944 in QemuLoadKernelImage (ImageHandle=<synthetic > > pointer>) at > > /home/ardb/build/edk2/OvmfPkg/Library/GenericQemuLoadImageLib/GenericQemuLoadImageLib.c:201 > > #5 TryRunningQemuKernel () at > > /home/ardb/build/edk2/ArmVirtPkg/Library/PlatformBootManagerLib/QemuKernel.c:46 > > #6 PlatformBootManagerAfterConsole () at > > /home/ardb/build/edk2/ArmVirtPkg/Library/PlatformBootManagerLib/PlatformBm.c:1139 > > #7 BdsEntry (This=<optimized out>) at > > /home/ardb/build/edk2/MdeModulePkg/Universal/BdsDxe/BdsEntry.c:931 > > #8 0x000000007ffd0018 in ?? () > > Backtrace stopped: previous frame inner to this frame (corrupt stack?) > > > > when QemuLoadKernelImage() has been tweaked to trigger an exception, as > > is shown by GDB when walking the call stack: > > > > | 0x7fcde938 <BdsEntry+3292> b.ne 0x7fcdf134 <BdsEntry+5336> // > > b.any > > | 0x7fcde93c <BdsEntry+3296> mov x0, #0x40 > > // #64 > > | 0x7fcde940 <BdsEntry+3300> bl 0x7fcd7aec <DebugPrint> > > | > 0x7fcde944 <BdsEntry+3304> brk #0x4d2 > > | 0x7fcde948 <BdsEntry+3308> bl 0x7fce4354 > > <ConnectDevicesFromQemu> > > | 0x7fcde94c <BdsEntry+3312> tbz x0, #63, 0x7fcde954 > > <BdsEntry+3320> > > | 0x7fcde950 <BdsEntry+3316> bl 0x7fcd844c > > <EfiBootManagerConnectAll> > > | 0x7fcde954 <BdsEntry+3320> bl 0x7fcd990c > > <EfiBootManagerRefreshAllBootOption > > > > Unfortunately, CLANGDWARF does not seem entirely happy with this > > arrangement: it identifies the call frame where the exception > > originated, but does not show any frames above that. (This could be > > related to the fact that the exception code uses a separate exception > > stack for handling synchronous exceptions) > > First of all, thanks for writing this patch so incredibly quickly. :) >
My pleasure. > Second, something must be off with my gdb. > > Before your patch, I kept experimenting with manually resetting FP, SP, > and LR to the values printed in the register dump, using gdb "set" > commands. Strangely, that did result in complete pre-exception stack > traces, but *only sometimes*. Most of the time gdb complains about > "corrupted stack". And I just can't figure out what distinguishes the > broken from the functional "bt" commands -- I did walk the allegedly > corrupt stack manually, and there is nothing corrupt in the FP and LR > parts of the stack frames. They all chain nicely and point to valid > instructions, respectively. I don't know what it is that gdb doesn't like. > I suspect that gdb is filled with heuristics and tweaks, and uses a combination of the frame records, the actual value of LR and the unwind data to figure out what the call stack looks like. > Third, when I test your patch, I seem to experience precisely what you > describe under CLANGDWARF -- it shows the faulting frame (the frame just > before the exception), but nothing before it! And I'm not building with > clang :( > Shame. Unfortunately, I don't have a lot of time to spend on this right now, but it is something I have been wanting to fix forever so hopefully I'll get back to it at some point. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#108144): https://edk2.groups.io/g/devel/message/108144 Mute This Topic: https://groups.io/mt/101030910/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-