On Tue, 29 Aug 2023 at 16:37, Laszlo Ersek <ler...@redhat.com> wrote:
>
> On 8/29/23 15:29, Ard Biesheuvel wrote:
> > Laszlo reports that the efi_gdb.py script fails to produce a full
> > backtrace when attaching it to an ARM firmware build that has halted on
> > an unhandled exception.
> >
> > The reason is that the asm code that processes the exception was not
> > implemented with this in mind, and therefore lacks any handling of it.
> >
> > So let's add this: create a dummy frame record suitable for chasing the
> > frame pointer, and add the CFI metadata to describe where the return
> > value can be found on the stack.
> >
> > When using a GCC5 build, this produces a stack trace such as
> >
> >   (gdb) bt
> >   #0  0x000000007fd4537c in CpuDeadLoop () at 
> > /home/ardb/build/edk2/MdePkg/Library/BaseLib/CpuDeadLoop.c:30
> >   #1  0x000000007fd454f8 in DebugAssert (
> >       FileName=FileName@entry=0x7fd4a8a8 <MmioWrite64Internal+3604> 
> > "/home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c",
> >       LineNumber=LineNumber@entry=343, 
> > Description=Description@entry=0x7fd4a896 <MmioWrite64Internal+3586> 
> > "((BOOLEAN)(0==1))")
> >       at 
> > /home/ardb/build/edk2/MdePkg/Library/BaseDebugLibSerialPort/DebugLib.c:235
> >   #2  0x000000007fd479ec in DefaultExceptionHandler 
> > (ExceptionType=<optimized out>, SystemContext=...)
> >       at 
> > /home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c:343
> >   #3  0x000000007fd48eb8 in ExceptionHandlersEnd ()
> >   #4  0x000000007fcde944 in QemuLoadKernelImage (ImageHandle=<synthetic 
> > pointer>) at 
> > /home/ardb/build/edk2/OvmfPkg/Library/GenericQemuLoadImageLib/GenericQemuLoadImageLib.c:201
> >   #5  TryRunningQemuKernel () at 
> > /home/ardb/build/edk2/ArmVirtPkg/Library/PlatformBootManagerLib/QemuKernel.c:46
> >   #6  PlatformBootManagerAfterConsole () at 
> > /home/ardb/build/edk2/ArmVirtPkg/Library/PlatformBootManagerLib/PlatformBm.c:1139
> >   #7  BdsEntry (This=<optimized out>) at 
> > /home/ardb/build/edk2/MdeModulePkg/Universal/BdsDxe/BdsEntry.c:931
> >   #8  0x000000007ffd0018 in ?? ()
> >   Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> >
> > when QemuLoadKernelImage() has been tweaked to trigger an exception, as
> > is shown by GDB when walking the call stack:
> >
> > |    0x7fcde938 <BdsEntry+3292>      b.ne    0x7fcdf134 <BdsEntry+5336>  // 
> > b.any
> > |    0x7fcde93c <BdsEntry+3296>      mov     x0, #0x40                      
> >  // #64
> > |    0x7fcde940 <BdsEntry+3300>      bl      0x7fcd7aec <DebugPrint>
> > |  > 0x7fcde944 <BdsEntry+3304>      brk     #0x4d2
> > |    0x7fcde948 <BdsEntry+3308>      bl      0x7fce4354 
> > <ConnectDevicesFromQemu>
> > |    0x7fcde94c <BdsEntry+3312>      tbz     x0, #63, 0x7fcde954 
> > <BdsEntry+3320>
> > |    0x7fcde950 <BdsEntry+3316>      bl      0x7fcd844c 
> > <EfiBootManagerConnectAll>
> > |    0x7fcde954 <BdsEntry+3320>      bl      0x7fcd990c 
> > <EfiBootManagerRefreshAllBootOption
> >
> > Unfortunately, CLANGDWARF does not seem entirely happy with this
> > arrangement: it identifies the call frame where the exception
> > originated, but does not show any frames above that. (This could be
> > related to the fact that the exception code uses a separate exception
> > stack for handling synchronous exceptions)
>
> First of all, thanks for writing this patch so incredibly quickly. :)
>

My pleasure.

> Second, something must be off with my gdb.
>
> Before your patch, I kept experimenting with manually resetting FP, SP,
> and LR to the values printed in the register dump, using gdb "set"
> commands. Strangely, that did result in complete pre-exception stack
> traces, but *only sometimes*. Most of the time gdb complains about
> "corrupted stack". And I just can't figure out what distinguishes the
> broken from the functional "bt" commands -- I did walk the allegedly
> corrupt stack manually, and there is nothing corrupt in the FP and LR
> parts of the stack frames. They all chain nicely and point to valid
> instructions, respectively. I don't know what it is that gdb doesn't like.
>

I suspect that gdb is filled with heuristics and tweaks, and uses a
combination of the frame records, the actual value of LR and the
unwind data to figure out what the call stack looks like.

> Third, when I test your patch, I seem to experience precisely what you
> describe under CLANGDWARF -- it shows the faulting frame (the frame just
> before the exception), but nothing before it! And I'm not building with
> clang :(
>

Shame. Unfortunately, I don't have a lot of time to spend on this
right now, but it is something I have been wanting to fix forever so
hopefully I'll get back to it at some point.


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#108144): https://edk2.groups.io/g/devel/message/108144
Mute This Topic: https://groups.io/mt/101030910/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to