On Mon, Feb 11, 2019 at 09:27:11AM -0800, Doug Anderson wrote:
> Hi,
> 
> On Mon, Feb 4, 2019 at 4:31 AM Dave Martin <dave.mar...@arm.com> wrote:
> >
> > On Fri, Feb 01, 2019 at 01:38:05PM -0800, Doug Anderson wrote:
> > > Hi,
> > >
> > > I was wondering if anyone out there has given any thought to
> > > annotating the ARM64 IRQ handling in such a way that we could stack
> > > crawl past el1_irq() when in gdb.
> > >
> > > I spent a bit of time on this a few months ago and documented all my
> > > findings in:
> > >
> > > https://bugs.chromium.org/p/chromium/issues/detail?id=908721
> > >
> > > I can copy and paste all the discussion from that bug here, but since
> > > it's public hopefully folks can read the discussion / investigation
> > > there.  To put it briefly, though: I can stack crawl past "el1_irq"
> > > with the normal linux stack crawl (which is what kdb uses) but I can't
> > > crawl past "el1_irq" in gdb().  After talking to some of our tools
> > > guys here I'm fairly certain that we could solve this with the right
> > > CFI directives, but when I poked at it I wasn't able to figure out the
> > > magic.
> > >
> > >
> > > Anyway, I figured I'd check to see if anyone here happens to know the
> > > right magic.
> >
> > The kernel (appears to) generate a valid frame record for el1_irq:
> >
> >    0xffffff8008082b94 <+84>:    mrs     x22, elr_el1
> >
> >         [...]
> >
> >    0xffffff8008082ba0 <+96>:    stp     x29, x22, [sp, #304]
> >    0xffffff8008082ba4 <+100>:   add     x29, sp, #0x130
> >
> > (I note that 0x130 == 304.  Yay binutils.)
> 
> Right, this is how the kernel is able to do the crawl.  It's also why
> I was able to manually do the crawl in the bug by chaining together
> frame pointers.
> 
> 
> > From the bug report, I don't see any real investigation into what
> > precisely causes gdb to choke on this frame.
> 
> Right.  I just don't know gdb well enough.  :(  I've had it on my list
> to dig into it, but I need to find time.  ;-)
> 
> 
> > Do you have evidence that CFI annotations help in this case?  And can
> > you explain _why_ they help (i.e., precisely how is gdb relying on the
> > annotations)?
> 
> I spent a tiny bit of time playing around with CFI annotations.
> Mostly it was stumbling around in the dark since I had a hard time
> finding good arm/arm64 examples and the documentation was a little
> hard for me to parse.

You could try compiling a few simple C functions with gcc -S
-fexceptions and see what the compiler spits out.

> ...but from my experience with gdb, my guess is that gdb wants more
> than just the simple frame pointers.  It wants to know where _all_ the
> registers are stored on the stack and the only way it's going to get
> that from assembly code (especially assembly code that barfed the
> registers onto the stack somewhere that's not between FUNC and
> ENDFUNC) is with some type of annotation.  My guess is that it doesn't
> fall back to just looking at frame pointer chains.  Specifically as
> you move up the stack frame in gdb and you type "info reg", the set of
> registers changes to be those registers that are correct for the stack
> frame you're on.  Here's a quick example showing how gdb behaves with
> a random register that was barfed, $x22:
> 
> (gdb) frame 3
> #3  0xffffff800846a088 in __handle_sysrq (key=103,
> check_mask=<optimized out>) at .../drivers/tty/sysrq.c:620
> 620                             op_p->handler(key);
> 
> (gdb) disass
> Dump of assembler code for function __handle_sysrq:
>    0xffffff8008469f64 <+0>:     str     x23, [sp, #-64]!
>    0xffffff8008469f68 <+4>:     stp     x22, x21, [sp, #16]
>    0xffffff8008469f6c <+8>:     stp     x20, x19, [sp, #32]
>    0xffffff8008469f70 <+12>:    stp     x29, x30, [sp, #48]
>    0xffffff8008469f74 <+16>:    add     x29, sp, #0x30
> 
> (gdb) print /x $x22
> $13 = 0xffffff8009035000
> 
> (gdb) print /x *(void**)($x29 - 0x30 + 16)
> $14 = 0x8000100
> 
> (gdb) up
> #4  0xffffff800846a0dc in handle_sysrq (key=103) at 
> .../drivers/tty/sysrq.c:649
> 649                     __handle_sysrq(key, true);
> 
> (gdb) print /x $x22
> $15 = 0x8000100


Indeed, but this requires full DWARF or .eh_frame info, which is not
generally available in the kernel.

Except for code built with -fomit-frame-pointer, you should at least
be able to see a list of frames though: this doesn't require all the
registers of ancestor frames to be recovered, just x29 and lr (which is
what the frame records on the stack contain -- so no other magic info
is required in order to recover these).

gdb tries various methods to unwind a frame, and ought to fall back to
this approach if all else fails.  Frame chains that appear to loop
are a problem though, with no straightforward solution.

My hunch is that gdb sees the frame chain attempt to loop backwards
after el1_irq and bails out.  Is your task stack at a lower address than
the IRQ stack?

In the kernel we gave up attempting to fully detect backtrace loops
because of this issue, but this involves some cruddy heuristics which
may not be considered acceptable for gdb.  For one thing, our rules are
specific for the kernel, not general-purpose.

Cheers
---Dave


_______________________________________________
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport

Reply via email to