I think we could think of five levels of eh_frame information:
1 unwind instructions at exception throw locations & locations where a callee
may throw an exception
2 unwind instructions that describe the prologue
3 unwind instructions that describe the epilogue at the end of the function
4 unwind instructions that describe mid-function epilogues (I see these on arm
all the time, don't see them on x86 with compiler generated code - but we don't
use eh_frame on arm at Apple, I'm just mentioning it for completeness)
5 unwind instructions that describe any changes mid-function needed to unwind
at all instructions ("asynchronous unwind information")
The eh_frame section only guarantees #1. gcc and clang always do #1 and #2.
Modern gcc's do #3. I don't know if gcc would do #4 on arm but it's not
important, I just mention it for completeness. And no one does #5 (as far as I
know), even in the DWARF debug_frame section.
I think it maybe possible to detect if an eh_frame entry fulfills #3 by looking
if the CFA definition on the last row is the same as the initial CFA
definition. But I'm not sure how a debugger could use heuristics to determine
much else.
In fact, detecting #3 may be the easiest thing to detect. I'm not sure if the
debugger could really detect #2 except maybe if the function had a standard
prologue (push rbp, mov rsp rbp) and the eh_frame didn't describe the effects
of these instructions, the debugger could know that the eh_frame does not
describe the prologue.
> On Jul 30, 2014, at 6:58 PM, Tong Shen <[email protected]> wrote:
>
> Ah I understand now.
>
> Now prologue seems always included in CFI fro gcc & clang; and newer gcc
> includes epilogue as well.
> Maybe we can detect and use them when they are available?
>
>
> On Wed, Jul 30, 2014 at 6:44 PM, Jason Molenda <[email protected]> wrote:
> Ah, it looks like gcc changed since I last looked at its eh_frame output.
>
> It's not a bug -- the eh_frame unwind instructions only need to be accurate
> at instructions where an exception can be thrown, or where a callee function
> can throw an exception. There's no requirement to include prologue or
> epilogue instructions in the eh_frame.
>
> And unfortunately from lldb's perspective, when we see eh_frame we'll never
> know how descriptive it is. If it's old-gcc or clang, it won't include
> epilogue instructions. If it's from another compiler, it may not include any
> prologue/epilogue instructions at all.
>
> Maybe we could look over the UnwindPlan rows and see if the CFA definition of
> the last row matches the initial row's CFA definition. That would show that
> the epilogue is described. Unless it is a tail-call (aka noreturn) function
> - in which case the stack is never restored.
>
>
>
>
> > On Jul 30, 2014, at 6:32 PM, Tong Shen <[email protected]> wrote:
> >
> > GCC seems to generate a row for epilogue.
> > Do you think this is a clang bug, or at least a discrepancy between clang &
> > gcc?
> >
> > Source:
> > int f() {
> > puts("HI\n");
> > return 5;
> > }
> >
> > Compile option: only -g
> >
> > gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
> > clang version 3.5.0 (213114)
> >
> > Env: Ubuntu 14.04, x86_64
> >
> > drawfdump -F of clang binary:
> > < 2><0x00400530:0x00400559><f><fde offset 0x00000088 length:
> > 0x0000001c><eh aug data len 0x0>
> > 0x00400530: <off cfa=08(r7) > <off r16=-8(cfa) >
> > 0x00400531: <off cfa=16(r7) > <off r6=-16(cfa) > <off r16=-8(cfa) >
> > 0x00400534: <off cfa=16(r6) > <off r6=-16(cfa) > <off r16=-8(cfa) >
> >
> > drawfdump -F of gcc binary:
> > < 1><0x0040052d:0x00400542><f><fde offset 0x00000070 length:
> > 0x0000001c><eh aug data len 0x0>
> > 0x0040052d: <off cfa=08(r7) > <off r16=-8(cfa) >
> > 0x0040052e: <off cfa=16(r7) > <off r6=-16(cfa) > <off r16=-8(cfa) >
> > 0x00400531: <off cfa=16(r6) > <off r6=-16(cfa) > <off r16=-8(cfa) >
> > 0x00400541: <off cfa=08(r7) > <off r6=-16(cfa) > <off r16=-8(cfa) >
> >
> >
> > On Wed, Jul 30, 2014 at 5:43 PM, Jason Molenda <[email protected]> wrote:
> > I'm open to trying to trust eh_frame at frame 0 for x86_64. The lack of
> > epilogue descriptions in eh_frame is the biggest problem here.
> >
> > When you "step" or "next" in the debugger, the debugger instruction steps
> > across the source line until it gets to the next source line. Every time
> > it stops after an instruction step, it confirms that it is (1) between the
> > start and end pc values for the source line, and (2) that the "stack id"
> > (start address of the function + CFA address) is the same. If it stops and
> > the stack id has changed, for a "next" command, it will backtrace one stack
> > frame to see if it stepped into a function. If so, it sets a breakpoint on
> > the return address and continues.
> >
> > If you switch lldb to prefer eh_frame instructions for x86_64, e.g.
> >
> > Index: source/Plugins/Process/Utility/RegisterContextLLDB.cpp
> > ===================================================================
> > --- source/Plugins/Process/Utility/RegisterContextLLDB.cpp (revision
> > 214344)
> > +++ source/Plugins/Process/Utility/RegisterContextLLDB.cpp (working
> > copy)
> > @@ -791,6 +791,22 @@
> > }
> > }
> >
> > + // For x86_64 debugging, let's try using the eh_frame instructions
> > even if this is the currently
> > + // executing function (frame zero).
> > + Target *target = exe_ctx.GetTargetPtr();
> > + if (target
> > + && (target->GetArchitecture().GetCore() ==
> > ArchSpec::eCore_x86_64_x86_64h
> > + || target->GetArchitecture().GetCore() ==
> > ArchSpec::eCore_x86_64_x86_64))
> > + {
> > + unwind_plan_sp = func_unwinders_sp->GetUnwindPlanAtCallSite
> > (m_current_offset_backed_up_one);
> > + int valid_offset = -1;
> > + if (IsUnwindPlanValidForCurrentPC(unwind_plan_sp, valid_offset))
> > + {
> > + UnwindLogMsgVerbose ("frame uses %s for full UnwindPlan,
> > preferred over assembly profiling on x86_64",
> > unwind_plan_sp->GetSourceName().GetCString());
> > + return unwind_plan_sp;
> > + }
> > + }
> > +
> > // Typically the NonCallSite UnwindPlan is the unwind created by
> > inspecting the assembly language instructions
> > if (behaves_like_zeroth_frame)
> > {
> >
> >
> > you'll find that you have to "next" twice to step out of a function. Why?
> > With a simple function like:
> >
> > * thread #1: tid = 0xaf31e, 0x0000000100000eb9 a.out`foo + 25 at a.c:5,
> > queue = 'com.apple.main-thread', stop reason = step over
> > #0: 0x0000000100000eb9 a.out`foo + 25 at a.c:5
> > 2 int foo ()
> > 3 {
> > 4 puts("HI");
> > -> 5 return 5;
> > 6 }
> > 7
> > 8 int bar ()
> > (lldb) disass
> > a.out`foo at a.c:3:
> > 0x100000ea0: pushq %rbp
> > 0x100000ea1: movq %rsp, %rbp
> > 0x100000ea4: subq $0x10, %rsp
> > 0x100000ea8: leaq 0x6b(%rip), %rdi ; "HI"
> > 0x100000eaf: callq 0x100000efa ; symbol stub for: puts
> > 0x100000eb4: movl $0x5, %ecx
> > -> 0x100000eb9: movl %eax, -0x4(%rbp)
> > 0x100000ebc: movl %ecx, %eax
> > 0x100000ebe: addq $0x10, %rsp
> > 0x100000ec2: popq %rbp
> > 0x100000ec3: retq
> >
> >
> > if you do "next" lldb will instruction step, comparing the stack ID at
> > every stop, until it gets to 0x100000ec3 at which point the stack ID will
> > change. The CFA address (which the eh_frame tells us is rbp+16) just
> > changed to the caller's CFA address because we're about to return. The
> > eh_frame instructions really need to tell us that the CFA is now rsp+8 at
> > 0x100000ec3.
> >
> > The end result is that you need to "next" twice to step out of a function.
> >
> > AssemblyParse_x86 has a special bit where it looks or the 'ret' instruction
> > sequence at the end of the function -
> >
> > // Now look at the byte at the end of the AddressRange for a limited
> > attempt at describing the
> > // epilogue. We're looking for the sequence
> >
> > // [ 0x5d ] mov %rbp, %rsp
> > // [ 0xc3 ] ret
> > // [ 0xe8 xx xx xx xx ] call __stack_chk_fail (this is sometimes the
> > final insn in the function)
> >
> > // We want to add a Row describing how to unwind when we're stopped on
> > the 'ret' instruction where the
> > // CFA is no longer defined in terms of rbp, but is now defined in
> > terms of rsp like on function entry.
> >
> >
> > and adds an extra row of unwind details for that instruction.
> >
> >
> > I mention x86_64 as being a possible good test case here because I worry
> > about the i386 picbase sequence (call next-instruction; pop $ebx) which
> > occurs a lot. But for x86_64, my main concern is the epilogues.
> >
> >
> >
> > > On Jul 30, 2014, at 2:52 PM, Tong Shen <[email protected]> wrote:
> > >
> > > Thanks Jason! That's a very informative post, clarify things a lot :-)
> > >
> > > Well I have to admit that my patch is specifically for certain kind of
> > > functions, and now I see that's not the general case.
> > >
> > > I did some experiment with gdb. gdb uses CFI for frame 0, either x86 or
> > > x86_64. It looks for FDE of frame 0, and do CFA calculations according to
> > > that.
> > >
> > > - For compiler generated functions: I think there are 2 usage scenarios
> > > for frame 0: breakpoint and signal.
> > > - Breakpoints are usually at source line boundary instead of
> > > instruction boundary, and generally we won't be caught at stack pointer
> > > changing locations, so CFI is still valid.
> > > - For signal, synchronous unwind table may not be sufficient here.
> > > But only stack changing instructions will cause incorrect CFA
> > > calculation, so it' not always the case.
> > > - For hand written assembly functions: from what I've seen, most of the
> > > time CFI is present and actually asynchronous.
> > > So it seems that in most cases, even with only synchronous unwind table,
> > > CFI is still correct.
> > >
> > > I believe we can trust eh_frame for frame 0 and use assembly profiling as
> > > fallback. If both failed, maybe code owner should use
> > > -fasynchronous-unwind-tables :-)
> > >
> > >
> > > On Tue, Jul 29, 2014 at 4:59 PM, Jason Molenda <[email protected]> wrote:
> > > It was a tricky one and got lost in the shuffle of a busy week. I was
> > > always reluctant to try profiling all the instructions in a function. On
> > > x86, compiler generated code (gcc/clang anyway) is very simplistic about
> > > setting up the stack frame at the start and only having one epilogue - so
> > > anything fancier risked making mistakes and could possibly have a
> > > performance impact as we run functions through the disassembler.
> > >
> > > For hand-written assembly functions (which can be very creative with
> > > their prologue/epilogue and where it is placed), my position is that they
> > > should write eh_frame instructions in their assembly source to tell lldb
> > > where to find things. There is one or two libraries on Mac OS X where we
> > > break the "ignore eh_frame for the currently executing function" because
> > > there are many hand-written assembly functions in there and the eh_frame
> > > is going to beat our own analysis.
> > >
> > >
> > > After I wrote the x86 unwinder, Greg and Caroline implemented the arm
> > > unwinder where it emulates every instruction in the function looking for
> > > prologue/epilogue instructions. We haven't seen it having a particularly
> > > bad impact performance-wise (lldb only does this disassembly for
> > > functions that it finds on stacks during an execution run, and it saves
> > > the result so it won't re-compute it for a given function). The clang
> > > armv7 codegen often has mid-function epilogues (early returns) which
> > > definitely complicated things and made it necessary to step through the
> > > entire function bodies. There's a bunch of code I added to support these
> > > mid-function epilogues - I have to save the register save state when I
> > > see an instruction which looks like an epilogue, and when I see the final
> > > ret instruction (aka restoring the saved lr contents into pc), I
> > > re-install the register save state from before the epilogue started.
> > >
> > > These things always make me a little nervous because the instruction
> > > analyzer obviously is doing a static analysis so it knows nothing about
> > > flow control. Tong's patch stops when it sees the first CALL instruction
> > > - but that's not right, that's just solving the problem for his
> > > particular function which doesn't have any CALL instructions before his
> > > prologue. :) You could imagine a function which saves a couple of
> > > registers, calls another function, then saves a couple more because it
> > > needs more scratch registers.
> > >
> > > If we're going to change to profiling deep into the function -- and I'm
> > > not opposed to doing that, it's been fine on arm -- we should just do the
> > > entire function I think.
> > >
> > >
> > > Another alternative would be to trust eh_frame on x86_64 at frame 0.
> > > This is one of those things where there's not a great solution. The
> > > unwind instructions in eh_frame are only guaranteed to be accurate for
> > > synchronous unwinds -- that is, they are only guaranteed to be accurate
> > > at places where an exception could be thrown - at call sites. So for
> > > instances, there's no reason why the compiler has to describe the
> > > function prologue instructions at all. There's no requirement that the
> > > eh_frame instructions describe the epilogue instructions. The
> > > information about spilled registers only needs to be emitted where we
> > > could throw an exception, or where a callee could throw an exception.
> > >
> > > clang/gcc both emit detailed instructions for the prologue setup. But
> > > for i386 codegen if the compiler needs to access some pc-relative data,
> > > it will do a "call next-instruction; pop %eax" to get the current pc
> > > value. (x86_64 has rip-relative addressing so this isn't needed) If
> > > you're debugging -fomit-frame-pointer code, that means your CFA is
> > > expressed in terms of the stack pointer and the stack pointer just
> > > changed mid-function --- and eh_frame instructions don't describe this.
> > >
> > > The end result: If you want accurate unwinds 100% of the time, you can't
> > > rely on the unwind instructions from eh_frame. But they'll get you
> > > accurate unwinds 99.9% of the time ... also, last I checked, neither
> > > clang nor gcc describe the epilogue instructions.
> > >
> > >
> > > In *theory* the unwind instructions from the DWARF debug_frame section
> > > should be asynchronous -- they should describe how to find the CFA
> > > address for every instruction in the function. Which makes sense - you
> > > want eh_frame to be compact because it's bundled into the executable, so
> > > it should only have the information necessary for exception handling and
> > > you can put the verbose stuff in debug_frame DWARF for debuggers. But
> > > instead (again, last time I checked), the compilers put the exact same
> > > thing in debug_frame even if you use the -fasynchronous-unwind-tables (or
> > > whatever that switch was) option.
> > >
> > >
> > > So I don't know, maybe we should just start trusting eh_frame at frame 0
> > > and write off those .1% cases where it isn't correct instead of trying to
> > > get too fancy with the assembly analysis code.
> > >
> > >
> > >
> > > > On Jul 29, 2014, at 4:17 PM, Todd Fiala <[email protected]> wrote:
> > > >
> > > > Hey Jason,
> > > >
> > > > Do you have any feedback on this?
> > > >
> > > > Thanks!
> > > >
> > > > -Todd
> > > >
> > > >
> > > > On Fri, Jul 25, 2014 at 1:42 PM, Tong Shen <[email protected]>
> > > > wrote:
> > > > Sorry, wrong version of patch...
> > > >
> > > >
> > > > On Fri, Jul 25, 2014 at 1:41 PM, Tong Shen <[email protected]>
> > > > wrote:
> > > > Hi Molenda, lldb-commits,
> > > >
> > > > For now, x86 assembly profiler will stop after 10 "non-prologue"
> > > > instructions. In practice it may not be sufficient. For example, we
> > > > have a hand-written assembly function, which have hundreds of
> > > > instruction before actual (stack-adjusting) prologue instructions.
> > > >
> > > > One way is to change the limit to 1000; but there will always be
> > > > functions that break the limit :-) I believe the right thing to do here
> > > > is parsing all instructions before "ret"/"call" as prologue
> > > > instructions.
> > > >
> > > > Here's what I changed:
> > > > - For "push %rbx" and "mov %rbx, -8(%rbp)": only add first row for that
> > > > register. They may appear multiple times in function body. But as long
> > > > as one of them appears, first appearance should be in prologue(If it's
> > > > not in prologue, this function will not use %rbx, so these 2
> > > > instructions should not appear at all).
> > > > - Also monitor "add %rsp 0x20".
> > > > - Remove non prologue instruction count.
> > > > - Add "call" instruction detection, and stop parsing after it.
> > > >
> > > > Thanks.
> > > >
> > > > --
> > > > Best Regards, Tong Shen
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Tong Shen
> > > >
> > > > _______________________________________________
> > > > lldb-commits mailing list
> > > > [email protected]
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Todd Fiala | Software Engineer | [email protected] |
> > > > 650-943-3180
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Tong Shen
> >
> >
> >
> >
> > --
> > Best Regards, Tong Shen
>
>
>
>
> --
> Best Regards, Tong Shen
_______________________________________________
lldb-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits