Re: [Lldb-commits] [PATCH] Profile Assembly Until Ret Instruction

Jason Molenda Wed, 30 Jul 2014 17:45:14 -0700

I'm open to trying to trust eh_frame at frame 0 for x86_64.  The lack of 
epilogue descriptions in eh_frame is the biggest problem here.


When you "step" or "next" in the debugger, the debugger instruction steps 
across the source line until it gets to the next source line.  Every time it 
stops after an instruction step, it confirms that it is (1) between the start 
and end pc values for the source line, and (2) that the "stack id" (start 
address of the function + CFA address) is the same.  If it stops and the stack 
id has changed, for a "next" command, it will backtrace one stack frame to see 
if it stepped into a function.  If so, it sets a breakpoint on the return 
address and continues.

If you switch lldb to prefer eh_frame instructions for x86_64, e.g.

Index: source/Plugins/Process/Utility/RegisterContextLLDB.cpp
===================================================================
--- source/Plugins/Process/Utility/RegisterContextLLDB.cpp      (revision 
214344)
+++ source/Plugins/Process/Utility/RegisterContextLLDB.cpp      (working copy)
@@ -791,6 +791,22 @@
         }
     }
 
+    // For x86_64 debugging, let's try using the eh_frame instructions even if 
this is the currently
+    // executing function (frame zero).
+    Target *target = exe_ctx.GetTargetPtr();
+    if (target
+        && (target->GetArchitecture().GetCore() == 
ArchSpec::eCore_x86_64_x86_64h
+            || target->GetArchitecture().GetCore() == 
ArchSpec::eCore_x86_64_x86_64))
+    {
+        unwind_plan_sp = func_unwinders_sp->GetUnwindPlanAtCallSite 
(m_current_offset_backed_up_one);
+        int valid_offset = -1;
+        if (IsUnwindPlanValidForCurrentPC(unwind_plan_sp, valid_offset))
+        {
+            UnwindLogMsgVerbose ("frame uses %s for full UnwindPlan, preferred 
over assembly profiling on x86_64", 
unwind_plan_sp->GetSourceName().GetCString());
+            return unwind_plan_sp;
+        }
+    }
+
     // Typically the NonCallSite UnwindPlan is the unwind created by 
inspecting the assembly language instructions
     if (behaves_like_zeroth_frame)
     {


you'll find that you have to "next" twice to step out of a function.  Why?  
With a simple function like:

* thread #1: tid = 0xaf31e, 0x0000000100000eb9 a.out`foo + 25 at a.c:5, queue = 
'com.apple.main-thread', stop reason = step over
    #0: 0x0000000100000eb9 a.out`foo + 25 at a.c:5
   2    int foo ()
   3    {
   4        puts("HI");
-> 5        return 5;
   6    }
   7    
   8    int bar ()
(lldb) disass
a.out`foo at a.c:3:
   0x100000ea0:  pushq  %rbp
   0x100000ea1:  movq   %rsp, %rbp
   0x100000ea4:  subq   $0x10, %rsp
   0x100000ea8:  leaq   0x6b(%rip), %rdi          ; "HI"
   0x100000eaf:  callq  0x100000efa               ; symbol stub for: puts
   0x100000eb4:  movl   $0x5, %ecx
-> 0x100000eb9:  movl   %eax, -0x4(%rbp)
   0x100000ebc:  movl   %ecx, %eax
   0x100000ebe:  addq   $0x10, %rsp
   0x100000ec2:  popq   %rbp
   0x100000ec3:  retq   


if you do "next" lldb will instruction step, comparing the stack ID at every 
stop, until it gets to 0x100000ec3 at which point the stack ID will change.  
The CFA address (which the eh_frame tells us is rbp+16) just changed to the 
caller's CFA address because we're about to return.  The eh_frame instructions 
really need to tell us that the CFA is now rsp+8 at 0x100000ec3.

The end result is that you need to "next" twice to step out of a function.  

AssemblyParse_x86 has a special bit where it looks or the 'ret' instruction 
sequence at the end of the function -

   // Now look at the byte at the end of the AddressRange for a limited attempt 
at describing the
    // epilogue.  We're looking for the sequence

    //  [ 0x5d ] mov %rbp, %rsp
    //  [ 0xc3 ] ret
    //  [ 0xe8 xx xx xx xx ] call __stack_chk_fail  (this is sometimes the 
final insn in the function)

    // We want to add a Row describing how to unwind when we're stopped on the 
'ret' instruction where the
    // CFA is no longer defined in terms of rbp, but is now defined in terms of 
rsp like on function entry.


and adds an extra row of unwind details for that instruction.


I mention x86_64 as being a possible good test case here because I worry about 
the i386 picbase sequence (call next-instruction; pop $ebx) which occurs a lot. 
 But for x86_64, my main concern is the epilogues.



> On Jul 30, 2014, at 2:52 PM, Tong Shen <[email protected]> wrote:
> 
> Thanks Jason! That's a very informative post, clarify things a lot :-)
> 
> Well I have to admit that my patch is specifically for certain kind of 
> functions, and now I see that's not the general case.
> 
> I did some experiment with gdb. gdb uses CFI for frame 0, either x86 or 
> x86_64. It looks for FDE of frame 0, and do CFA calculations according to 
> that.
> 
> - For compiler generated functions: I think there are 2 usage scenarios for 
> frame 0: breakpoint and signal. 
>     - Breakpoints are usually at source line boundary instead of instruction 
> boundary, and generally we won't be caught at stack pointer changing 
> locations, so CFI is still valid.
>     - For signal, synchronous unwind table may not be sufficient here. But 
> only stack changing instructions will cause incorrect CFA calculation, so it' 
> not always the case.
> - For hand written assembly functions: from what I've seen, most of the time 
> CFI is present and actually asynchronous.
> So it seems that in most cases, even with only synchronous unwind table, CFI 
> is still correct.
> 
> I believe we can trust eh_frame for frame 0 and use assembly profiling as 
> fallback. If both failed, maybe code owner should use 
> -fasynchronous-unwind-tables :-)
> 
> 
> On Tue, Jul 29, 2014 at 4:59 PM, Jason Molenda <[email protected]> wrote:
> It was a tricky one and got lost in the shuffle of a busy week.  I was always 
> reluctant to try profiling all the instructions in a function.  On x86, 
> compiler generated code (gcc/clang anyway) is very simplistic about setting 
> up the stack frame at the start and only having one epilogue - so anything 
> fancier risked making mistakes and could possibly have a performance impact 
> as we run functions through the disassembler.
> 
> For hand-written assembly functions (which can be very creative with their 
> prologue/epilogue and where it is placed), my position is that they should 
> write eh_frame instructions in their assembly source to tell lldb where to 
> find things.  There is one or two libraries on Mac OS X where we break the 
> "ignore eh_frame for the currently executing function" because there are many 
> hand-written assembly functions in there and the eh_frame is going to beat 
> our own analysis.
> 
> 
> After I wrote the x86 unwinder, Greg and Caroline implemented the arm 
> unwinder where it emulates every instruction in the function looking for 
> prologue/epilogue instructions.  We haven't seen it having a particularly bad 
> impact performance-wise (lldb only does this disassembly for functions that 
> it finds on stacks during an execution run, and it saves the result so it 
> won't re-compute it for a given function).  The clang armv7 codegen often has 
> mid-function epilogues (early returns) which definitely complicated things 
> and made it necessary to step through the entire function bodies.  There's a 
> bunch of code I added to support these mid-function epilogues - I have to 
> save the register save state when I see an instruction which looks like an 
> epilogue, and when I see the final ret instruction (aka restoring the saved 
> lr contents into pc), I re-install the register save state from before the 
> epilogue started.
> 
> These things always make me a little nervous because the instruction analyzer 
> obviously is doing a static analysis so it knows nothing about flow control.  
> Tong's patch stops when it sees the first CALL instruction - but that's not 
> right, that's just solving the problem for his particular function which 
> doesn't have any CALL instructions before his prologue. :) You could imagine 
> a function which saves a couple of registers, calls another function, then 
> saves a couple more because it needs more scratch registers.
> 
> If we're going to change to profiling deep into the function -- and I'm not 
> opposed to doing that, it's been fine on arm -- we should just do the entire 
> function I think.
> 
> 
> Another alternative would be to trust eh_frame on x86_64 at frame 0.  This is 
> one of those things where there's not a great solution.  The unwind 
> instructions in eh_frame are only guaranteed to be accurate for synchronous 
> unwinds -- that is, they are only guaranteed to be accurate at places where 
> an exception could be thrown - at call sites.  So for instances, there's no 
> reason why the compiler has to describe the function prologue instructions at 
> all.  There's no requirement that the eh_frame instructions describe the 
> epilogue instructions.  The information about spilled registers only needs to 
> be emitted where we could throw an exception, or where a callee could throw 
> an exception.
> 
> clang/gcc both emit detailed instructions for the prologue setup.  But for 
> i386 codegen if the compiler needs to access some pc-relative data, it will 
> do a "call next-instruction; pop %eax" to get the current pc value.  (x86_64 
> has rip-relative addressing so this isn't needed)  If you're debugging 
> -fomit-frame-pointer code, that means your CFA is expressed in terms of the 
> stack pointer and the stack pointer just changed mid-function --- and 
> eh_frame instructions don't describe this.
> 
> The end result: If you want accurate unwinds 100% of the time, you can't rely 
> on the unwind instructions from eh_frame.  But they'll get you accurate 
> unwinds 99.9% of the time ...  also, last I checked, neither clang nor gcc 
> describe the epilogue instructions.
> 
> 
> In *theory* the unwind instructions from the DWARF debug_frame section should 
> be asynchronous -- they should describe how to find the CFA address for every 
> instruction in the function.  Which makes sense - you want eh_frame to be 
> compact because it's bundled into the executable, so it should only have the 
> information necessary for exception handling and you can put the verbose 
> stuff in debug_frame DWARF for debuggers.  But instead (again, last time I 
> checked), the compilers put the exact same thing in debug_frame even if you 
> use the -fasynchronous-unwind-tables (or whatever that switch was) option.
> 
> 
> So I don't know, maybe we should just start trusting eh_frame at frame 0 and 
> write off those .1% cases where it isn't correct instead of trying to get too 
> fancy with the assembly analysis code.
> 
> 
> 
> > On Jul 29, 2014, at 4:17 PM, Todd Fiala <[email protected]> wrote:
> >
> > Hey Jason,
> >
> > Do you have any feedback on this?
> >
> > Thanks!
> >
> > -Todd
> >
> >
> > On Fri, Jul 25, 2014 at 1:42 PM, Tong Shen <[email protected]> wrote:
> > Sorry, wrong version of patch...
> >
> >
> > On Fri, Jul 25, 2014 at 1:41 PM, Tong Shen <[email protected]> wrote:
> > Hi Molenda, lldb-commits,
> >
> > For now, x86 assembly profiler will stop after 10 "non-prologue" 
> > instructions. In practice it may not be sufficient. For example, we have a 
> > hand-written assembly function, which have hundreds of instruction before 
> > actual (stack-adjusting) prologue instructions.
> >
> > One way is to change the limit to 1000; but there will always be functions 
> > that break the limit :-) I believe the right thing to do here is parsing 
> > all instructions before "ret"/"call" as prologue instructions.
> >
> > Here's what I changed:
> > - For "push %rbx" and "mov %rbx, -8(%rbp)": only add first row for that 
> > register. They may appear multiple times in function body. But as long as 
> > one of them appears, first appearance should be in prologue(If it's not in 
> > prologue, this function will not use %rbx, so these 2 instructions should 
> > not appear at all).
> > - Also monitor "add %rsp 0x20".
> > - Remove non prologue instruction count.
> > - Add "call" instruction detection, and stop parsing after it.
> >
> > Thanks.
> >
> > --
> > Best Regards, Tong Shen
> >
> >
> >
> > --
> > Best Regards, Tong Shen
> >
> > _______________________________________________
> > lldb-commits mailing list
> > [email protected]
> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
> >
> >
> >
> >
> > --
> > Todd Fiala |   Software Engineer |     [email protected] |     650-943-3180
> >
> 
> 
> 
> 
> -- 
> Best Regards, Tong Shen

_______________________________________________
lldb-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits

Re: [Lldb-commits] [PATCH] Profile Assembly Until Ret Instruction

Reply via email to