Er I'll "get it" in... eek..
On Fri, Aug 22, 2014 at 9:49 AM, Todd Fiala <[email protected]> wrote: > I'm going to test this now. If it all looks good, I'll ge tit in. > > > On Tue, Aug 19, 2014 at 5:01 PM, Tong Shen <[email protected]> wrote: > >> Thanks Jason! >> I will finish this patch and let's see how it goes. >> >> P.S. I know a little about eh_frame stuff; I added CFI to the new Android >> ahead-of-time Java compiler so AOT'ed code can properly unwind :-) >> >> >> >> On Tue, Aug 19, 2014 at 4:51 PM, Jason Molenda <[email protected]> >> wrote: >> >>> The CIE sets the initial unwind state -- the CIE may describe the unwind >>> state at the first instruction (as it always does with gcc, clang) but in >>> theory it could describe the unwind state once the prologue had executed. >>> >>> The idea is that there is one CIE entry which describes a typical >>> at-first-instruction unwind state and then many FDEs that describe the >>> unwind instructions for specific functions - they all use that one CIE. >>> >>> Anyway, that's just an implementation detail of eh_frame. I honestly >>> don't think we should worry about incomplete eh_frame - let's try living on >>> them and see how it works in practice. >>> >>> It may be possible to categorize eh_frame to see how complete it is. >>> Compiler-generated x86 prologues are very regular, it would be possible to >>> look at the first few bytes of a function for some pushes or stack pointer >>> changes and see if the eh_frame describes that. We know what the unwind >>> state is on the first instruction of a function (it's determined by the >>> ABI) -- does the eh_frame have the same instructions? Can we can through >>> the function for an epilogue, and if we find one, does the eh_frame have >>> unwind instructions there? >>> >>> But I don't want to have the perfect be the enemy of the good. IMO >>> let's take the plunge and try, to use eh_frame and see how that goes. We >>> can refine it later, or back it out again (it will be a very small change >>> to RegisterContextLLDB) if necessary. >>> >>> >>> > On Aug 19, 2014, at 4:41 PM, Tong Shen <[email protected]> wrote: >>> > >>> > And for no prologue case: >>> > We can detect this easily (any CFI for start address?) and bail out, >>> so we will fallback to assembly profiler. >>> > >>> > >>> > On Tue, Aug 19, 2014 at 4:36 PM, Tong Shen <[email protected]> >>> wrote: >>> > Ahh sorry I've been working on something else this week and didn't get >>> back to you in time. >>> > And you've been very patient and informative. Thanks! >>> > >>> > I'm only suggesting it for x86 / x86_64. What I am doing here relies >>> on: >>> > - Compiler describes prologue; >>> > - We can figure our all mid function CFA changes by inspecting >>> instructions. >>> > >>> > For frame 0, the new progress for CFA locating will look like this: >>> > - Find the nearest CFI available before current PC. >>> > - If the CFI is for current PC, viola :-) If not, continue. >>> > - Inspect all instructions in between, and make changes to CFA >>> accordingly. This can solve the PC relative addressing case. >>> > - For epilogue, detect if we are in middle of an epilogue. Considering >>> that there are not many patterns and they are all simple, I think we can >>> enumerate them and handle accordingly. >>> > >>> > From what I've seen so far, this actually can solve most of gcc/clang >>> generated code. >>> > For JIT'ed code or hand written assembly, if there's no asynchronous >>> CFI we are screwed anyway, so trying this won't hurt either (except some >>> extra running time).\ >>> > >>> > I hope I explain my thoughts clearly. >>> > >>> > Thank you. >>> > >>> > >>> > >>> > On Tue, Aug 19, 2014 at 4:22 PM, Jason Molenda <[email protected]> >>> wrote: >>> > Hi Tong, my message was a little rambling. Let's be specific. >>> > >>> > We are changing lldb to trust eh_frame instructions on the >>> currently-executing aka 0th frame. >>> > >>> > In practice, gcc and clang eh_frame both describe the prologue, so >>> this is OK. >>> > >>> > Old gcc and clang eh_frame do not describe the epilogue. So we need >>> to add a pass for i386/x86_64 (at least) to augment the eh_frame-sourced >>> unwind instructions. I don't know if it would be best to augment eh_frame >>> UnwindPlans when we create them in DWARFCallFrameInfo or if it would be >>> better to do it lazily when we are actually using the unwind instructions >>> in RegisterContextLLDB (probably RegisterContextLLDB like you were doing). >>> We should only do it once for a given function, of course. >>> > >>> > I think it would cleanest if the augmentation function lived in the >>> UnwindAssembly class. But I haven't looked how easy it is to get an >>> UnwindAssembly object where we need it. >>> > >>> > >>> > Thanks for taking this on. It will be interesting to try living >>> entirely off eh_frame and see how that works for all the >>> architectures/environments lldb supports. >>> > >>> > I worry a little that we're depending on the generous eh_frame from >>> clang/gcc and if we try to run on icc (Intel's compiler) or something like >>> that, we may have no prologue instructions and stepping will work very >>> poorly. But we'll cross that bridge when we get to it. >>> > >>> > >>> > >>> > > On Aug 15, 2014, at 8:07 PM, Jason Molenda <[email protected]> >>> wrote: >>> > > >>> > > Hi Tong, sorry for the delay in replying. >>> > > >>> > > I have a couple thoughts about the patch. First, the change in >>> RegisterContextLLDB::GetFullUnwindPlanForFrame() forces the use of eh_frame >>> unwind instructions ("UnwindPlanAtCallSite" - which normally means the >>> eh_frame unwind instructions) for the currently-executing aka zeroth >>> frame. We've talked about this before, but it's worth noting that this >>> patch includes that change. >>> > > >>> > > There's still the problem of detecting how *asynchronous* those >>> eh_frame unwind instructions are. For instance, what do you get for an >>> i386 program that does >>> > > >>> > > #include <stdio.h> >>> > > int main() >>> > > { >>> > > puts ("HI"); >>> > > } >>> > > >>> > > Most codegen will use a sequence like >>> > > >>> > > call LNextInstruction >>> > > .LNextInstruction >>> > > pop ebx >>> > > >>> > > this call & pop sequence is establishing the "pic base", it the >>> program will then use that address to find the "HI" constant data. If you >>> compile this -fomit-frame-pointer, so we have to use the stack pointer to >>> find the CFA, do the eh_frame instructions describe this? >>> > > >>> > > It's a bit of an extreme example but it's one of those tricky cases >>> where asynchronous ("accurate at every instruction") unwind instructions >>> and synchronous ("accurate at places where we can throw an exception, or a >>> callee can throw an exception") unwind instructions are different. >>> > > >>> > > >>> > > I would use behaves_like_zeroth_frame instead of if (IsFrameZero()) >>> because you can have a frame in the middle of the stack which was the >>> zeroth frame when an asynchronous signal came in -- in which case, the >>> "callee" stack frame will be sigtramp. >>> > > >>> > > >>> > > You'd want to update the UnwindLogMsgVerbose() text, of course. >>> > > >>> > > >>> > > What your DWARFCallFrameInfo::PatchUnwindPlanForX86() function is >>> doing is assuming that the unwind plan fails to include an epilogue >>> description, steps through all the instructions in the function looking for >>> the epilogue. >>> > > >>> > > DWARFCallFrameInfo doesn't seem like the right place for this. >>> There's an assumption that the instructions came from eh_frame and that >>> they are incomplete. It seems like it would more naturally live in the >>> UnwindAssembly plugin and it would have a name like >>> AugmentIncompleteUnwindPlanWithEpilogue or something like that. >>> > > >>> > > What if the CFI already does describe the epilogue? I imagine we'll >>> just end up with a doubling of UnwindPlan Rows that describe the epilogue >>> instructions. >>> > > >>> > > What if we have a mid-function epilogue? I've never seen gcc/clang >>> generate these for x86, but it's possible. It's a common code sequence on >>> arm/arm64. You can see a messy bit of code in >>> UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly which >>> handles these -- saving the UnwindPlan's unwind instructions when we see >>> the beginning of an epilogue, and once the epilogue is complete, restoring >>> the unwind instructions. >>> > > >>> > > >>> > > I'm not opposed to the patch - but it does make the assumption that >>> we're going to use eh_frame for the currently executing function and that >>> the eh_frame instructions do not include a description of the epilogue. >>> (and that there is only one epilogue in the function). Mostly I want to >>> call all of those aspects out so we're clear what we're talking about >>> here. Let's clean it up a bit, put it in and see how it goes. >>> > > >>> > > J >>> > > >>> > > >>> > >> On Aug 14, 2014, at 6:31 PM, Tong Shen <[email protected]> >>> wrote: >>> > >> >>> > >> Hi Jason, >>> > >> >>> > >> Turns out we still need CFI for frame 0 in certain situations... >>> > >> >>> > >> A possible approach is to disassemble machine code, and manually >>> adjust CFI for frame 0. For example, if we see "pop ebp; => ret", we set >>> cfa to [esp]; if we see "call next-insn; => pop %ebp", we set cfa_offset+=4. >>> > >> >>> > >> Patch attached, now it just implements adjustment for "pop ebp; >>> ret". >>> > >> >>> > >> If you think this approach is OK, I will go ahead and add other >>> tricks(i386 pc relative addressing, more styles of epilogue, etc). >>> > >> >>> > >> Thank you for your time! >>> > >> >>> > >> >>> > >> On Thu, Jul 31, 2014 at 12:50 PM, Tong Shen <[email protected]> >>> wrote: >>> > >> I think gdb's rationale for using CFI for leaf function is: >>> > >> - gcc always generate CFI for progolue, so at function entry, we >>> know the correct CFA; >>> > >> - any stack pointer altering operation after that(mid-function & >>> epilogue), we can recognize and handle them. >>> > >> So basically, it assumes 2, hacks its way through 3 & 4, and >>> pretends we are at 5. >>> > >> Number of hacks we need seems to be small in x86 world, so this >>> tradition is still here. >>> > >> >>> > >> Here's what gdb does for epilogue: normally when you run 'n', it >>> will run one instruction a time till the next line/different stack id. But >>> when it sees "pop %rbp; ret", it won't step into these instructions. >>> Instead it will execute past them directly. >>> > >> I didn't experiment with x86 pc-relative addressing; but I guess it >>> will also recognize and execute past this pattern directly. >>> > >> >>> > >> So for compiler generated functions, what we do now with assembly >>> parser now can be done with CFI + those gdb hacks. >>> > >> And for hand-written assembly, i think CFI is almost always precise >>> at instruction level. In this case, utilizing CFI instead of assembly >>> parser will be a big help. >>> > >> >>> > >> So maybe we can apply those hacks, and trust CFI only for x86 & >>> x86_64 targets? >>> > >> >>> > >> >>> > >> On Thu, Jul 31, 2014 at 12:02 AM, Jason Molenda <[email protected]> >>> wrote: >>> > >> I think we could think of five levels of eh_frame information: >>> > >> >>> > >> >>> > >> 1 unwind instructions at exception throw locations & locations >>> where a callee may throw an exception >>> > >> >>> > >> 2 unwind instructions that describe the prologue >>> > >> >>> > >> 3 unwind instructions that describe the epilogue at the end of the >>> function >>> > >> >>> > >> 4 unwind instructions that describe mid-function epilogues (I see >>> these on arm all the time, don't see them on x86 with compiler generated >>> code - but we don't use eh_frame on arm at Apple, I'm just mentioning it >>> for completeness) >>> > >> >>> > >> 5 unwind instructions that describe any changes mid-function needed >>> to unwind at all instructions ("asynchronous unwind information") >>> > >> >>> > >> >>> > >> The eh_frame section only guarantees #1. gcc and clang always do >>> #1 and #2. Modern gcc's do #3. I don't know if gcc would do #4 on arm but >>> it's not important, I just mention it for completeness. And no one does #5 >>> (as far as I know), even in the DWARF debug_frame section. >>> > >> >>> > >> I think it maybe possible to detect if an eh_frame entry fulfills >>> #3 by looking if the CFA definition on the last row is the same as the >>> initial CFA definition. But I'm not sure how a debugger could use >>> heuristics to determine much else. >>> > >> >>> > >> >>> > >> In fact, detecting #3 may be the easiest thing to detect. I'm not >>> sure if the debugger could really detect #2 except maybe if the function >>> had a standard prologue (push rbp, mov rsp rbp) and the eh_frame didn't >>> describe the effects of these instructions, the debugger could know that >>> the eh_frame does not describe the prologue. >>> > >> >>> > >> >>> > >> >>> > >> >>> > >>> On Jul 30, 2014, at 6:58 PM, Tong Shen <[email protected]> >>> wrote: >>> > >>> >>> > >>> Ah I understand now. >>> > >>> >>> > >>> Now prologue seems always included in CFI fro gcc & clang; and >>> newer gcc includes epilogue as well. >>> > >>> Maybe we can detect and use them when they are available? >>> > >>> >>> > >>> >>> > >>> On Wed, Jul 30, 2014 at 6:44 PM, Jason Molenda <[email protected]> >>> wrote: >>> > >>> Ah, it looks like gcc changed since I last looked at its eh_frame >>> output. >>> > >>> >>> > >>> It's not a bug -- the eh_frame unwind instructions only need to be >>> accurate at instructions where an exception can be thrown, or where a >>> callee function can throw an exception. There's no requirement to include >>> prologue or epilogue instructions in the eh_frame. >>> > >>> >>> > >>> And unfortunately from lldb's perspective, when we see eh_frame >>> we'll never know how descriptive it is. If it's old-gcc or clang, it won't >>> include epilogue instructions. If it's from another compiler, it may not >>> include any prologue/epilogue instructions at all. >>> > >>> >>> > >>> Maybe we could look over the UnwindPlan rows and see if the CFA >>> definition of the last row matches the initial row's CFA definition. That >>> would show that the epilogue is described. Unless it is a tail-call (aka >>> noreturn) function - in which case the stack is never restored. >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>>> On Jul 30, 2014, at 6:32 PM, Tong Shen <[email protected]> >>> wrote: >>> > >>>> >>> > >>>> GCC seems to generate a row for epilogue. >>> > >>>> Do you think this is a clang bug, or at least a discrepancy >>> between clang & gcc? >>> > >>>> >>> > >>>> Source: >>> > >>>> int f() { >>> > >>>> puts("HI\n"); >>> > >>>> return 5; >>> > >>>> } >>> > >>>> >>> > >>>> Compile option: only -g >>> > >>>> >>> > >>>> gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) >>> > >>>> clang version 3.5.0 (213114) >>> > >>>> >>> > >>>> Env: Ubuntu 14.04, x86_64 >>> > >>>> >>> > >>>> drawfdump -F of clang binary: >>> > >>>> < 2><0x00400530:0x00400559><f><fde offset 0x00000088 length: >>> 0x0000001c><eh aug data len 0x0> >>> > >>>> 0x00400530: <off cfa=08(r7) > <off r16=-8(cfa) > >>> > >>>> 0x00400531: <off cfa=16(r7) > <off r6=-16(cfa) > <off >>> r16=-8(cfa) > >>> > >>>> 0x00400534: <off cfa=16(r6) > <off r6=-16(cfa) > <off >>> r16=-8(cfa) > >>> > >>>> >>> > >>>> drawfdump -F of gcc binary: >>> > >>>> < 1><0x0040052d:0x00400542><f><fde offset 0x00000070 length: >>> 0x0000001c><eh aug data len 0x0> >>> > >>>> 0x0040052d: <off cfa=08(r7) > <off r16=-8(cfa) > >>> > >>>> 0x0040052e: <off cfa=16(r7) > <off r6=-16(cfa) > <off >>> r16=-8(cfa) > >>> > >>>> 0x00400531: <off cfa=16(r6) > <off r6=-16(cfa) > <off >>> r16=-8(cfa) > >>> > >>>> 0x00400541: <off cfa=08(r7) > <off r6=-16(cfa) > <off >>> r16=-8(cfa) > >>> > >>>> >>> > >>>> >>> > >>>> On Wed, Jul 30, 2014 at 5:43 PM, Jason Molenda < >>> [email protected]> wrote: >>> > >>>> I'm open to trying to trust eh_frame at frame 0 for x86_64. The >>> lack of epilogue descriptions in eh_frame is the biggest problem here. >>> > >>>> >>> > >>>> When you "step" or "next" in the debugger, the debugger >>> instruction steps across the source line until it gets to the next source >>> line. Every time it stops after an instruction step, it confirms that it >>> is (1) between the start and end pc values for the source line, and (2) >>> that the "stack id" (start address of the function + CFA address) is the >>> same. If it stops and the stack id has changed, for a "next" command, it >>> will backtrace one stack frame to see if it stepped into a function. If >>> so, it sets a breakpoint on the return address and continues. >>> > >>>> >>> > >>>> If you switch lldb to prefer eh_frame instructions for x86_64, >>> e.g. >>> > >>>> >>> > >>>> Index: source/Plugins/Process/Utility/RegisterContextLLDB.cpp >>> > >>>> >>> =================================================================== >>> > >>>> --- source/Plugins/Process/Utility/RegisterContextLLDB.cpp >>> (revision 214344) >>> > >>>> +++ source/Plugins/Process/Utility/RegisterContextLLDB.cpp >>> (working copy) >>> > >>>> @@ -791,6 +791,22 @@ >>> > >>>> } >>> > >>>> } >>> > >>>> >>> > >>>> + // For x86_64 debugging, let's try using the eh_frame >>> instructions even if this is the currently >>> > >>>> + // executing function (frame zero). >>> > >>>> + Target *target = exe_ctx.GetTargetPtr(); >>> > >>>> + if (target >>> > >>>> + && (target->GetArchitecture().GetCore() == >>> ArchSpec::eCore_x86_64_x86_64h >>> > >>>> + || target->GetArchitecture().GetCore() == >>> ArchSpec::eCore_x86_64_x86_64)) >>> > >>>> + { >>> > >>>> + unwind_plan_sp = >>> func_unwinders_sp->GetUnwindPlanAtCallSite (m_current_offset_backed_up_one); >>> > >>>> + int valid_offset = -1; >>> > >>>> + if (IsUnwindPlanValidForCurrentPC(unwind_plan_sp, >>> valid_offset)) >>> > >>>> + { >>> > >>>> + UnwindLogMsgVerbose ("frame uses %s for full >>> UnwindPlan, preferred over assembly profiling on x86_64", >>> unwind_plan_sp->GetSourceName().GetCString()); >>> > >>>> + return unwind_plan_sp; >>> > >>>> + } >>> > >>>> + } >>> > >>>> + >>> > >>>> // Typically the NonCallSite UnwindPlan is the unwind created >>> by inspecting the assembly language instructions >>> > >>>> if (behaves_like_zeroth_frame) >>> > >>>> { >>> > >>>> >>> > >>>> >>> > >>>> you'll find that you have to "next" twice to step out of a >>> function. Why? With a simple function like: >>> > >>>> >>> > >>>> * thread #1: tid = 0xaf31e, 0x0000000100000eb9 a.out`foo + 25 at >>> a.c:5, queue = 'com.apple.main-thread', stop reason = step over >>> > >>>> #0: 0x0000000100000eb9 a.out`foo + 25 at a.c:5 >>> > >>>> 2 int foo () >>> > >>>> 3 { >>> > >>>> 4 puts("HI"); >>> > >>>> -> 5 return 5; >>> > >>>> 6 } >>> > >>>> 7 >>> > >>>> 8 int bar () >>> > >>>> (lldb) disass >>> > >>>> a.out`foo at a.c:3: >>> > >>>> 0x100000ea0: pushq %rbp >>> > >>>> 0x100000ea1: movq %rsp, %rbp >>> > >>>> 0x100000ea4: subq $0x10, %rsp >>> > >>>> 0x100000ea8: leaq 0x6b(%rip), %rdi ; "HI" >>> > >>>> 0x100000eaf: callq 0x100000efa ; symbol stub >>> for: puts >>> > >>>> 0x100000eb4: movl $0x5, %ecx >>> > >>>> -> 0x100000eb9: movl %eax, -0x4(%rbp) >>> > >>>> 0x100000ebc: movl %ecx, %eax >>> > >>>> 0x100000ebe: addq $0x10, %rsp >>> > >>>> 0x100000ec2: popq %rbp >>> > >>>> 0x100000ec3: retq >>> > >>>> >>> > >>>> >>> > >>>> if you do "next" lldb will instruction step, comparing the stack >>> ID at every stop, until it gets to 0x100000ec3 at which point the stack ID >>> will change. The CFA address (which the eh_frame tells us is rbp+16) just >>> changed to the caller's CFA address because we're about to return. The >>> eh_frame instructions really need to tell us that the CFA is now rsp+8 at >>> 0x100000ec3. >>> > >>>> >>> > >>>> The end result is that you need to "next" twice to step out of a >>> function. >>> > >>>> >>> > >>>> AssemblyParse_x86 has a special bit where it looks or the 'ret' >>> instruction sequence at the end of the function - >>> > >>>> >>> > >>>> // Now look at the byte at the end of the AddressRange for a >>> limited attempt at describing the >>> > >>>> // epilogue. We're looking for the sequence >>> > >>>> >>> > >>>> // [ 0x5d ] mov %rbp, %rsp >>> > >>>> // [ 0xc3 ] ret >>> > >>>> // [ 0xe8 xx xx xx xx ] call __stack_chk_fail (this is >>> sometimes the final insn in the function) >>> > >>>> >>> > >>>> // We want to add a Row describing how to unwind when we're >>> stopped on the 'ret' instruction where the >>> > >>>> // CFA is no longer defined in terms of rbp, but is now >>> defined in terms of rsp like on function entry. >>> > >>>> >>> > >>>> >>> > >>>> and adds an extra row of unwind details for that instruction. >>> > >>>> >>> > >>>> >>> > >>>> I mention x86_64 as being a possible good test case here because >>> I worry about the i386 picbase sequence (call next-instruction; pop $ebx) >>> which occurs a lot. But for x86_64, my main concern is the epilogues. >>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>>> On Jul 30, 2014, at 2:52 PM, Tong Shen <[email protected]> >>> wrote: >>> > >>>>> >>> > >>>>> Thanks Jason! That's a very informative post, clarify things a >>> lot :-) >>> > >>>>> >>> > >>>>> Well I have to admit that my patch is specifically for certain >>> kind of functions, and now I see that's not the general case. >>> > >>>>> >>> > >>>>> I did some experiment with gdb. gdb uses CFI for frame 0, either >>> x86 or x86_64. It looks for FDE of frame 0, and do CFA calculations >>> according to that. >>> > >>>>> >>> > >>>>> - For compiler generated functions: I think there are 2 usage >>> scenarios for frame 0: breakpoint and signal. >>> > >>>>> - Breakpoints are usually at source line boundary instead of >>> instruction boundary, and generally we won't be caught at stack pointer >>> changing locations, so CFI is still valid. >>> > >>>>> - For signal, synchronous unwind table may not be sufficient >>> here. But only stack changing instructions will cause incorrect CFA >>> calculation, so it' not always the case. >>> > >>>>> - For hand written assembly functions: from what I've seen, most >>> of the time CFI is present and actually asynchronous. >>> > >>>>> So it seems that in most cases, even with only synchronous >>> unwind table, CFI is still correct. >>> > >>>>> >>> > >>>>> I believe we can trust eh_frame for frame 0 and use assembly >>> profiling as fallback. If both failed, maybe code owner should use >>> -fasynchronous-unwind-tables :-) >>> > >>>>> >>> > >>>>> >>> > >>>>> On Tue, Jul 29, 2014 at 4:59 PM, Jason Molenda < >>> [email protected]> wrote: >>> > >>>>> It was a tricky one and got lost in the shuffle of a busy week. >>> I was always reluctant to try profiling all the instructions in a >>> function. On x86, compiler generated code (gcc/clang anyway) is very >>> simplistic about setting up the stack frame at the start and only having >>> one epilogue - so anything fancier risked making mistakes and could >>> possibly have a performance impact as we run functions through the >>> disassembler. >>> > >>>>> >>> > >>>>> For hand-written assembly functions (which can be very creative >>> with their prologue/epilogue and where it is placed), my position is that >>> they should write eh_frame instructions in their assembly source to tell >>> lldb where to find things. There is one or two libraries on Mac OS X where >>> we break the "ignore eh_frame for the currently executing function" because >>> there are many hand-written assembly functions in there and the eh_frame is >>> going to beat our own analysis. >>> > >>>>> >>> > >>>>> >>> > >>>>> After I wrote the x86 unwinder, Greg and Caroline implemented >>> the arm unwinder where it emulates every instruction in the function >>> looking for prologue/epilogue instructions. We haven't seen it having a >>> particularly bad impact performance-wise (lldb only does this disassembly >>> for functions that it finds on stacks during an execution run, and it saves >>> the result so it won't re-compute it for a given function). The clang >>> armv7 codegen often has mid-function epilogues (early returns) which >>> definitely complicated things and made it necessary to step through the >>> entire function bodies. There's a bunch of code I added to support these >>> mid-function epilogues - I have to save the register save state when I see >>> an instruction which looks like an epilogue, and when I see the final ret >>> instruction (aka restoring the saved lr contents into pc), I re-install the >>> register save state from before the epilogue started. >>> > >>>>> >>> > >>>>> These things always make me a little nervous because the >>> instruction analyzer obviously is doing a static analysis so it knows >>> nothing about flow control. Tong's patch stops when it sees the first CALL >>> instruction - but that's not right, that's just solving the problem for his >>> particular function which doesn't have any CALL instructions before his >>> prologue. :) You could imagine a function which saves a couple of >>> registers, calls another function, then saves a couple more because it >>> needs more scratch registers. >>> > >>>>> >>> > >>>>> If we're going to change to profiling deep into the function -- >>> and I'm not opposed to doing that, it's been fine on arm -- we should just >>> do the entire function I think. >>> > >>>>> >>> > >>>>> >>> > >>>>> Another alternative would be to trust eh_frame on x86_64 at >>> frame 0. This is one of those things where there's not a great solution. >>> The unwind instructions in eh_frame are only guaranteed to be accurate for >>> synchronous unwinds -- that is, they are only guaranteed to be accurate at >>> places where an exception could be thrown - at call sites. So for >>> instances, there's no reason why the compiler has to describe the function >>> prologue instructions at all. There's no requirement that the eh_frame >>> instructions describe the epilogue instructions. The information about >>> spilled registers only needs to be emitted where we could throw an >>> exception, or where a callee could throw an exception. >>> > >>>>> >>> > >>>>> clang/gcc both emit detailed instructions for the prologue >>> setup. But for i386 codegen if the compiler needs to access some >>> pc-relative data, it will do a "call next-instruction; pop %eax" to get the >>> current pc value. (x86_64 has rip-relative addressing so this isn't >>> needed) If you're debugging -fomit-frame-pointer code, that means your CFA >>> is expressed in terms of the stack pointer and the stack pointer just >>> changed mid-function --- and eh_frame instructions don't describe this. >>> > >>>>> >>> > >>>>> The end result: If you want accurate unwinds 100% of the time, >>> you can't rely on the unwind instructions from eh_frame. But they'll get >>> you accurate unwinds 99.9% of the time ... also, last I checked, neither >>> clang nor gcc describe the epilogue instructions. >>> > >>>>> >>> > >>>>> >>> > >>>>> In *theory* the unwind instructions from the DWARF debug_frame >>> section should be asynchronous -- they should describe how to find the CFA >>> address for every instruction in the function. Which makes sense - you >>> want eh_frame to be compact because it's bundled into the executable, so it >>> should only have the information necessary for exception handling and you >>> can put the verbose stuff in debug_frame DWARF for debuggers. But instead >>> (again, last time I checked), the compilers put the exact same thing in >>> debug_frame even if you use the -fasynchronous-unwind-tables (or whatever >>> that switch was) option. >>> > >>>>> >>> > >>>>> >>> > >>>>> So I don't know, maybe we should just start trusting eh_frame at >>> frame 0 and write off those .1% cases where it isn't correct instead of >>> trying to get too fancy with the assembly analysis code. >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>>> On Jul 29, 2014, at 4:17 PM, Todd Fiala <[email protected]> >>> wrote: >>> > >>>>>> >>> > >>>>>> Hey Jason, >>> > >>>>>> >>> > >>>>>> Do you have any feedback on this? >>> > >>>>>> >>> > >>>>>> Thanks! >>> > >>>>>> >>> > >>>>>> -Todd >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> On Fri, Jul 25, 2014 at 1:42 PM, Tong Shen < >>> [email protected]> wrote: >>> > >>>>>> Sorry, wrong version of patch... >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> On Fri, Jul 25, 2014 at 1:41 PM, Tong Shen < >>> [email protected]> wrote: >>> > >>>>>> Hi Molenda, lldb-commits, >>> > >>>>>> >>> > >>>>>> For now, x86 assembly profiler will stop after 10 >>> "non-prologue" instructions. In practice it may not be sufficient. For >>> example, we have a hand-written assembly function, which have hundreds of >>> instruction before actual (stack-adjusting) prologue instructions. >>> > >>>>>> >>> > >>>>>> One way is to change the limit to 1000; but there will always >>> be functions that break the limit :-) I believe the right thing to do here >>> is parsing all instructions before "ret"/"call" as prologue instructions. >>> > >>>>>> >>> > >>>>>> Here's what I changed: >>> > >>>>>> - For "push %rbx" and "mov %rbx, -8(%rbp)": only add first row >>> for that register. They may appear multiple times in function body. But as >>> long as one of them appears, first appearance should be in prologue(If it's >>> not in prologue, this function will not use %rbx, so these 2 instructions >>> should not appear at all). >>> > >>>>>> - Also monitor "add %rsp 0x20". >>> > >>>>>> - Remove non prologue instruction count. >>> > >>>>>> - Add "call" instruction detection, and stop parsing after it. >>> > >>>>>> >>> > >>>>>> Thanks. >>> > >>>>>> >>> > >>>>>> -- >>> > >>>>>> Best Regards, Tong Shen >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> -- >>> > >>>>>> Best Regards, Tong Shen >>> > >>>>>> >>> > >>>>>> _______________________________________________ >>> > >>>>>> lldb-commits mailing list >>> > >>>>>> [email protected] >>> > >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> -- >>> > >>>>>> Todd Fiala | Software Engineer | [email protected] | >>> 650-943-3180 >>> > >>>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> -- >>> > >>>>> Best Regards, Tong Shen >>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>> -- >>> > >>>> Best Regards, Tong Shen >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> -- >>> > >>> Best Regards, Tong Shen >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> -- >>> > >> Best Regards, Tong Shen >>> > >> >>> > >> >>> > >> >>> > >> -- >>> > >> Best Regards, Tong Shen >>> > >> <adjust_cfi_for_frame_zero.patch> >>> > > >>> > >>> > >>> > >>> > >>> > -- >>> > Best Regards, Tong Shen >>> > >>> > >>> > >>> > -- >>> > Best Regards, Tong Shen >>> >>> >> >> >> -- >> Best Regards, Tong Shen >> >> _______________________________________________ >> lldb-commits mailing list >> [email protected] >> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits >> >> > > > -- > Todd Fiala | Software Engineer | [email protected] | 650-943-3180 > -- Todd Fiala | Software Engineer | [email protected] | 650-943-3180
_______________________________________________ lldb-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
