I like that idea.  A bunch of other work just landed on my desk so it might be 
a bit before I do it, but I'll see how that patch looks.

> On Nov 9, 2016, at 3:54 AM, Tamas Berghammer <tbergham...@google.com> wrote:
> 
> Based on your comments I have one more idea for a good heuristic. What if we 
> detect a dynamic branch (e.g. "br <Xn>", "tbb ...", etc...) and store the 
> register state for that place. Then when we find a block with no unwind info 
> for the first instruction then we use the one we saved for the dynamic branch 
> (as we know that the only way that block can be reached is through a dynamic 
> branch). If there is exactly 1 dynamic branch in the code then this should 
> gave us the "perfect" result while if we have multiple dynamic branches then 
> we will pick one "randomly" but for compiler generated code I think it will 
> be good enough. The only tricky case is if we fail to detect the dynamic 
> branch but that should be easy to fix as we already track every branch on ARM 
> (for single stepping) and doing it on AArch64 should be easy as well.
> 
> On Tue, Nov 8, 2016 at 11:10 PM Jason Molenda <jmole...@apple.com> wrote:
> Yeah I was thinking that maybe if we spot an epilogue instruction (ret, b 
> <some-other-function>), and the next instruction doesn't have a reinstated 
> register context, we could backtrack to the initial register context of this 
> block of instructions (and if it's not the beginning of the function), 
> re-instate that register context for the next instruction.
> 
> It doesn't help if we have a dynamic dispatch after the initial part of the 
> function.  For that, we'd need to do something like your suggestion of 
> finding the biggest collection of register saves.
> 
> e.g. if I rearrange/modify my example function a little to make it more 
> interesting (I didn't fix up the +offsets)
> 
> prologue:
> >     0x100007df0 <+0>:   stp    x22, x21, [sp, #-0x30]!
> >     0x100007df4 <+4>:   stp    x20, x19, [sp, #0x10]
> >     0x100007df8 <+8>:   stp    x29, x30, [sp, #0x20]
> >     0x100007dfc <+12>:  add    x29, sp, #0x20            ; =0x20
> 
> direct branch:
> >     0x100007e1c <+44>:  cmp    w20, #0x1d                ; =0x1d
> >     0x100007e20 <+48>:  b.hi   0x100007e4c               ; <+92>  { block 
> > #3 }
> 
> dynamic dispatch:
> >     0x100007e24 <+52>:  adr    x9, #0x90                 ; switcher + 196
> >     0x100007e28 <+56>:  nop
> >     0x100007e2c <+60>:  ldrsw  x8, [x9, x8, lsl #2]
> >     0x100007e30 <+64>:  add    x8, x8, x9
> >     0x100007e34 <+68>:  br     x8
> 
> block #1
> >     0x100007e9c <+172>: sxtw   x8, w19
> >     0x100007ea0 <+176>: str    x8, [sp]
> >     0x100007ea4 <+180>: adr    x0, #0x10f                ; "%c\n"
> >     0x100007ea8 <+184>: nop
> >     0x100007eac <+188>: bl     0x100007f64               ; symbol stub for: 
> > printf
> >     0x100007e70 <+128>: sub    sp, x29, #0x20            ; =0x20
> >     0x100007e74 <+132>: ldp    x29, x30, [sp, #0x20]
> >     0x100007e78 <+136>: ldp    x20, x19, [sp, #0x10]
> >     0x100007e7c <+140>: ldp    x22, x21, [sp], #0x30
> >     0x100007eb0 <+192>: b     0x100007f4c               ; symbol stub for: 
> > abort
> 
> block #2
> >     0x100007e38 <+72>:  sub    sp, x29, #0x20            ; =0x20
> >     0x100007e3c <+76>:  ldp    x29, x30, [sp, #0x20]
> >     0x100007e40 <+80>:  ldp    x20, x19, [sp, #0x10]
> >     0x100007e44 <+84>:  ldp    x22, x21, [sp], #0x30
> >     0x100007e48 <+88>:  ret
> 
> 
> block #3
> >     0x100007e4c <+92>:  add    w0, w0, #0x1              ; =0x1
> >     0x100007e50 <+96>:  b      0x100007e38               ; <+72> at a.c:115
> >     0x100007e54 <+100>: orr    w8, wzr, #0x7
> >     0x100007e58 <+104>: str    x8, [sp, #0x8]
> >     0x100007e5c <+108>: sxtw   x8, w19
> >     0x100007e60 <+112>: str    x8, [sp]
> >     0x100007e64 <+116>: adr    x0, #0x148                ; "%c %d\n"
> >     0x100007e68 <+120>: nop
> >     0x100007e6c <+124>: bl     0x100007f64               ; symbol stub for: 
> > printf
> >     0x100007e70 <+128>: sub    sp, x29, #0x20            ; =0x20
> >     0x100007e74 <+132>: ldp    x29, x30, [sp, #0x20]
> >     0x100007e78 <+136>: ldp    x20, x19, [sp, #0x10]
> >     0x100007e7c <+140>: ldp    x22, x21, [sp], #0x30
> >     0x100007e80 <+144>: b      0x100007f38               ; f3 at b.c:4
> 
> block #4
> >     0x100007e38 <+72>:  sub    sp, x29, #0x20            ; =0x20
> >     0x100007e3c <+76>:  ldp    x29, x30, [sp, #0x20]
> >     0x100007e40 <+80>:  ldp    x20, x19, [sp, #0x10]
> >     0x100007e44 <+84>:  ldp    x22, x21, [sp], #0x30
> >     0x100007e48 <+88>:  ret
> 
> First, an easy one:  When we get to the first instruction of 'block #4', 
> we've seen a complete epilogue ending in 'B other-function' and the first 
> instruction of block #4 is not branched to.  If we find the previous direct 
> branch target -- to the first instruction of 'block #3' was conditionally 
> branched to, we reuse that register context for block #4.  This could easily 
> go wrong for hand-written assembly where you might undo the stack state 
> part-way and then branch to another part of the function.  But I doubt 
> compiler generated code is ever going to do that.
> 
> Second, a trickier one: When we get to the first instruction of 'block #2', 
> we have no previous branch target register context to re-instate.  We could 
> look further into the function (to block target #3 again) and reuse that 
> register state, the assumption being that a function has one prologue that 
> sets up the complete register state and then doesn't change anything outside 
> mid-function epilogues.  I'm not opposed to that idea.  The other way would 
> be to look backwards in the instruction stream for the row with the most 
> registers saved, as you suggested, maybe reusing the earliest one if there 
> are multiple entries with the same # of registers (this would need to ignore 
> IsSame registers).
> 
> 
> Let me see if I can code something along these lines and we can look at how 
> that turns out.
> 
> 
> 
> > On Nov 7, 2016, at 8:10 AM, Tamas Berghammer <tbergham...@google.com> wrote:
> >
> > Hi Jason,
> >
> > I thought about this situation when implemented the original branch 
> > following code and haven't been able to come up with a really good solution.
> >
> > My only idea is the same what you mentioned. We should try to recognize all 
> > unconditional branches and returns (but not calls) and then if the 
> > following instruction don't have any unwind information yet (e.g. haven't 
> > been a branch target so far) then we try to find some reasonable unwind 
> > info from the previous lines.
> >
> > The difficult question is how to find the correct information. One possible 
> > heuristic I have in mind is to try to find any call instruction inside the 
> > function before the current PC and use the unwind info from there. The 
> > reason I like this heuristic because there won't be a call instruction 
> > inside the prologue or epilogue and on ARM based on the ABI every call 
> > instruction have to have the same unwind info. Other possible alternative 
> > (or if we don't have a call instruction) is to use the unwind info line 
> > with the information about the highest number of registers. If multiple 
> > lines have the same number of information then either use the earliest one 
> > or the one with the fewest registers being set to IsSame to avoid picking 
> > something from an epilogue.
> >
> > I don't think any of my suggestions are really good but I don't have any 
> > better idea at the moment.
> >
> > Tamas
> >
> > On Sat, Nov 5, 2016 at 3:01 AM Jason Molenda <jmole...@apple.com> wrote:
> > Hi Tamas & Pavel, I thought you might have some ideas so I wanted to show a 
> > problem I'm looking at right now.  The arm64 instruction unwinder forwards 
> > the unwind state based on branch instructions within the function.  So if 
> > one block of code ends in an epilogue, the next instruction (which is 
> > presumably a branch target) will have the correct original unwind state.  
> > This change went in to UnwindAssemblyInstEmulation.cpp  mid-2015 in r240533 
> > - the code it replaced was poorly written, we're better off with this 
> > approach.
> >
> > However I'm looking at a problem where clang will come up with a branch 
> > table for a bunch of case statements.  e.g. this function:
> >
> >     0x100007df0 <+0>:   stp    x22, x21, [sp, #-0x30]!
> >     0x100007df4 <+4>:   stp    x20, x19, [sp, #0x10]
> >     0x100007df8 <+8>:   stp    x29, x30, [sp, #0x20]
> >     0x100007dfc <+12>:  add    x29, sp, #0x20            ; =0x20
> >     0x100007e00 <+16>:  sub    sp, sp, #0x10             ; =0x10
> >     0x100007e04 <+20>:  mov    x19, x1
> >     0x100007e08 <+24>:  mov    x20, x0
> >     0x100007e0c <+28>:  add    w21, w20, w20, lsl #2
> >     0x100007e10 <+32>:  bl     0x100007f58               ; symbol stub for: 
> > getpid
> >     0x100007e14 <+36>:  add    w0, w0, w21
> >     0x100007e18 <+40>:  mov    w8, w20
> >     0x100007e1c <+44>:  cmp    w20, #0x1d                ; =0x1d
> >     0x100007e20 <+48>:  b.hi   0x100007e4c               ; <+92> at a.c:112
> >     0x100007e24 <+52>:  adr    x9, #0x90                 ; switcher + 196
> >     0x100007e28 <+56>:  nop
> >     0x100007e2c <+60>:  ldrsw  x8, [x9, x8, lsl #2]
> >     0x100007e30 <+64>:  add    x8, x8, x9
> >     0x100007e34 <+68>:  br     x8
> >     0x100007e38 <+72>:  sub    sp, x29, #0x20            ; =0x20
> >     0x100007e3c <+76>:  ldp    x29, x30, [sp, #0x20]
> >     0x100007e40 <+80>:  ldp    x20, x19, [sp, #0x10]
> >     0x100007e44 <+84>:  ldp    x22, x21, [sp], #0x30
> >     0x100007e48 <+88>:  ret
> >     0x100007e4c <+92>:  add    w0, w0, #0x1              ; =0x1
> >     0x100007e50 <+96>:  b      0x100007e38               ; <+72> at a.c:115
> >     0x100007e54 <+100>: orr    w8, wzr, #0x7
> >     0x100007e58 <+104>: str    x8, [sp, #0x8]
> >     0x100007e5c <+108>: sxtw   x8, w19
> >     0x100007e60 <+112>: str    x8, [sp]
> >     0x100007e64 <+116>: adr    x0, #0x148                ; "%c %d\n"
> >     0x100007e68 <+120>: nop
> >     0x100007e6c <+124>: bl     0x100007f64               ; symbol stub for: 
> > printf
> >     0x100007e70 <+128>: sub    sp, x29, #0x20            ; =0x20
> >     0x100007e74 <+132>: ldp    x29, x30, [sp, #0x20]
> >     0x100007e78 <+136>: ldp    x20, x19, [sp, #0x10]
> >     0x100007e7c <+140>: ldp    x22, x21, [sp], #0x30
> >     0x100007e80 <+144>: b      0x100007f38               ; f3 at b.c:4
> >     0x100007e84 <+148>: sxtw   x8, w19
> >     0x100007e88 <+152>: str    x8, [sp]
> >     0x100007e8c <+156>: adr    x0, #0x127                ; "%c\n"
> >     0x100007e90 <+160>: nop
> >     0x100007e94 <+164>: bl     0x100007f64               ; symbol stub for: 
> > printf
> >     0x100007e98 <+168>: bl     0x100007f40               ; f4 at b.c:7
> >     0x100007e9c <+172>: sxtw   x8, w19
> >     0x100007ea0 <+176>: str    x8, [sp]
> >     0x100007ea4 <+180>: adr    x0, #0x10f                ; "%c\n"
> >     0x100007ea8 <+184>: nop
> >     0x100007eac <+188>: bl     0x100007f64               ; symbol stub for: 
> > printf
> >     0x100007eb0 <+192>: bl     0x100007f4c               ; symbol stub for: 
> > abort
> >
> >
> > It loads data from the jump table and branches to the correct block in the 
> > +52 .. +68 instructions.  We have epilogues at 88, 144, and 192.  And we 
> > get an unwind plan like
> >
> > row[0]:    0: CFA=sp +0 =>
> > row[1]:    4: CFA=sp+48 => x21=[CFA-40] x22=[CFA-48]
> > row[2]:    8: CFA=sp+48 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] 
> > x22=[CFA-48]
> > row[3]:   12: CFA=sp+48 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] 
> > x22=[CFA-48] fp=[CFA-16] lr=[CFA-8]
> > row[4]:   20: CFA=sp+64 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] 
> > x22=[CFA-48] fp=[CFA-16] lr=[CFA-8]
> > row[5]:   80: CFA=sp+64 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] 
> > x22=[CFA-48] fp= <same> lr= <same>
> > row[6]:   84: CFA=sp+64 => x19= <same> x20= <same> x21=[CFA-40] 
> > x22=[CFA-48] fp= <same> lr= <same>
> > row[7]:   88: CFA=sp +0 => x19= <same> x20= <same> x21= <same> x22= <same> 
> > fp= <same> lr= <same>
> > row[8]:   92: CFA=sp+64 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] 
> > x22=[CFA-48] fp=[CFA-16] lr=[CFA-8]
> > row[9]:  108: CFA=sp+64 => x8=[CFA-56] x19=[CFA-24] x20=[CFA-32] 
> > x21=[CFA-40] x22=[CFA-48] fp=[CFA-16] lr=[CFA-8]
> > row[10]:  136: CFA=sp+64 => x8=[CFA-56] x19=[CFA-24] x20=[CFA-32] 
> > x21=[CFA-40] x22=[CFA-48] fp= <same> lr= <same>
> > row[11]:  140: CFA=sp+64 => x8=[CFA-56] x19= <same> x20= <same> 
> > x21=[CFA-40] x22=[CFA-48] fp= <same> lr= <same>
> > row[12]:  144: CFA=sp +0 => x8=[CFA-56] x19= <same> x20= <same> x21= <same> 
> > x22= <same> fp= <same> lr= <same>
> >
> > where we have no unwind state for the range 148..192 (I complicated it a 
> > little by calling a noreturn function that ended up being the last one -- 
> > that's why it doesn't do an epilogue sequence at the very end of the 
> > function).
> >
> >
> > I'm not sure how we should address this one - our branch-target approach 
> > can't do the right thing here, there is no indication (for lldb) of the 
> > branch from instruction +68 to +148.  Should we recognize "ret" and "b" 
> > (with a range outside the bounds of the current function) as epilogues and 
> > try to find something that looks like valid unwind state from earlier in 
> > the function body?
> >
> >
> > What actually happens right now (if lldb is stopped in the range 148 - 192) 
> > is that we find a LR with a save status of IsSame which RegisterContextLLDB 
> > rejects as impossible (yes, this IS actually possible - if we're stopped in 
> > the range of +80 - +88 then the return address is in the lr and the ret 
> > instruction will use it when we get there - but let's ignore that separate 
> > bug for now) is that we reject the unwind plan because of the lr being 
> > IsSame and we fall back to the ABI's architectural default unwind plan 
> > which does work in this case, although we won't know about the x19 & x20 
> > register spills to the stack.
> >
> > This is a synthetic example, of course, the real place where I hit this 
> > issue is in some apple internal code, but it's the same instruction pattern.
> >
> > fwiw I reproduced it with these two source files linked together with clang 
> > and -Os optimization; gcc is unlikely to generate the same code.  It's 
> > probably possible to reduce the test case further but I'll add a unit test 
> > of the raw instructions once we figure out how to handle this.
> >
> >
> >
> >
> >
> 

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Reply via email to