I like that idea. A bunch of other work just landed on my desk so it might be a bit before I do it, but I'll see how that patch looks.
> On Nov 9, 2016, at 3:54 AM, Tamas Berghammer <tbergham...@google.com> wrote: > > Based on your comments I have one more idea for a good heuristic. What if we > detect a dynamic branch (e.g. "br <Xn>", "tbb ...", etc...) and store the > register state for that place. Then when we find a block with no unwind info > for the first instruction then we use the one we saved for the dynamic branch > (as we know that the only way that block can be reached is through a dynamic > branch). If there is exactly 1 dynamic branch in the code then this should > gave us the "perfect" result while if we have multiple dynamic branches then > we will pick one "randomly" but for compiler generated code I think it will > be good enough. The only tricky case is if we fail to detect the dynamic > branch but that should be easy to fix as we already track every branch on ARM > (for single stepping) and doing it on AArch64 should be easy as well. > > On Tue, Nov 8, 2016 at 11:10 PM Jason Molenda <jmole...@apple.com> wrote: > Yeah I was thinking that maybe if we spot an epilogue instruction (ret, b > <some-other-function>), and the next instruction doesn't have a reinstated > register context, we could backtrack to the initial register context of this > block of instructions (and if it's not the beginning of the function), > re-instate that register context for the next instruction. > > It doesn't help if we have a dynamic dispatch after the initial part of the > function. For that, we'd need to do something like your suggestion of > finding the biggest collection of register saves. > > e.g. if I rearrange/modify my example function a little to make it more > interesting (I didn't fix up the +offsets) > > prologue: > > 0x100007df0 <+0>: stp x22, x21, [sp, #-0x30]! > > 0x100007df4 <+4>: stp x20, x19, [sp, #0x10] > > 0x100007df8 <+8>: stp x29, x30, [sp, #0x20] > > 0x100007dfc <+12>: add x29, sp, #0x20 ; =0x20 > > direct branch: > > 0x100007e1c <+44>: cmp w20, #0x1d ; =0x1d > > 0x100007e20 <+48>: b.hi 0x100007e4c ; <+92> { block > > #3 } > > dynamic dispatch: > > 0x100007e24 <+52>: adr x9, #0x90 ; switcher + 196 > > 0x100007e28 <+56>: nop > > 0x100007e2c <+60>: ldrsw x8, [x9, x8, lsl #2] > > 0x100007e30 <+64>: add x8, x8, x9 > > 0x100007e34 <+68>: br x8 > > block #1 > > 0x100007e9c <+172>: sxtw x8, w19 > > 0x100007ea0 <+176>: str x8, [sp] > > 0x100007ea4 <+180>: adr x0, #0x10f ; "%c\n" > > 0x100007ea8 <+184>: nop > > 0x100007eac <+188>: bl 0x100007f64 ; symbol stub for: > > printf > > 0x100007e70 <+128>: sub sp, x29, #0x20 ; =0x20 > > 0x100007e74 <+132>: ldp x29, x30, [sp, #0x20] > > 0x100007e78 <+136>: ldp x20, x19, [sp, #0x10] > > 0x100007e7c <+140>: ldp x22, x21, [sp], #0x30 > > 0x100007eb0 <+192>: b 0x100007f4c ; symbol stub for: > > abort > > block #2 > > 0x100007e38 <+72>: sub sp, x29, #0x20 ; =0x20 > > 0x100007e3c <+76>: ldp x29, x30, [sp, #0x20] > > 0x100007e40 <+80>: ldp x20, x19, [sp, #0x10] > > 0x100007e44 <+84>: ldp x22, x21, [sp], #0x30 > > 0x100007e48 <+88>: ret > > > block #3 > > 0x100007e4c <+92>: add w0, w0, #0x1 ; =0x1 > > 0x100007e50 <+96>: b 0x100007e38 ; <+72> at a.c:115 > > 0x100007e54 <+100>: orr w8, wzr, #0x7 > > 0x100007e58 <+104>: str x8, [sp, #0x8] > > 0x100007e5c <+108>: sxtw x8, w19 > > 0x100007e60 <+112>: str x8, [sp] > > 0x100007e64 <+116>: adr x0, #0x148 ; "%c %d\n" > > 0x100007e68 <+120>: nop > > 0x100007e6c <+124>: bl 0x100007f64 ; symbol stub for: > > printf > > 0x100007e70 <+128>: sub sp, x29, #0x20 ; =0x20 > > 0x100007e74 <+132>: ldp x29, x30, [sp, #0x20] > > 0x100007e78 <+136>: ldp x20, x19, [sp, #0x10] > > 0x100007e7c <+140>: ldp x22, x21, [sp], #0x30 > > 0x100007e80 <+144>: b 0x100007f38 ; f3 at b.c:4 > > block #4 > > 0x100007e38 <+72>: sub sp, x29, #0x20 ; =0x20 > > 0x100007e3c <+76>: ldp x29, x30, [sp, #0x20] > > 0x100007e40 <+80>: ldp x20, x19, [sp, #0x10] > > 0x100007e44 <+84>: ldp x22, x21, [sp], #0x30 > > 0x100007e48 <+88>: ret > > First, an easy one: When we get to the first instruction of 'block #4', > we've seen a complete epilogue ending in 'B other-function' and the first > instruction of block #4 is not branched to. If we find the previous direct > branch target -- to the first instruction of 'block #3' was conditionally > branched to, we reuse that register context for block #4. This could easily > go wrong for hand-written assembly where you might undo the stack state > part-way and then branch to another part of the function. But I doubt > compiler generated code is ever going to do that. > > Second, a trickier one: When we get to the first instruction of 'block #2', > we have no previous branch target register context to re-instate. We could > look further into the function (to block target #3 again) and reuse that > register state, the assumption being that a function has one prologue that > sets up the complete register state and then doesn't change anything outside > mid-function epilogues. I'm not opposed to that idea. The other way would > be to look backwards in the instruction stream for the row with the most > registers saved, as you suggested, maybe reusing the earliest one if there > are multiple entries with the same # of registers (this would need to ignore > IsSame registers). > > > Let me see if I can code something along these lines and we can look at how > that turns out. > > > > > On Nov 7, 2016, at 8:10 AM, Tamas Berghammer <tbergham...@google.com> wrote: > > > > Hi Jason, > > > > I thought about this situation when implemented the original branch > > following code and haven't been able to come up with a really good solution. > > > > My only idea is the same what you mentioned. We should try to recognize all > > unconditional branches and returns (but not calls) and then if the > > following instruction don't have any unwind information yet (e.g. haven't > > been a branch target so far) then we try to find some reasonable unwind > > info from the previous lines. > > > > The difficult question is how to find the correct information. One possible > > heuristic I have in mind is to try to find any call instruction inside the > > function before the current PC and use the unwind info from there. The > > reason I like this heuristic because there won't be a call instruction > > inside the prologue or epilogue and on ARM based on the ABI every call > > instruction have to have the same unwind info. Other possible alternative > > (or if we don't have a call instruction) is to use the unwind info line > > with the information about the highest number of registers. If multiple > > lines have the same number of information then either use the earliest one > > or the one with the fewest registers being set to IsSame to avoid picking > > something from an epilogue. > > > > I don't think any of my suggestions are really good but I don't have any > > better idea at the moment. > > > > Tamas > > > > On Sat, Nov 5, 2016 at 3:01 AM Jason Molenda <jmole...@apple.com> wrote: > > Hi Tamas & Pavel, I thought you might have some ideas so I wanted to show a > > problem I'm looking at right now. The arm64 instruction unwinder forwards > > the unwind state based on branch instructions within the function. So if > > one block of code ends in an epilogue, the next instruction (which is > > presumably a branch target) will have the correct original unwind state. > > This change went in to UnwindAssemblyInstEmulation.cpp mid-2015 in r240533 > > - the code it replaced was poorly written, we're better off with this > > approach. > > > > However I'm looking at a problem where clang will come up with a branch > > table for a bunch of case statements. e.g. this function: > > > > 0x100007df0 <+0>: stp x22, x21, [sp, #-0x30]! > > 0x100007df4 <+4>: stp x20, x19, [sp, #0x10] > > 0x100007df8 <+8>: stp x29, x30, [sp, #0x20] > > 0x100007dfc <+12>: add x29, sp, #0x20 ; =0x20 > > 0x100007e00 <+16>: sub sp, sp, #0x10 ; =0x10 > > 0x100007e04 <+20>: mov x19, x1 > > 0x100007e08 <+24>: mov x20, x0 > > 0x100007e0c <+28>: add w21, w20, w20, lsl #2 > > 0x100007e10 <+32>: bl 0x100007f58 ; symbol stub for: > > getpid > > 0x100007e14 <+36>: add w0, w0, w21 > > 0x100007e18 <+40>: mov w8, w20 > > 0x100007e1c <+44>: cmp w20, #0x1d ; =0x1d > > 0x100007e20 <+48>: b.hi 0x100007e4c ; <+92> at a.c:112 > > 0x100007e24 <+52>: adr x9, #0x90 ; switcher + 196 > > 0x100007e28 <+56>: nop > > 0x100007e2c <+60>: ldrsw x8, [x9, x8, lsl #2] > > 0x100007e30 <+64>: add x8, x8, x9 > > 0x100007e34 <+68>: br x8 > > 0x100007e38 <+72>: sub sp, x29, #0x20 ; =0x20 > > 0x100007e3c <+76>: ldp x29, x30, [sp, #0x20] > > 0x100007e40 <+80>: ldp x20, x19, [sp, #0x10] > > 0x100007e44 <+84>: ldp x22, x21, [sp], #0x30 > > 0x100007e48 <+88>: ret > > 0x100007e4c <+92>: add w0, w0, #0x1 ; =0x1 > > 0x100007e50 <+96>: b 0x100007e38 ; <+72> at a.c:115 > > 0x100007e54 <+100>: orr w8, wzr, #0x7 > > 0x100007e58 <+104>: str x8, [sp, #0x8] > > 0x100007e5c <+108>: sxtw x8, w19 > > 0x100007e60 <+112>: str x8, [sp] > > 0x100007e64 <+116>: adr x0, #0x148 ; "%c %d\n" > > 0x100007e68 <+120>: nop > > 0x100007e6c <+124>: bl 0x100007f64 ; symbol stub for: > > printf > > 0x100007e70 <+128>: sub sp, x29, #0x20 ; =0x20 > > 0x100007e74 <+132>: ldp x29, x30, [sp, #0x20] > > 0x100007e78 <+136>: ldp x20, x19, [sp, #0x10] > > 0x100007e7c <+140>: ldp x22, x21, [sp], #0x30 > > 0x100007e80 <+144>: b 0x100007f38 ; f3 at b.c:4 > > 0x100007e84 <+148>: sxtw x8, w19 > > 0x100007e88 <+152>: str x8, [sp] > > 0x100007e8c <+156>: adr x0, #0x127 ; "%c\n" > > 0x100007e90 <+160>: nop > > 0x100007e94 <+164>: bl 0x100007f64 ; symbol stub for: > > printf > > 0x100007e98 <+168>: bl 0x100007f40 ; f4 at b.c:7 > > 0x100007e9c <+172>: sxtw x8, w19 > > 0x100007ea0 <+176>: str x8, [sp] > > 0x100007ea4 <+180>: adr x0, #0x10f ; "%c\n" > > 0x100007ea8 <+184>: nop > > 0x100007eac <+188>: bl 0x100007f64 ; symbol stub for: > > printf > > 0x100007eb0 <+192>: bl 0x100007f4c ; symbol stub for: > > abort > > > > > > It loads data from the jump table and branches to the correct block in the > > +52 .. +68 instructions. We have epilogues at 88, 144, and 192. And we > > get an unwind plan like > > > > row[0]: 0: CFA=sp +0 => > > row[1]: 4: CFA=sp+48 => x21=[CFA-40] x22=[CFA-48] > > row[2]: 8: CFA=sp+48 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] > > x22=[CFA-48] > > row[3]: 12: CFA=sp+48 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] > > x22=[CFA-48] fp=[CFA-16] lr=[CFA-8] > > row[4]: 20: CFA=sp+64 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] > > x22=[CFA-48] fp=[CFA-16] lr=[CFA-8] > > row[5]: 80: CFA=sp+64 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] > > x22=[CFA-48] fp= <same> lr= <same> > > row[6]: 84: CFA=sp+64 => x19= <same> x20= <same> x21=[CFA-40] > > x22=[CFA-48] fp= <same> lr= <same> > > row[7]: 88: CFA=sp +0 => x19= <same> x20= <same> x21= <same> x22= <same> > > fp= <same> lr= <same> > > row[8]: 92: CFA=sp+64 => x19=[CFA-24] x20=[CFA-32] x21=[CFA-40] > > x22=[CFA-48] fp=[CFA-16] lr=[CFA-8] > > row[9]: 108: CFA=sp+64 => x8=[CFA-56] x19=[CFA-24] x20=[CFA-32] > > x21=[CFA-40] x22=[CFA-48] fp=[CFA-16] lr=[CFA-8] > > row[10]: 136: CFA=sp+64 => x8=[CFA-56] x19=[CFA-24] x20=[CFA-32] > > x21=[CFA-40] x22=[CFA-48] fp= <same> lr= <same> > > row[11]: 140: CFA=sp+64 => x8=[CFA-56] x19= <same> x20= <same> > > x21=[CFA-40] x22=[CFA-48] fp= <same> lr= <same> > > row[12]: 144: CFA=sp +0 => x8=[CFA-56] x19= <same> x20= <same> x21= <same> > > x22= <same> fp= <same> lr= <same> > > > > where we have no unwind state for the range 148..192 (I complicated it a > > little by calling a noreturn function that ended up being the last one -- > > that's why it doesn't do an epilogue sequence at the very end of the > > function). > > > > > > I'm not sure how we should address this one - our branch-target approach > > can't do the right thing here, there is no indication (for lldb) of the > > branch from instruction +68 to +148. Should we recognize "ret" and "b" > > (with a range outside the bounds of the current function) as epilogues and > > try to find something that looks like valid unwind state from earlier in > > the function body? > > > > > > What actually happens right now (if lldb is stopped in the range 148 - 192) > > is that we find a LR with a save status of IsSame which RegisterContextLLDB > > rejects as impossible (yes, this IS actually possible - if we're stopped in > > the range of +80 - +88 then the return address is in the lr and the ret > > instruction will use it when we get there - but let's ignore that separate > > bug for now) is that we reject the unwind plan because of the lr being > > IsSame and we fall back to the ABI's architectural default unwind plan > > which does work in this case, although we won't know about the x19 & x20 > > register spills to the stack. > > > > This is a synthetic example, of course, the real place where I hit this > > issue is in some apple internal code, but it's the same instruction pattern. > > > > fwiw I reproduced it with these two source files linked together with clang > > and -Os optimization; gcc is unlikely to generate the same code. It's > > probably possible to reduce the test case further but I'll add a unit test > > of the raw instructions once we figure out how to handle this. > > > > > > > > > > > _______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev