On 12/2/2023 1:26 AM, Ard Biesheuvel wrote:
On Sat, 2 Dec 2023 at 09:49, Justin Chen <justin.c...@broadcom.com> wrote:



On 12/1/23 10:53 PM, Ard Biesheuvel wrote:
On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.c...@broadcom.com> wrote:



On 12/1/23 10:07 AM, Steven Rostedt wrote:
On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen <justin.c...@broadcom.com> wrote:

It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.

Thanks for the assistance. The gap between the stack frame depends on
the function. Most do not have a gap. Some have 8 (as shown above), some
have 12. A single assumption here is not going to work. I'm having a
hard time finding out the reasoning for this gap. I tried disabling a
bunch of gcc flags as well as -O2 and the gap still exists.

That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.

Well it's been over a decade since that happened (2009).

     71e308a239c09 ("function-graph: add stack frame test")

I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.

Has it detected any real issues on the Arm platforms?

-- Steve

I am not familiar enough to make a call. But from my limited testing
with ARM, I didn't see any issues. If you would like me to, I can submit
a patch to remove the check entirely. Or maybe only disable it for ARM?


Please try the fix I proposed first.

Just tested it. Seems to do the trick.

Thanks

Either solution works for me.


Given that this discussion is taking place in the context of the
report of an issue identified by GRAPH_FP_TEST, I don't see how
removing that would be a reasonable conclusion.


Fair enough. I will submit a patch. Thanks for the help.

FWIW I also experimented with LLVM, looks like function_graph just
crashes regardless of the issue being discussed. The disassemble of
LLVM[1] does something completely different.



LLVM does not support CONFIG_UNWINDER_FRAME_POINTER so the fact that
the prologue looks different is expected.

In the case below, the FP {r11} is correctly made to point to a {r11,
lr} tuple on the stack, so the codegen is correct AFAICT. But IIRC we
rely on unwind information for ftrace in this case, not the frame
pointer.

Could you be more specific about the crash?


It just hangs with no prints. I can instrument the hang and see what I can find.

Justin



[1]
LLVM dump
c0c6faa0 <sk_getsockopt>:
c0c6faa0: f0 4f 2d e9   push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
c0c6faa4: 1c b0 8d e2   add     r11, sp, #28
c0c6faa8: ac d0 4d e2   sub     sp, sp, #172
c0c6faac: 00 70 a0 e1   mov     r7, r0
c0c6fab0: c8 0c 04 e3   movw    r0, #19656
c0c6fab4: 80 02 4c e3   movt    r0, #49792
c0c6fab8: 03 50 a0 e1   mov     r5, r3
c0c6fabc: 00 00 90 e5   ldr     r0, [r0]
c0c6fac0: 02 a0 a0 e1   mov     r10, r2
c0c6fac4: 20 00 0b e5   str     r0, [r11, #-32]
c0c6fac8: 00 40 2d e9   stmdb   sp!, {lr}
c0c6facc: 4b 8b d6 eb   bl      0xc0212800 <__gnu_mcount_nc> @ imm =
#-10867412

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to