On 12/2/2023 1:26 AM, Ard Biesheuvel wrote:
On Sat, 2 Dec 2023 at 09:49, Justin Chen <justin.c...@broadcom.com> wrote:On 12/1/23 10:53 PM, Ard Biesheuvel wrote:On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.c...@broadcom.com> wrote:On 12/1/23 10:07 AM, Steven Rostedt wrote:On Fri, 1 Dec 2023 09:25:59 -0800 Justin Chen <justin.c...@broadcom.com> wrote:It appears the sub instruction at 0x6dd0 correctly accounts for the extra 8 bytes, so the frame pointer is valid. So it is our assumption that there are no gaps between the stack frames is invalid.Thanks for the assistance. The gap between the stack frame depends on the function. Most do not have a gap. Some have 8 (as shown above), some have 12. A single assumption here is not going to work. I'm having a hard time finding out the reasoning for this gap. I tried disabling a bunch of gcc flags as well as -O2 and the gap still exists.That code was originally added because of some strange things that gcc did with mcount (for example, it made a copy of the stack frame that it passed to mcount, where the function graph tracer replaced the copy of the return stack making the shadow stack go out of sync and crash). This was very hard to debug and I added this code to detect it if it happened again. Well it's been over a decade since that happened (2009). 71e308a239c09 ("function-graph: add stack frame test") I'm happy assuming that the compiler folks are aware of our tricks with hijacking return calls and I don't expect it to happen again. We can just rip out those checks. That is, if it's only causing false positives, I don't think it's worth keeping around. Has it detected any real issues on the Arm platforms? -- SteveI am not familiar enough to make a call. But from my limited testing with ARM, I didn't see any issues. If you would like me to, I can submit a patch to remove the check entirely. Or maybe only disable it for ARM?Please try the fix I proposed first.Just tested it. Seems to do the trick.ThanksEither solution works for me.Given that this discussion is taking place in the context of the report of an issue identified by GRAPH_FP_TEST, I don't see how removing that would be a reasonable conclusion.
Fair enough. I will submit a patch. Thanks for the help.
FWIW I also experimented with LLVM, looks like function_graph just crashes regardless of the issue being discussed. The disassemble of LLVM[1] does something completely different.LLVM does not support CONFIG_UNWINDER_FRAME_POINTER so the fact that the prologue looks different is expected. In the case below, the FP {r11} is correctly made to point to a {r11, lr} tuple on the stack, so the codegen is correct AFAICT. But IIRC we rely on unwind information for ftrace in this case, not the frame pointer. Could you be more specific about the crash?
It just hangs with no prints. I can instrument the hang and see what I can find.
Justin
[1] LLVM dump c0c6faa0 <sk_getsockopt>: c0c6faa0: f0 4f 2d e9 push {r4, r5, r6, r7, r8, r9, r10, r11, lr} c0c6faa4: 1c b0 8d e2 add r11, sp, #28 c0c6faa8: ac d0 4d e2 sub sp, sp, #172 c0c6faac: 00 70 a0 e1 mov r7, r0 c0c6fab0: c8 0c 04 e3 movw r0, #19656 c0c6fab4: 80 02 4c e3 movt r0, #49792 c0c6fab8: 03 50 a0 e1 mov r5, r3 c0c6fabc: 00 00 90 e5 ldr r0, [r0] c0c6fac0: 02 a0 a0 e1 mov r10, r2 c0c6fac4: 20 00 0b e5 str r0, [r11, #-32] c0c6fac8: 00 40 2d e9 stmdb sp!, {lr} c0c6facc: 4b 8b d6 eb bl 0xc0212800 <__gnu_mcount_nc> @ imm = #-10867412
smime.p7s
Description: S/MIME Cryptographic Signature