And some more formatting issues. On 30/03/16 10:33, Ramana Radhakrishnan wrote: > > > On 24/03/16 21:02, Charles Baylis wrote: >> When compiling with -mlong-calls and -pg, calls to the __gnu_mcount_nc >> function are not generated as long calls. >> >> This is the sequel to this patch >> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00881.html >> >> This patch fixes the following problems with the previous patch. >> . Nested functions now work (thanks to Richard E for spotting this) >> . Thumb-1 now works > >> >> This patch works by adding new patterns (one for ARM/Thumb-2 and one >> for Thumb-1) which are placed in the prologue as a placeholder for >> some RTL which describes the address. This is either a SYMBOL_REF for >> targets with MOVW/MOVT, or a literal pool reference for other targets. >> The implementation of ARM_FUNCTION_PROFILER is changed to search for >> this insn so that the the address of the __gnu_mcount_nc function can >> be loaded using an appropriate sequence for the target. > > I'm reasonably happy with the approach but there are nits. > >> >> I also tried generating the profiling call sequence directly in the >> prologue, but this requires some unpleasant hacks to prevent spurious >> register pushes from ASM_OUTPUT_REG_PUSH. >> >> Tested with no new regressions on arm-unknown-linux-gnueabihf on QEMU. >> The generated code sequences have been inspected for normal and nested >> functions on ARM v6, ARM v7, Thumb-1, and Thumb-2 targets. >> >> This does not fix a regression, so I don't expect to apply it for >> GCC6, is it OK for when stage 1 re-opens. > > > I'm not sure how much testing coverage you get by just running the testsuite. > Doing a profiled bootstrap with -mlong-calls and a regression test run for > arm and / or thumb2 would be a useful test to do additionally - Given that > this originally came from kernel builds with allyesconfig how important is > this to be fixed for GCC 6 ? I'd rather take the fix into GCC 6 to get the > kernel building again but that's something we can discuss with the RM's once > the issues with the patch are fixed. >> >> gcc/ChangeLog: >> >> 2016-03-24 Charles Baylis <charles.bay...@linaro.org> >> >> * config/arm/arm-protos.h (arm_emit_long_call_profile): New function. >> * config/arm/arm.c (arm_emit_long_call_profile_insn): New function. >> (arm_expand_prologue): Likewise. >> (thumb1_expand_prologue): Likewise. >> (arm_output_long_call_to_profile_func): Likewise. >> (arm_emit_long_call_profile): Likewise. >> * config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment. >> * config/arm/arm.md (arm_long_call_profile): New pattern. >> * config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New >> define. >> * config/arm/thumb1.md (thumb1_long_call_profile): New pattern. >> * config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE. >> >> gcc/testsuite/ChangeLog: >> >> 2016-03-24 Charles Baylis <charles.bay...@linaro.org> >> >> * gcc.target/arm/pr69770.c: New test. >> >> >> 0001-PR69770-mlong-calls-does-not-affect-calls-to-__gnu_m.patch >> >> >> From 5a39451f34be9b6ca98b3460bf40d879d6ee61a5 Mon Sep 17 00:00:00 2001 >> From: Charles Baylis <charles.bay...@linaro.org> >> Date: Thu, 24 Mar 2016 20:43:25 +0000 >> Subject: [PATCH] PR69770 -mlong-calls does not affect calls to >> __gnu_mcount_nc >> generated by -pg >> >> gcc/ChangeLog: >> >> 2016-03-24 Charles Baylis <charles.bay...@linaro.org> >> >> * config/arm/arm-protos.h (arm_emit_long_call_profile): New function. >> * config/arm/arm.c (arm_emit_long_call_profile_insn): New function. >> (arm_expand_prologue): Likewise. >> (thumb1_expand_prologue): Likewise. >> (arm_output_long_call_to_profile_func): Likewise. >> (arm_emit_long_call_profile): Likewise. >> * config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment. >> * config/arm/arm.md (arm_long_call_profile): New pattern. >> * config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New >> define. >> * config/arm/thumb1.md (thumb1_long_call_profile): New pattern. >> * config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE. >> >> gcc/testsuite/ChangeLog: >> >> 2016-03-24 Charles Baylis <charles.bay...@linaro.org> >> >> * gcc.target/arm/pr69770.c: New test. >> >> Change-Id: I9b8de01fea083f17f729c3801f83174bedb3b0c6 >> >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >> index 0083673..324c9f4 100644 >> --- a/gcc/config/arm/arm-protos.h >> +++ b/gcc/config/arm/arm-protos.h >> @@ -343,6 +343,7 @@ extern void arm_register_target_pragmas (void); >> extern void arm_cpu_cpp_builtins (struct cpp_reader *); >> >> extern bool arm_is_constant_pool_ref (rtx); >> +void arm_emit_long_call_profile (); >> >> /* Flags used to identify the presence of processor capabilities. */ >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index c868490..040b255 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -21426,6 +21426,22 @@ output_probe_stack_range (rtx reg1, rtx reg2) >> return ""; >> } >> >> +static void >> +arm_emit_long_call_profile_insn () > > s/()/(void) > >> +{ >> + rtx sym_ref = gen_rtx_SYMBOL_REF (Pmode, "__gnu_mcount_nc"); >> + /* if movt/movw are not available, use a constant pool */ >> + if (!arm_arch_thumb2) > > Please don't do that - use TARGET_USE_MOVT && !target_word_relocations > please. And better still look at adding !target_word_relocations to the > define of TARGET_USE_MOVT. > >> + { >> + sym_ref = force_const_mem(Pmode, sym_ref); > > Space between force_const_mem and `(' > >> + } >> + rtvec vec = gen_rtvec (1, sym_ref); >> + rtx tmp = >> + gen_rtx_UNSPEC_VOLATILE (VOIDmode, vec, VUNSPEC_LONG_CALL_PROFILE); >> + emit_insn (tmp); >> +} >> + > >> + >> /* Generate the prologue instructions for entry into an ARM or Thumb-2 >> function. */ >> void >> @@ -21789,6 +21805,10 @@ arm_expand_prologue (void) >> arm_load_pic_register (mask); >> } >> >> + if (crtl->profile && TARGET_LONG_CALLS >> + && ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS) >> + arm_emit_long_call_profile_insn (); >> + >> /* If we are profiling, make sure no instructions are scheduled before >> the call to mcount. Similarly if the user has requested no >> scheduling in the prolog. Similarly if we want non-call exceptions >> @@ -24985,6 +25005,10 @@ thumb1_expand_prologue (void) >> if (frame_pointer_needed) >> thumb_set_frame_pointer (offsets); >> >> + if (crtl->profile && TARGET_LONG_CALLS >> + && ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS) >> + arm_emit_long_call_profile_insn (); >> + >> /* If we are profiling, make sure no instructions are scheduled before >> the call to mcount. Similarly if the user has requested no >> scheduling in the prolog. Similarly if we want non-call exceptions >> @@ -30289,4 +30313,70 @@ arm_sched_fusion_priority (rtx_insn *insn, int >> max_pri, >> return; >> } >> >> +static void >> +arm_output_long_call_to_profile_func (rtx * operands, bool push_scratch) >> +{ > > Missing comment above function.
Blank lines between if blocks please > >> + /* operands[0] is the address of the __gnu_mcount_nc function >> + operands[1] is the scratch register we use to load that address */ >> + if (push_scratch) >> + output_asm_insn ("push\t{%1}", operands); Here . >> + output_asm_insn ("push\t{lr}", operands); >> + if (GET_CODE (operands[0]) == SYMBOL_REF) >> + { >> + output_asm_insn ("movw\t%1, #:lower16:%c0", operands); >> + output_asm_insn ("movt\t%1, #:upper16:%c0", operands); >> + } >> + else >> + { >> + output_asm_insn ("ldr\t%1, %0", operands); >> + } Here. >> + if (!arm_arch5) >> + { >> + output_asm_insn ("mov\tlr, pc", operands); >> + output_asm_insn ("mov\tpc, %1", operands); >> + } >> + else >> + output_asm_insn ("blx\t%1", operands); Here. >> + if (push_scratch) >> + output_asm_insn ("pop\t{%1}", operands); Here and check the rest of the patch. >> +} >> + > > >> +void >> +arm_emit_long_call_profile() > > s/()/ (void) > > Missing comment. > >> +{ >> + rtx alcp = NULL; >> + rtx operands[2]; >> + bool push_scratch; >> + /* find the arm_long_call_profile */ >> + for (rtx_insn * insn = get_insns (); insn; insn = NEXT_INSN (insn)) >> + { >> + if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END) >> + break; >> + if (INSN_CODE (insn) == CODE_FOR_arm_long_call_profile >> + || INSN_CODE (insn) == CODE_FOR_thumb1_long_call_profile) >> + { >> + alcp = PATTERN (insn); >> + break; >> + } >> + } >> + gcc_assert (alcp); >> + >> + operands[0] = XEXP (XEXP (alcp, 0), 0); >> + if (TARGET_32BIT) >> + { >> + operands[1] = gen_rtx_REG (SImode, IP_REGNUM); >> + push_scratch = false; >> + } >> + else >> + { >> + /* for nested functions, we can set push_scratch to false, since >> + final.c:profile_function.c and ASM_OUTPUT_REG_PUSH preserve it as >> + part of the sequence to preserve ip across the call to the >> + profiling function. */ >> + operands[1] = gen_rtx_REG (SImode, R0_REGNUM + 7); >> + push_scratch = !IS_NESTED (arm_current_func_type ()); >> + } >> + arm_output_long_call_to_profile_func (operands, push_scratch); >> +} >> + >> #include "gt-arm.h" >> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h >> index 6352140..44f1c64 100644 >> --- a/gcc/config/arm/arm.h >> +++ b/gcc/config/arm/arm.h >> @@ -2044,7 +2044,9 @@ extern int making_const_table; >> that ASM_OUTPUT_REG_PUSH will be matched with ASM_OUTPUT_REG_POP, and >> that r7 isn't used by the function profiler, so we can use it as a >> scratch reg. WARNING: This isn't safe in the general case! It may be >> - sensitive to future changes in final.c:profile_function. */ >> + sensitive to future changes in final.c:profile_function. This is also >> + relied on in arm_emit_long_call_profile() which assumes r7 can be >> + used as a scratch register to load the address of the function profiler. >> */ >> #define ASM_OUTPUT_REG_PUSH(STREAM, REGNO) \ >> do \ >> { \ >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >> index 47171b9..beb44d5 100644 >> --- a/gcc/config/arm/arm.md >> +++ b/gcc/config/arm/arm.md >> @@ -11424,6 +11424,15 @@ >> DONE; >> }) >> >> +(define_insn "arm_long_call_profile" >> + [(unspec_volatile [(match_operand:SI 0 "general_operand" "ji,m") >> + ] VUNSPEC_LONG_CALL_PROFILE)] >> + "TARGET_32BIT" >> + "%@ arm_long_call_profile" >> + [(set_attr "arm_pool_range" "*,4096") >> + (set_attr "arm_neg_pool_range" "*,4084")] >> +) > > Please set the length of this insn appropriately. IIUC this is the length of > the instructions from FUNCTION_PROFILER, though I suspect we could push all > of this out into the RTL stream. Underestimating the length of this insn can > cause issues with the minipool placement code later especially with -pg. > >> + >> ;; Vector bits common to IWMMXT and Neon >> (include "vec-common.md") >> ;; Load the Intel Wireless Multimedia Extension patterns >> diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h >> index 5d6c4ed..7acf66b 100644 >> --- a/gcc/config/arm/bpabi.h >> +++ b/gcc/config/arm/bpabi.h >> @@ -174,11 +174,18 @@ >> >> #undef NO_PROFILE_COUNTERS >> #define NO_PROFILE_COUNTERS 1 >> +#undef ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS >> +#define ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS 1 > > Why do we need this ? > >> #undef ARM_FUNCTION_PROFILER >> #define ARM_FUNCTION_PROFILER(STREAM, LABELNO) \ >> { \ >> - fprintf (STREAM, "\tpush\t{lr}\n"); >> \ >> - fprintf (STREAM, "\tbl\t__gnu_mcount_nc\n"); >> \ >> + if (TARGET_LONG_CALLS) \ >> + { >> \ >> + arm_emit_long_call_profile(); \ >> + } else { \ >> + fprintf (STREAM, "\tpush\t{lr}\n"); >> \ >> + fprintf (STREAM, "\tbl\t__gnu_mcount_nc\n"); \ >> + } \ >> } >> > > While you are here delete the definition of FUNCTION_PROFILER that is > conditional on THUMB_FUNCTION_PROFILER, looks like that code is dead. > >> #undef SUBTARGET_FRAME_POINTER_REQUIRED >> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md >> index 072ed4d..9b1de5f 100644 >> --- a/gcc/config/arm/thumb1.md >> +++ b/gcc/config/arm/thumb1.md >> @@ -1798,7 +1798,7 @@ >> [(unspec_volatile [(match_operand:SI 0 "s_register_operand" "l")] >> VUNSPEC_EH_RETURN) >> (clobber (match_scratch:SI 1 "=&l"))] >> - "TARGET_THUMB1" >> + "TARGET_THUMB1 && 0" > > This means it wasn't tested for Thumb1 or you had to turn this off to get > proper results for Thumb1 ? > > Please retest. > >> "#" >> "&& reload_completed" >> [(const_int 0)] >> @@ -1809,4 +1809,13 @@ >> }" >> [(set_attr "type" "mov_reg")] >> ) >> + >> +(define_insn "thumb1_long_call_profile" >> + [(unspec_volatile [(match_operand:SI 0 "general_operand" "j,m") >> + ] VUNSPEC_LONG_CALL_PROFILE)] >> + "TARGET_THUMB1" >> + "%@ thumb1_long_call_profile" >> + [(set_attr "pool_range" "1018")] >> +) >> + >> > > Please set length accordingly as well as check on the pool_range here. > >> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md >> index 5744c62..d7ddc3a 100644 >> --- a/gcc/config/arm/unspecs.md >> +++ b/gcc/config/arm/unspecs.md >> @@ -148,6 +148,7 @@ >> VUNSPEC_GET_FPSCR ; Represent fetch of FPSCR content. >> VUNSPEC_SET_FPSCR ; Represent assign of FPSCR content. >> VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing. >> + VUNSPEC_LONG_CALL_PROFILE ; Represent a long call to profile function >> ]) >> >> ;; Enumerators for NEON unspecs. >> diff --git a/gcc/testsuite/gcc.target/arm/pr69770.c >> b/gcc/testsuite/gcc.target/arm/pr69770.c >> new file mode 100644 >> index 0000000..61e5c6d >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/pr69770.c >> @@ -0,0 +1,9 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-pg -mlong-calls" } */ >> + >> +extern void g(void); >> + >> +int f() { g(); return 0; } >> + >> +/* { dg-final { scan-assembler-not "bl\[ \t\]+__gnu_mcount_nc" } } */ >> +/* { dg-final { scan-assembler "__gnu_mcount_nc" } } */ >> -- 1.9.1 >>