[PATCH] gcc mcount-nofp was Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-20 Thread Andi Kleen
Steven Rostedt rost...@goodmis.org writes:

 And frame pointers do add a little overhead as well. Too bad the mcount
 ABI wasn't something like this:


   function:
   callmcount
   [...]

 This way, the function address for mcount would have been (%esp) and the
 parent address would be 4(%esp). Mcount would work without frame
 pointers and this whole mess would also become moot.

I did a patch to do this in x86 gcc some time ago. The motivation
was indeed the frame pointer overhead on Atom with tracing.

Unfortunately it also requires glibc changes (I did those too). For
compatibility and catching mistakes the new function was called
__mcount_nofp.

I haven't tried it with current gcc and last time I missed the 
gcc feature merge window with this.

But perhaps you find it useful. Of course it would need more
kernel changes to probe for the new option and handle it.

Here's the old patch. I haven't retested it with a current
gcc version, but I think it still applies at least.

If there's interest I can polish it up and submit formally.

-Andi


Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi (revision 149140)
+++ gcc/doc/tm.texi (working copy)
@@ -1884,6 +1884,12 @@
 of words in each data entry.
 @end defmac
 
+...@defmac TARGET_FUNCTION_PROFILE
+Define if the target has a custom function_profiler function.
+The target should not set this macro, it is implicitely set from 
+the PROFILE_BEFORE_PROLOGUE macro.
+...@end defmac
+
 @node Registers
 @section Register Usage
 @cindex register usage
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 149140)
+++ gcc/doc/invoke.texi (working copy)
@@ -593,7 +593,7 @@
 -momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
 -mcmod...@var{code-model} -ma...@var{name} @gol
 -m32  -m64 -mlarge-data-thresho...@var{num} @gol
--mfused-madd -mno-fused-madd -msse2avx}
+-mfused-madd -mno-fused-madd -msse2avx -mmcount-nofp}
 
 @emph{IA-64 Options}
 @gccoptlist{-mbig-endian  -mlittle-endian  -mgnu-as  -mgnu-ld  -mno-pic @gol
@@ -11749,6 +11749,11 @@
 @opindex msse2avx
 Specify that the assembler should encode SSE instructions with VEX
 prefix.  The option @option{-mavx} turns this on by default.
+
+...@item -mmcount-nofp
+Don't force the frame counter with @option{-pg} function profiling.
+Instead call a new @code{__mcount_nofp} function before a stack 
+frame is set up.
 @end table
 
 These @samp{-m} switches are supported in addition to the above
Index: gcc/target.h
===
--- gcc/target.h(revision 149140)
+++ gcc/target.h(working copy)
@@ -1132,6 +1132,9 @@
*/
   bool arm_eabi_unwinder;
 
+  /* True when the function profiler code is outputted before the prologue. */
+  bool profile_before_prologue;
+
   /* Leave the boolean fields at the end.  */
 };
 
Index: gcc/final.c
===
--- gcc/final.c (revision 149140)
+++ gcc/final.c (working copy)
@@ -1520,10 +1520,8 @@
 
   /* The Sun386i and perhaps other machines don't work right
  if the profiling code comes after the prologue.  */
-#ifdef PROFILE_BEFORE_PROLOGUE
-  if (crtl-profile)
+  if (targetm.profile_before_prologue  crtl-profile)
 profile_function (file);
-#endif /* PROFILE_BEFORE_PROLOGUE */
 
 #if defined (DWARF2_UNWIND_INFO)  defined (HAVE_prologue)
   if (dwarf2out_do_frame ())
@@ -1565,10 +1563,8 @@
 static void
 profile_after_prologue (FILE *file ATTRIBUTE_UNUSED)
 {
-#ifndef PROFILE_BEFORE_PROLOGUE
-  if (crtl-profile)
+  if (!targetm.profile_before_prologue  crtl-profile)
 profile_function (file);
-#endif /* not PROFILE_BEFORE_PROLOGUE */
 }
 
 static void
Index: gcc/gcc.c
===
--- gcc/gcc.c   (revision 149140)
+++ gcc/gcc.c   (working copy)
@@ -797,6 +797,12 @@
 # define SYSROOT_HEADERS_SUFFIX_SPEC 
 #endif
 
+/* Target can override this to allow -pg/-fomit-frame-pointer together */
+#ifndef TARGET_PG_OPTION_SPEC
+#define TARGET_PG_OPTION_SPEC \
+%{pg:%{fomit-frame-pointer:%e-pg and -fomit-frame-pointer are incompatible}}
+#endif
+
 static const char *asm_debug;
 static const char *cpp_spec = CPP_SPEC;
 static const char *cc1_spec = CC1_SPEC;
@@ -866,8 +872,8 @@
 
 /* NB: This is shared amongst all front-ends, except for Ada.  */
 static const char *cc1_options =
-%{pg:%{fomit-frame-pointer:%e-pg and -fomit-frame-pointer are incompatible}}\
- %1 %{!Q:-quiet} -dumpbase %B %{d*} %{m*} %{a*}\
+ TARGET_PG_OPTION_SPEC
+ %1 %{!Q:-quiet} -dumpbase %B %{d*} %{m*} %{a*}\
  %{fcompare-debug-second:%:compare-debug-auxbase-opt(%b)} \
  %{!fcompare-debug-second:%{c|S:%{o*:-auxbase-strip %*}%{!o*:-auxbase 
%b}}}%{!c:%{!S:-auxbase %b}} \
  %{g*} %{O*} %{W*pedantic*} %{w} %{std*ansitrigraphs}\
Index: gcc/target-def.h

Re: [PATCH] gcc mcount-nofp was Re: BUG: GCC-4.4.x changes the function frame on some functions

2009-11-20 Thread Steven Rostedt
On Fri, 2009-11-20 at 10:57 +0100, Andi Kleen wrote:
 Steven Rostedt rost...@goodmis.org writes:
 
  And frame pointers do add a little overhead as well. Too bad the mcount
  ABI wasn't something like this:
 
 
  function:
  callmcount
  [...]
 
  This way, the function address for mcount would have been (%esp) and the
  parent address would be 4(%esp). Mcount would work without frame
  pointers and this whole mess would also become moot.
 
 I did a patch to do this in x86 gcc some time ago. The motivation
 was indeed the frame pointer overhead on Atom with tracing.
 

Yes, I remember you talking about this but I don't remember how far it
went.



 Unfortunately it also requires glibc changes (I did those too). For
 compatibility and catching mistakes the new function was called
 __mcount_nofp.

Actually, could you change the name? I really hate the mcount name, it
is legacy and with a new feature, it should be abandoned. Something like
__fentry__ would be nicer.

 
 I haven't tried it with current gcc and last time I missed the 
 gcc feature merge window with this.
 
 But perhaps you find it useful. Of course it would need more
 kernel changes to probe for the new option and handle it.
 
 Here's the old patch. I haven't retested it with a current
 gcc version, but I think it still applies at least.
 
 If there's interest I can polish it up and submit formally.

I would definitely be interested, and I would also be willing to test
it.

Thanks!

-- Steve