* Steven Rostedt <rost...@goodmis.org> wrote: > On Thu, 2009-11-19 at 19:47 +0100, Ingo Molnar wrote: > > * Linus Torvalds <torva...@linux-foundation.org> wrote: > > > > > Admittedly, anybody who compiles with -pg probably doesn't care deeply > > > about smaller and more efficient code, since the mcount call overhead > > > tends to make the thing moot anyway, but it really looks like a > > > win-win situation to just fix the mcount call sequence regardless. > > > > Just a sidenote: due to dyn-ftrace, which patches out all mcounts during > > bootup to be NOPs (and opt-in patches them in again if someone runs the > > function tracer), the cost is not as large as one would have it with say > > -pg based user-space profiling. > > > > It's not completely zero-cost as the pure NOPs balloon the i$ footprint > > a bit and GCC generates different code too in some cases. But it's > > certainly good enough that it's generally pretty hard to prove overhead > > via micro or macro benchmarks that the patched out mcounts call sites > > are there. > > And frame pointers do add a little overhead as well. Too bad the mcount > ABI wasn't something like this: > > > <function>: > call mcount > [...] > > This way, the function address for mcount would have been (%esp) and > the parent address would be 4(%esp). Mcount would work without frame > pointers and this whole mess would also become moot.
In that case we could also fix up static callsites to this address as well (to jump +5 bytes into the function) and avoid the NOP as well in most cases. (That would in essence merge any slow-path function epilogue with the mcount cal instruction in terms of I$ footprint - i.e. it would be an even lower overhead feature.) If only the kernel had its own compiler. Ingo