https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer

This document represents a proposed Change. As part of the Changes
process, proposals are publicly announced in order to receive
community feedback. This proposal will only be implemented if approved
by the Fedora Engineering Steering Committee.

== Summary ==

Fedora will add -fno-omit-frame-pointer to the default C/C++
compilation flags, which will improve the effectiveness of profiling
and debugging tools.

== Owner ==
* Name: [[User:daandemeyer| Daan De Meyer]], [[User:Dcavalca| Davide
Cavalca]], [[ Andrii Nakryiko]]
* Email: daandeme...@fb.com, dcava...@fb.com, andr...@fb.com


== Detailed Description ==

Credits to Mirek Klimos, whose internal note on stacktrace unwinding
formed the basis for this change proposal (myre...@gmail.com).

Any performance or efficiency work relies on accurate profiling data.
Sampling profilers probe the target program's call stack at regular
intervals and store the stack traces. If we collect enough of them, we
can closely approximate the real cost of a library or function with
minimal runtime overhead.

Stack trace capture what’s running on a thread. It should start with
clone - if the thread was created via clone syscall - or with _start -
if it’s the main thread of the process. The last function in the stack
trace is code that CPU is currently executing. If a stack starts with
[unknown] or any other symbol, it means it's not complete.

=== Unwinding ===

How does the profiler get the list of function names? There are two parts of it:

# Unwinding the stack - getting a list of virtual addresses pointing
to the executable code
# Symbolization - translating virtual addresses into human-readable
information, like function name, inlined functions at the address, or
file name and line number.

Unwinding is what we're interested in for the purpose of this
proposal. The important things are:

* Data on stack is split into frames, each frame belonging to one function.
* Right before each function call, the return address is put on the
stack. This is the instruction address in the caller to which we will
eventually return — and that's what we care about.
* One register, called the "frame pointer" or "base pointer" register
(RBP), is traditionally used to point to the beginning of the current
frame. Every function should back up RBP onto the stack and set it
properly at the very beginning.

The “frame pointer” part is achieved by adding push %rbp, mov
%rsp,%rbp to the beginning of every function and by adding pop %rbp
before returning. Using this knowledge, stack unwinding boils down to
traversing a linked list:

https://i.imgur.com/P6pFdPD.png

=== Where’s the catch? ===

The frame pointer register is not necessary to run a compiled binary.
It makes it easy to unwind the stack, and some debugging tools rely on
frame pointers, but the compiler knows how much data it put on the
stack, so it can generate code that doesn't need the RBP. Not using
the frame pointer register can make a program more efficient:

* We don’t need to back up the value of the register onto the stack,
which saves 3 instructions per function.
* We can treat the RBP as a general-purpose register and use it for
something else.

Whether the compiler sets frame pointer or not is controlled by the
-fomit-frame-pointer flag and the default is "omit", meaning we can’t
use this method of stack unwinding by default.

To make it possible to rely on the frame pointer being available,
we'll add -fno-omit-frame-pointer to the default C/C++ compilation
flags. This will instruct the compiler to make sure the frame pointer
is always available. This will in turn allow profiling tools to
provide accurate performance data which can drive performance
improvements in core libraries and executables.

== Feedback ==

=== Potential performance impact ===

* Meta builds all its libraries and executables with
-fno-omit-frame-pointer by default. Internal benchmarks did not show
significant impact on performance when omitting the frame pointer for
two of our most performance intensive applications.
* Firefox recently landed a change to preserve the frame pointer in
all jitted code
(https://bugzilla.mozilla.org/show_bug.cgi?id=1426134). No significant
decrease in performance was observed.
* Kernel 4.8 frame pointer benchmarks by Suse showed 5%-10%
regressions in some benchmarks
(https://lore.kernel.org/all/20170602104048.jkkzssljsompj...@suse.de/T/#u)

Should individual libraries or executables notice a significant
performance degradation caused by including the frame pointer
everywhere, these packages can opt-out on an individual basis as
described in 
https://docs.fedoraproject.org/en-US/packaging-guidelines/#_compiler_flags.

=== Alternatives to frame pointers ===

There are a few alternative ways to unwind stacks instead of using the
frame pointer:

* [https://dwarfstd.org DWARF] data - The compiler can emit extra
information that allows us to find the beginning of the frame without
the frame pointer, which means we can walk the stack exactly as
before. The problem is that we need to unwind the stack in kernel
space which isn't implemented in the kernel. Given that the kernel
implemented it's own format (ORC) instead of using DWARF, it's
unlikely that we'll see a DWARF unwinder in the kernel any time soon.
The perf tool allows you to use the DWARF data with
--call-graph=dwarf, but this means that it copies the full stack on
every event and unwinds in user space. This has very high overhead.
* [https://www.kernel.org/doc/html/v5.3/x86/orc-unwinder.html ORC]
(undwarf) - problems with unwinding in kernel led to creation of
another format with the same purpose as DWARF, just much simpler. This
can only be used to unwind kernel stack traces; it doesn't help us
with userspace stacks. More information on ORC can be found
[https://lwn.net/Articles/728339 here].
* [https://lwn.net/Articles/680985 LBR] - New Intel CPUs have a
feature that gives you source and target addresses for the last 16 (or
32, in newer CPUs) branches with no overhead. It can be configured to
record only function calls and to be used as a stack, which means it
can be used to get the stack trace. Sadly, you only get the last X
calls, and not the full stack trace, so the data can be very
incomplete. On top of that, many Fedora users might still be using
CPUs without LBR support which means we wouldn't be able to assume
working profilers on a Fedora system by default.

To summarize, if we want complete stacks with reasonably low overhead
(which we do, there's no other way to get accurate profiling data from
running services), frame pointers are currently the best option.

== Benefit to Fedora ==

Implementing this change will provide profiling tools with easy access
to stacktraces of installed libraries and executables which will lead
to more accurate profiling data in general. This in turn can be used
to implement optimizations to core libraries and executables which
will improve the overall performance of Fedora itself and the wider
Linux ecosystem.

Various debugging tools can also make use of the frame pointer to
access the current stacktrace, although tools like gdb can already do
this to some degree via embedded dwarf debugging info.

== Scope ==
* Proposal owners: Put up a PR to change the rpm macros to build
packages by default with -fno-omit-frame-pointer by default.

* Other developers: Review and merge the PR implementing the Change.

* Release engineering: [https://pagure.io/releng/issues #Releng issue
number]. A mass rebuild is required.

* Policies and guidelines: N/A (not needed for this Change)

* Trademark approval: N/A (not needed for this Change)

* Alignment with Objectives: N/A

== Upgrade/compatibility impact ==

This should not impact upgrades in any way.

== How To Test ==

# Build the package with the updated rpm macros
# Profile the binary with `perf record -g <binary>`
# Inspect the perf data with `perf report -g 'graph,0.5,caller'`
# When expanding hot functions in the perf report, perf should show
the full call graph of the hot function (at least for all functions
that are part of the binary compiled with -fno-omit-frame-pointer)

== User Experience ==

Fedora users will be more likely to have a streamlined experience when
trying to debug/profile system executables/libraries. Tools such as
perf will work out of the box instead of requiring to users to provide
extra options (e.g. --call-graph=dwarf/LBR) or requiring users to
recompile all relevant packages with -fno-omit-frame-pointer.

== Dependencies ==

The rpm macros for Fedora need to be adjusted to include
-fno-omit-frame-pointer in the default C/C++ compilation flags.

== Contingency Plan ==

* Contingency mechanism: The new version can be released without every
package being rebuilt with fno-omit-frame-pointer. Profiling will only
work perfectly once all packages have been rebuilt but there will be
no regression in behavior if not all packages have been rebuilt by the
time of the release. If the Change is found to introduce unacceptable
regressions, the PR implementing it can be reverted and affected
packages can be rebuilt.
* Contingency deadline: Final freeze
* Blocks release? No

== Documentation ==

* Original proposal for in-kernel DWARF unwinder (rejected):
https://lkml.org/lkml/2017/5/5/571

== Release Notes ==

Packages are now compiled with frame pointers included by default.
This will enable a variety of profiling and debugging tools to show
more information out of the box.


-- 
Ben Cotton
He / Him / His
Fedora Program Manager
Red Hat
TZ=America/Indiana/Indianapolis
_______________________________________________
devel-announce mailing list -- devel-announce@lists.fedoraproject.org
To unsubscribe send an email to devel-announce-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel-announce@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to