We currently use -mno-omit-leaf-frame-pointer on various architectures. I think we should remove it because it does not work as expected.
Obviously, this does not work for many glibc string functions because they use hand-written assembly that does not set up a frame pointer. The happy path of the malloc implementation in Fedora rawhide implementation looks like this: Dump of assembler code for function __GI___libc_malloc: <+0>: endbr64 # special case for zero size argument <+4>: test %rdi,%rdi <+7>: js 0x82af8 <__GI___libc_malloc+264> # compute true allocation size <+13>: lea 0x17(%rdi),%rdx <+17>: mov %rdx,%rax <+20>: and $0xfffffffffffffff0,%rax <+24>: cmp $0x1f,%rdx <+28>: mov $0x20,%edx <+33>: cmovbe %rdx,%rax # check if size argument is in tcache range <+37>: cmp 0x1647d4(%rip),%rax # 0x1e71f0 <mp_+112> <+44>: jae 0x82a80 <__GI___libc_malloc+144> # load tcache pointer <+46>: mov 0x16435b(%rip),%rdx # 0x1e6d80 <+53>: mov %fs:(%rdx),%rcx # check for tcache initialization <+57>: test %rcx,%rcx <+60>: je 0x82b10 <__GI___libc_malloc+288> # compute tcache bin <+66>: mov %rax,%rdx <+69>: shr $0x4,%rdx <+73>: lea -0x2(%rdx),%rsi # check for basic tcache range <+77>: cmp $0x3f,%rsi <+81>: ja 0x82a88 <__GI___libc_malloc+152> <+83>: add $0x10,%rdx # load tcache free list and check if it is empty <+87>: mov 0x8(%rcx,%rdx,8),%rax <+92>: test %rax,%rax <+95>: je 0x82a80 <__GI___libc_malloc+144> # allocation alignment check <+97>: test $0xf,%al <+99>: jne 0x82b40 <__GI___libc_malloc+336> # pointer decoding <+105>: mov %rax,%rdi <+108>: shr $0xc,%rdi <+112>: xor (%rax),%rdi # remove from tcache free list and record that there is more room in it <+115>: mov %rdi,0x8(%rcx,%rdx,8) <+120>: addw $0x1,(%rcx,%rsi,2) # prevent leakage of double-free token in user allocation <+125>: movq $0x0,0x8(%rax) <+133>: ret This is compiler-generated code. The -mno-omit-leaf-frame-pointer flag was used in the build flags as requested: gcc -m32 malloc.c -c -std=gnu11 -fgnu89-inline -O2 -g -Wall -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Wwrite-strings -Wundef -Wimplicit-fallthrough -Werror -fmerge-all-constants -frounding-math -fstack-protector-strong -fno-common -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Wstrict-prototypes -Wold-style-definition -Wfree-labels -Wmissing-parameter-name -fmath-errno -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fPIC -DMORECORE_CLEARS=2 -ftls-model=initial-exec -I../include -I…/build-x86_64-redhat-linux-32/malloc -I…/build-x86_64-redhat-linux-32 -I../sysdeps/unix/sysv/linux/i386/i686 -I../sysdeps/i386/i686/nptl -I../sysdeps/unix/sysv/linux/i386 -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86 -I../sysdeps/x86/nptl -I../sysdeps/i386/nptl -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/i386 -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/i386/i686/fpu/multiarch -I../sysdeps/i386/i686/fpu -I../sysdeps/i386/i686/multiarch -I../sysdeps/i386/i686 -I../sysdeps/i386/fpu -I../sysdeps/x86/fpu -I../sysdeps/i386 -I../sysdeps/x86/include -I../sysdeps/x86 -I../sysdeps/wordsize-32 -I../sysdeps/ieee754/float128 -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/ieee754 -I../sysdeps/generic -I.. -I../libio -I. -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/15/include -isystem /usr/include -D_LIBC_REENTRANT -include …/build-x86_64-redhat-linux-32/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h -DPIC -DSHARED -DUSE_TCACHE=1 -DTOP_NAMESPACE=glibc -o …/build-x86_64-redhat-linux-32/malloc/malloc.os -MD -MP -MF …/build-x86_64-redhat-linux-32/malloc/malloc.os.dt -MT …/build-x86_64-redhat-linux-32/malloc/malloc.os But clearly it has not the desired effect of setting up a frame pointer. The reason is what GCC calls shrink-wrapping. It's related to this old optimization: Minimizing register usage penalty at procedure calls <https://dl.acm.org/doi/abs/10.1145/53990.53999> Basically it pushes optional function prologue instructions (such as saving callee-saved registers, or setting up the frame pointer) to the code paths that actually need them. Given that most of the hot glibc functions do not use frame pointers (at least not on their happy paths), PLT stubs do not have them, and there is a race condition at the start of functions until the frame pointer is set up, tools need to be aware that the top-most interrupted frame (or the first frame after a signal frame) may use the caller's frame pointer. This does not prevent frame pointer based unwinding because in the impacted frames, the frame pointer register is not changed. Without countermeasures, the immediate caller's frame just vanishes from backtraces. Given how pervasive this effect is and that no problems have been reported so far, I think we can drop -mno-omit-leaf-frame-pointer from the build flags. Thoughts? Thanks, Florian -- _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue