https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120870
Steven Sun <StevenSun2021 at hotmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |StevenSun2021 at hotmail dot
com
--- Comment #36 from Steven Sun <StevenSun2021 at hotmail dot com> ---
Thanks for reporting to CPython side. I am a maintainer of CPython and I did
some analysis on this bug. Here is what I found.
## Root Cause: DRAP Register Clobbered in `preserve_none` Functions
### The DRAP Mechanism
When a function needs stricter stack alignment than the ABI guarantees (e.g.,
128-bit/16-byte for AVX/AVX2 on znver2), GCC uses a **DRAP (Dynamic Realign
Argument Pointer)** register:
1. Prologue: choose a register, push it, compute its value = original RSP, then
align the stack.
2. Epilogue: use that register to restore RSP before returning: `lea
-offset(%drap_reg), %rsp`.
**GCC's implicit invariant:** The DRAP register value must survive unchanged
from prologue to epilogue.
### How `preserve_none` Breaks This Invariant
The `preserve_none` ABI marks all general-purpose registers (except RBP and
RSP) as caller-saved or parameter registers. GCC's register allocator is free
to use any of them for local temporaries without saving/restoring.
In this bug:
1. `find_drap_reg()` selects **RBX** as DRAP for a `preserve_none` function.
2. The function body (a large tail-call interpreter opcode handler) freely
modifies RBX.
3. The epilogue executes `lea -0x10(%rbx), %rsp` — but **RBX now holds a
different value** than what the prologue computed.
4. RSP becomes garbage. The corrupted stack pointer propagates through `jmp`
tail-call chains, eventually causing a segfault.
### Why `-march=znver2` Specifically Triggers It
`-march=znver2` sets `preferred_stack_boundary` to 128 bits (16 bytes), forcing
stack realignment for functions with outgoing stack arguments. Without this
flag, the default 64-bit stack boundary avoids DRAP entirely for most
functions.
That's why `-march=x86-64-v3` and `-march=x86-64-v4` also trigger it.
### Assembly Evidence
In the miscompiled function with `-march=znver2`:
```asm
push %rbx
sub $0x28, %rsp
...
; function body modifies RBX freely
...
lea -0x10(%rbx), %rsp ; uses CORRUPTED RBX
pop %rbx ; too late — RSP is already wrong
```
The `lea` uses RBX after it has been overwritten by the function body,
computing a garbage stack pointer.